Re: [ceph-users] SSD-primary crush rule doesn't work as intended

2018-05-23 Thread Horace
Oh, it's not working as intended though the ssd-primary rule is officially 
listed on ceph documentation. I should file a feature request or bugzilla for 
it? 

Regards, 
Horace Ng 


From: "Paul Emmerich"  
To: "horace"  
Cc: "ceph-users"  
Sent: Wednesday, May 23, 2018 8:37:07 PM 
Subject: Re: [ceph-users] SSD-primary crush rule doesn't work as intended 

You can't mix HDDs and SSDs in a server if you want to use such a rule. 
The new selection step after "emit" can't know what server was selected 
previously. 

Paul 

2018-05-23 11:02 GMT+02:00 Horace < [ mailto:hor...@hkisl.net | 
hor...@hkisl.net ] > : 


Add to the info, I have a slightly modified rule to take advantage of the new 
storage class. 

rule ssd-hybrid { 
id 2 
type replicated 
min_size 1 
max_size 10 
step take default class ssd 
step chooseleaf firstn 1 type host 
step emit 
step take default class hdd 
step chooseleaf firstn -1 type host 
step emit 
} 

Regards, 
Horace Ng 

- Original Message - 
From: "horace" < [ mailto:hor...@hkisl.net | hor...@hkisl.net ] > 
To: "ceph-users" < [ mailto:ceph-users@lists.ceph.com | 
ceph-users@lists.ceph.com ] > 
Sent: Wednesday, May 23, 2018 3:56:20 PM 
Subject: [ceph-users] SSD-primary crush rule doesn't work as intended 

I've set up the rule according to the doc, but some of the PGs are still being 
assigned to the same host. 

[ http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/ | 
http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/ ] 

rule ssd-primary { 
ruleset 5 
type replicated 
min_size 5 
max_size 10 
step take ssd 
step chooseleaf firstn 1 type host 
step emit 
step take platter 
step chooseleaf firstn -1 type host 
step emit 
} 

Crush tree: 

[root@ceph0 ~]# ceph osd crush tree 
ID CLASS WEIGHT TYPE NAME 
-1 58.63989 root default 
-2 19.55095 host ceph0 
0 hdd 2.73000 osd.0 
1 hdd 2.73000 osd.1 
2 hdd 2.73000 osd.2 
3 hdd 2.73000 osd.3 
12 hdd 4.54999 osd.12 
15 hdd 3.71999 osd.15 
18 ssd 0.2 osd.18 
19 ssd 0.16100 osd.19 
-3 19.55095 host ceph1 
4 hdd 2.73000 osd.4 
5 hdd 2.73000 osd.5 
6 hdd 2.73000 osd.6 
7 hdd 2.73000 osd.7 
13 hdd 4.54999 osd.13 
16 hdd 3.71999 osd.16 
20 ssd 0.16100 osd.20 
21 ssd 0.2 osd.21 
-4 19.53799 host ceph2 
8 hdd 2.73000 osd.8 
9 hdd 2.73000 osd.9 
10 hdd 2.73000 osd.10 
11 hdd 2.73000 osd.11 
14 hdd 3.71999 osd.14 
17 hdd 4.54999 osd.17 
22 ssd 0.18700 osd.22 
23 ssd 0.16100 osd.23 

#ceph pg ls-by-pool ssd-hybrid 

27.8 1051 0 0 0 0 4399733760 1581 1581 active+clean 2018-05-23 06:20:56.088216 
27957'185553 27959:368828 [23,1,11] 23 [23,1,11] 23 27953'182582 2018-05-23 
06:20:56.088172 27843'162478 2018-05-20 18:28:20.118632 

With osd.23 and osd.11 being assigned on the same host. 

Regards, 
Horace Ng 
___ 
ceph-users mailing list 
[ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] 
[ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] 
___ 
ceph-users mailing list 
[ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] 
[ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] 






-- 
-- 
Paul Emmerich 

Looking for help with your Ceph cluster? Contact us at [ https://croit.io/ | 
https://croit.io ] 

croit GmbH 
Freseniusstr. 31h 
81247 München 
[ http://www.croit.io/ | www.croit.io ] 
Tel: +49 89 1896585 90 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-05-23 Thread Yan, Zheng
On Thu, May 24, 2018 at 12:00 AM, Sean Sullivan  wrote:
> Thanks Yan! I did this for the bug ticket and missed these replies. I hope I
> did it correctly. Here are the pastes of the dumps:
>
> https://pastebin.com/kw4bZVZT -- primary
> https://pastebin.com/sYZQx0ER -- secondary
>
>
> they are not that long here is the output of one:
>
> Thread 17 "mds_rank_progr" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fe3b100a700 (LWP 120481)]
> 0x5617aacc48c2 in Server::handle_client_getattr
> (this=this@entry=0x5617b5acbcd0, mdr=..., is_lookup=is_lookup@entry=true) at
> /build/ceph-12.2.5/src/mds/Server.cc:3065
> 3065/build/ceph-12.2.5/src/mds/Server.cc: No such file or directory.
> (gdb) t
> [Current thread is 17 (Thread 0x7fe3b100a700 (LWP 120481))]
> (gdb) bt
> #0  0x5617aacc48c2 in Server::handle_client_getattr
> (this=this@entry=0x5617b5acbcd0, mdr=..., is_lookup=is_lookup@entry=true) at
> /build/ceph-12.2.5/src/mds/Server.cc:3065
> #1  0x5617aacfc98b in Server::dispatch_client_request
> (this=this@entry=0x5617b5acbcd0, mdr=...) at
> /build/ceph-12.2.5/src/mds/Server.cc:1802
> #2  0x5617aacfce9b in Server::handle_client_request
> (this=this@entry=0x5617b5acbcd0, req=req@entry=0x5617bdfa8700)at
> /build/ceph-12.2.5/src/mds/Server.cc:1716
> #3  0x5617aad017b6 in Server::dispatch (this=0x5617b5acbcd0,
> m=m@entry=0x5617bdfa8700) at /build/ceph-12.2.5/src/mds/Server.cc:258
> #4  0x5617aac6afac in MDSRank::handle_deferrable_message
> (this=this@entry=0x5617b5d22000, m=m@entry=0x5617bdfa8700)at
> /build/ceph-12.2.5/src/mds/MDSRank.cc:716
> #5  0x5617aac795cb in MDSRank::_dispatch
> (this=this@entry=0x5617b5d22000, m=0x5617bdfa8700,
> new_msg=new_msg@entry=false) at /build/ceph-12.2.5/src/mds/MDSRank.cc:551
> #6  0x5617aac7a472 in MDSRank::retry_dispatch (this=0x5617b5d22000,
> m=) at /build/ceph-12.2.5/src/mds/MDSRank.cc:998
> #7  0x5617aaf0207b in Context::complete (r=0, this=0x5617bd568080) at
> /build/ceph-12.2.5/src/include/Context.h:70
> #8  MDSInternalContextBase::complete (this=0x5617bd568080, r=0) at
> /build/ceph-12.2.5/src/mds/MDSContext.cc:30
> #9  0x5617aac78bf7 in MDSRank::_advance_queues (this=0x5617b5d22000) at
> /build/ceph-12.2.5/src/mds/MDSRank.cc:776
> #10 0x5617aac7921a in MDSRank::ProgressThread::entry
> (this=0x5617b5d22d40) at /build/ceph-12.2.5/src/mds/MDSRank.cc:502
> #11 0x7fe3bb3066ba in start_thread (arg=0x7fe3b100a700) at
> pthread_create.c:333
> #12 0x7fe3ba37241d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
>
>
> I
> * set the debug level to mds=20 mon=1,
> *  attached gdb prior to trying to mount aufs from a separate client,
> *  typed continue, attempted the mount,
> * then backtraced after it seg faulted.
>
> I hope this is more helpful. Is there something else I should try to get
> more info? I was hoping for something closer to a python trace where it says
> a variable is a different type or a missing delimiter. womp. I am definitely
> out of my depth but now is a great time to learn! Can anyone shed some more
> light as to what may be wrong?
>

I updated https://tracker.ceph.com/issues/23972.  It's a kernel bug,
which sends malformed request to mds.

Regards
Yan, Zheng
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph replication factor of 2

2018-05-23 Thread Jack
Hi,

About Bluestore, sure there are checksum, but are they fully used ?
Rumors said that on a replicated pool, during recovery, they are not


> My thoughts on the subject are that even though checksums do allow to find 
> which replica is corrupt without having to figure which 2 out of 3 copies are 
> the same, this is not the only reason min_size=2 was required. Even if you 
> are running all SSD which are more reliable than HDD and are keeping the disk 
> size small so you could backfill quickly in case of a single disk failure, 
> you would still occasionally have longer periods of degraded operation. To 
> name a couple - a full node going down; or operator deliberately wiping an 
> OSD to rebuild it. min_size=1 in this case would leave you running with no 
> redundancy at all. DR scenario with pool-to-pool mirroring probably means 
> that you can not just replace the lost or incomplete PGs in your main site 
> from your DR, cause DR is likely to have a different PG layout, so full 
> resync from DR would be required in case of one disk lost during such 
> unprotected times.

I have to say, this is a common yet worthless argument
If I have 3000 OSD, using 2 or 3 replica will not change much : the
probability of losing 2 devices is still "high"

On the other hand, if I have a small cluster, less than a hundred OSD,
that same probability become "low"

I do not buy the "if someone is making a maintenance and a device fails"
either : this is a no-limit goal: what is X servers burns at the same
time ? What if an admin make a mistake and drop 5 OSD ? What is some
network tor or routers blow away ?
Should we do one replica par OSD ?


Thus, I would like to emphasis the technical sanity of using 2 replica,
versus the organisational sanity of doing so

Organisational stuff if specific to everybody, technical is shared by
all clusters

I would like people, especially the Ceph's devs and other people who
knows how it works deeply (read the code!) to give us their advices

Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph replication factor of 2

2018-05-23 Thread Anthony Verevkin
This week at the OpenStackSummit Vancouver I can hear people entertaining the 
idea of running Ceph with replication factor of 2.

Karl Vietmeier of Intel suggested that we use 2x replication because Bluestore 
comes with checksums.
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21370/supporting-highly-transactional-and-low-latency-workloads-on-ceph

Later, there was a question from the audience during the Ceph DR/mirroring talk 
on whether we could use 2x replication if we also mirror to DR.
https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20749/how-to-survive-an-openstack-cloud-meltdown-with-ceph

So the interest is definitely there: not losing 1/3 of your disk space and 
performance is promising. But on the other hand it comes with higher risks.

I wonder if we as the community could come up to some consensus, now that the 
established practice of requiring size=3, min_size=2 is being challenged.


My thoughts on the subject are that even though checksums do allow to find 
which replica is corrupt without having to figure which 2 out of 3 copies are 
the same, this is not the only reason min_size=2 was required. Even if you are 
running all SSD which are more reliable than HDD and are keeping the disk size 
small so you could backfill quickly in case of a single disk failure, you would 
still occasionally have longer periods of degraded operation. To name a couple 
- a full node going down; or operator deliberately wiping an OSD to rebuild it. 
min_size=1 in this case would leave you running with no redundancy at all. DR 
scenario with pool-to-pool mirroring probably means that you can not just 
replace the lost or incomplete PGs in your main site from your DR, cause DR is 
likely to have a different PG layout, so full resync from DR would be required 
in case of one disk lost during such unprotected times.

What are your thoughts, would you run 2x replication factor in Production and 
in what scenarios?

Regards,
Anthony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Flush very, very slow

2018-05-23 Thread Philip Poten
Hi,

the flush from the overlay cache for my ec-based cephfs is very very slow,
as are all operations on the cephfs. The flush accelerates when the mds is
stopped.

I think this is due to a large number of files that were deleted all at
once, but I'm not sure how to verify that. Are there any counters I can
look up that show things like "pending deletions"? How else can I debug the
problem?

Any insight is very much appreciated.

Philip

(potentially helpful debug output follows)

status:

root@lxt-prod-ceph-mon02:~# ceph -s
  cluster:
id: 066f558c-6789-4a93-aaf1-5af1ba01a3ad
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
102 slow requests are blocked > 32 sec

  services:
mon: 2 daemons, quorum lxt-prod-ceph-mon01,lxt-prod-ceph-mon02
mgr: lxt-prod-ceph-mon02(active), standbys: lxt-prod-ceph-mon01
mds: plexfs-1/1/1 up  {0=lxt-prod-ceph-mds01=up:active}
osd: 13 osds: 7 up, 7 in
 flags noscrub,nodeep-scrub

  data:
pools:   3 pools, 536 pgs
objects: 5431k objects, 21056 GB
usage:   28442 GB used, 5319 GB / 33761 GB avail
pgs: 536 active+clean

  io:
client:   687 kB/s wr, 0 op/s rd, 9 op/s wr
cache:345 kB/s flush

(Throughput is currently in the kilobyte/ low megabyte range, but could go
to 100MB/s under healthy conditions)

health:

root@lxt-prod-ceph-mon02:~# ceph health detail
HEALTH_WARN noscrub,nodeep-scrub flag(s) set; 105 slow requests are blocked
> 32 sec
OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
REQUEST_SLOW 105 slow requests are blocked > 32 sec
45 ops are blocked > 262.144 sec
29 ops are blocked > 131.072 sec
20 ops are blocked > 65.536 sec
11 ops are blocked > 32.768 sec
osds 1,7 have blocked requests > 262.144 sec

(all osds have a high system load, but not a lot of iowait. cephfs/flushing
usually performs much better with the same conditions)

pool configuration:

root@lxt-prod-ceph-mon02:~# ceph osd pool ls detail
pool 6 'cephfs-metadata' replicated size 1 min_size 1 crush_rule 0
object_hash rjenkins pg_num 16 pgp_num 16 last_change 12515 lfor 0/12412
flags hashpspool stripe_width 0 application cephfs
pool 9 'cephfs-data' erasure size 4 min_size 3 crush_rule 4 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 12482 lfor 12481/12481 flags
hashpspool crash_replay_interval 45 tiers 17 read_tier 17 write_tier 17
stripe_width 4128 application cephfs
pool 17 'cephfs-cache' replicated size 1 min_size 1 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 12553 lfor 12481/12481
flags hashpspool,incomplete_clones,noscrub,nodeep-scrub tier_of 9
cache_mode writeback target_bytes 2000 target_objects 15
hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0}
180s x1 decay_rate 20 search_last_n 1 min_write_recency_for_promote 1
stripe_width 0

metadata and cache are both on the same ssd osd:

root@lxt-prod-ceph-mon02:~# ceph osd crush tree
ID CLASS WEIGHT   TYPE NAME
-50.24399 root ssd
 7   ssd  0.24399 osd.7
-45.74399 host sinnlich
 6   hdd  5.5 osd.6
 7   ssd  0.24399 osd.7
-1   38.40399 root hdd
-2   16.45799 host hn-lxt-ceph01
 1   hdd  5.5 osd.1
 9   hdd  5.5 osd.9
12   hdd  5.5 osd.12
-3   16.44600 host hn-lxt-ceph02
 2   hdd  5.5 osd.2
 3   hdd  5.5 osd.3
 4   hdd  2.72299 osd.4
 5   hdd  2.72299 osd.5
 6   hdd  5.5 osd.6

cache tier settings:

root@lxt-prod-ceph-mon02:~# ceph osd pool get cephfs-cache all
size: 1
min_size: 1
crash_replay_interval: 0
pg_num: 8
pgp_num: 8
crush_rule: replicated_ruleset
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: true
nodeep-scrub: true
hit_set_type: bloom
hit_set_period: 180
hit_set_count: 1
hit_set_fpp: 0.05
use_gmt_hitset: 1
auid: 0
target_max_objects: 15
target_max_bytes: 2000
cache_target_dirty_ratio: 0.01
cache_target_dirty_high_ratio: 0.1
cache_target_full_ratio: 0.8
cache_min_flush_age: 60
cache_min_evict_age: 0
min_read_recency_for_promote: 0
min_write_recency_for_promote: 1
fast_read: 0
hit_set_grade_decay_rate: 20
hit_set_search_last_n: 1

(I'm not sure the values make much sense, I copied them from online
examples and adapted them minimally if at all)

the mds shows no ops in flight, but the ssd osd shows a lot of those
operations that seem to be slow (all of them with the same events timeline
stopping at reached_pg):

root@sinnlich:~# ceph daemon  osd.7 dump_ops_in_flight| head -30
{
"ops": [
{
"description": "osd_op(mds.0.3479:170284 17.1
17:98fc84de:::12a830d.:head [delete] snapc 1=[]
ondisk+write+known_if_redirected+full_force e12553)",
"initiated_at": "2018-05-23 21:27:00.140552",
"age": 47.611064,
"duration": 47.611077,
"type_data": {
"flag_point": "reached pg",
   

Re: [ceph-users] Too many objects per pg than average: deadlock situation

2018-05-23 Thread Sage Weil
On Wed, 23 May 2018, Mike A wrote:
> Hello
> 
> > 21 мая 2018 г., в 2:05, Sage Weil  написал(а):
> > 
> > On Sun, 20 May 2018, Mike A wrote:
> >> Hello!
> >> 
> >> In our cluster, we see a deadlock situation.
> >> This is a standard cluster for an OpenStack without a RadosGW, we have a 
> >> standard block access pools and one for metrics from a gnocchi.
> >> The amount of data in the gnocchi pool is small, but objects are just a 
> >> lot.
> >> 
> >> When planning a distribution of PG between pools, the PG are distributed 
> >> depending on the estimated data size of each pool. Correspondingly, as 
> >> suggested by pgcalc for the gnocchi pool, it is necessary to allocate a 
> >> little PG quantity.
> >> 
> >> As a result, the cluster is constantly hanging with the error "1 pools 
> >> have many more objects per pg than average" and this is understandable: 
> >> the gnocchi produces a lot of small objects and in comparison with the 
> >> rest of pools it is tens times larger.
> >> 
> >> And here we are at a deadlock:
> >> 1. We can not increase the amount of PG on the gnocchi pool, since it is 
> >> very small in data size
> >> 2. Even if we increase the number of PG - we can cross the recommended 200 
> >> PGs limit for each OSD in cluster
> >> 3. Constantly holding the cluster in the HEALTH_WARN mode is a bad idea
> >> 4. We can set the parameter "mon pg warn max object skew", but we do not 
> >> know how the Ceph will work when there is one pool with a huge object / 
> >> pool ratio
> >> 
> >> There is no obvious solution.
> >> 
> >> How to solve this problem correctly?
> > 
> > As a workaround, I'd just increase the skew option to make the warning go 
> > away.
> > 
> > It seems to me like the underlying problem is that we're looking at object 
> > count vs pg count, but ignoring the object sizes.  Unfortunately it's a 
> > bit awkward to fix because we don't have a way to quantify the size of 
> > omap objects via the stats (currently).  So for now, just adjust the skew 
> > value enough to make the warning go away!
> > 
> > sage
> 
> This situation can somehow negatively affect the work of the cluster?

Eh, you'll end up with a PG count that is possibly suboptimal.  You'd have 
to work pretty hard to notice any difference, though.  I wouldn't worry 
about it.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Too many objects per pg than average: deadlock situation

2018-05-23 Thread Mike A
Hello

> 21 мая 2018 г., в 2:05, Sage Weil  написал(а):
> 
> On Sun, 20 May 2018, Mike A wrote:
>> Hello!
>> 
>> In our cluster, we see a deadlock situation.
>> This is a standard cluster for an OpenStack without a RadosGW, we have a 
>> standard block access pools and one for metrics from a gnocchi.
>> The amount of data in the gnocchi pool is small, but objects are just a lot.
>> 
>> When planning a distribution of PG between pools, the PG are distributed 
>> depending on the estimated data size of each pool. Correspondingly, as 
>> suggested by pgcalc for the gnocchi pool, it is necessary to allocate a 
>> little PG quantity.
>> 
>> As a result, the cluster is constantly hanging with the error "1 pools have 
>> many more objects per pg than average" and this is understandable: the 
>> gnocchi produces a lot of small objects and in comparison with the rest of 
>> pools it is tens times larger.
>> 
>> And here we are at a deadlock:
>> 1. We can not increase the amount of PG on the gnocchi pool, since it is 
>> very small in data size
>> 2. Even if we increase the number of PG - we can cross the recommended 200 
>> PGs limit for each OSD in cluster
>> 3. Constantly holding the cluster in the HEALTH_WARN mode is a bad idea
>> 4. We can set the parameter "mon pg warn max object skew", but we do not 
>> know how the Ceph will work when there is one pool with a huge object / pool 
>> ratio
>> 
>> There is no obvious solution.
>> 
>> How to solve this problem correctly?
> 
> As a workaround, I'd just increase the skew option to make the warning go 
> away.
> 
> It seems to me like the underlying problem is that we're looking at object 
> count vs pg count, but ignoring the object sizes.  Unfortunately it's a 
> bit awkward to fix because we don't have a way to quantify the size of 
> omap objects via the stats (currently).  So for now, just adjust the skew 
> value enough to make the warning go away!
> 
> sage

This situation can somehow negatively affect the work of the cluster?

— 
Mike, runs!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS_DAMAGE: 1 MDSs report damaged metadata

2018-05-23 Thread Marc-Antoine Desrochers
Dear Ceph Experts,

 

I have recently deleted a very big directory on my cephfs and a few minutes
after my dashboard start yelling : 

Overall status: HEALTH_ERR

MDS_DAMAGE: 1 MDSs report damaged metadata

 

So I immediately log in my ceph admin node than do a ceph -s:

cluster:

id: 472dfc88-84dc-4284-a1cf-0810ea45ae19

health: HEALTH_ERR

1 MDSs report damaged metadata

 

  services:

mon: 3 daemons, quorum ceph-n1,ceph-n2,ceph-n3

mgr: ceph-admin(active), standbys: ceph-n1

mds: cephfs-2/2/2 up  {0=ceph-admin=up:active,1=ceph-n1=up:active}, 1
up:standby

osd: 17 osds: 17 up, 17 in

rgw: 1 daemon active

 

  data:

pools:   9 pools, 1584 pgs

objects: 1093 objects, 418 MB

usage:   2765 MB used, 6797 GB / 6799 GB avail

pgs: 1584 active+clean

 

  io:

client:   35757 B/s rd, 0 B/s wr, 34 op/s rd, 23 op/s wr

 

and after a few research I tried : #ceph tell mds.0 damage ls : 

"damage_type": "backtrace",

"id": 2744661796,

"ino": 1099512314364,

"path": "/M3/sogetel.net/t/te/testmda3/Maildir/dovecot.index.log.2"

 

And so I tried to do what I saw at
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg35682.html

But it did not work so now I don't know how to fix it.

 

Can you help me ?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Vasu Kulkarni
On Wed, May 23, 2018 at 10:03 AM, Alfredo Deza  wrote:
> On Wed, May 23, 2018 at 12:12 PM, Vasu Kulkarni  wrote:
>> Alfredo,
>>
>> Do we have the migration docs link from ceph-disk deployment to
>> ceph-volume? the current docs as i see lacks scenario migration, maybe
>> there is another link ?
>> http://docs.ceph.com/docs/master/ceph-volume/simple/#ceph-volume-simple
>>
>> If it doesn't exist can we document, how a) ceph-disk with filestore
>> (with/without) journal can migrate to ceph-volume  and b)
>> ceph-disk/bluestore with wal/db on same/different partitions.
>
> There is no "scenario" because ceph-volume scans the existing OSD and
> whatever that gives us we work with it:
>
> * filestore with collocated/separate journals
> * bluestore in any kind of deployment (with wal, with db, with db and
> wal, with main only)
>
> multiply that with *both* ceph-disk's way of encrypting.
>
> In short: we support them all. No special command or flag needed.

Cool that sounds great. Thanks

>
>
>>
>> Regards
>> Vasu
>>
>>
>> On Wed, May 23, 2018 at 8:12 AM, Alfredo Deza  wrote:
>>> Now that Mimic is fully branched out from master, ceph-disk is going
>>> to be removed from master so that it is no longer available for the N
>>> release (pull request to follow)
>>>
>>> ceph-disk should be considered as "frozen" and deprecated for Mimic,
>>> in favor of ceph-volume.
>>>
>>> This means that if you are relying on ceph-disk *at all*, you should
>>> plan on migrating to ceph-volume for Mimic, and should expect breakage
>>> if using/testing it in master.
>>>
>>> Please refer to the guide to migrate away from ceph-disk [0]
>>>
>>> Willem, we don't have a way of directly supporting FreeBSD, I've
>>> suggested that a plugin would be a good way to consume ceph-volume
>>> with whatever FreeBSD needs, alternatively forking ceph-disk could be
>>> another option?
>>>
>>>
>>> Thanks
>>>
>>>
>>> [0] http://docs.ceph.com/docs/master/ceph-volume/#migrating
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Alfredo Deza
On Wed, May 23, 2018 at 11:47 AM, Willem Jan Withagen  wrote:
> On 23-5-2018 17:12, Alfredo Deza wrote:
>> Now that Mimic is fully branched out from master, ceph-disk is going
>> to be removed from master so that it is no longer available for the N
>> release (pull request to follow)
>
>> Willem, we don't have a way of directly supporting FreeBSD, I've
>> suggested that a plugin would be a good way to consume ceph-volume
>> with whatever FreeBSD needs, alternatively forking ceph-disk could be
>> another option?
>
> Yup, I'm aware of my "trouble"/commitment.
>
> Now that you have riped out most/all of the partitioning stuff there
> should not much that one would need to do in ceph-volume other than
> accept the filestore directories to format the MON/OSD stuff in.

I worry about the way we poke at devices for setups (blkid, lsblk,
/proc/mounts, etc...)

The creation of the OSD (aside from devices) is straightforward

>
> IFF I could find the time to dive into ceph-volume. :(
> ATM I'm having a hard time keeping up with the changes as it is.
>
> I'd appreciate if you could delay yanking ceph-disk until we are close
> to the nautilus release. At which point feel free to use the axe.

We can't delay this for ~8 months because it will obfuscate what
breakage we will find on our end by ripping it up (teuthology suites,
etc...)

I've already started working on it, and we should be looking at 2 to 3
weeks from today.

>
> --WjW
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Alfredo Deza
On Wed, May 23, 2018 at 12:12 PM, Vasu Kulkarni  wrote:
> Alfredo,
>
> Do we have the migration docs link from ceph-disk deployment to
> ceph-volume? the current docs as i see lacks scenario migration, maybe
> there is another link ?
> http://docs.ceph.com/docs/master/ceph-volume/simple/#ceph-volume-simple
>
> If it doesn't exist can we document, how a) ceph-disk with filestore
> (with/without) journal can migrate to ceph-volume  and b)
> ceph-disk/bluestore with wal/db on same/different partitions.

There is no "scenario" because ceph-volume scans the existing OSD and
whatever that gives us we work with it:

* filestore with collocated/separate journals
* bluestore in any kind of deployment (with wal, with db, with db and
wal, with main only)

multiply that with *both* ceph-disk's way of encrypting.

In short: we support them all. No special command or flag needed.


>
> Regards
> Vasu
>
>
> On Wed, May 23, 2018 at 8:12 AM, Alfredo Deza  wrote:
>> Now that Mimic is fully branched out from master, ceph-disk is going
>> to be removed from master so that it is no longer available for the N
>> release (pull request to follow)
>>
>> ceph-disk should be considered as "frozen" and deprecated for Mimic,
>> in favor of ceph-volume.
>>
>> This means that if you are relying on ceph-disk *at all*, you should
>> plan on migrating to ceph-volume for Mimic, and should expect breakage
>> if using/testing it in master.
>>
>> Please refer to the guide to migrate away from ceph-disk [0]
>>
>> Willem, we don't have a way of directly supporting FreeBSD, I've
>> suggested that a plugin would be a good way to consume ceph-volume
>> with whatever FreeBSD needs, alternatively forking ceph-disk could be
>> another option?
>>
>>
>> Thanks
>>
>>
>> [0] http://docs.ceph.com/docs/master/ceph-volume/#migrating
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Vasu Kulkarni
Alfredo,

Do we have the migration docs link from ceph-disk deployment to
ceph-volume? the current docs as i see lacks scenario migration, maybe
there is another link ?
http://docs.ceph.com/docs/master/ceph-volume/simple/#ceph-volume-simple

If it doesn't exist can we document, how a) ceph-disk with filestore
(with/without) journal can migrate to ceph-volume  and b)
ceph-disk/bluestore with wal/db on same/different partitions.

Regards
Vasu


On Wed, May 23, 2018 at 8:12 AM, Alfredo Deza  wrote:
> Now that Mimic is fully branched out from master, ceph-disk is going
> to be removed from master so that it is no longer available for the N
> release (pull request to follow)
>
> ceph-disk should be considered as "frozen" and deprecated for Mimic,
> in favor of ceph-volume.
>
> This means that if you are relying on ceph-disk *at all*, you should
> plan on migrating to ceph-volume for Mimic, and should expect breakage
> if using/testing it in master.
>
> Please refer to the guide to migrate away from ceph-disk [0]
>
> Willem, we don't have a way of directly supporting FreeBSD, I've
> suggested that a plugin would be a good way to consume ceph-volume
> with whatever FreeBSD needs, alternatively forking ceph-disk could be
> another option?
>
>
> Thanks
>
>
> [0] http://docs.ceph.com/docs/master/ceph-volume/#migrating
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-05-23 Thread Sean Sullivan
Thanks Yan! I did this for the bug ticket and missed these replies. I hope
I did it correctly. Here are the pastes of the dumps:

https://pastebin.com/kw4bZVZT -- primary
https://pastebin.com/sYZQx0ER -- secondary


they are not that long here is the output of one:


   1. Thread 17 "mds_rank_progr" received signal SIGSEGV, Segmentation fault
   .
   2. [Switching to Thread 0x7fe3b100a700 (LWP 120481)]
   3. 0x5617aacc48c2 in Server::handle_client_getattr (this=this@entry=
   0x5617b5acbcd0, mdr=..., is_lookup=is_lookup@entry=true) at
   /build/ceph-12.2.5/src/mds/Server.cc:3065
   4. 3065/build/ceph-12.2.5/src/mds/Server.cc: No such file or
   directory.
   5. (gdb) t
   6. [Current thread is 17 (Thread 0x7fe3b100a700 (LWP 120481))]
   7. (gdb) bt
   8. #0  0x5617aacc48c2 in Server::handle_client_getattr (
   this=this@entry=0x5617b5acbcd0, mdr=..., is_lookup=is_lookup@entry=true)
   at /build/ceph-12.2.5/src/mds/Server.cc:3065
   9. #1  0x5617aacfc98b in Server::dispatch_client_request (
   this=this@entry=0x5617b5acbcd0, mdr=...) at
   /build/ceph-12.2.5/src/mds/Server.cc:1802
   10. #2  0x5617aacfce9b in Server::handle_client_request (
   this=this@entry=0x5617b5acbcd0, req=req@entry=0x5617bdfa8700)at
   /build/ceph-12.2.5/src/mds/Server.cc:1716
   11. #3  0x5617aad017b6 in Server::dispatch (this=0x5617b5acbcd0,
   m=m@entry=0x5617bdfa8700) at /build/ceph-12.2.5/src/mds/Server.cc:258
   12. #4  0x5617aac6afac in MDSRank::handle_deferrable_message (
   this=this@entry=0x5617b5d22000, m=m@entry=0x5617bdfa8700)at
   /build/ceph-12.2.5/src/mds/MDSRank.cc:716
   13. #5  0x5617aac795cb in MDSRank::_dispatch (this=this@entry=
   0x5617b5d22000, m=0x5617bdfa8700, new_msg=new_msg@entry=false) at
   /build/ceph-12.2.5/src/mds/MDSRank.cc:551
   14. #6  0x5617aac7a472 in MDSRank::retry_dispatch (this=
   0x5617b5d22000, m=) at
   /build/ceph-12.2.5/src/mds/MDSRank.cc:998
   15. #7  0x5617aaf0207b in Context::complete (r=0, this=0x5617bd568080
   ) at /build/ceph-12.2.5/src/include/Context.h:70
   16. #8  MDSInternalContextBase::complete (this=0x5617bd568080, r=0) at
   /build/ceph-12.2.5/src/mds/MDSContext.cc:30
   17. #9  0x5617aac78bf7 in MDSRank::_advance_queues (this=
   0x5617b5d22000) at /build/ceph-12.2.5/src/mds/MDSRank.cc:776
   18. #10 0x5617aac7921a in MDSRank::ProgressThread::entry (this=
   0x5617b5d22d40) at /build/ceph-12.2.5/src/mds/MDSRank.cc:502
   19. #11 0x7fe3bb3066ba in start_thread (arg=0x7fe3b100a700) at
   pthread_create.c:333
   20. #12 0x7fe3ba37241d in clone () at
   ../sysdeps/unix/sysv/linux/x86_64/clone.S:109



I
* set the debug level to mds=20 mon=1,
*  attached gdb prior to trying to mount aufs from a separate client,
*  typed continue, attempted the mount,
* then backtraced after it seg faulted.

I hope this is more helpful. Is there something else I should try to get
more info? I was hoping for something closer to a python trace where it
says a variable is a different type or a missing delimiter. womp. I am
definitely out of my depth but now is a great time to learn! Can anyone
shed some more light as to what may be wrong?



On Fri, May 4, 2018 at 7:49 PM, Yan, Zheng  wrote:

> On Wed, May 2, 2018 at 7:19 AM, Sean Sullivan  wrote:
> > Forgot to reply to all:
> >
> > Sure thing!
> >
> > I couldn't install the ceph-mds-dbg packages without upgrading. I just
> > finished upgrading the cluster to 12.2.5. The issue still persists in
> 12.2.5
> >
> > From here I'm not really sure how to do generate the backtrace so I hope
> I
> > did it right. For others on Ubuntu this is what I did:
> >
> > * firstly up the debug_mds to 20 and debug_ms to 1:
> > ceph tell mds.* injectargs '--debug-mds 20 --debug-ms 1'
> >
> > * install the debug packages
> > ceph-mds-dbg in my case
> >
> > * I also added these options to /etc/ceph/ceph.conf just in case they
> > restart.
> >
> > * Now allow pids to dump (stolen partly from redhat docs and partly from
> > ubuntu)
> > echo -e 'DefaultLimitCORE=infinity\nPrivateTmp=true' | tee -a
> > /etc/systemd/system.conf
> > sysctl fs.suid_dumpable=2
> > sysctl kernel.core_pattern=/tmp/core
> > systemctl daemon-reload
> > systemctl restart ceph-mds@$(hostname -s)
> >
> > * A crash was created in /var/crash by apport but gdb cant read it. I
> used
> > apport-unpack and then ran GDB on what is inside:
> >
>
> core dump should be in /tmp/core
>
> > apport-unpack /var/crash/$(ls /var/crash/*mds*) /root/crash_dump/
> > cd /root/crash_dump/
> > gdb $(cat ExecutablePath) CoreDump -ex 'thr a a bt' | tee
> > /root/ceph_mds_$(hostname -s)_backtrace
> >
> > * This left me with the attached backtraces (which I think are wrong as I
> > see a lot of ?? yet gdb says
> > /usr/lib/debug/.build-id/1d/23dc5ef4fec1dacebba2c6445f05c8fe6b8a7c.debug
> was
> > loaded)
> >
> >  kh10-8 mds backtrace -- https://pastebin.com/bwqZGcfD
> >  kh09-8 mds backtrace -- https://pastebin.com/vvGiXYVY
> >
>

Re: [ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Willem Jan Withagen
On 23-5-2018 17:12, Alfredo Deza wrote:
> Now that Mimic is fully branched out from master, ceph-disk is going
> to be removed from master so that it is no longer available for the N
> release (pull request to follow)

> Willem, we don't have a way of directly supporting FreeBSD, I've
> suggested that a plugin would be a good way to consume ceph-volume
> with whatever FreeBSD needs, alternatively forking ceph-disk could be
> another option?

Yup, I'm aware of my "trouble"/commitment.

Now that you have riped out most/all of the partitioning stuff there
should not much that one would need to do in ceph-volume other than
accept the filestore directories to format the MON/OSD stuff in.

IFF I could find the time to dive into ceph-volume. :(
ATM I'm having a hard time keeping up with the changes as it is.

I'd appreciate if you could delay yanking ceph-disk until we are close
to the nautilus release. At which point feel free to use the axe.

--WjW

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Alfredo Deza
Now that Mimic is fully branched out from master, ceph-disk is going
to be removed from master so that it is no longer available for the N
release (pull request to follow)

ceph-disk should be considered as "frozen" and deprecated for Mimic,
in favor of ceph-volume.

This means that if you are relying on ceph-disk *at all*, you should
plan on migrating to ceph-volume for Mimic, and should expect breakage
if using/testing it in master.

Please refer to the guide to migrate away from ceph-disk [0]

Willem, we don't have a way of directly supporting FreeBSD, I've
suggested that a plugin would be a good way to consume ceph-volume
with whatever FreeBSD needs, alternatively forking ceph-disk could be
another option?


Thanks


[0] http://docs.ceph.com/docs/master/ceph-volume/#migrating
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] open vstorage

2018-05-23 Thread Brady Deetz
http://www.openvstorage.com
https://www.openvstorage.org

I came across this the other day and am curious if anybody has run it in
front of their Ceph cluster. I'm looking at it for a clean-ish Ceph
integration with VMWare.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD-primary crush rule doesn't work as intended

2018-05-23 Thread Paul Emmerich
You can't mix HDDs and SSDs in a server if you want to use such a rule.
The new selection step after "emit" can't know what server was selected
previously.

Paul

2018-05-23 11:02 GMT+02:00 Horace :

> Add to the info, I have a slightly modified rule to take advantage of the
> new storage class.
>
> rule ssd-hybrid {
> id 2
> type replicated
> min_size 1
> max_size 10
> step take default class ssd
> step chooseleaf firstn 1 type host
> step emit
> step take default class hdd
> step chooseleaf firstn -1 type host
> step emit
> }
>
> Regards,
> Horace Ng
>
> - Original Message -
> From: "horace" 
> To: "ceph-users" 
> Sent: Wednesday, May 23, 2018 3:56:20 PM
> Subject: [ceph-users] SSD-primary crush rule doesn't work as intended
>
> I've set up the rule according to the doc, but some of the PGs are still
> being assigned to the same host.
>
> http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/
>
>   rule ssd-primary {
>   ruleset 5
>   type replicated
>   min_size 5
>   max_size 10
>   step take ssd
>   step chooseleaf firstn 1 type host
>   step emit
>   step take platter
>   step chooseleaf firstn -1 type host
>   step emit
>   }
>
> Crush tree:
>
> [root@ceph0 ~]#ceph osd crush tree
> ID CLASS WEIGHT   TYPE NAME
> -1   58.63989 root default
> -2   19.55095 host ceph0
>  0   hdd  2.73000 osd.0
>  1   hdd  2.73000 osd.1
>  2   hdd  2.73000 osd.2
>  3   hdd  2.73000 osd.3
> 12   hdd  4.54999 osd.12
> 15   hdd  3.71999 osd.15
> 18   ssd  0.2 osd.18
> 19   ssd  0.16100 osd.19
> -3   19.55095 host ceph1
>  4   hdd  2.73000 osd.4
>  5   hdd  2.73000 osd.5
>  6   hdd  2.73000 osd.6
>  7   hdd  2.73000 osd.7
> 13   hdd  4.54999 osd.13
> 16   hdd  3.71999 osd.16
> 20   ssd  0.16100 osd.20
> 21   ssd  0.2 osd.21
> -4   19.53799 host ceph2
>  8   hdd  2.73000 osd.8
>  9   hdd  2.73000 osd.9
> 10   hdd  2.73000 osd.10
> 11   hdd  2.73000 osd.11
> 14   hdd  3.71999 osd.14
> 17   hdd  4.54999 osd.17
> 22   ssd  0.18700 osd.22
> 23   ssd  0.16100 osd.23
>
> #ceph pg ls-by-pool ssd-hybrid
>
> 27.8   1051  00 0   0 4399733760
> 1581 1581   active+clean 2018-05-23 06:20:56.088216
> 27957'185553 27959:368828  [23,1,11] 23  [23,1,11] 23
> 27953'182582 2018-05-23 06:20:56.08817227843'162478 2018-05-20
> 18:28:20.118632
>
> With osd.23 and osd.11 being assigned on the same host.
>
> Regards,
> Horace Ng
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] HDFS with CEPH, only single RGW works with the hdfs

2018-05-23 Thread 한승진
Hello Cephers,

Our Team currently is trying to replace hdfs to CEPH object storage.

However, there is a big problem which is "*hdfs dfs -put*" operation is
very slow.

I doubt session of RGW with hadoop system.

Because, only one RGW node works with hadoop, even through we have 4 RGWs.

There seems not have configurations about multi session of hdfs.

Have you experienced similar issues and how could you overcome the issue.

I would appreciate if anybody give me advice.

Best Regards,

John Haan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Several questions on the radosgw-openstack integration

2018-05-23 Thread Massimo Sgaravatto
For #2, I think I found myself the answer. The admin can simply generate
the S3 keys for the user, e.g.:

radosgw-admin key create --key-type=s3 --gen-access-key --gen-secret
--uid="a22db12575694c9e9f8650dde73ef565\$a22db12575694c9e9f8650dde73ef565"
--rgw-realm=cloudtest

and then the user can access her data also using S3. besides swift

Cheers, Massimo

On Wed, May 23, 2018 at 12:49 PM, Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> For #1 I guess this is a known issue (http://tracker.ceph.com/issues/20570
> )
>
> On Tue, May 22, 2018 at 1:03 PM, Massimo Sgaravatto <
> massimo.sgarava...@gmail.com> wrote:
>
>> I have several questions on the radosgw - OpenStack integration.
>>
>> I was more or less able to set it (using a Luminous ceph cluster
>> and an Ocata OpenStack cloud), but I don't know if it working as expected.
>>
>>
>> So, the questions:
>>
>>
>> 1.
>> I miss the meaning of the attribute "rgw keystone implicit tenants"
>> If I set "rgw keystone implicit tenants = false", accounts are created
>> using id:
>>
>>  and the display name is the name of the OpenStack
>> project
>>
>>
>> If I set "rgw keystone implicit tenants = true", accounts are created
>> using id:
>>
>> $<
>>
>> and, again, the display name is the name of the OpenStack project
>>
>>
>> So one account per openstack project in both cases
>> I would have expected two radosgw accounts for 2 openstack users
>> belonging to the same project, setting "rgw keystone implicit tenants =
>> true"
>>
>>
>> 2
>> Are OpenStack users supposed to access to their data only using swift, or
>> also via S3 ?
>> In the latter case, how can the user find her S3 credentials ?
>> I am not able to find the S3 keys for such OpenStack users also using
>> radosgw-admin
>>
>> # radosgw-admin user info --uid="a22db12575694c9e9f8650d
>> de73ef565\$a22db12575694c9e9f8650dde73ef565" --rgw-realm=cloudtest
>> ...
>> ...
>>  "keys": [],
>> ...
>> ...
>>
>>
>> 3
>> How is the admin supposed to set default quota for each project/user ?
>> How can then the admin modify the quota for a user ?
>> How can the user see the assigned quota ?
>>
>> I tried relying on the "rgw user default quota max size" attribute to
>> set the default quota. It works for users created using "radosgw-admin
>> user create" while
>> I am not able to see it working for OpenStack users (see also the thread
>> "rgw default user quota for OpenStack users")
>>
>> If I explicitly set the quota for a OpenStack user using:
>>
>> radosgw-admin quota set --quota-scope=user --max-size=2G
>> --uid="a22db12575694c9e9f8650dde73ef565\$a22db12575694c9e9f8650dde73ef565"
>> --rgw-realm=cloudtest
>> radosgw-admin quota enable --quota-scope=user
>> --uid="a22db12575694c9e9f8650dde73ef565\$a22db12575694c9e9f8650dde73ef565"
>> --rgw-realm=cloudtest
>>
>>
>> this works (i.e. quota is enforced) but such quota is not exposed to the
>> user (at least it is not reported anywhere in the OpenStack dashboard nor
>> in the "swift stat" output)
>>
>>
>> 4
>> I tried creating (using the OpenStack dashboard) containers with public
>> access.
>> It looks like this works only if "rgw keystone implicit tenants" is set
>> to false
>> Is this expected ?
>>
>>
>> Many thanks, Massimo
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-23 Thread nokia ceph
yes it is 68 disks , and will this  mon_osd_reporter_subtree_level = host
have any impact on  mon_osd_ min_down_reporters ?

And related to min_size , yes there was many suggestions for us to move to
2 , due to storage efficiency concerns we still retain with 1 and trying to
convince customers to go with 2 for better data integrity.

thanks,
Muthu

On Wed, May 23, 2018 at 3:31 PM, David Turner  wrote:

> How many disks in each node? 68? If yes, then change it to 69. Also
> running with ec 4+1 is bad for the same reason as running with size=2
> min_size=1 which has been mentioned and discussed multiple times on the ML.
>
>
> On Wed, May 23, 2018, 3:39 AM nokia ceph  wrote:
>
>> Hi David Turner,
>>
>> This is our ceph config under mon section , we have EC 4+1 and set the
>> failure domain as host and osd_min_down_reporters to 4 ( osds from 4
>> different host ) .
>>
>> [mon]
>> mon_compact_on_start = True
>> mon_osd_down_out_interval = 86400
>> mon_osd_down_out_subtree_limit = host
>> mon_osd_min_down_reporters = 4
>> mon_osd_reporter_subtree_level = host
>>
>> We have 68 disks , can we increase  sd_min_down_reporters  to 68 ?
>>
>> Thanks,
>> Muthu
>>
>> On Tue, May 22, 2018 at 5:46 PM, David Turner 
>> wrote:
>>
>>> What happens when a storage node loses its cluster network but not it's
>>> public network is that all other osss on the cluster see that it's down and
>>> report that to the mons, but the node call still talk to the mons telling
>>> the mons that it is up and in fact everything else is down.
>>>
>>> The setting osd _min_reporters (I think that's the name of it off the
>>> top of my head) is designed to help with this scenario. It's default is 1
>>> which means any osd on either side of the network problem will be trusted
>>> by the mons to mark osds down. What you want to do with this seeing is to
>>> set it to at least 1 more than the number of osds in your failure domain.
>>> If the failure domain is host and each node has 32 osds, then setting it to
>>> 33 will prevent a full problematic node from being able to cause havoc.
>>>
>>> The osds will still try to mark themselves as up and this will still
>>> cause problems for read until the osd process stops or the network comes
>>> back up. There might be a seeing for how long an odd will try telling the
>>> mons it's up, but this isn't really a situation I've come across after
>>> initial testing and installation of nodes.
>>>
>>> On Tue, May 22, 2018, 1:47 AM nokia ceph 
>>> wrote:
>>>
 Hi Ceph users,

 We have a cluster with 5 node (67 disks) and EC 4+1 configuration and
 min_size set as 4.
 Ceph version : 12.2.5
 While executing one of our resilience usecase , making private
 interface down on one of the node, till kraken we saw less outage in rados
 (60s) .

 Now with luminous, we could able to see rados read/write outage for
 more than 200s . In the logs we could able to see that peer OSDs inform
 that one of the node OSDs are down however the OSDs  defend like it is
 wrongly marked down and does not move to down state for long time.

 2018-05-22 05:37:17.871049 7f6ac71e6700  0 log_channel(cluster) log
 [WRN] : Monitor daemon marked osd.1 down, but it is still running
 2018-05-22 05:37:17.871072 7f6ac71e6700  0 log_channel(cluster) log
 [DBG] : map e35690 wrongly marked me down at e35689
 2018-05-22 05:37:17.878347 7f6ac71e6700  0 osd.1 35690 crush map has
 features 1009107927421960192, adjusting msgr requires for osds
 2018-05-22 05:37:18.296643 7f6ac71e6700  0 osd.1 35691 crush map has
 features 1009107927421960192, adjusting msgr requires for osds


 Only when all 67 OSDs are move to down state , the read/write traffic
 is resumed.

 Could you please help us in resolving this issue and if it is bug , we
 will create corresponding ticket.

 Thanks,
 Muthu
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph_vms performance

2018-05-23 Thread Thomas Bennett
Hi,

I'm testing out ceph_vms vs a cephfs mount with a cifs export.

I currently have 3 active ceph mds servers to maximise throughput  and
when I have configured a cephfs mount with a cifs export,  I'm getting
a reasonable benchmark results.

However, when I tried some benchmarking with the ceph_vms module, I
only got a 3rd of the comparable write throughput.

I'm just wondering if this is expected, or if there is an obvious
configuration setup that I'm missing?

Configuration:

I've compiled git branch samba 4_8_test.

I'm using ceph 12.2.5

Kind regards,
Tom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Several questions on the radosgw-openstack integration

2018-05-23 Thread Massimo Sgaravatto
For #1 I guess this is a known issue (http://tracker.ceph.com/issues/20570)

On Tue, May 22, 2018 at 1:03 PM, Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> I have several questions on the radosgw - OpenStack integration.
>
> I was more or less able to set it (using a Luminous ceph cluster
> and an Ocata OpenStack cloud), but I don't know if it working as expected.
>
>
> So, the questions:
>
>
> 1.
> I miss the meaning of the attribute "rgw keystone implicit tenants"
> If I set "rgw keystone implicit tenants = false", accounts are created
> using id:
>
>  and the display name is the name of the OpenStack
> project
>
>
> If I set "rgw keystone implicit tenants = true", accounts are created
> using id:
>
> $<
>
> and, again, the display name is the name of the OpenStack project
>
>
> So one account per openstack project in both cases
> I would have expected two radosgw accounts for 2 openstack users belonging
> to the same project, setting "rgw keystone implicit tenants = true"
>
>
> 2
> Are OpenStack users supposed to access to their data only using swift, or
> also via S3 ?
> In the latter case, how can the user find her S3 credentials ?
> I am not able to find the S3 keys for such OpenStack users also using
> radosgw-admin
>
> # radosgw-admin user info --uid="a22db12575694c9e9f8650dde73ef565\$
> a22db12575694c9e9f8650dde73ef565" --rgw-realm=cloudtest
> ...
> ...
>  "keys": [],
> ...
> ...
>
>
> 3
> How is the admin supposed to set default quota for each project/user ?
> How can then the admin modify the quota for a user ?
> How can the user see the assigned quota ?
>
> I tried relying on the "rgw user default quota max size" attribute to
> set the default quota. It works for users created using "radosgw-admin
> user create" while
> I am not able to see it working for OpenStack users (see also the thread
> "rgw default user quota for OpenStack users")
>
> If I explicitly set the quota for a OpenStack user using:
>
> radosgw-admin quota set --quota-scope=user --max-size=2G --uid="
> a22db12575694c9e9f8650dde73ef565\$a22db12575694c9e9f8650dde73ef565"
> --rgw-realm=cloudtest
> radosgw-admin quota enable --quota-scope=user --uid="
> a22db12575694c9e9f8650dde73ef565\$a22db12575694c9e9f8650dde73ef565"
> --rgw-realm=cloudtest
>
>
> this works (i.e. quota is enforced) but such quota is not exposed to the
> user (at least it is not reported anywhere in the OpenStack dashboard nor
> in the "swift stat" output)
>
>
> 4
> I tried creating (using the OpenStack dashboard) containers with public
> access.
> It looks like this works only if "rgw keystone implicit tenants" is set to
> false
> Is this expected ?
>
>
> Many thanks, Massimo
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-23 Thread David Turner
How many disks in each node? 68? If yes, then change it to 69. Also running
with ec 4+1 is bad for the same reason as running with size=2 min_size=1
which has been mentioned and discussed multiple times on the ML.

On Wed, May 23, 2018, 3:39 AM nokia ceph  wrote:

> Hi David Turner,
>
> This is our ceph config under mon section , we have EC 4+1 and set the
> failure domain as host and osd_min_down_reporters to 4 ( osds from 4
> different host ) .
>
> [mon]
> mon_compact_on_start = True
> mon_osd_down_out_interval = 86400
> mon_osd_down_out_subtree_limit = host
> mon_osd_min_down_reporters = 4
> mon_osd_reporter_subtree_level = host
>
> We have 68 disks , can we increase  sd_min_down_reporters  to 68 ?
>
> Thanks,
> Muthu
>
> On Tue, May 22, 2018 at 5:46 PM, David Turner 
> wrote:
>
>> What happens when a storage node loses its cluster network but not it's
>> public network is that all other osss on the cluster see that it's down and
>> report that to the mons, but the node call still talk to the mons telling
>> the mons that it is up and in fact everything else is down.
>>
>> The setting osd _min_reporters (I think that's the name of it off the top
>> of my head) is designed to help with this scenario. It's default is 1 which
>> means any osd on either side of the network problem will be trusted by the
>> mons to mark osds down. What you want to do with this seeing is to set it
>> to at least 1 more than the number of osds in your failure domain. If the
>> failure domain is host and each node has 32 osds, then setting it to 33
>> will prevent a full problematic node from being able to cause havoc.
>>
>> The osds will still try to mark themselves as up and this will still
>> cause problems for read until the osd process stops or the network comes
>> back up. There might be a seeing for how long an odd will try telling the
>> mons it's up, but this isn't really a situation I've come across after
>> initial testing and installation of nodes.
>>
>> On Tue, May 22, 2018, 1:47 AM nokia ceph 
>> wrote:
>>
>>> Hi Ceph users,
>>>
>>> We have a cluster with 5 node (67 disks) and EC 4+1 configuration and
>>> min_size set as 4.
>>> Ceph version : 12.2.5
>>> While executing one of our resilience usecase , making private interface
>>> down on one of the node, till kraken we saw less outage in rados (60s) .
>>>
>>> Now with luminous, we could able to see rados read/write outage for more
>>> than 200s . In the logs we could able to see that peer OSDs inform that one
>>> of the node OSDs are down however the OSDs  defend like it is wrongly
>>> marked down and does not move to down state for long time.
>>>
>>> 2018-05-22 05:37:17.871049 7f6ac71e6700  0 log_channel(cluster) log
>>> [WRN] : Monitor daemon marked osd.1 down, but it is still running
>>> 2018-05-22 05:37:17.871072 7f6ac71e6700  0 log_channel(cluster) log
>>> [DBG] : map e35690 wrongly marked me down at e35689
>>> 2018-05-22 05:37:17.878347 7f6ac71e6700  0 osd.1 35690 crush map has
>>> features 1009107927421960192, adjusting msgr requires for osds
>>> 2018-05-22 05:37:18.296643 7f6ac71e6700  0 osd.1 35691 crush map has
>>> features 1009107927421960192, adjusting msgr requires for osds
>>>
>>>
>>> Only when all 67 OSDs are move to down state , the read/write traffic is
>>> resumed.
>>>
>>> Could you please help us in resolving this issue and if it is bug , we
>>> will create corresponding ticket.
>>>
>>> Thanks,
>>> Muthu
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] IO500 Call for Submissions for ISC 2018

2018-05-23 Thread John Bent
IO500 Call for Submission
Deadline: 23 June 2018 AoE

The IO500 is now accepting and encouraging submissions for the upcoming
IO500 list revealed at ISC 2018 in Frankfurt, Germany. The benchmark suite
is designed to be easy to run and the community has multiple active support
channels to help with any questions. Please submit and we look forward to
seeing many of you at ISC 2018! Please note that submissions of all size
are welcome; the site has customizable sorting so it is possible to submit
on a small system and still get a very good per-client score for example.
Additionally, the list is about much more than just the raw rank; all
submissions help the community by collecting and publishing a wider corpus
of data. More details below.

Following the success of the Top500 in collecting and analyzing historical
trends in supercomputer technology and evolution, the IO500 was created in
2017 and published its first list at SC17. The need for such an initiative
has long been known within High Performance Computing; however, defining
appropriate benchmarks had long been challenging. Despite this challenge,
the community, after long and spirited discussion, finally reached
consensus on a suite of benchmarks and a metric for resolving the scores
into a single ranking.

The multi-fold goals of the benchmark suite are as follows:

* Maximizing simplicity in running the benchmark suite
* Encouraging complexity in tuning for performance
* Allowing submitters to highlight their “hero run” performance numbers
* Forcing submitters to simultaneously report performance for challenging
IO patterns.

Specifically, the benchmark suite includes a hero-run of both IOR and
mdtest configured however possible to maximize performance and establish an
upper-bound for performance. It also includes an IOR and mdtest run with
highly prescribed parameters in an attempt to determine a lower-bound.
Finally, it includes a namespace search as this has been determined to be a
highly sought-after feature in HPC storage systems that has historically
not been well-measured. Submitters are encouraged to share their tuning
insights for publication.

The goals of the community are also multi-fold:

* Gather historical data for the sake of analysis and to aid predictions of
storage futures
* Collect tuning information to share valuable performance optimizations
across the community
* Encourage vendors and designers to optimize for workloads beyond “hero
runs”
* Establish bounded expectations for users, procurers, and administrators

Once again, we encourage you to submit (see http://io500.org/submission),
to join our community, and to attend our BoF “The IO-500 and the Virtual
Institute of I/O” at ISC 2018 where we will announce the second ever IO500
list. The current list includes results from BeeGPFS, DataWarp, IME,
Lustre, and Spectrum Scale. We hope that the next list has even more!

We look forward to answering any questions or concerns you might have.

Thank you!

IO500 Committee
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group

2018-05-23 Thread Siegfried Höllrigl

Hi !

We have now deleted all snapshots of the pool in question.

With "ceph pg dump" we can see that pg 5.9b has a SNAPTRIMQ_LEN of 27826.

All other PGs have 0.

It looks like this value does not decrease. LAST_SCRUB and 
LAST_DEEP_SCRUB  are both from 2018-04-24. Almost 1 month ago.



OSD still crashing a while after we start it. OSD Log :

*** Caught signal (Aborted) **

and

/build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != 
recovery_info.ss.clone_snaps.end())



Any Ideas howto fix this ? Is there a way to "force" the snaptrim of the 
pg in question ? Or anyother way to "clean" this pg ?


We have searched a lot in the mail archives but couldnt find anything 
that could help us in that case.



Br,



Am 17.05.2018 um 00:12 schrieb Gregory Farnum:
On Wed, May 16, 2018 at 6:49 AM Siegfried Höllrigl 
> wrote:


Hi Greg !

Thank you for your fast reply.

We have now deleted the PG on OSD.130 like you suggested and
started it :

ceph-s-06 # ceph-objectstore-tool --data-path
/var/lib/ceph/osd/ceph-130/ --pgid 5.9b --op remove --force
  marking collection for removal
setting '_remove' omap key
finish_remove_pgs 5.9b_head removing 5.9b
Remove successful
ceph-s-06 # systemctl start ceph-osd@130.service

The cluster recovered again until it came to the PG 5.9b. Then
OSD.130
crashed again. -> No Change

So we wanted to start the other way and export the PG from the
primary
(healthy) OSD. (OSD.19) but that fails:

root@ceph-s-03:/tmp5.9b# ceph-objectstore-tool --op export --pgid
5.9b
--data-path /var/lib/ceph/osd/ceph-19 --file /tmp5.9b/5.9b.export
OSD has the store locked

But we don't want to stop OSD.19 on this server because this Pool has
size=3 and size_min=2.
(this would make pg5.9b inaccessable)


I'm a bit confused. Are you saying that
1) the ceph-objectstore-tool you pasted there successfully removed pg 
5.9b from osd.130 (as it appears), AND
2) pg 5.9b was active with one of the other nodes as primary, so all 
data remained available, AND
3) when pg 5.9b got backfilled into osd.130, osd.130 crashed again? 
(But the other OSDs kept the PG fully available, without crashing?)


That sequence of events is *deeply* confusing and I really don't 
understand how it might happen.


Sadly I don't think you can grab a PG for export without stopping the 
OSD in question.



When we query the pg, we can see a lot of "snap_trimq".
Can this be cleaned somehow, even if the pg is undersized and
degraded ?


I *think* the PG will keep trimming snapshots even if 
undersized+degraded (though I don't remember for sure), but snapshot 
trimming is often heavily throttled and I'm not aware of any way to 
specifically push one PG to the front. If you're interested in 
speeding snaptrimming up you can search the archives or check the docs 
for the appropriate config options.

-Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD-primary crush rule doesn't work as intended

2018-05-23 Thread Horace
Add to the info, I have a slightly modified rule to take advantage of the new 
storage class.

rule ssd-hybrid {
id 2
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 1 type host
step emit
step take default class hdd
step chooseleaf firstn -1 type host
step emit
}

Regards,
Horace Ng

- Original Message -
From: "horace" 
To: "ceph-users" 
Sent: Wednesday, May 23, 2018 3:56:20 PM
Subject: [ceph-users] SSD-primary crush rule doesn't work as intended

I've set up the rule according to the doc, but some of the PGs are still being 
assigned to the same host.

http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/

  rule ssd-primary {
  ruleset 5
  type replicated
  min_size 5
  max_size 10
  step take ssd
  step chooseleaf firstn 1 type host
  step emit
  step take platter
  step chooseleaf firstn -1 type host
  step emit
  }

Crush tree:

[root@ceph0 ~]#ceph osd crush tree
ID CLASS WEIGHT   TYPE NAME  
-1   58.63989 root default   
-2   19.55095 host ceph0 
 0   hdd  2.73000 osd.0  
 1   hdd  2.73000 osd.1  
 2   hdd  2.73000 osd.2  
 3   hdd  2.73000 osd.3  
12   hdd  4.54999 osd.12 
15   hdd  3.71999 osd.15 
18   ssd  0.2 osd.18 
19   ssd  0.16100 osd.19 
-3   19.55095 host ceph1 
 4   hdd  2.73000 osd.4  
 5   hdd  2.73000 osd.5  
 6   hdd  2.73000 osd.6  
 7   hdd  2.73000 osd.7  
13   hdd  4.54999 osd.13 
16   hdd  3.71999 osd.16 
20   ssd  0.16100 osd.20 
21   ssd  0.2 osd.21 
-4   19.53799 host ceph2 
 8   hdd  2.73000 osd.8  
 9   hdd  2.73000 osd.9  
10   hdd  2.73000 osd.10 
11   hdd  2.73000 osd.11 
14   hdd  3.71999 osd.14 
17   hdd  4.54999 osd.17 
22   ssd  0.18700 osd.22 
23   ssd  0.16100 osd.23 

#ceph pg ls-by-pool ssd-hybrid

27.8   1051  00 0   0 4399733760 1581   
  1581   active+clean 2018-05-23 06:20:56.088216 27957'185553 
27959:368828  [23,1,11] 23  [23,1,11] 23 27953'182582 
2018-05-23 06:20:56.08817227843'162478 2018-05-20 18:28:20.118632 

With osd.23 and osd.11 being assigned on the same host.

Regards,
Horace Ng
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD-primary crush rule doesn't work as intended

2018-05-23 Thread Horace
I've set up the rule according to the doc, but some of the PGs are still being 
assigned to the same host.

http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/

  rule ssd-primary {
  ruleset 5
  type replicated
  min_size 5
  max_size 10
  step take ssd
  step chooseleaf firstn 1 type host
  step emit
  step take platter
  step chooseleaf firstn -1 type host
  step emit
  }

Crush tree:

[root@ceph0 ~]#ceph osd crush tree
ID CLASS WEIGHT   TYPE NAME  
-1   58.63989 root default   
-2   19.55095 host ceph0 
 0   hdd  2.73000 osd.0  
 1   hdd  2.73000 osd.1  
 2   hdd  2.73000 osd.2  
 3   hdd  2.73000 osd.3  
12   hdd  4.54999 osd.12 
15   hdd  3.71999 osd.15 
18   ssd  0.2 osd.18 
19   ssd  0.16100 osd.19 
-3   19.55095 host ceph1 
 4   hdd  2.73000 osd.4  
 5   hdd  2.73000 osd.5  
 6   hdd  2.73000 osd.6  
 7   hdd  2.73000 osd.7  
13   hdd  4.54999 osd.13 
16   hdd  3.71999 osd.16 
20   ssd  0.16100 osd.20 
21   ssd  0.2 osd.21 
-4   19.53799 host ceph2 
 8   hdd  2.73000 osd.8  
 9   hdd  2.73000 osd.9  
10   hdd  2.73000 osd.10 
11   hdd  2.73000 osd.11 
14   hdd  3.71999 osd.14 
17   hdd  4.54999 osd.17 
22   ssd  0.18700 osd.22 
23   ssd  0.16100 osd.23 

#ceph pg ls-by-pool ssd-hybrid

27.8   1051  00 0   0 4399733760 1581   
  1581   active+clean 2018-05-23 06:20:56.088216 27957'185553 
27959:368828  [23,1,11] 23  [23,1,11] 23 27953'182582 
2018-05-23 06:20:56.08817227843'162478 2018-05-20 18:28:20.118632 

With osd.23 and osd.11 being assigned on the same host.

Regards,
Horace Ng
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-23 Thread nokia ceph
Hi David Turner,

This is our ceph config under mon section , we have EC 4+1 and set the
failure domain as host and osd_min_down_reporters to 4 ( osds from 4
different host ) .

[mon]
mon_compact_on_start = True
mon_osd_down_out_interval = 86400
mon_osd_down_out_subtree_limit = host
mon_osd_min_down_reporters = 4
mon_osd_reporter_subtree_level = host

We have 68 disks , can we increase  sd_min_down_reporters  to 68 ?

Thanks,
Muthu

On Tue, May 22, 2018 at 5:46 PM, David Turner  wrote:

> What happens when a storage node loses its cluster network but not it's
> public network is that all other osss on the cluster see that it's down and
> report that to the mons, but the node call still talk to the mons telling
> the mons that it is up and in fact everything else is down.
>
> The setting osd _min_reporters (I think that's the name of it off the top
> of my head) is designed to help with this scenario. It's default is 1 which
> means any osd on either side of the network problem will be trusted by the
> mons to mark osds down. What you want to do with this seeing is to set it
> to at least 1 more than the number of osds in your failure domain. If the
> failure domain is host and each node has 32 osds, then setting it to 33
> will prevent a full problematic node from being able to cause havoc.
>
> The osds will still try to mark themselves as up and this will still cause
> problems for read until the osd process stops or the network comes back up.
> There might be a seeing for how long an odd will try telling the mons it's
> up, but this isn't really a situation I've come across after initial
> testing and installation of nodes.
>
> On Tue, May 22, 2018, 1:47 AM nokia ceph  wrote:
>
>> Hi Ceph users,
>>
>> We have a cluster with 5 node (67 disks) and EC 4+1 configuration and
>> min_size set as 4.
>> Ceph version : 12.2.5
>> While executing one of our resilience usecase , making private interface
>> down on one of the node, till kraken we saw less outage in rados (60s) .
>>
>> Now with luminous, we could able to see rados read/write outage for more
>> than 200s . In the logs we could able to see that peer OSDs inform that one
>> of the node OSDs are down however the OSDs  defend like it is wrongly
>> marked down and does not move to down state for long time.
>>
>> 2018-05-22 05:37:17.871049 7f6ac71e6700  0 log_channel(cluster) log [WRN]
>> : Monitor daemon marked osd.1 down, but it is still running
>> 2018-05-22 05:37:17.871072 7f6ac71e6700  0 log_channel(cluster) log [DBG]
>> : map e35690 wrongly marked me down at e35689
>> 2018-05-22 05:37:17.878347 7f6ac71e6700  0 osd.1 35690 crush map has
>> features 1009107927421960192, adjusting msgr requires for osds
>> 2018-05-22 05:37:18.296643 7f6ac71e6700  0 osd.1 35691 crush map has
>> features 1009107927421960192, adjusting msgr requires for osds
>>
>>
>> Only when all 67 OSDs are move to down state , the read/write traffic is
>> resumed.
>>
>> Could you please help us in resolving this issue and if it is bug , we
>> will create corresponding ticket.
>>
>> Thanks,
>> Muthu
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [client.rgw.hostname] or [client.radosgw.hostname] ?

2018-05-23 Thread Massimo Sgaravatto
Ok, understood

Thanks a lot

Cheers, Massimo

On Tue, May 22, 2018 at 1:57 PM, David Turner  wrote:

> We use radosgw in our deployment. It doesn't really matter as you can
> specify the key in the config file.  You could call it
> client.thatobjectthing.hostname and it would work fine.
>
> On Tue, May 22, 2018, 5:54 AM Massimo Sgaravatto <
> massimo.sgarava...@gmail.com> wrote:
>
>> # ls /var/lib/ceph/radosgw/
>> ceph-rgw.ceph-test-rgw-01
>>
>>
>> So [client.rgw.ceph-test-rgw-01]
>>
>> Thanks, Massimo
>>
>>
>> On Tue, May 22, 2018 at 6:28 AM, Marc Roos 
>> wrote:
>>
>>>
>>> I can relate to your issue, I am always looking at
>>>
>>> /var/lib/ceph/
>>>
>>> See what is used there
>>>
>>>
>>> -Original Message-
>>> From: Massimo Sgaravatto [mailto:massimo.sgarava...@gmail.com]
>>> Sent: dinsdag 22 mei 2018 11:46
>>> To: Ceph Users
>>> Subject: [ceph-users] [client.rgw.hostname] or [client.radosgw.hostname]
>>> ?
>>>
>>> I am really confused about the use of [client.rgw.hostname] or
>>> [client.radosgw.hostname] in the configuration file. I don't understand
>>> if they have different purposes or if there is just a problem with
>>> documentation.
>>>
>>>
>>> E.g.:
>>>
>>> http://docs.ceph.com/docs/luminous/start/quick-rgw/
>>>
>>>
>>> says that [client.rgw.hostname] should be used
>>>
>>> while:
>>>
>>> http://docs.ceph.com/docs/luminous/radosgw/config-ref/
>>>
>>>
>>> talks about [client.radosgw.{instance-name}]
>>>
>>>
>>> In my luminous-centos7 cluster it looks like only [client.rgw.hostname]
>>> works
>>>
>>>
>>>
>>> Thanks, Massimo
>>>
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com