Re: [ceph-users] Ceph Day Germany :)

2018-02-11 Thread Kai Wagner

On 12.02.2018 00:33, c...@elchaka.de wrote:
> I absolutely agree, too. This was really great! Would be Fantastic if the 
> ceph days  will happen again in Darmstadt - or Düsseldorf ;)
>
> Btw. Will the Slides and perhaps Videos of the presentation be online 
> avaiable?

AFAIK Danny is working on that.

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd feature overheads

2018-02-11 Thread Blair Bethwaite
Hi all,

Wondering if anyone can clarify whether there are any significant overheads
from rbd features like object-map, fast-diff, etc. I'm interested in both
performance overheads from a latency and space perspective, e.g., can
object-map be sanely deployed on a 100TB volume or does the client try to
read the whole thing into memory...?

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph mons de-synced from rest of cluster?

2018-02-11 Thread Chris Apsey

All,

Recently doubled the number of OSDs in our cluster, and towards the end 
of the rebalancing, I noticed that recovery IO fell to nothing and that 
the ceph mons eventually looked like this when I ran ceph -s


  cluster:
id: 6a65c3d0-b84e-4c89-bbf7-a38a1966d780
health: HEALTH_WARN
34922/4329975 objects misplaced (0.807%)
Reduced data availability: 542 pgs inactive, 49 pgs 
peering, 13502 pgs stale
Degraded data redundancy: 248778/4329975 objects 
degraded (5.745%), 7319 pgs unclean, 2224 pgs degraded, 1817 pgs 
undersized


  services:
mon: 3 daemons, quorum cephmon-0,cephmon-1,cephmon-2
mgr: cephmon-0(active), standbys: cephmon-1, cephmon-2
osd: 376 osds: 376 up, 376 in

  data:
pools:   9 pools, 13952 pgs
objects: 1409k objects, 5992 GB
usage:   31528 GB used, 1673 TB / 1704 TB avail
pgs: 3.225% pgs unknown
 0.659% pgs not active
 248778/4329975 objects degraded (5.745%)
 34922/4329975 objects misplaced (0.807%)
 6141 stale+active+clean
 4537 stale+active+remapped+backfilling
 1575 stale+active+undersized+degraded
 489  stale+active+clean+remapped
 450  unknown
 396  stale+active+recovery_wait+degraded
 216  
stale+active+undersized+degraded+remapped+backfilling

 40   stale+peering
 30   stale+activating
 24   stale+active+undersized+remapped
 22   stale+active+recovering+degraded
 13   stale+activating+degraded
 9stale+remapped+peering
 4stale+active+remapped+backfill_wait
 3stale+active+clean+scrubbing+deep
 2
stale+active+undersized+degraded+remapped+backfill_wait

 1stale+active+remapped

The problem is, everything works fine.  If I run ceph health detail and 
do a pg query against one of the 'degraded' placement groups, it reports 
back as active-clean.  All clients in the cluster can write and read at 
normal speeds, but not IO information is ever reported in ceph -s.


From what I can see, everything in the cluster is working properly 
except the actual reporting on the status of the cluster.  Has anyone 
seen this before/know how to sync the mons up to what the OSDs are 
actually reporting?  I see no connectivity errors in the logs of the 
mons or the osds.


Thanks,

---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] max number of pools per cluster

2018-02-11 Thread Konstantin Shalygin

And if for any reason even single PG was damaged and for example stuck
inactive - then all RBDs will be affected.

First that come to mind is to create a separate pool for every RBD.


I think this is insane.
Is better to think how Kipod save data in CRUSH. Plan your failure 
domains and perform full stack monitoring (hots, power, network...).






k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Re: [ceph-users] Ceph Day Germany :)

2018-02-11 Thread ceph


Am 9. Februar 2018 11:51:08 MEZ schrieb Lenz Grimmer :
>Hi all,
>
>On 02/08/2018 11:23 AM, Martin Emrich wrote:
>
>> I just want to thank all organizers and speakers for the awesome Ceph
>> Day at Darmstadt, Germany yesterday.
>> 
>> I learned of some cool stuff I'm eager to try out (NFS-Ganesha for
>RGW,
>> openATTIC,...), Organization and food were great, too.
>
>I agree - thanks a lot to Danny Al-Gaaf and Leonardo for the overall
>organization, and of course the sponsors and speakers who made it
>happen! I too learned a lot.
>
>Lenz

I absolutely agree, too. This was really great! Would be Fantastic if the ceph 
days  will happen again in Darmstadt - or Düsseldorf ;)

Btw. Will the Slides and perhaps Videos of the presentation be online avaiable?

Thanks again Guys - Great day 
- Mehmet
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] degraded PGs when adding OSDs

2018-02-11 Thread Brad Hubbard
On Mon, Feb 12, 2018 at 8:51 AM, Simon Ironside  wrote:
> On 09/02/18 09:05, Janne Johansson wrote:
>>
>> 2018-02-08 23:38 GMT+01:00 Simon Ironside > >:
>>
>> Hi Everyone,
>> I recently added an OSD to an active+clean Jewel (10.2.3) cluster
>> and was surprised to see a peak of 23% objects degraded. Surely this
>> should be at or near zero and the objects should show as misplaced?
>> I've searched and found Chad William Seys' thread from 2015 but
>> didn't see any conclusion that explains this:
>>
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003355.html
>>
>> 
>>
>>   I agree, I always viewed it as if you had three copies of your PG, add a
>> new OSD and that PG decides one of the copies should be on that OSD instead
>> of one of the 3 older ones, it would just stop caring about the old PG,
>> create a new empty PG on the new OSD and then as the synch is going towards
>> the new PG it is "behind" in the data it contains until sync is done, but it
>> (and its 2 previous copies) are correctly placed for the new crush map.
>> Misplaced would probably be a more natural way of seeing it, at least if the
>> now-abandoned PG was still being updated while the sync is done, but I don't
>> think it is. It gets orphaned rather quickly as the new OSD kicks in.
>>
>> I guess this design choice boils down to "being able to handle someone
>> adding more OSDs to a cluster that is close to getting full", at the expense
>> of "discarding one or more of the old copies and scaring the admin as if
>> there was a huge issue when just adding one or many new shiny OSDs".
>
>
> It certainly does scare me, especially as this particular cluster is size=2,
> min_size=1.
>
> My worry is that I could experience a disk failure while adding a new OSD
> and potentially lose data

You've already indicated you are willing to accept data loss by
configuring size=2,  min_size=1.

Search for "2x replication: A BIG warning"

> while if the same disk failed when the cluster was
> active+clean I wouldn't. That doesn't seem like a very safe design choice
> but perhaps the real answer is to use size=3.
>
> Reweighting an active OSD to 0 does the same thing on my cluster, causes the
> objects to go degraded instead of misplaced as I'd expect.
>
>
> Thanks,
> Simon.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] degraded PGs when adding OSDs

2018-02-11 Thread Simon Ironside

On 09/02/18 09:05, Janne Johansson wrote:
2018-02-08 23:38 GMT+01:00 Simon Ironside >:


Hi Everyone,
I recently added an OSD to an active+clean Jewel (10.2.3) cluster
and was surprised to see a peak of 23% objects degraded. Surely this
should be at or near zero and the objects should show as misplaced?
I've searched and found Chad William Seys' thread from 2015 but
didn't see any conclusion that explains this:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003355.html


  I agree, I always viewed it as if you had three copies of your PG, add 
a new OSD and that PG decides one of the copies should be on that OSD 
instead of one of the 3 older ones, it would just stop caring about the 
old PG, create a new empty PG on the new OSD and then as the synch is 
going towards the new PG it is "behind" in the data it contains until 
sync is done, but it (and its 2 previous copies) are correctly placed 
for the new crush map. Misplaced would probably be a more natural way of 
seeing it, at least if the now-abandoned PG was still being updated 
while the sync is done, but I don't think it is. It gets orphaned rather 
quickly as the new OSD kicks in.


I guess this design choice boils down to "being able to handle someone 
adding more OSDs to a cluster that is close to getting full", at the 
expense of "discarding one or more of the old copies and scaring the 
admin as if there was a huge issue when just adding one or many new 
shiny OSDs".


It certainly does scare me, especially as this particular cluster is 
size=2, min_size=1.


My worry is that I could experience a disk failure while adding a new 
OSD and potentially lose data while if the same disk failed when the 
cluster was active+clean I wouldn't. That doesn't seem like a very safe 
design choice but perhaps the real answer is to use size=3.


Reweighting an active OSD to 0 does the same thing on my cluster, causes 
the objects to go degraded instead of misplaced as I'd expect.


Thanks,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk vs. ceph-volume: both error prone

2018-02-11 Thread Willem Jan Withagen

On 09/02/2018 21:56, Alfredo Deza wrote:

On Fri, Feb 9, 2018 at 10:48 AM, Nico Schottelius
 wrote:


Dear list,

for a few days we are disecting ceph-disk and ceph-volume to find out,
what is the appropriate way of creating partitions for ceph.


ceph-volume does not create partitions for ceph



For years already I found ceph-disk (and especially ceph-deploy) very
error prone and we at ungleich are considering to rewrite both into a
ceph-block-do-what-I-want-tool.


This is not very simple, that is the reason why there are tools that
do this for you.



Only considering bluestore, I see that ceph-disk creates two partitions:

Device  StartEndSectors   Size Type
/dev/sde12048 206847 204800   100M Ceph OSD
/dev/sde2  206848 2049966046 2049759199 977.4G unknown

Does somebody know, what exactly belongs onto the xfs formatted first
disk and how is the data/wal/db device sde2 formatted?


If you must, I would encourage you to try ceph-disk out with full
verbosity and dissect all the system calls, which will answer how the
partitions are formatted



What I really would like to know is, how can we best extract this
information so that we are not depending on ceph-{disk,volume} anymore.


Initially you mentioned partitions, but you want to avoid ceph-disk
and ceph-volume wholesale? That is going to take a lot more effort.
These tools not only "prepare" devices
for Ceph consumption, they also "activate" them when a system boots,
it talks to the cluster to register the OSDs, etc... It isn't just
partitioning (for ceph-disk).


I personally find it very annoying that ceph-disk tries to be friends 
with all the init-tools that are with all linuxes. Let alone all the 
udev stuff that starts working on disks once they are introduced in the 
system.


And for FreeBSD I'm not suggesting to use that since it does not fit 
with with the FreeBSD paradigm that things like this are not really 
automagically started.


So if it is only about creating the ceph-infra, things are relatively easy.

The actual work on the partitions is done with ceph-osd --mkfs and there 
is little magic about it. And then some more options tell where the 
parts for BlueStore go if you want something that is not the STD location.


Also a large part of ceph-disk is complicated/abfuscated by desires to 
run on crypted disks and or multipath disk providers...
Running it with verbose on, gives a bit of info, but the python-code is 
convoluted and complex until you have it figured out. Then it starts to 
become simpler, but never easy. ;-)


Writing a script that does what ceph-disk does? Take a look at 
src/vstart in the source. That script builds a full cluster during 
testing and is way more legible.
I did so for my FreeBSD multi-server cluster tests, and it is not 
complex at all.


Just my 2cts,
--WjW
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is there a "set pool readonly" command?

2018-02-11 Thread David Turner
If you set min_size to 2 or more, it will disable reads and writes to the
pool by blocking requests. Min_size is the minimum copies of a PG that need
to be online to allow it to the data. If you only have 1 copy, then it will
prevent io. It's not a flag you can set on the pool, but it should work
out. If you have size=3, then min_size=3 should block most io until the
pool is almost fully backfilled.

On Sun, Feb 11, 2018, 9:46 AM Nico Schottelius 
wrote:

>
> Hello,
>
> we have one pool, in which about 10 disks failed last week (fortunately
> mostly sequentially), which now has now some pgs that are only left on
> one disk.
>
> Is there a command to set one pool into "read-only" mode or even
> "recovery io-only" mode so that the only thing same is doing is
> recovering and no client i/o will disturb that process?
>
> Best,
>
> Nico
>
>
>
> --
> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is there a "set pool readonly" command?

2018-02-11 Thread Nico Schottelius

Hello,

we have one pool, in which about 10 disks failed last week (fortunately
mostly sequentially), which now has now some pgs that are only left on
one disk.

Is there a command to set one pool into "read-only" mode or even
"recovery io-only" mode so that the only thing same is doing is
recovering and no client i/o will disturb that process?

Best,

Nico



--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com