Re: [ceph-users] fixable inconsistencies but more appears

2018-08-21 Thread Alfredo Daniel Rezinovsky

I'm running kernel 4.15.0-32-generic. from ubuntu.

Some kernel upgrade might have triggered the errors...



On 21/08/18 18:24, Paul Emmerich wrote:

You might be hitting http://tracker.ceph.com/issues/22464 in this
configuration (it doesn't always show up as the CRC error described
there)
Which kernel are you running?


Paul

2018-08-21 21:41 GMT+02:00 Alfredo Daniel Rezinovsky
:

Nope. I have plenty of RAM. 8Gb for 3 OSDs per node. Most of it used for
buffering.




On 21/08/18 16:09, Paul Emmerich wrote:

Are you running tight on memory?

Paul

2018-08-21 20:37 GMT+02:00 Alfredo Daniel Rezinovsky
:

My cluster suddenly shows many inconsistent PGs.

with this kind of log

2018-08-21 15:29:39.065613 osd.2 osd.2 10.64.1.1:6801/1310438 146 :
cluster
[ERR] 2.61 shard 5: soid 2:864a5b37:::170510e.0004:head candidate
had a read error
2018-08-21 15:31:38.542447 osd.2 osd.2 10.64.1.1:6801/1310438 147 :
cluster
[ERR] 2.61 shard 5: soid 2:86783f28:::1241f7f.:head candidate
had a read error

Al error fixes with "ceph pg repair" eventually but new inconsistencies
appears.

smart and kernel logs shows no hdd problems.

I have bluestore OSDs in HDD with journal in an SDD partition.

--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo






--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Testing Weekly Tomorrow — With Kubernetes/Install discussion

2018-08-21 Thread Gregory Farnum
Hey all,
We've had the testing weekly call going on for several months now
(URL: https://ceph.com/testing/) and some people have found it useful,
but we haven't gotten many new attendees so here's a reminder:
Wednesday 8AM Pacific time, there's a BlueJeans session to discuss
anything about testing Ceph (particularly, but not exclusively,
teuthology). Etherpad containing occasionally-updated notes is at
https://pad.ceph.com/p/community-testing-weekly

But this week, there's an extra-special bonus topic! We're going to
discuss options for updating our test installation strategy so we can
include ceph-ansible/DeepSea/Rook testing in our regular suites, and
options for working with Kubernetes going forward (inside the tests?
Outside them as part of the framework? So many choices!). If you're
interested in helping shape how we test Ceph going forward, don't miss
it! :)
Thanks,
-Greg

To join the Meeting:
https://bluejeans.com/185197632
To join via phone :
1)  Dial:
 408-915-6466 (United States)
 (see all numbers - https://www.redhat.com/en/conference-numbers)
2)  Enter Conference ID : 185197632
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Still risky to remove RBD-Images?

2018-08-21 Thread ceph



Am 20. August 2018 17:22:35 MESZ schrieb Mehmet :
>Hello,

Hello me,

>
>AFAIK removing of big RBD-Images would lead ceph to produce blocked 
>requests - I dont mean caused by poor disks.
>
>Is this still the case with "Luminous (12.2.4)"?
>

To answer my question by myself :)
There is no Problem, had to delete the 2T Image at First and did not See any 
blocked requests.

The Space is being freed over a few minutes.

- Mehmet 

>I have a a few images with
>
>- 2 Terrabyte
>- 5 Terrabyte
>and
>- 20 Terrabyte
>
>in size and have to delete the images.
>
>Would be nice if you could enlightne me :)
>
>- Mehmet
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] packages names for ubuntu/debian

2018-08-21 Thread Ken Dreyer
Yes, this is a bummer.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-November/022687.html

Unfortunately we chose to add the the Ubuntu distro codename suffixes
like "xenial" to the ceph.com packages long ago, because who knew that
the release names would ever wrap around :)

If we were to switch to "ubuntuXX.XX" suffixes (like ubuntu.com does),
our "x" suffix still sorts ahead of the letter "u" in "ubuntu". Maybe
we can fix this bug before Ubuntu releases hit "u" codenames again in
2028.

If you're do-release-upgrade'ing from Xenial to Bionic, once you've
ensured that you've disabled the xenial repos and enable the bionic
ones on your cluster nodes:

A) If you were running the latest point release of Ceph, you'll need
to "downgrade" to get the bionic builds

B) If your Xenial boxes happened to be behind the latest Ceph point
release, then you can use apt to get to the latest Ceph point release
without the apt "downgrade" operation.

- Ken



On Mon, Aug 20, 2018 at 6:56 PM, Alfredo Daniel Rezinovsky
 wrote:
> On 20/08/18 03:50, Bastiaan Visser wrote:
>>
>> you should only use the 18.04 repo in 18.04, and remove the 16.04 repo.
>>
>> use:
>> https://download.ceph.com/debian-luminous bionic main
>>
>> - Bastiaan
>
>
> Right. But if I came from a working 16.04 system upgraded to 18.04 the ceph
> (xenial) packages are already there and wont upgrade to beaver ones because
> the names means downgrade.
>
>
>> - Original Message -
>> From: "Alfredo Daniel Rezinovsky"
>> 
>> To: "ceph-users" 
>> Sent: Sunday, August 19, 2018 10:15:00 PM
>> Subject: [ceph-users] packages names for ubuntu/debian
>>
>> Last packages for ubuntu 16.04 are version 13.2.1-1xenial
>> while last packages for ubuntu 18.04 are 13.2.1-1bionic
>>
>> I recently upgraded from ubuntu 16 to 18 and the ceph packages stayed in
>> xenial because alphabetically xenial > bionic.
>>
>> I had to set the piining to force the upgrade to bionic (Which was trated
>> as a downgrade)
>>
>> In Ubuntu maling lists they said those packages are "wrongly versioned"
>>
>> I think the names should be 13.2.1-1ubuntu16.04-xenial and
>> 13.2.1.ubuntu18.04-bionic.
>>
>
> --
> Alfredo Daniel Rezinovsky
> Director de Tecnologías de Información y Comunicaciones
> Facultad de Ingeniería - Universidad Nacional de Cuyo
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-fuse slow cache?

2018-08-21 Thread Stefan Kooman
Hi,

I'm trying to find out why ceph-fuse client(s) are slow. Luminous 12.2.7
Ceph cluster, Mimic 13.2.1 ceph-fuse client. Ubuntu xenial, 4.13.0-38-generic 
kernel.

Test case:
25 curl requests directed at a single threaded apache process (apache2
-X).

When the requests are handled by ceph-kernel client it takes about 1.5
seconds for the first "GET /" (all fs /inode dentrie caches dropped
before hand). Subsequent requests only take ~ 0.4 seconds. So fs caches
seem to do their job.

With ceph-fuse it's a bit different. The first and subsequent requests
("GET /") will take around 4-5 seconds every single time. As if the
ceph-fuse / vfs cache does not work. ceph daemon client.id dump_cache
shows that all dentries and inodes are in the cache ~ 589 of them. Relevant
part of one of them:

"caps": [
{
"mds": 0,
"auth": 1,
"mds": 0,
"ino": "0x1096a6a",
"cap_id": 50538956,
"issued": "pAsLsXsFscr",
"wanted": "-",
"seq": 1,
"issue_seq": 1,
"mseq": 0,
"gen": 0
}
],
"auth_cap": 0,
"dirty_caps": "-",
"shared_gen": 1,
"cache_gen": 1,
"hold_caps_until": "0.00",

Not sure what the "issued" caps indicate. This client is currently the only
client for this directory, so cache should not be invalidated.
On the MDS side I hardly see any request from this client (I checked
"dump_ops_in_flight" every second). So, I guess they should come from the
cache. But why does it take so long? I have run ceph-fuse in debug mode
(--debug-client=20) but this of course results in a lot of output, and I'm not
sure what to look for.

Watching "mds_requests" on the client every second does not show any request.

I know the performance of ceph kernel client is (much) better than ceph-fuse,
but does this also apply to objects in cache?

Thanks for any hints.

Gr. Stefan

P.s. ceph-fuse luminous client 12.2.7 shows same result. the only active MDS 
server has 256 GB
cache and has hardly any load. So most inodes / dentries should be cached there 
also.


-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-container - rbd map failing since upgrade?

2018-08-21 Thread Ilya Dryomov
On Tue, Aug 21, 2018 at 9:19 PM Jacob DeGlopper  wrote:
>
> I'm seeing an error from the rbd map command running in ceph-container;
> I had initially deployed this cluster as Luminous, but a pull of the
> ceph/daemon container unexpectedly upgraded me to Mimic 13.2.1.
>
> [root@nodeA2 ~]# ceph version
> ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
> (stable)
>
> [root@nodeA2 ~]# rbd info mysqlTB
> rbd image 'mysqlTB':
>  size 360 GiB in 92160 objects
>  order 22 (4 MiB objects)
>  id: 206a962ae8944a
>  block_name_prefix: rbd_data.206a962ae8944a
>  format: 2
>  features: layering
>  op_features:
>  flags:
>  create_timestamp: Sat Aug 11 00:00:36 2018
>
> [root@nodeA2 ~]# rbd map mysqlTB
> rbd: failed to add secret 'client.admin' to kernel
> In some cases useful info is found in syslog - try "dmesg | tail".
> rbd: map failed: (1) Operation not permitted
>
> [root@nodeA2 ~]# type rbd
> rbd is a function
> rbd ()
> {
>  sudo docker exec ceph-mon-nodeA2 rbd --cluster ceph ${@}
> }
>
> [root@nodeA2 ~]# ls -alF /etc/ceph/ceph.client.admin.keyring
> -rw--- 1 ceph ceph 159 May 21 09:27 /etc/ceph/ceph.client.admin.keyring
>
> System is CentOS 7 with the elrepo mainline kernel:
>
> [root@nodeA2 ~]# uname -a
> Linux nodeA2 4.18.3-1.el7.elrepo.x86_64 #1 SMP Sat Aug 18 09:30:18 EDT
> 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> I see a similar question here with no answer:
> https://github.com/ceph/ceph-container/issues/1030

Hi Jacob,

You mentioned an upgrade in the subject, did it work with luminous
ceph-container?

It seems unlikely -- docker blocks add_key(2) and other key management
related system calls with seccomp because the kernel keyring is global.
See https://docs.docker.com/engine/security/seccomp/.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] There's a way to remove the block.db ?

2018-08-21 Thread David Turner
They have talked about working on allowing people to be able to do this,
but for now there is nothing you can do to remove the block.db or block.wal
from a bluestore OSD. However, there is an option to completely replace the
SSD, not remove it.  There are a few ML threads discussing how to utilize
dd to copy all of the contents of the current SSD to a new device and
configure your OSDs to use the new SSD.

On Tue, Aug 21, 2018 at 3:44 PM Alfredo Daniel Rezinovsky <
alfredo.rezinov...@ingenieria.uncuyo.edu.ar> wrote:

> My ceph-$ID mount point looks like this
>
> -rw-r--r-- 1 root root 438 Aug 15 11:05 activate.monmap
> -rw-r--r-- 1 ceph ceph   3 Aug 15 11:05 active
> lrwxrwxrwx 1 ceph ceph  58 Aug 15 11:05 block ->
> /dev/disk/by-partuuid/bd9f8501-2958-4294-8982-1e5cae80deef
> lrwxrwxrwx 1 ceph ceph  58 Aug 15 11:05 block.db ->
> /dev/disk/by-partuuid/ee7d30db-4899-49e7-a86d-018e4f97608b
> -rw-r--r-- 1 ceph ceph  37 Aug 15 11:05 block.db_uuid
> -rw-r--r-- 1 ceph ceph  37 Aug 15 11:05 block_uuid
> -rw-r--r-- 1 ceph ceph   2 Aug 15 11:05 bluefs
> -rw-r--r-- 1 ceph ceph  37 Aug 15 11:05 ceph_fsid
> -rw-r--r-- 1 ceph ceph  37 Aug 15 11:05 fsid
> -rw--- 1 ceph ceph  56 Aug 15 11:05 keyring
> -rw-r--r-- 1 ceph ceph   8 Aug 15 11:05 kv_backend
> -rw-r--r-- 1 ceph ceph  21 Aug 15 11:05 magic
> -rw-r--r-- 1 ceph ceph   4 Aug 15 11:05 mkfs_done
> -rw-r--r-- 1 ceph ceph   6 Aug 15 11:05 ready
> -rw-r--r-- 1 ceph ceph   0 Aug 16 15:18 systemd
> -rw-r--r-- 1 ceph ceph  10 Aug 15 11:05 type
> -rw-r--r-- 1 ceph ceph   2 Aug 15 11:05 whoami
>
> Is a bluestore with block.db in a SSD. I'm not trusting the SSD and I
> want to remove the block.db without destroying and creating the OSD.
>
> There's a way to do this.
>
> --
> Alfredo Daniel Rezinovsky
> Director de Tecnologías de Información y Comunicaciones
> Facultad de Ingeniería - Universidad Nacional de Cuyo
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] There's a way to remove the block.db ?

2018-08-21 Thread Alfredo Daniel Rezinovsky

My ceph-$ID mount point looks like this

-rw-r--r-- 1 root root 438 Aug 15 11:05 activate.monmap
-rw-r--r-- 1 ceph ceph   3 Aug 15 11:05 active
lrwxrwxrwx 1 ceph ceph  58 Aug 15 11:05 block -> 
/dev/disk/by-partuuid/bd9f8501-2958-4294-8982-1e5cae80deef
lrwxrwxrwx 1 ceph ceph  58 Aug 15 11:05 block.db -> 
/dev/disk/by-partuuid/ee7d30db-4899-49e7-a86d-018e4f97608b

-rw-r--r-- 1 ceph ceph  37 Aug 15 11:05 block.db_uuid
-rw-r--r-- 1 ceph ceph  37 Aug 15 11:05 block_uuid
-rw-r--r-- 1 ceph ceph   2 Aug 15 11:05 bluefs
-rw-r--r-- 1 ceph ceph  37 Aug 15 11:05 ceph_fsid
-rw-r--r-- 1 ceph ceph  37 Aug 15 11:05 fsid
-rw--- 1 ceph ceph  56 Aug 15 11:05 keyring
-rw-r--r-- 1 ceph ceph   8 Aug 15 11:05 kv_backend
-rw-r--r-- 1 ceph ceph  21 Aug 15 11:05 magic
-rw-r--r-- 1 ceph ceph   4 Aug 15 11:05 mkfs_done
-rw-r--r-- 1 ceph ceph   6 Aug 15 11:05 ready
-rw-r--r-- 1 ceph ceph   0 Aug 16 15:18 systemd
-rw-r--r-- 1 ceph ceph  10 Aug 15 11:05 type
-rw-r--r-- 1 ceph ceph   2 Aug 15 11:05 whoami

Is a bluestore with block.db in a SSD. I'm not trusting the SSD and I 
want to remove the block.db without destroying and creating the OSD.


There's a way to do this.

--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-container - rbd map failing since upgrade?

2018-08-21 Thread Jacob DeGlopper
I'm seeing an error from the rbd map command running in ceph-container; 
I had initially deployed this cluster as Luminous, but a pull of the 
ceph/daemon container unexpectedly upgraded me to Mimic 13.2.1.


[root@nodeA2 ~]# ceph version
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic 
(stable)


[root@nodeA2 ~]# rbd info mysqlTB
rbd image 'mysqlTB':
    size 360 GiB in 92160 objects
    order 22 (4 MiB objects)
    id: 206a962ae8944a
    block_name_prefix: rbd_data.206a962ae8944a
    format: 2
    features: layering
    op_features:
    flags:
    create_timestamp: Sat Aug 11 00:00:36 2018

[root@nodeA2 ~]# rbd map mysqlTB
rbd: failed to add secret 'client.admin' to kernel
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (1) Operation not permitted

[root@nodeA2 ~]# type rbd
rbd is a function
rbd ()
{
    sudo docker exec ceph-mon-nodeA2 rbd --cluster ceph ${@}
}

[root@nodeA2 ~]# ls -alF /etc/ceph/ceph.client.admin.keyring
-rw--- 1 ceph ceph 159 May 21 09:27 /etc/ceph/ceph.client.admin.keyring

System is CentOS 7 with the elrepo mainline kernel:

[root@nodeA2 ~]# uname -a
Linux nodeA2 4.18.3-1.el7.elrepo.x86_64 #1 SMP Sat Aug 18 09:30:18 EDT 
2018 x86_64 x86_64 x86_64 GNU/Linux


I see a similar question here with no answer: 
https://github.com/ceph/ceph-container/issues/1030


dmesg shows nothing related:

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fixable inconsistencies but more appears

2018-08-21 Thread Paul Emmerich
Are you running tight on memory?

Paul

2018-08-21 20:37 GMT+02:00 Alfredo Daniel Rezinovsky
:
> My cluster suddenly shows many inconsistent PGs.
>
> with this kind of log
>
> 2018-08-21 15:29:39.065613 osd.2 osd.2 10.64.1.1:6801/1310438 146 : cluster
> [ERR] 2.61 shard 5: soid 2:864a5b37:::170510e.0004:head candidate
> had a read error
> 2018-08-21 15:31:38.542447 osd.2 osd.2 10.64.1.1:6801/1310438 147 : cluster
> [ERR] 2.61 shard 5: soid 2:86783f28:::1241f7f.:head candidate
> had a read error
>
> Al error fixes with "ceph pg repair" eventually but new inconsistencies
> appears.
>
> smart and kernel logs shows no hdd problems.
>
> I have bluestore OSDs in HDD with journal in an SDD partition.
>
> --
> Alfredo Daniel Rezinovsky
> Director de Tecnologías de Información y Comunicaciones
> Facultad de Ingeniería - Universidad Nacional de Cuyo
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] fixable inconsistencies but more appears

2018-08-21 Thread Alfredo Daniel Rezinovsky

My cluster suddenly shows many inconsistent PGs.

with this kind of log

2018-08-21 15:29:39.065613 osd.2 osd.2 10.64.1.1:6801/1310438 146 : 
cluster [ERR] 2.61 shard 5: soid 2:864a5b37:::170510e.0004:head 
candidate had a read error
2018-08-21 15:31:38.542447 osd.2 osd.2 10.64.1.1:6801/1310438 147 : 
cluster [ERR] 2.61 shard 5: soid 2:86783f28:::1241f7f.:head 
candidate had a read error


Al error fixes with "ceph pg repair" eventually but new inconsistencies 
appears.


smart and kernel logs shows no hdd problems.

I have bluestore OSDs in HDD with journal in an SDD partition.

--
Alfredo Daniel Rezinovsky
Director de Tecnologías de Información y Comunicaciones
Facultad de Ingeniería - Universidad Nacional de Cuyo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-08-21 Thread David Turner
The problem with the current OSDs was a poorly advised chmod of the OSD
data store.  From what I've pieced together the chmod was run against a
running OSD.

On Tue, Aug 21, 2018 at 1:13 PM Paul Emmerich 
wrote:

> I would continue with the upgrade of all OSDs this scenario as the old
> ones are crashing, not the new one.
> Maybe with all the flags set (pause, norecover, ...)
>
>
> Paul
>
> 2018-08-21 19:08 GMT+02:00 Kees Meijs :
> > Hello David,
> >
> > Thank you and I'm terribly sorry; I was unaware I was starting new
> threads.
> >
> > From the top of my mind I say "yes it'll fit" but obviously I make sure
> at
> > first.
> >
> > Regards,
> > Kees
> >
> > On 21-08-18 16:34, David Turner wrote:
> >>
> >> Ceph does not support downgrading OSDs.  When you removed the single
> OSD,
> >> it was probably trying to move data onto the other OSDs in the node with
> >> Infernalis OSDs.  I would recommend stopping every OSD in that node and
> >> marking them out so the cluster will rebalance without them.  Assuming
> your
> >> cluster is able to get healthy after that, we'll see where things are.
> >>
> >> Also, please stop opening so many email threads about this same issue.
> It
> >> makes tracking this in the archives impossible.
> >>
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90 <+49%2089%20189658590>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-08-21 Thread Jason Dillaman
Can you collect any librados / librbd debug logs and provide them via
pastebin? Just add / tweak the following in your "/etc/ceph/ceph.conf"
file's "[client]" section and re-run to gather the logs.

[client]
log file = /path/to/a/log/file
debug ms = 1
debug monc = 20
debug objecter = 20
debug rados = 20
debug rbd = 20
On Mon, Aug 20, 2018 at 12:55 PM Andre Goree  wrote:
>
> This issue first started while using Luminous 12.2.5, I upgraded to 12.2.7 
> and it's still present.  This issue is _not_ present in 12.2.4.
>
> With Ceph 12.2.4, using QEMU/KVM + Libvirt, I'm able to mount an rbd image 
> using the following syntax and populated xml:
>
> 'virsh attach-device $vm foo.xml --persistent'
>
> xml contents:
> 
>  
>   
> 
> 
> 
>
>
>  
>
>
>
> 
>
> I receive this error:
> ~# virsh attach-device $vm foo.xml --persistent
> error: Failed to attach device from foo.xml
> error: internal error: unable to execute QEMU command 'device_add': Property 
> 'scsi-hd.drive' can't find value 'drive-scsi0-0-0-1'
>
> I've tried different things with the XML, but nothing seems to work, always 
> failing with the above error.  This does _not_ happen with our cluster 
> running 12.2.4, the same exact command with a cluster using an identical 
> configuration (for all intents and purposes).
>
> Any thoughts?  Hard to believe I'm the only one to hit this if it's indeed a 
> bug, but I haven't found anyone else having the issue through interweb 
> searches.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-08-21 Thread Paul Emmerich
I would continue with the upgrade of all OSDs this scenario as the old
ones are crashing, not the new one.
Maybe with all the flags set (pause, norecover, ...)


Paul

2018-08-21 19:08 GMT+02:00 Kees Meijs :
> Hello David,
>
> Thank you and I'm terribly sorry; I was unaware I was starting new threads.
>
> From the top of my mind I say "yes it'll fit" but obviously I make sure at
> first.
>
> Regards,
> Kees
>
> On 21-08-18 16:34, David Turner wrote:
>>
>> Ceph does not support downgrading OSDs.  When you removed the single OSD,
>> it was probably trying to move data onto the other OSDs in the node with
>> Infernalis OSDs.  I would recommend stopping every OSD in that node and
>> marking them out so the cluster will rebalance without them.  Assuming your
>> cluster is able to get healthy after that, we'll see where things are.
>>
>> Also, please stop opening so many email threads about this same issue.  It
>> makes tracking this in the archives impossible.
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-08-21 Thread Kees Meijs

Hello David,

Thank you and I'm terribly sorry; I was unaware I was starting new threads.

From the top of my mind I say "yes it'll fit" but obviously I make sure 
at first.


Regards,
Kees

On 21-08-18 16:34, David Turner wrote:
Ceph does not support downgrading OSDs.  When you removed the single 
OSD, it was probably trying to move data onto the other OSDs in the 
node with Infernalis OSDs.  I would recommend stopping every OSD in 
that node and marking them out so the cluster will rebalance without 
them.  Assuming your cluster is able to get healthy after that, we'll 
see where things are.


Also, please stop opening so many email threads about this same 
issue.  It makes tracking this in the archives impossible.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-08-21 Thread Konstantin Shalygin

On 08/21/2018 11:44 PM, Andre Goree wrote:


Thank you for your reply.


Interestingly, the same (or similar enough) settings still fail here.  
Which version of libvirt are you using?  I think maybe this is a 
libvirt or QEMU bug, and not specifically Ceph...?



~# qemu-system-x86_64 --version

QEMU emulator version 2.5.0
~# virsh --version
1.3.1




# /usr/libexec/qemu-kvm --version
QEMU emulator version 2.10.0(qemu-kvm-ev-2.10.0-21.el7_5.4.1)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
# virsh --version
3.9.0




k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Question about 'firstn|indep'

2018-08-21 Thread Cody
Hi everyone,

I read an earlier thread [1] that made a good explanation on the 'step
choose|chooseleaf' option. Could someone further help me to understand
the 'firstn|indep' part? Also, what is the relationship between 'step
take' and 'step choose|chooseleaf' when it comes to define a failure
domain?

Thank you very much.

Regards,
Cody

[1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/010370.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Documentation regarding log file structure

2018-08-21 Thread Gregory Farnum
I don't think so. Sorry. :(

On Tue, Aug 21, 2018 at 12:06 AM Uwe Sauter  wrote:

> Hi list,
>
> does documentation exist that explains the structure of Ceph log files?
> Other than the source code?
>
> Thanks,
>
> Uwe
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph balancer: further optimizations?

2018-08-21 Thread Gregory Farnum
You should be able to create issues now; we had a misconfiguration in
the tracker following the recent spam attack.
-Greg

On Tue, Aug 21, 2018 at 3:07 AM, Stefan Priebe - Profihost AG
 wrote:
>
> Am 21.08.2018 um 12:03 schrieb Stefan Priebe - Profihost AG:
>>
>> Am 21.08.2018 um 11:56 schrieb Dan van der Ster:
>>> On Tue, Aug 21, 2018 at 11:54 AM Stefan Priebe - Profihost AG
>>>  wrote:

 Am 21.08.2018 um 11:47 schrieb Dan van der Ster:
> On Mon, Aug 20, 2018 at 10:45 PM Stefan Priebe - Profihost AG
>  wrote:
>>
>>
>> Am 20.08.2018 um 22:38 schrieb Dan van der Ster:
>>> On Mon, Aug 20, 2018 at 10:19 PM Stefan Priebe - Profihost AG
>>>  wrote:


 Am 20.08.2018 um 21:52 schrieb Sage Weil:
> On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> since loic seems to have left ceph development and his wunderful 
>> crush
>> optimization tool isn'T working anymore i'm trying to get a good
>> distribution with the ceph balancer.
>>
>> Sadly it does not work as good as i want.
>>
>> # ceph osd df | sort -k8
>>
>> show 75 to 83% Usage which is 8% difference which is too much for me.
>> I'm optimization by bytes.
>>
>> # ceph balancer eval
>> current cluster score 0.005420 (lower is better)
>>
>> # ceph balancer eval $OPT_NAME
>> plan spriebe_2018-08-20_19:36 final score 0.005456 (lower is better)
>>
>> I'm unable to optimize further ;-( Is there any chance to optimize
>> further even in case of more rebelancing?
>
> The scoring that the balancer module is doing is currently a hybrid 
> of pg
> count, bytes, and object count.  Picking a single metric might help a 
> bit
> (as those 3 things are not always perfectly aligned).

 Hi,

 ok i found a bug in the balancer code which seems to be present in all
 releases.

  861 best_ws = next_ws
  862 best_ow = next_ow


 should be:

  861 best_ws = copy.deepcopy(next_ws)
  862 best_ow = copy.deepcopy(next_ow)

 otherwise it does not use the best but the last.
>>>
>>> Interesting... does that change improve things?
>>
>> It fixes the following (mgr debug output):
>> 2018-08-20 22:33:46.078525 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001180, misplacing 0.000912
>> 2018-08-20 22:33:46.078574 7f2fbc3b6700  0 mgr[balancer] Score got
>> worse, taking another step
>> 2018-08-20 22:33:46.078770 7f2fbc3b6700  0 mgr[balancer] Balancing root
>> default (pools ['cephstor2']) by bytes
>> 2018-08-20 22:33:46.156326 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001180, misplacing 0.000912
>> 2018-08-20 22:33:46.156374 7f2fbc3b6700  0 mgr[balancer] Score got
>> worse, taking another step
>> 2018-08-20 22:33:46.156581 7f2fbc3b6700  0 mgr[balancer] Balancing root
>> default (pools ['cephstor2']) by bytes
>> 2018-08-20 22:33:46.233818 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001180, misplacing 0.000912
>> 2018-08-20 22:33:46.233868 7f2fbc3b6700  0 mgr[balancer] Score got
>> worse, taking another step
>> 2018-08-20 22:33:46.234043 7f2fbc3b6700  0 mgr[balancer] Balancing root
>> default (pools ['cephstor2']) by bytes
>> 2018-08-20 22:33:46.313212 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001180, misplacing 0.000912
>> 2018-08-20 22:33:46.313714 7f2fbc3b6700  0 mgr[balancer] Score got
>> worse, trying smaller step 0.000244
>> 2018-08-20 22:33:46.313887 7f2fbc3b6700  0 mgr[balancer] Balancing root
>> default (pools ['cephstor2']) by bytes
>> 2018-08-20 22:33:46.391586 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001152, misplacing 0.001141
>> 2018-08-20 22:33:46.393374 7f2fbc3b6700  0 mgr[balancer] Balancing root
>> default (pools ['cephstor2']) by bytes
>> 2018-08-20 22:33:46.473956 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001180, misplacing 0.000912
>> 2018-08-20 22:33:46.474001 7f2fbc3b6700  0 mgr[balancer] Score got
>> worse, taking another step
>> 2018-08-20 22:33:46.474046 7f2fbc3b6700  0 mgr[balancer] Success, score
>> 0.001155 -> 0.001152
>>
>> BUT:
>> # ceph balancer eval myplan
>> plan myplan final score 0.001180 (lower is better)
>>
>> So the final plan does NOT contain the expected optimization. The
>> deepcopy fixes it.
>>
>> After:
>> # ceph balancer eval myplan
>> plan myplan final score 0.001152 (lower is better)
>>
>
> OK that looks 

Re: [ceph-users] OSD Crash When Upgrading from Jewel to Luminous?

2018-08-21 Thread Kenneth Van Alstyne
After looking into this further, is it possible that adjusting CRUSH weight of 
the OSDs while running mis-matched versions of the ceph-osd daemon across the 
cluster can cause this issue?  Under certain circumstances in our cluster, this 
may happen automatically on the backend.  I can’t duplicate the issue in a lab, 
but highly suspect this is what happened.

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 2 / ISO 27001 / CMMI Level 3

Notice: This e-mail message, including any attachments, is for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, copy, use, disclosure, or distribution is 
STRICTLY prohibited. If you are not the intended recipient, please contact the 
sender by reply e-mail and destroy all copies of the original message.

On Aug 17, 2018, at 4:01 PM, Gregory Farnum 
mailto:gfar...@redhat.com>> wrote:

Do you have more logs that indicate what state machine event the crashing OSDs 
received? This obviously shouldn't have happened, but it's a plausible failure 
mode, especially if it's a relatively rare combination of events.
-Greg

On Fri, Aug 17, 2018 at 4:49 PM Kenneth Van Alstyne 
mailto:kvanalst...@knightpoint.com>> wrote:
Hello all:
I ran into an issue recently with one of my clusters when upgrading 
from 10.2.10 to 12.2.7.  I have previously tested the upgrade in a lab and 
upgraded one of our five production clusters with no issues.  On the second 
cluster, however, I ran into an issue where all OSDs that were NOT running 
Luminous yet (which was about 40% of the cluster at the time) all crashed with 
the same backtrace, which I have pasted below:

===
 0> 2018-08-13 17:35:13.160849 7f145c9ec700 -1 osd/PG.cc: In 
function 
'PG::RecoveryState::Crashed::Crashed(boost::statechart::state::my_context)' thread 7f145c9ec700 time 
2018-08-13 17:35:13.157319
osd/PG.cc: 5860: FAILED assert(0 == "we got a bad state machine 
event")

 ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) 
[0x55b9bf08614f]
 2: 
(PG::RecoveryState::Crashed::Crashed(boost::statechart::state, (boost::statechart::history_mode)0>::my_context)+0xc4) 
[0x55b9bea62db4]
 3: (()+0x447366) [0x55b9bea9a366]
 4: (boost::statechart::simple_state, 
(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base 
const&, void const*)+0x2f7) [0x55b9beac8b77]
 5: (boost::statechart::state_machine, 
boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
 const&)+0x6b) [0x55b9beaab5bb]
 6: (PG::handle_peering_event(std::shared_ptr, 
PG::RecoveryCtx*)+0x384) [0x55b9bea7db14]
 7: (OSD::process_peering_events(std::__cxx11::list > 
const&, ThreadPool::TPHandle&)+0x263) [0x55b9be9d1723]
 8: (ThreadPool::BatchWorkQueue::_void_process(void*, 
ThreadPool::TPHandle&)+0x2a) [0x55b9bea1274a]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb0) [0x55b9bf076d40]
 10: (ThreadPool::WorkThread::entry()+0x10) [0x55b9bf077ef0]
 11: (()+0x7507) [0x7f14e2c96507]
 12: (clone()+0x3f) [0x7f14e0ca214f]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.
===

Once I restarted the impacted OSDs, which brought them up to 12.2.7, everything 
recovered just fine and the cluster is healthy.  The only rub is that losing 
that many OSDs simultaneously caused a significant I/O disruption to the 
production servers for several minutes while I brought up the remaining OSDs.  
I have been trying to duplicate this issue in a lab again before continuing the 
upgrades on the other three clusters, but am coming up short.  Has anyone seen 
anything like this and am I missing something obvious?

Given how quickly the issue happened and the fact that I’m having a hard time 
reproducing this issue, I am limited in the amount of logging and debug 
information I have available, unfortunately.  If it helps, all ceph-mon, 
ceph-mds, radosgw, and ceph-mgr daemons were running 12.2.7, while 30 of the 50 
total ceph-osd daemons were also on 12.2.7 when the remaining 20 ceph-osd 
daemons (on 10.2.10) crashed.

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 
20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 2 / ISO 27001 / CMMI 

Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-08-21 Thread David Turner
Ceph does not support downgrading OSDs.  When you removed the single OSD,
it was probably trying to move data onto the other OSDs in the node with
Infernalis OSDs.  I would recommend stopping every OSD in that node and
marking them out so the cluster will rebalance without them.  Assuming your
cluster is able to get healthy after that, we'll see where things are.

Also, please stop opening so many email threads about this same issue.  It
makes tracking this in the archives impossible.

On Mon, Aug 20, 2018 at 9:45 PM Kees Meijs  wrote:

> Hi there,
>
> A few hours ago I started the given OSD again and gave it weight 1.0.
> Backfilling started and more PGs became active+clean.
>
> After a while the same crashing behaviour started to act up so I stopped
> the backfilling.
>
> Running with noout,nobackfill,norebalance,noscrub,nodeep-scrub flags now
> but at least it seems the cluster seems stable (fingers crossed...)
>
> Possible plan of attack:
>
>1. Stopping all Infernalis OSDs.
>2. Remove Ceph Infernalis packages from OSD node.
>3. Install Hammer packages.
>4. Start the OSDs (or maybe the package installation does this
>already.)
>
> Effectively this is an OSD downgrade. Is this supported or does Ceph
> "upgrade" data structures on disk as well?
>
> Recap: this would imply going from Infernalis back to Hammer.
>
> Any thoughts are more than welcome (maybe a completely different approach
> makes sense...) Meanwhile, I'll try to catch some sleep.
>
> Thanks, thanks!
>
> Best regards,
> Kees
> On 20-08-18 21:46, Kees Meijs wrote:
>
> Other than restarting the "out" and stopped OSD for the time being
> (haven't tried that yet) I'm quite lost.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] backporting to luminous librgw: export multitenancy support

2018-08-21 Thread Marc Roos


Can this be added to luminous?

https://github.com/ceph/ceph/pull/19358

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions on CRUSH map

2018-08-21 Thread Cody
Hi Konstantin,

I could only dream of reading this answer! Thank you so much!!!

Regards,
Cody


On Tue, Aug 21, 2018 at 8:50 AM Konstantin Shalygin  wrote:
>
> On 08/20/2018 08:15 PM, Cody wrote:
>
> Hi Konstantin,
>
> Thank you for looking into my question.
>
> I was trying to understand how to set up CRUSH hierarchies and set
> rules for different failure domains. I am particularly confused by the
> 'step take' and 'step choose|chooseleaf' settings for which I think
> are the keys for defining a failure domain in a CRUSH rule.
>
> As for my hypothetical cluster, it is made of 3 racks with 2 hosts on
> each. One host has 3 SSD-based OSDs and the other has 3 HDD-based
> OSDs. I wished to create two rules: one uses SSD-only and another
> HDD-only. Both rules should have a rack level failure domain.
>
> I have attached a diagram that may help to explain my setup. The
> following is my CRUSH map configuration (with all typos fixed) for
> review:
>
> device 0 osd.0 class ssd
> device 1 osd.1 class ssd
> device 2 osd.2 class ssd
> device 3 osd.3 class hdd
> device 4 osd.4 class hdd
> device 5 osd.5 class hdd
> device 6 osd.6 class ssd
> device 7 osd.7 class ssd
> device 8 osd.8 class ssd
> device 9 osd.9 class hdd
> device 10 osd.10 class hdd
> device 11 osd.11 class hdd
> device 12 osd.12 class ssd
> device 13 osd.13 class ssd
> device 14 osd.14 class ssd
> device 15 osd.15 class hdd
> device 16 osd.17 class hdd
> device 17 osd.17 class hdd
>
>   host a1-1 {
>   id -1
>   alg straw
>   hash 0
>   item osd.0 weight 1.00
>   item osd.1 weight 1.00
>   item osd.2 weight 1.00
>   }
>
>   host a1-2 {
>   id -2
>   alg straw
>   hash 0
>   item osd.3 weight 1.00
>   item osd.4 weight 1.00
>   item osd.5 weight 1.00
>   }
>
>   host a2-1 {
>   id -3
>   alg straw
>   hash 0
>   item osd.6 weight 1.00
>   item osd.7 weight 1.00
>   item osd.8 weight 1.00
>   }
>
>   host a2-2 {
>   id -4
>   alg straw
>   hash 0
>   item osd.9 weight 1.00
>   item osd.10 weight 1.00
>   item osd.11 weight 1.00
>   }
>
>   host a3-1 {
>   id -5
>   alg straw
>   hash 0
>   item osd.12 weight 1.00
>   item osd.13 weight 1.00
>   item osd.14 weight 1.00
>   }
>
>   host a3-2 {
>   id -6
>   alg straw
>   hash 0
>   item osd.15 weight 1.00
>   item osd.16 weight 1.00
>   item osd.17 weight 1.00
>   }
>
>   rack a1 {
>   id -7
>   alg straw
>   hash 0
>   item a1-1 weight 3.0
>   item a1-2 weight 3.0
>   }
>
>   rack a2 {
>   id -5
>   alg straw
>   hash 0
>   item a2-1 weight 3.0
>   item a2-2 weight 3.0
>   }
>
>   rack a3 {
>   id -6
>   alg straw
>   hash 0
>   item a3-1 weight 3.0
>   item a3-2 weight 3.0
>   }
>
>   row a {
>   id -7
>   alg straw
>   hash 0
>   item a1 6.0
>   item a2 6.0
>   item a3 6.0
>   }
>
>   rule ssd {
>   id 1
>   type replicated
>   min_size 2
>   max_size 11
>   step take a class ssd
>   step chooseleaf firstn 0 type rack
>   step emit
>   }
>
>   rule hdd {
>   id 2
>   type replicated
>   min_size 2
>   max_size 11
>   step take a class hdd
>   step chooseleaf firstn 0 type rack
>   step emit
>   }
>
>
> Are the two rules correct?
>
>
>
> Times when you need manually edit CRUSH map is gone. Manual editing even in 
> your case has already lead to errors.
>
>
>
> # create new datacenter and move it to default root
> ceph osd crush add-bucket new_datacenter datacenter
> ceph osd crush move new_datacenter root=default
> # create our racks
> ceph osd crush add-bucket rack_a1 rack
> ceph osd crush add-bucket rack_a2 rack
> ceph osd crush add-bucket rack_a3 rack
> # move our racks to our datacenter
> ceph osd crush move rack_a1 datacenter=new_datacenter
> ceph osd crush move rack_a2 datacenter=new_datacenter
> ceph osd crush move rack_a3 datacenter=new_datacenter
> # create our hosts
> ceph osd crush add-bucket host_a1-1 host
> ceph osd crush add-bucket host_a1-2 host
> ceph osd crush add-bucket host_a2-1 host
> ceph osd crush add-bucket host_a2-2 host
> ceph osd crush add-bucket host_a3-1 host
> ceph osd crush add-bucket host_a3-2 host
> # and move it to racks
> ceph osd crush move host_a1-1 rack=rack_a1
> ceph osd crush move host_a1-2 rack=rack_a1
> ceph osd crush move host_a2-1 rack=rack_a2
> ceph osd crush move host_a2-2 rack=rack_a2
> ceph osd crush move host_a3-1 rack=rack_a3
> ceph osd crush move host_a3-2 rack=rack_a3
> # now it's time to deploy osds. when osds is 'up' and 'in' and properly class
> # is assigned we can move it hosts. In case when class is wrong, i.e.
> # 'nvme' is detected as 'ssd' we can rewrite device class like this:
> ceph osd crush rm-device-class osd.5
> ceph osd crush set-device-class nvme osd.5
> # okay, `ceph osd tree` show our osds with device classes, move it to hosts:
> ceph osd crush move 

Re: [ceph-users] Questions on CRUSH map

2018-08-21 Thread Konstantin Shalygin

On 08/20/2018 08:15 PM, Cody wrote:

Hi Konstantin,

Thank you for looking into my question.

I was trying to understand how to set up CRUSH hierarchies and set
rules for different failure domains. I am particularly confused by the
'step take' and 'step choose|chooseleaf' settings for which I think
are the keys for defining a failure domain in a CRUSH rule.

As for my hypothetical cluster, it is made of 3 racks with 2 hosts on
each. One host has 3 SSD-based OSDs and the other has 3 HDD-based
OSDs. I wished to create two rules: one uses SSD-only and another
HDD-only. Both rules should have a rack level failure domain.

I have attached a diagram that may help to explain my setup. The
following is my CRUSH map configuration (with all typos fixed) for
review:

device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class ssd
device 15 osd.15 class hdd
device 16 osd.17 class hdd
device 17 osd.17 class hdd

   host a1-1 {
   id -1
   alg straw
   hash 0
   item osd.0 weight 1.00
   item osd.1 weight 1.00
   item osd.2 weight 1.00
   }

   host a1-2 {
   id -2
   alg straw
   hash 0
   item osd.3 weight 1.00
   item osd.4 weight 1.00
   item osd.5 weight 1.00
   }

   host a2-1 {
   id -3
   alg straw
   hash 0
   item osd.6 weight 1.00
   item osd.7 weight 1.00
   item osd.8 weight 1.00
   }

   host a2-2 {
   id -4
   alg straw
   hash 0
   item osd.9 weight 1.00
   item osd.10 weight 1.00
   item osd.11 weight 1.00
   }

   host a3-1 {
   id -5
   alg straw
   hash 0
   item osd.12 weight 1.00
   item osd.13 weight 1.00
   item osd.14 weight 1.00
   }

   host a3-2 {
   id -6
   alg straw
   hash 0
   item osd.15 weight 1.00
   item osd.16 weight 1.00
   item osd.17 weight 1.00
   }

   rack a1 {
   id -7
   alg straw
   hash 0
   item a1-1 weight 3.0
   item a1-2 weight 3.0
   }

   rack a2 {
   id -5
   alg straw
   hash 0
   item a2-1 weight 3.0
   item a2-2 weight 3.0
   }

   rack a3 {
   id -6
   alg straw
   hash 0
   item a3-1 weight 3.0
   item a3-2 weight 3.0
   }

   row a {
   id -7
   alg straw
   hash 0
   item a1 6.0
   item a2 6.0
   item a3 6.0
   }

   rule ssd {
   id 1
   type replicated
   min_size 2
   max_size 11
   step take a class ssd
   step chooseleaf firstn 0 type rack
   step emit
   }

   rule hdd {
   id 2
   type replicated
   min_size 2
   max_size 11
   step take a class hdd
   step chooseleaf firstn 0 type rack
   step emit
   }


Are the two rules correct?




Times when you need manually edit CRUSH map is gone. Manual editing even 
in your case has already lead to errors.




# create new datacenter and move it to default root
ceph osd crush add-bucket new_datacenter datacenter
ceph osd crush move new_datacenter root=default
# create our racks
ceph osd crush add-bucket rack_a1 rack
ceph osd crush add-bucket rack_a2 rack
ceph osd crush add-bucket rack_a3 rack
# move our racks to our datacenter
ceph osd crush move rack_a1 datacenter=new_datacenter
ceph osd crush move rack_a2 datacenter=new_datacenter
ceph osd crush move rack_a3 datacenter=new_datacenter
# create our hosts
ceph osd crush add-bucket host_a1-1 host
ceph osd crush add-bucket host_a1-2 host
ceph osd crush add-bucket host_a2-1 host
ceph osd crush add-bucket host_a2-2 host
ceph osd crush add-bucket host_a3-1 host
ceph osd crush add-bucket host_a3-2 host
# and move it to racks
ceph osd crush move host_a1-1 rack=rack_a1
ceph osd crush move host_a1-2 rack=rack_a1
ceph osd crush move host_a2-1 rack=rack_a2
ceph osd crush move host_a2-2 rack=rack_a2
ceph osd crush move host_a3-1 rack=rack_a3
ceph osd crush move host_a3-2 rack=rack_a3
# now it's time to deploy osds. when osds is 'up' and 'in' and properly 
class

# is assigned we can move it hosts. In case when class is wrong, i.e.
# 'nvme' is detected as 'ssd' we can rewrite device class like this:
ceph osd crush rm-device-class osd.5
ceph osd crush set-device-class nvme osd.5
# okay, `ceph osd tree` show our osds with device classes, move it to hosts:
ceph osd crush move osd.0 host=host_a1-1
ceph osd crush move osd.1 host=host_a1-1
ceph osd crush move osd.2 host=host_a1-1
ceph osd crush move osd.3 host=host_a1-2
ceph osd crush move osd.4 host=host_a1-2
ceph osd crush move osd.5 host=host_a1-2
...
# when this done we should reweight osds on crush map
# ssd drives is 960Gb
ceph osd crush reweight osd.0 0.960
ceph osd crush reweight osd.1 0.960
ceph osd crush reweight osd.2 

Re: [ceph-users] cephfs client version in RedHat/CentOS 7.5

2018-08-21 Thread Ilya Dryomov
On Mon, Aug 20, 2018 at 9:49 PM Dan van der Ster  wrote:
>
> On Mon, Aug 20, 2018 at 5:37 PM Ilya Dryomov  wrote:
> >
> > On Mon, Aug 20, 2018 at 4:52 PM Dietmar Rieder
> >  wrote:
> > >
> > > Hi Cephers,
> > >
> > >
> > > I wonder if the cephfs client in RedHat/CentOS 7.5 will be updated to
> > > luminous?
> > > As far as I see there is some luminous related stuff that was
> > > backported, however,
> > > the "ceph features" command just reports "jewel" as release of my cephfs
> > > clients running CentOS 7.5 (kernel 3.10.0-862.11.6.el7.x86_64)
> > >
> > >
> > > {
> > > "mon": {
> > > "group": {
> > > "features": "0x3ffddff8eea4fffb",
> > > "release": "luminous",
> > > "num": 3
> > > }
> > > },
> > > "mds": {
> > > "group": {
> > > "features": "0x3ffddff8eea4fffb",
> > > "release": "luminous",
> > > "num": 3
> > > }
> > > },
> > > "osd": {
> > > "group": {
> > > "features": "0x3ffddff8eea4fffb",
> > > "release": "luminous",
> > > "num": 240
> > > }
> > > },
> > > "client": {
> > > "group": {
> > > "features": "0x7010fb86aa42ada",
> > > "release": "jewel",
> > > "num": 23
> > > },
> > > "group": {
> > > "features": "0x3ffddff8eea4fffb",
> > > "release": "luminous",
> > > "num": 4
> > > }
> > > }
> > > }
> > >
> > >
> > > This prevents me to run ceph balancer using the upmap mode.
> > >
> > >
> > > Any idea?
> >
> > Hi Dietmar,
> >
> > All luminous features are supported in RedHat/CentOS 7.5, but it shows
> > up as jewel due to a technicality.
>
> Except rados namespaces, right? Manila CephFS shares are not yet
> mountable with 7.5.

Yes, I was talking about cluster-wide feature bits, as that is what
"ceph features" is about.  CephFS layouts with namespaces are indeed
not supported in 7.5.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs client version in RedHat/CentOS 7.5

2018-08-21 Thread Dietmar Rieder
On 08/21/2018 02:22 PM, Ilya Dryomov wrote:
> On Tue, Aug 21, 2018 at 9:12 AM Dietmar Rieder
>  wrote:
>>
>> On 08/20/2018 05:36 PM, Ilya Dryomov wrote:
>>> On Mon, Aug 20, 2018 at 4:52 PM Dietmar Rieder
>>>  wrote:

 Hi Cephers,


 I wonder if the cephfs client in RedHat/CentOS 7.5 will be updated to
 luminous?
 As far as I see there is some luminous related stuff that was
 backported, however,
 the "ceph features" command just reports "jewel" as release of my cephfs
 clients running CentOS 7.5 (kernel 3.10.0-862.11.6.el7.x86_64)


 {
 "mon": {
 "group": {
 "features": "0x3ffddff8eea4fffb",
 "release": "luminous",
 "num": 3
 }
 },
 "mds": {
 "group": {
 "features": "0x3ffddff8eea4fffb",
 "release": "luminous",
 "num": 3
 }
 },
 "osd": {
 "group": {
 "features": "0x3ffddff8eea4fffb",
 "release": "luminous",
 "num": 240
 }
 },
 "client": {
 "group": {
 "features": "0x7010fb86aa42ada",
 "release": "jewel",
 "num": 23
 },
 "group": {
 "features": "0x3ffddff8eea4fffb",
 "release": "luminous",
 "num": 4
 }
 }
 }


 This prevents me to run ceph balancer using the upmap mode.


 Any idea?
>>>
>>> Hi Dietmar,
>>>
>>> All luminous features are supported in RedHat/CentOS 7.5, but it shows
>>> up as jewel due to a technicality.  Just do
>>>
>>>   $ ceph osd set-require-min-compat-client luminous --yes-i-really-mean-it
>>>
>>> to override the safety check.
>>>
>>> See https://www.spinics.net/lists/ceph-users/msg45071.html for details.
>>> It references an upstream kernel, but both the problem and the solution
>>> are the same.
>>>
>>
>> Hi Ilya,
>>
>> thank you for your answer.
>>
>> Just to make sure:
>> The thread you are referring to, is about kernel 4.13+, is this also
>> true for the "official" RedHat/CentOS 7.5 kernel 3.10
>> (3.10.0-862.11.6.el7.x86_64) ?
> 
> Yes, it is.
> 

Thanks

Dietmar




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network cluster / addr

2018-08-21 Thread David Turner
Private is only for OSDs. Nothing else communicates on that. MONs, MGRs,
MDSs, RGWs, and clients all communicate on the public network. Even OSDs
need to communicate with MONs on the public network.

All of that said, it is generally considered useless to split your private
and public subnets. Even on very large clusters there is very little
difference in modern ceph. Definitely not worth the complexity of multiple
switches, multiple VLANs maybe.

On Tue, Aug 21, 2018, 3:34 AM Janne Johansson  wrote:

> Den tis 21 aug. 2018 kl 09:31 skrev Nino Bosteels <
> n.boste...@proximedia.be>:
>
>>
>> * Does ceph interpret multiple values for this in the ceph.conf (I
>> wouldn’t say so out of my tests)?
>>
>> * Shouldn’t public network be your internet facing range and cluster
>> network the private range?
>>
>
> "Public" doesn't necessarily mean "reachable from internet", it means
> "where ceph consumers and clients can talk", and the private network is
> "where only OSDs and ceph infrastructure can talk to eachother".
>
> Ceph clients can still be non-reachable from the internet, it's not the
> same meaning that firewall vendors place on "private" and "public".
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs client version in RedHat/CentOS 7.5

2018-08-21 Thread Ilya Dryomov
On Tue, Aug 21, 2018 at 9:12 AM Dietmar Rieder
 wrote:
>
> On 08/20/2018 05:36 PM, Ilya Dryomov wrote:
> > On Mon, Aug 20, 2018 at 4:52 PM Dietmar Rieder
> >  wrote:
> >>
> >> Hi Cephers,
> >>
> >>
> >> I wonder if the cephfs client in RedHat/CentOS 7.5 will be updated to
> >> luminous?
> >> As far as I see there is some luminous related stuff that was
> >> backported, however,
> >> the "ceph features" command just reports "jewel" as release of my cephfs
> >> clients running CentOS 7.5 (kernel 3.10.0-862.11.6.el7.x86_64)
> >>
> >>
> >> {
> >> "mon": {
> >> "group": {
> >> "features": "0x3ffddff8eea4fffb",
> >> "release": "luminous",
> >> "num": 3
> >> }
> >> },
> >> "mds": {
> >> "group": {
> >> "features": "0x3ffddff8eea4fffb",
> >> "release": "luminous",
> >> "num": 3
> >> }
> >> },
> >> "osd": {
> >> "group": {
> >> "features": "0x3ffddff8eea4fffb",
> >> "release": "luminous",
> >> "num": 240
> >> }
> >> },
> >> "client": {
> >> "group": {
> >> "features": "0x7010fb86aa42ada",
> >> "release": "jewel",
> >> "num": 23
> >> },
> >> "group": {
> >> "features": "0x3ffddff8eea4fffb",
> >> "release": "luminous",
> >> "num": 4
> >> }
> >> }
> >> }
> >>
> >>
> >> This prevents me to run ceph balancer using the upmap mode.
> >>
> >>
> >> Any idea?
> >
> > Hi Dietmar,
> >
> > All luminous features are supported in RedHat/CentOS 7.5, but it shows
> > up as jewel due to a technicality.  Just do
> >
> >   $ ceph osd set-require-min-compat-client luminous --yes-i-really-mean-it
> >
> > to override the safety check.
> >
> > See https://www.spinics.net/lists/ceph-users/msg45071.html for details.
> > It references an upstream kernel, but both the problem and the solution
> > are the same.
> >
>
> Hi Ilya,
>
> thank you for your answer.
>
> Just to make sure:
> The thread you are referring to, is about kernel 4.13+, is this also
> true for the "official" RedHat/CentOS 7.5 kernel 3.10
> (3.10.0-862.11.6.el7.x86_64) ?

Yes, it is.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] alert conditions

2018-08-21 Thread Jan Fajerski
Fwiw I added a few things to https://pad.ceph.com/p/alert-conditions and will 
circulate this mail a bit wider.

Or maybe there is not all that much interest in alerting...

On Mon, Jul 23, 2018 at 06:10:04PM +0200, Jan Fajerski wrote:

Hi community,
the topic of alerting conditions for a ceph cluster comes up in 
various contexts. Some folks use prometheus or grafana, (I believe) 
sopme people would like snmp traps from ceph, the mgr dashboard could 
provide basic alerting capabilities and there is of course ceph -s.

Also see "Improving alerting/health checks" on ceph-devel.

Working on some prometheus stuff I think it would be nice to have some 
basic alerting rules in the ceph repo. This could serve as a 
out-of-the-box default as well as a example or best practice which 
conditions should be watched.


So I'm wondering what does the community think? What do operators use 
as alert conditions or find alert-worthy?
I'm aware that this is very open-ended, highly dependent on the 
cluster and its workload and can range from obvious (health_err 
anyone?) to intricate conditions that are designed for a certain 
cluster. I'm wondering if we can distill some non-trivial alert 
conditions that ceph itself does not (yet) provide.


If you have any conditions fitting that description, feel free to add 
them to https://pad.ceph.com/p/alert-conditions. Otherwise looking 
forward to feedback.


jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph configuration; Was: FreeBSD rc.d script: sta.rt not found

2018-08-21 Thread Willem Jan Withagen

Norman,

I'm cc-ing this back to ceph-users for others the reply to or in future 
to find


On 21/08/2018 12:01, Norman Gray wrote:


Willem Jan, hello.

Thanks for your detailed notes on my list question.

On 20 Aug 2018, at 21:32, Willem Jan Withagen wrote:


 # zpool create -m/var/lib/ceph/osd/osd.0 osd.0 gpt/zd000 gpt/zd001


Over the weekend I update the Ceph manual for FreeBSD manual, with 
exactly that.
I 'm not sure what sort of devices zd000 and zd001 are, but concating 
devices seriously lowers the MTBF for the vdev. And as such it is 
likely better to create 2 OSDs on these 2 devices.


My sort-of problem is that the machine I'm doing this on was not specced 
with Ceph in mind: it has 16 3.5TB disks.  Given that 
 
suggests that 20 is a 'high' number of OSDs on a host, I thought it 
might be better to aim for an initial setup of 6 two-disk OSDs rather 
than 12 one-disk ones (leaving four disks free).


That said, 12 < 20, so I think that, especially bearing in mind your 
advice here, I should probably stick to 1-disk OSDs with one (default) 
5GB SSD journal each, and not complicate things.


Only one way to find out: try both...
But I certainly do not advise to put concat disks in an OSD. Especially 
not for production. Break one disk, you break the vdev.


And the most important thing for OSDs is 1G per 1T of disk.
So with 70T of disk you'd need 64 or more of RAM, preferably more since 
ZFS will want his share as well..
CPUs there is not going to that much of a issue. Unless you have real 
tiny CPUs.


What I still have not figured out is what to do with the SSDs.
There are 3 things you can do (or in any combination)
1) Ceph standard: make it a journal. Mount the SSD on a separate dir and
get ceph-disk to start using it as journal
2) Attach a ZFS cache to the vdev which will improve reading
3) Attach a ZFS log on SSD to the vdev to improve sync writing.

At the moment I'm doing all three:
[~] w...@freetest.digiware.nl> zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
osd.0.journal  316K  5.33G88K 
/usr/jails/ceph_0/var/lib/ceph/osd/osd.0/journal-ssd
osd.1.journal  316K  5.33G88K 
/usr/jails/ceph_1/var/lib/ceph/osd/osd.1/journal-ssd
osd.2.journal  316K  5.33G88K 
/usr/jails/ceph_2/var/lib/ceph/osd/osd.2/journal-ssd
osd.3.journal  316K  5.33G88K 
/usr/jails/ceph_3/var/lib/ceph/osd/osd.3/journal-ssd
osd.4.journal  316K  5.33G88K 
/usr/jails/ceph_4/var/lib/ceph/osd/osd.4/journal-ssd
osd.5.journal  316K  5.33G88K 
/usr/jails/ceph_0/var/lib/ceph/osd/osd.5/journal-ssd
osd.6.journal  316K  5.33G88K 
/usr/jails/ceph_1/var/lib/ceph/osd/osd.6/journal-ssd
osd.7.journal  316K  5.33G88K 
/usr/jails/ceph_2/var/lib/ceph/osd/osd.7/journal-ssd
osd_0 5.16G   220G  5.16G 
/usr/jails/ceph_0/var/lib/ceph/osd/osd.0
osd_1 5.34G   219G  5.34G 
/usr/jails/ceph_1/var/lib/ceph/osd/osd.1
osd_2 5.42G   219G  5.42G 
/usr/jails/ceph_2/var/lib/ceph/osd/osd.2
osd_3 6.62G  1.31T  6.62G 
/usr/jails/ceph_3/var/lib/ceph/osd/osd.3
osd_4 6.83G  1.75T  6.83G 
/usr/jails/ceph_4/var/lib/ceph/osd/osd.4
osd_5 5.92G  1.31T  5.92G 
/usr/jails/ceph_0/var/lib/ceph/osd/osd.5
osd_6 6.00G  1.31T  6.00G 
/usr/jails/ceph_1/var/lib/ceph/osd/osd.6
osd_7 6.10G  1.31T  6.10G 
/usr/jails/ceph_2/var/lib/ceph/osd/osd.7


[~] w...@freetest.digiware.nl> zpool list -v osd_1
NAMESIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAGCAP 
DEDUP  HEALTH  ALTROOT
osd_1   232G  5.34G   227G- - 0% 2% 
1.00x  ONLINE  -

  gpt/osd_1 232G  5.34G   227G- - 0% 2%
log-  -  - -  -  -
  gpt/osd.1.log 960M12K   960M- - 0% 0%
cache  -  -  - -  -  -
  gpt/osd.1.cache  22.0G  1.01G  21.0G- - 0% 4%

So each OSD has a SSD journal (zfs volume) and each osd volume has cache 
and log. ATM the cluster is idle, so hence the log is "empty"


But I would first work on the architecture of how you want the cluster 
to be, and then start tuning. fs log and cache are easily added and 
removed after the fact.


I found what appear to be a couple of typos in your script which I can 
report back to you.  I hope to make significant progress with this work 
this week, so should be able to give you more feedback on the script, on 
my experiences, and on the FreeBSD page in the manual.


Sure, keep'm coming

--WjW


I'll work through your various notes.  Below are a couple of specific 
points.



When I attempt to start the service, I get:

# service ceph start
=== mon.pochhammer ===


You're sort of free to pick names, but most of the times tooling 
expects naming 

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-21 Thread Stefan Priebe - Profihost AG


Am 21.08.2018 um 12:03 schrieb Stefan Priebe - Profihost AG:
> 
> Am 21.08.2018 um 11:56 schrieb Dan van der Ster:
>> On Tue, Aug 21, 2018 at 11:54 AM Stefan Priebe - Profihost AG
>>  wrote:
>>>
>>> Am 21.08.2018 um 11:47 schrieb Dan van der Ster:
 On Mon, Aug 20, 2018 at 10:45 PM Stefan Priebe - Profihost AG
  wrote:
>
>
> Am 20.08.2018 um 22:38 schrieb Dan van der Ster:
>> On Mon, Aug 20, 2018 at 10:19 PM Stefan Priebe - Profihost AG
>>  wrote:
>>>
>>>
>>> Am 20.08.2018 um 21:52 schrieb Sage Weil:
 On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote:
> Hello,
>
> since loic seems to have left ceph development and his wunderful crush
> optimization tool isn'T working anymore i'm trying to get a good
> distribution with the ceph balancer.
>
> Sadly it does not work as good as i want.
>
> # ceph osd df | sort -k8
>
> show 75 to 83% Usage which is 8% difference which is too much for me.
> I'm optimization by bytes.
>
> # ceph balancer eval
> current cluster score 0.005420 (lower is better)
>
> # ceph balancer eval $OPT_NAME
> plan spriebe_2018-08-20_19:36 final score 0.005456 (lower is better)
>
> I'm unable to optimize further ;-( Is there any chance to optimize
> further even in case of more rebelancing?

 The scoring that the balancer module is doing is currently a hybrid of 
 pg
 count, bytes, and object count.  Picking a single metric might help a 
 bit
 (as those 3 things are not always perfectly aligned).
>>>
>>> Hi,
>>>
>>> ok i found a bug in the balancer code which seems to be present in all
>>> releases.
>>>
>>>  861 best_ws = next_ws
>>>  862 best_ow = next_ow
>>>
>>>
>>> should be:
>>>
>>>  861 best_ws = copy.deepcopy(next_ws)
>>>  862 best_ow = copy.deepcopy(next_ow)
>>>
>>> otherwise it does not use the best but the last.
>>
>> Interesting... does that change improve things?
>
> It fixes the following (mgr debug output):
> 2018-08-20 22:33:46.078525 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001180, misplacing 0.000912
> 2018-08-20 22:33:46.078574 7f2fbc3b6700  0 mgr[balancer] Score got
> worse, taking another step
> 2018-08-20 22:33:46.078770 7f2fbc3b6700  0 mgr[balancer] Balancing root
> default (pools ['cephstor2']) by bytes
> 2018-08-20 22:33:46.156326 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001180, misplacing 0.000912
> 2018-08-20 22:33:46.156374 7f2fbc3b6700  0 mgr[balancer] Score got
> worse, taking another step
> 2018-08-20 22:33:46.156581 7f2fbc3b6700  0 mgr[balancer] Balancing root
> default (pools ['cephstor2']) by bytes
> 2018-08-20 22:33:46.233818 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001180, misplacing 0.000912
> 2018-08-20 22:33:46.233868 7f2fbc3b6700  0 mgr[balancer] Score got
> worse, taking another step
> 2018-08-20 22:33:46.234043 7f2fbc3b6700  0 mgr[balancer] Balancing root
> default (pools ['cephstor2']) by bytes
> 2018-08-20 22:33:46.313212 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001180, misplacing 0.000912
> 2018-08-20 22:33:46.313714 7f2fbc3b6700  0 mgr[balancer] Score got
> worse, trying smaller step 0.000244
> 2018-08-20 22:33:46.313887 7f2fbc3b6700  0 mgr[balancer] Balancing root
> default (pools ['cephstor2']) by bytes
> 2018-08-20 22:33:46.391586 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001152, misplacing 0.001141
> 2018-08-20 22:33:46.393374 7f2fbc3b6700  0 mgr[balancer] Balancing root
> default (pools ['cephstor2']) by bytes
> 2018-08-20 22:33:46.473956 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001180, misplacing 0.000912
> 2018-08-20 22:33:46.474001 7f2fbc3b6700  0 mgr[balancer] Score got
> worse, taking another step
> 2018-08-20 22:33:46.474046 7f2fbc3b6700  0 mgr[balancer] Success, score
> 0.001155 -> 0.001152
>
> BUT:
> # ceph balancer eval myplan
> plan myplan final score 0.001180 (lower is better)
>
> So the final plan does NOT contain the expected optimization. The
> deepcopy fixes it.
>
> After:
> # ceph balancer eval myplan
> plan myplan final score 0.001152 (lower is better)
>

 OK that looks like a bug. Did you create a tracker or PR?
>>>
>>> No not yet. Should i create a PR on github with the fix?
>>
>> Yeah, probably tracker first (requesting luminous,mimic backports),
>> then a PR on master with "Fixes: tracker..."
> 
> Will do but can't find a create button in the tracker. I've opened
> several 

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-21 Thread Stefan Priebe - Profihost AG


Am 21.08.2018 um 11:56 schrieb Dan van der Ster:
> On Tue, Aug 21, 2018 at 11:54 AM Stefan Priebe - Profihost AG
>  wrote:
>>
>> Am 21.08.2018 um 11:47 schrieb Dan van der Ster:
>>> On Mon, Aug 20, 2018 at 10:45 PM Stefan Priebe - Profihost AG
>>>  wrote:


 Am 20.08.2018 um 22:38 schrieb Dan van der Ster:
> On Mon, Aug 20, 2018 at 10:19 PM Stefan Priebe - Profihost AG
>  wrote:
>>
>>
>> Am 20.08.2018 um 21:52 schrieb Sage Weil:
>>> On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote:
 Hello,

 since loic seems to have left ceph development and his wunderful crush
 optimization tool isn'T working anymore i'm trying to get a good
 distribution with the ceph balancer.

 Sadly it does not work as good as i want.

 # ceph osd df | sort -k8

 show 75 to 83% Usage which is 8% difference which is too much for me.
 I'm optimization by bytes.

 # ceph balancer eval
 current cluster score 0.005420 (lower is better)

 # ceph balancer eval $OPT_NAME
 plan spriebe_2018-08-20_19:36 final score 0.005456 (lower is better)

 I'm unable to optimize further ;-( Is there any chance to optimize
 further even in case of more rebelancing?
>>>
>>> The scoring that the balancer module is doing is currently a hybrid of 
>>> pg
>>> count, bytes, and object count.  Picking a single metric might help a 
>>> bit
>>> (as those 3 things are not always perfectly aligned).
>>
>> Hi,
>>
>> ok i found a bug in the balancer code which seems to be present in all
>> releases.
>>
>>  861 best_ws = next_ws
>>  862 best_ow = next_ow
>>
>>
>> should be:
>>
>>  861 best_ws = copy.deepcopy(next_ws)
>>  862 best_ow = copy.deepcopy(next_ow)
>>
>> otherwise it does not use the best but the last.
>
> Interesting... does that change improve things?

 It fixes the following (mgr debug output):
 2018-08-20 22:33:46.078525 7f2fbc3b6700  0 mgr[balancer] Step result
 score 0.001152 -> 0.001180, misplacing 0.000912
 2018-08-20 22:33:46.078574 7f2fbc3b6700  0 mgr[balancer] Score got
 worse, taking another step
 2018-08-20 22:33:46.078770 7f2fbc3b6700  0 mgr[balancer] Balancing root
 default (pools ['cephstor2']) by bytes
 2018-08-20 22:33:46.156326 7f2fbc3b6700  0 mgr[balancer] Step result
 score 0.001152 -> 0.001180, misplacing 0.000912
 2018-08-20 22:33:46.156374 7f2fbc3b6700  0 mgr[balancer] Score got
 worse, taking another step
 2018-08-20 22:33:46.156581 7f2fbc3b6700  0 mgr[balancer] Balancing root
 default (pools ['cephstor2']) by bytes
 2018-08-20 22:33:46.233818 7f2fbc3b6700  0 mgr[balancer] Step result
 score 0.001152 -> 0.001180, misplacing 0.000912
 2018-08-20 22:33:46.233868 7f2fbc3b6700  0 mgr[balancer] Score got
 worse, taking another step
 2018-08-20 22:33:46.234043 7f2fbc3b6700  0 mgr[balancer] Balancing root
 default (pools ['cephstor2']) by bytes
 2018-08-20 22:33:46.313212 7f2fbc3b6700  0 mgr[balancer] Step result
 score 0.001152 -> 0.001180, misplacing 0.000912
 2018-08-20 22:33:46.313714 7f2fbc3b6700  0 mgr[balancer] Score got
 worse, trying smaller step 0.000244
 2018-08-20 22:33:46.313887 7f2fbc3b6700  0 mgr[balancer] Balancing root
 default (pools ['cephstor2']) by bytes
 2018-08-20 22:33:46.391586 7f2fbc3b6700  0 mgr[balancer] Step result
 score 0.001152 -> 0.001152, misplacing 0.001141
 2018-08-20 22:33:46.393374 7f2fbc3b6700  0 mgr[balancer] Balancing root
 default (pools ['cephstor2']) by bytes
 2018-08-20 22:33:46.473956 7f2fbc3b6700  0 mgr[balancer] Step result
 score 0.001152 -> 0.001180, misplacing 0.000912
 2018-08-20 22:33:46.474001 7f2fbc3b6700  0 mgr[balancer] Score got
 worse, taking another step
 2018-08-20 22:33:46.474046 7f2fbc3b6700  0 mgr[balancer] Success, score
 0.001155 -> 0.001152

 BUT:
 # ceph balancer eval myplan
 plan myplan final score 0.001180 (lower is better)

 So the final plan does NOT contain the expected optimization. The
 deepcopy fixes it.

 After:
 # ceph balancer eval myplan
 plan myplan final score 0.001152 (lower is better)

>>>
>>> OK that looks like a bug. Did you create a tracker or PR?
>>
>> No not yet. Should i create a PR on github with the fix?
> 
> Yeah, probably tracker first (requesting luminous,mimic backports),
> then a PR on master with "Fixes: tracker..."

Will do but can't find a create button in the tracker. I've opened
several reports in the past but right now it seems a can't create a ticket.

Stefan

> 
> -- dan
> 
> 
>>
>>> -- Dan
>>>
>>>
>
> Also, if most of your data is in one pool you can try ceph 

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-21 Thread Dan van der Ster
On Tue, Aug 21, 2018 at 11:54 AM Stefan Priebe - Profihost AG
 wrote:
>
> Am 21.08.2018 um 11:47 schrieb Dan van der Ster:
> > On Mon, Aug 20, 2018 at 10:45 PM Stefan Priebe - Profihost AG
> >  wrote:
> >>
> >>
> >> Am 20.08.2018 um 22:38 schrieb Dan van der Ster:
> >>> On Mon, Aug 20, 2018 at 10:19 PM Stefan Priebe - Profihost AG
> >>>  wrote:
> 
> 
>  Am 20.08.2018 um 21:52 schrieb Sage Weil:
> > On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote:
> >> Hello,
> >>
> >> since loic seems to have left ceph development and his wunderful crush
> >> optimization tool isn'T working anymore i'm trying to get a good
> >> distribution with the ceph balancer.
> >>
> >> Sadly it does not work as good as i want.
> >>
> >> # ceph osd df | sort -k8
> >>
> >> show 75 to 83% Usage which is 8% difference which is too much for me.
> >> I'm optimization by bytes.
> >>
> >> # ceph balancer eval
> >> current cluster score 0.005420 (lower is better)
> >>
> >> # ceph balancer eval $OPT_NAME
> >> plan spriebe_2018-08-20_19:36 final score 0.005456 (lower is better)
> >>
> >> I'm unable to optimize further ;-( Is there any chance to optimize
> >> further even in case of more rebelancing?
> >
> > The scoring that the balancer module is doing is currently a hybrid of 
> > pg
> > count, bytes, and object count.  Picking a single metric might help a 
> > bit
> > (as those 3 things are not always perfectly aligned).
> 
>  Hi,
> 
>  ok i found a bug in the balancer code which seems to be present in all
>  releases.
> 
>   861 best_ws = next_ws
>   862 best_ow = next_ow
> 
> 
>  should be:
> 
>   861 best_ws = copy.deepcopy(next_ws)
>   862 best_ow = copy.deepcopy(next_ow)
> 
>  otherwise it does not use the best but the last.
> >>>
> >>> Interesting... does that change improve things?
> >>
> >> It fixes the following (mgr debug output):
> >> 2018-08-20 22:33:46.078525 7f2fbc3b6700  0 mgr[balancer] Step result
> >> score 0.001152 -> 0.001180, misplacing 0.000912
> >> 2018-08-20 22:33:46.078574 7f2fbc3b6700  0 mgr[balancer] Score got
> >> worse, taking another step
> >> 2018-08-20 22:33:46.078770 7f2fbc3b6700  0 mgr[balancer] Balancing root
> >> default (pools ['cephstor2']) by bytes
> >> 2018-08-20 22:33:46.156326 7f2fbc3b6700  0 mgr[balancer] Step result
> >> score 0.001152 -> 0.001180, misplacing 0.000912
> >> 2018-08-20 22:33:46.156374 7f2fbc3b6700  0 mgr[balancer] Score got
> >> worse, taking another step
> >> 2018-08-20 22:33:46.156581 7f2fbc3b6700  0 mgr[balancer] Balancing root
> >> default (pools ['cephstor2']) by bytes
> >> 2018-08-20 22:33:46.233818 7f2fbc3b6700  0 mgr[balancer] Step result
> >> score 0.001152 -> 0.001180, misplacing 0.000912
> >> 2018-08-20 22:33:46.233868 7f2fbc3b6700  0 mgr[balancer] Score got
> >> worse, taking another step
> >> 2018-08-20 22:33:46.234043 7f2fbc3b6700  0 mgr[balancer] Balancing root
> >> default (pools ['cephstor2']) by bytes
> >> 2018-08-20 22:33:46.313212 7f2fbc3b6700  0 mgr[balancer] Step result
> >> score 0.001152 -> 0.001180, misplacing 0.000912
> >> 2018-08-20 22:33:46.313714 7f2fbc3b6700  0 mgr[balancer] Score got
> >> worse, trying smaller step 0.000244
> >> 2018-08-20 22:33:46.313887 7f2fbc3b6700  0 mgr[balancer] Balancing root
> >> default (pools ['cephstor2']) by bytes
> >> 2018-08-20 22:33:46.391586 7f2fbc3b6700  0 mgr[balancer] Step result
> >> score 0.001152 -> 0.001152, misplacing 0.001141
> >> 2018-08-20 22:33:46.393374 7f2fbc3b6700  0 mgr[balancer] Balancing root
> >> default (pools ['cephstor2']) by bytes
> >> 2018-08-20 22:33:46.473956 7f2fbc3b6700  0 mgr[balancer] Step result
> >> score 0.001152 -> 0.001180, misplacing 0.000912
> >> 2018-08-20 22:33:46.474001 7f2fbc3b6700  0 mgr[balancer] Score got
> >> worse, taking another step
> >> 2018-08-20 22:33:46.474046 7f2fbc3b6700  0 mgr[balancer] Success, score
> >> 0.001155 -> 0.001152
> >>
> >> BUT:
> >> # ceph balancer eval myplan
> >> plan myplan final score 0.001180 (lower is better)
> >>
> >> So the final plan does NOT contain the expected optimization. The
> >> deepcopy fixes it.
> >>
> >> After:
> >> # ceph balancer eval myplan
> >> plan myplan final score 0.001152 (lower is better)
> >>
> >
> > OK that looks like a bug. Did you create a tracker or PR?
>
> No not yet. Should i create a PR on github with the fix?

Yeah, probably tracker first (requesting luminous,mimic backports),
then a PR on master with "Fixes: tracker..."

-- dan


>
> > -- Dan
> >
> >
> >>>
> >>> Also, if most of your data is in one pool you can try ceph balancer
> >>> eval 
> >>
> >> Already tried this doesn't help much.
> >>
> >> Greets,
> >> Stefan
> >>
> >>
> >>> -- dan
> >>>
> 
>  I'm also using this one:
>  

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-21 Thread Stefan Priebe - Profihost AG
Am 21.08.2018 um 11:47 schrieb Dan van der Ster:
> On Mon, Aug 20, 2018 at 10:45 PM Stefan Priebe - Profihost AG
>  wrote:
>>
>>
>> Am 20.08.2018 um 22:38 schrieb Dan van der Ster:
>>> On Mon, Aug 20, 2018 at 10:19 PM Stefan Priebe - Profihost AG
>>>  wrote:


 Am 20.08.2018 um 21:52 schrieb Sage Weil:
> On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> since loic seems to have left ceph development and his wunderful crush
>> optimization tool isn'T working anymore i'm trying to get a good
>> distribution with the ceph balancer.
>>
>> Sadly it does not work as good as i want.
>>
>> # ceph osd df | sort -k8
>>
>> show 75 to 83% Usage which is 8% difference which is too much for me.
>> I'm optimization by bytes.
>>
>> # ceph balancer eval
>> current cluster score 0.005420 (lower is better)
>>
>> # ceph balancer eval $OPT_NAME
>> plan spriebe_2018-08-20_19:36 final score 0.005456 (lower is better)
>>
>> I'm unable to optimize further ;-( Is there any chance to optimize
>> further even in case of more rebelancing?
>
> The scoring that the balancer module is doing is currently a hybrid of pg
> count, bytes, and object count.  Picking a single metric might help a bit
> (as those 3 things are not always perfectly aligned).

 Hi,

 ok i found a bug in the balancer code which seems to be present in all
 releases.

  861 best_ws = next_ws
  862 best_ow = next_ow


 should be:

  861 best_ws = copy.deepcopy(next_ws)
  862 best_ow = copy.deepcopy(next_ow)

 otherwise it does not use the best but the last.
>>>
>>> Interesting... does that change improve things?
>>
>> It fixes the following (mgr debug output):
>> 2018-08-20 22:33:46.078525 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001180, misplacing 0.000912
>> 2018-08-20 22:33:46.078574 7f2fbc3b6700  0 mgr[balancer] Score got
>> worse, taking another step
>> 2018-08-20 22:33:46.078770 7f2fbc3b6700  0 mgr[balancer] Balancing root
>> default (pools ['cephstor2']) by bytes
>> 2018-08-20 22:33:46.156326 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001180, misplacing 0.000912
>> 2018-08-20 22:33:46.156374 7f2fbc3b6700  0 mgr[balancer] Score got
>> worse, taking another step
>> 2018-08-20 22:33:46.156581 7f2fbc3b6700  0 mgr[balancer] Balancing root
>> default (pools ['cephstor2']) by bytes
>> 2018-08-20 22:33:46.233818 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001180, misplacing 0.000912
>> 2018-08-20 22:33:46.233868 7f2fbc3b6700  0 mgr[balancer] Score got
>> worse, taking another step
>> 2018-08-20 22:33:46.234043 7f2fbc3b6700  0 mgr[balancer] Balancing root
>> default (pools ['cephstor2']) by bytes
>> 2018-08-20 22:33:46.313212 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001180, misplacing 0.000912
>> 2018-08-20 22:33:46.313714 7f2fbc3b6700  0 mgr[balancer] Score got
>> worse, trying smaller step 0.000244
>> 2018-08-20 22:33:46.313887 7f2fbc3b6700  0 mgr[balancer] Balancing root
>> default (pools ['cephstor2']) by bytes
>> 2018-08-20 22:33:46.391586 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001152, misplacing 0.001141
>> 2018-08-20 22:33:46.393374 7f2fbc3b6700  0 mgr[balancer] Balancing root
>> default (pools ['cephstor2']) by bytes
>> 2018-08-20 22:33:46.473956 7f2fbc3b6700  0 mgr[balancer] Step result
>> score 0.001152 -> 0.001180, misplacing 0.000912
>> 2018-08-20 22:33:46.474001 7f2fbc3b6700  0 mgr[balancer] Score got
>> worse, taking another step
>> 2018-08-20 22:33:46.474046 7f2fbc3b6700  0 mgr[balancer] Success, score
>> 0.001155 -> 0.001152
>>
>> BUT:
>> # ceph balancer eval myplan
>> plan myplan final score 0.001180 (lower is better)
>>
>> So the final plan does NOT contain the expected optimization. The
>> deepcopy fixes it.
>>
>> After:
>> # ceph balancer eval myplan
>> plan myplan final score 0.001152 (lower is better)
>>
> 
> OK that looks like a bug. Did you create a tracker or PR?

No not yet. Should i create a PR on github with the fix?

> -- Dan
> 
> 
>>>
>>> Also, if most of your data is in one pool you can try ceph balancer
>>> eval 
>>
>> Already tried this doesn't help much.
>>
>> Greets,
>> Stefan
>>
>>
>>> -- dan
>>>

 I'm also using this one:
 https://github.com/ceph/ceph/pull/20665/commits/c161a74ad6cf006cd9b33b40fd7705b67c170615

 to optimize by bytes only.

 Greets,
 Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph balancer: further optimizations?

2018-08-21 Thread Dan van der Ster
On Mon, Aug 20, 2018 at 10:45 PM Stefan Priebe - Profihost AG
 wrote:
>
>
> Am 20.08.2018 um 22:38 schrieb Dan van der Ster:
> > On Mon, Aug 20, 2018 at 10:19 PM Stefan Priebe - Profihost AG
> >  wrote:
> >>
> >>
> >> Am 20.08.2018 um 21:52 schrieb Sage Weil:
> >>> On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote:
>  Hello,
> 
>  since loic seems to have left ceph development and his wunderful crush
>  optimization tool isn'T working anymore i'm trying to get a good
>  distribution with the ceph balancer.
> 
>  Sadly it does not work as good as i want.
> 
>  # ceph osd df | sort -k8
> 
>  show 75 to 83% Usage which is 8% difference which is too much for me.
>  I'm optimization by bytes.
> 
>  # ceph balancer eval
>  current cluster score 0.005420 (lower is better)
> 
>  # ceph balancer eval $OPT_NAME
>  plan spriebe_2018-08-20_19:36 final score 0.005456 (lower is better)
> 
>  I'm unable to optimize further ;-( Is there any chance to optimize
>  further even in case of more rebelancing?
> >>>
> >>> The scoring that the balancer module is doing is currently a hybrid of pg
> >>> count, bytes, and object count.  Picking a single metric might help a bit
> >>> (as those 3 things are not always perfectly aligned).
> >>
> >> Hi,
> >>
> >> ok i found a bug in the balancer code which seems to be present in all
> >> releases.
> >>
> >>  861 best_ws = next_ws
> >>  862 best_ow = next_ow
> >>
> >>
> >> should be:
> >>
> >>  861 best_ws = copy.deepcopy(next_ws)
> >>  862 best_ow = copy.deepcopy(next_ow)
> >>
> >> otherwise it does not use the best but the last.
> >
> > Interesting... does that change improve things?
>
> It fixes the following (mgr debug output):
> 2018-08-20 22:33:46.078525 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001180, misplacing 0.000912
> 2018-08-20 22:33:46.078574 7f2fbc3b6700  0 mgr[balancer] Score got
> worse, taking another step
> 2018-08-20 22:33:46.078770 7f2fbc3b6700  0 mgr[balancer] Balancing root
> default (pools ['cephstor2']) by bytes
> 2018-08-20 22:33:46.156326 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001180, misplacing 0.000912
> 2018-08-20 22:33:46.156374 7f2fbc3b6700  0 mgr[balancer] Score got
> worse, taking another step
> 2018-08-20 22:33:46.156581 7f2fbc3b6700  0 mgr[balancer] Balancing root
> default (pools ['cephstor2']) by bytes
> 2018-08-20 22:33:46.233818 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001180, misplacing 0.000912
> 2018-08-20 22:33:46.233868 7f2fbc3b6700  0 mgr[balancer] Score got
> worse, taking another step
> 2018-08-20 22:33:46.234043 7f2fbc3b6700  0 mgr[balancer] Balancing root
> default (pools ['cephstor2']) by bytes
> 2018-08-20 22:33:46.313212 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001180, misplacing 0.000912
> 2018-08-20 22:33:46.313714 7f2fbc3b6700  0 mgr[balancer] Score got
> worse, trying smaller step 0.000244
> 2018-08-20 22:33:46.313887 7f2fbc3b6700  0 mgr[balancer] Balancing root
> default (pools ['cephstor2']) by bytes
> 2018-08-20 22:33:46.391586 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001152, misplacing 0.001141
> 2018-08-20 22:33:46.393374 7f2fbc3b6700  0 mgr[balancer] Balancing root
> default (pools ['cephstor2']) by bytes
> 2018-08-20 22:33:46.473956 7f2fbc3b6700  0 mgr[balancer] Step result
> score 0.001152 -> 0.001180, misplacing 0.000912
> 2018-08-20 22:33:46.474001 7f2fbc3b6700  0 mgr[balancer] Score got
> worse, taking another step
> 2018-08-20 22:33:46.474046 7f2fbc3b6700  0 mgr[balancer] Success, score
> 0.001155 -> 0.001152
>
> BUT:
> # ceph balancer eval myplan
> plan myplan final score 0.001180 (lower is better)
>
> So the final plan does NOT contain the expected optimization. The
> deepcopy fixes it.
>
> After:
> # ceph balancer eval myplan
> plan myplan final score 0.001152 (lower is better)
>

OK that looks like a bug. Did you create a tracker or PR?

-- Dan


> >
> > Also, if most of your data is in one pool you can try ceph balancer
> > eval 
>
> Already tried this doesn't help much.
>
> Greets,
> Stefan
>
>
> > -- dan
> >
> >>
> >> I'm also using this one:
> >> https://github.com/ceph/ceph/pull/20665/commits/c161a74ad6cf006cd9b33b40fd7705b67c170615
> >>
> >> to optimize by bytes only.
> >>
> >> Greets,
> >> Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic osd fails to start.

2018-08-21 Thread Daznis
Thanks for the all the help. For some bizarre reason I had an empty
host inside default root. Once I dumped a "fake" osd into it
everything started working.
On Mon, Aug 20, 2018 at 7:36 PM Daznis  wrote:
>
> Hello,
>
>
> Medic shows everything fine. Whole cluster is on the latest mimic
> version. It was updated to mimic when stable version of mimic was
> release and recently it was updated to "ceph version 13.2.1
> (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)". For some
> reason one mgr service is running, but it's not connected to the
> cluster.
>
> Versions output:
>
> {
> "mon": {
> "ceph version 13.2.1
> (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 3
> },
> "mgr": {
> "ceph version 13.2.1
> (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 2
> },
> "osd": {
> "ceph version 13.2.1
> (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 47
> },
> "mds": {},
> "overall": {
> "ceph version 13.2.1
> (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 52
> }
> }
>
> Medic output:
> ===  Starting remote check session  
> 
> Version: 1.0.4Cluster Name: "ceph"
> Total hosts: [10]
> OSDs:5MONs:3 Clients:0
> MDSs:0RGWs:0 MGRs:   2
>
> 
>
> -- managers --
>  mon03
>  mon02
>  mon01
>
>  osds 
>  node03
>  node02
>  node01
>  node05
>  node04
>
>  mons 
>  mon01
>  mon03
>  mon02
>
> 107 passed, on 11 hosts
> On Mon, Aug 20, 2018 at 6:13 PM Alfredo Deza  wrote:
> >
> > On Mon, Aug 20, 2018 at 10:23 AM, Daznis  wrote:
> > > Hello,
> > >
> > > It appears that something is horribly wrong with the cluster itself. I
> > > can't create or add any new osds to it at all.
> >
> > Have you added new monitors? Or replaced monitors? I would check that
> > all your versions match, something seems to be expecting different
> > versions.
> >
> > The "Invalid argument" problem is a common thing we see when that happens.
> >
> > Something that might help a bit here is if you run ceph-medic against
> > your cluster:
> >
> > http://docs.ceph.com/ceph-medic/master/
> >
> >
> >
> > > On Mon, Aug 20, 2018 at 11:04 AM Daznis  wrote:
> > >>
> > >> Hello,
> > >>
> > >>
> > >> Zapping the journal didn't help. I tried to create the journal after
> > >> zapping it. Also failed. I'm not really sure why this happens.
> > >>
> > >> Looking at the monitor logs with 20/20 debug I'm seeing these errors:
> > >>
> > >> 2018-08-20 08:57:58.753 7f9d85934700  0 mon.mon02@1(peon) e4
> > >> handle_command mon_command({"prefix": "osd crush set-device-class",
> > >> "class": "ssd", "ids": ["48"]} v 0) v1
> > >> 2018-08-20 08:57:58.753 7f9d85934700 20 is_capable service=osd
> > >> command=osd crush set-device-class read write on cap allow profile osd
> > >> 2018-08-20 08:57:58.753 7f9d85934700 20  allow so far , doing grant
> > >> allow profile osd
> > >> 2018-08-20 08:57:58.753 7f9d85934700 20  match
> > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4
> > >> _allowed_command capable
> > >> 2018-08-20 08:57:58.753 7f9d85934700  0 log_channel(audit) log [INF] :
> > >> from='osd.48 10.24.52.17:6800/153683' entity='osd.48' cmd=[{"prefix":
> > >> "osd crush set-device-class", "class": "ssd", "ids": ["48"]}]:
> > >> dispatch
> > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).osd e46327
> > >> preprocess_query mon_command({"prefix": "osd crush set-device-class",
> > >> "class": "ssd", "ids": ["48"]} v 0) v1 from osd.48
> > >> 10.24.52.17:6800/153683
> > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4
> > >> forward_request 4 request mon_command({"prefix": "osd crush
> > >> set-device-class", "class": "ssd", "ids": ["48"]} v 0) v1 features
> > >> 4611087854031142907
> > >> 2018-08-20 08:57:58.753 7f9d85934700 20 mon.mon02@1(peon) e4
> > >> _ms_dispatch existing session 0x55b4ec482a80 for mon.1
> > >> 10.24.52.11:6789/0
> > >> 2018-08-20 08:57:58.753 7f9d85934700 20 mon.mon02@1(peon) e4  caps allow 
> > >> *
> > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).log
> > >> v10758065 preprocess_query log(1 entries from seq 4 at 2018-08-20
> > >> 08:57:58.755306) v1 from mon.1 10.24.52.11:6789/0
> > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).log
> > >> v10758065 preprocess_log log(1 entries from seq 4 at 2018-08-20
> > >> 08:57:58.755306) v1 from mon.1
> > >> 2018-08-20 08:57:58.753 7f9d85934700 20 is_capable service=log
> > >> command= write on cap allow *
> > >> 2018-08-20 08:57:58.753 7f9d85934700 20  allow so far , doing grant 
> > >> allow *
> > >> 2018-08-20 08:57:58.753 7f9d85934700 20  allow all
> > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4
> > >> forward_request 5 request log(1 entries from seq 4 at 2018-08-20
> > >> 

Re: [ceph-users] Removing all rados objects based on a prefix

2018-08-21 Thread John Spray
On Mon, Aug 20, 2018 at 5:40 PM Wido den Hollander  wrote:
>
>
>
> On 08/20/2018 05:20 PM, David Turner wrote:
> > The general talk about the rados cleanup command is to clean things up
> > after benchmarking.  Could this command also be used for deleting an old
> > RGW bucket or an RBD.  For instance, a bucket with a prefix of
> > `25ff9eff-058b-41e3-8724-cfffecb979c0.9709451.1` such that all objects
> > in the default.rgw.buckets.data pool for that bucket start with that
> > string.  Could I run [1] this command to clean all of those up?  Listing
> > the full pool contents and grepping out for that string returns 100M
> > objects and every way I've come up with to iterate over that list will
> > take us about a month to get through it.  I would think this has a
> > decent chance to work, except for the description of the [2] cleanup
> > option from the rados man page.
> >
> > Perhaps I'm also barking up the wrong tree.  Does anyone have a better
> > way to delete large RBDs or buckets?
> >
>
> Nope, you can't filter on prefixes of objects. You'll have to do a
> listing and filter on the output.. That's a very long list.

If someone is writing scripts for this kind of thing, the only help I
can suggest is to use the rados_object_list_slice functionality in
librados, to divide up the object space into multiple ranges and issue
parallel list operations.  That's what the --worker_n etc options to
cephfs-data-scan do.

John

> Wido
>
> >
> > [1] rados -p default.rgw.buckets.data cleanup --prefix
> > 25ff9eff-058b-41e3-8724-cfffecb979c0.9709451.1
> >
> > [2] cleanup [ --run-name run_name ] [ --prefix prefix ]
> > Clean up a previous benchmark operation.  Note: the default
> > run-name is "benchmark_last_metadata"
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-08-21 Thread Konstantin Shalygin

This issue first started while using Luminous 12.2.5, I upgraded to 12.2.7 and 
it's still present.  This issue is _not_ present in 12.2.4.

With Ceph 12.2.4, using QEMU/KVM + Libvirt, I'm able to mount an rbd image 
using the following syntax and populated xml:

'virsh attach-device $vm foo.xml --persistent'

xml contents:

  
   
 
 
 


  





I receive this error:
~# virsh attach-device $vm foo.xml --persistent
error: Failed to attach device from foo.xml
error: internal error: unable to execute QEMU command 'device_add': Property 
'scsi-hd.drive' can't find value 'drive-scsi0-0-0-1'

I've tried different things with the XML, but nothing seems to work, always 
failing with the above error.  This does _not_ happen with our cluster running 
12.2.4, the same exact command with a cluster using an identical configuration 
(for all intents and purposes).

Any thoughts?  Hard to believe I'm the only one to hit this if it's indeed a 
bug, but I haven't found anyone else having the issue through interweb searches.



oVirt use this settings. Don't have issues since Jewel. Now on Luminous 
12.2.5.




    
    unit="0" />
    name="replicated_rbd_nvme/volume-06f11659-073a-4407-899e-1cc7fa002f05" 
protocol="rbd">

    
    
    

    
    

    
    
    name="qemu" type="raw" />
 




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network cluster / addr

2018-08-21 Thread Janne Johansson
Den tis 21 aug. 2018 kl 09:31 skrev Nino Bosteels :

>
> * Does ceph interpret multiple values for this in the ceph.conf (I
> wouldn’t say so out of my tests)?
>
> * Shouldn’t public network be your internet facing range and cluster
> network the private range?
>

"Public" doesn't necessarily mean "reachable from internet", it means
"where ceph consumers and clients can talk", and the private network is
"where only OSDs and ceph infrastructure can talk to eachother".

Ceph clients can still be non-reachable from the internet, it's not the
same meaning that firewall vendors place on "private" and "public".

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Network cluster / addr

2018-08-21 Thread Nino Bosteels
Dear mailinglist,

I've been struggling to find a working configuration of the network cluster / 
addr or even public addr.

* Does ceph interpret multiple values for this in the ceph.conf (I wouldn't say 
so out of my tests)?
* Shouldn't public network be your internet facing range and cluster network 
the private range?

Thanks for your time,
Nino
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs client version in RedHat/CentOS 7.5

2018-08-21 Thread Dietmar Rieder
On 08/20/2018 05:36 PM, Ilya Dryomov wrote:
> On Mon, Aug 20, 2018 at 4:52 PM Dietmar Rieder
>  wrote:
>>
>> Hi Cephers,
>>
>>
>> I wonder if the cephfs client in RedHat/CentOS 7.5 will be updated to
>> luminous?
>> As far as I see there is some luminous related stuff that was
>> backported, however,
>> the "ceph features" command just reports "jewel" as release of my cephfs
>> clients running CentOS 7.5 (kernel 3.10.0-862.11.6.el7.x86_64)
>>
>>
>> {
>> "mon": {
>> "group": {
>> "features": "0x3ffddff8eea4fffb",
>> "release": "luminous",
>> "num": 3
>> }
>> },
>> "mds": {
>> "group": {
>> "features": "0x3ffddff8eea4fffb",
>> "release": "luminous",
>> "num": 3
>> }
>> },
>> "osd": {
>> "group": {
>> "features": "0x3ffddff8eea4fffb",
>> "release": "luminous",
>> "num": 240
>> }
>> },
>> "client": {
>> "group": {
>> "features": "0x7010fb86aa42ada",
>> "release": "jewel",
>> "num": 23
>> },
>> "group": {
>> "features": "0x3ffddff8eea4fffb",
>> "release": "luminous",
>> "num": 4
>> }
>> }
>> }
>>
>>
>> This prevents me to run ceph balancer using the upmap mode.
>>
>>
>> Any idea?
> 
> Hi Dietmar,
> 
> All luminous features are supported in RedHat/CentOS 7.5, but it shows
> up as jewel due to a technicality.  Just do
> 
>   $ ceph osd set-require-min-compat-client luminous --yes-i-really-mean-it
> 
> to override the safety check.
> 
> See https://www.spinics.net/lists/ceph-users/msg45071.html for details.
> It references an upstream kernel, but both the problem and the solution
> are the same.
> 

Hi Ilya,

thank you for your answer.

Just to make sure:
The thread you are referring to, is about kernel 4.13+, is this also
true for the "official" RedHat/CentOS 7.5 kernel 3.10
(3.10.0-862.11.6.el7.x86_64) ?

Best
  Dietmar





signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Documentation regarding log file structure

2018-08-21 Thread Uwe Sauter
Hi list,

does documentation exist that explains the structure of Ceph log files? Other 
than the source code?

Thanks,

Uwe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com