[ceph-users] CephFS Ganesha NFS for VMWare

2019-10-29 Thread Glen Baars
Hello Ceph Users,

I am trialing CephFS / Ganesha NFS for VMWare usage. We are on Mimic / Centos 
7.7 / 130 x 12TB 7200rpm OSDs / 13 hosts / 3 replica.

So far the read performance has been great. The write performance ( NFS sync ) 
hasn't been great. We use a lot of 64KB NFS read / writes and the latency is 
around 50-60ms from esxtop.

I have been benchmarking different CephFS block / stripe sizes but would like 
to hear what others have settled on? The default 4MB / 1 stripe doesn't seem to 
give great 64KB performance.

I would also like to know if I am experiencing PG locking but haven't found a 
way to do that.

Glen
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any CEPH's iSCSI gateway users?

2019-06-11 Thread Glen Baars
Interesting performance increase! I'm Iscsi it at a few installations and now a 
wonder what version of Centos is required to improve performance! Did the 
cluster go from Luminous to Mimic?

Glen

-Original Message-
From: ceph-users  On Behalf Of Heðin 
Ejdesgaard Møller
Sent: Saturday, 8 June 2019 8:00 AM
To: Paul Emmerich ; Igor Podlesny 
Cc: Ceph Users 
Subject: Re: [ceph-users] Any CEPH's iSCSI gateway users?

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I recently upgraded a RHCS-3.0 cluster with 4 iGW's to RHCS-3.2 on top of RHEL-
7.6
Big block size performance went from ~350MB/s to about 1100MB/s on each lun, 
seen from a VM in vSphere-6.5 with data read from an ssd pool and written to a 
hdd pool, both being 3/2 replica.
I have not experienced any hick-up since the upgrade.
You will always have a degree of performance hit when using the iGW, because 
it's both an extra layer between consumer and hardware, and a potential choke- 
point, just like any "traditional" iSCSI based SAN solution.

If you are considering to deploy the iGW on the upstream bits then I would 
recommend you to stick to CentOS, since a lot of it's development have happened 
on the RHEL platform.

Regards
Heðin Ejdesgaard
Synack sp/f

On frí, 2019-06-07 at 12:44 +0200, Paul Emmerich wrote:
> Hi,
>
> ceph-iscsi 3.0 fixes a lot of problems and limitations of the older gateway.
>
> Best way to run it on Debian/Ubuntu is to build it yourself
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at
> https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Tue, May 28, 2019 at 10:02 AM Igor Podlesny  wrote:
> > What is your experience?
> > Does it make sense to use it -- is it solid enough or beta quality
> > rather (both in terms of stability and performance)?
> >
> > I've read it was more or less packaged to work with RHEL. Does it
> > hold true still?
> > What's the best way to install it on, say, CentOS or Debian/Ubuntu?
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEElZWfRQVsNukQFi9Ko80MCbT/An0FAlz6+ocACgkQo80MCbT/
An28EBAA4FlpRYEhFSWm2dfTdYBfFLNJbLrwyMvXOe22sLHwlz3GWMnY2llJ7nyM
YAZy0DZGmujoztBos3eR1A/FB22yr6BYPjC9/f/+8vt3TMhxG5Tm0g/XifJSXaJl
zL8lA3T+XkcMZkphukjhR2BZWioam0ipT07n6+rNdQCaS9/xt7QE7gwWeGQWxKsf
EDY4XWKjiIvyuK4nt2R1raTl9uaW1FI2qM/UoHWyW+ip86syEC1p1HfqWpeU5Mm2
TXRgTVRS4tM91GfciwKdCwZIZjT10POyFfk2DHwMA40lUc8cFCyzj3aAkdJp4U4h
8Wm0QJBzabcuWHfBBJlWRARSGVXKUx08HM3alatO8vum5WSK2w9l5pgyx5H4jM5+
6YABtwvT5lwEiHL9hUoO9HDpyj/IcMzHF5yG5v5PdXCuat7HwNcv6dD2j2dEAgma
HlLRo84PNeHiIn52jSSFGr4O6MQTYei/VMD2IbrDJzjUOFCUOxdX5WsSeFdhF5Zc
LW2rcnLiTcRisxiu3MvJI1kUvGFr1GFmjQI/7MeTXiq2bfQh08LUpM6Cz/ch7iUQ
xo7zUGGuQcOx6iSmagTcMa1QqF8+txCSvCTVvlWdXLAzXsDOJ4mkGe1EWJ2pHjz2
zBcn25Qfws9DEvEww71a/sKp2tlwnKCZgKXhkIBOKyhU7x1dYOI=
=I3pv
-END PGP SIGNATURE-

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] op_w_latency

2019-04-02 Thread Glen Baars
Thanks for the updated command – much cleaner!

The OSD nodes have a single 6core X5650 @ 2.67GHz, 72GB GB and around 8x10TB 
HDD OSD/ 4 x 2TB SSD OSD. Cpu usage is around 20% and the ram has 22GB 
available.
The 3 MON nodes are the same but with no OSDs
The cluster has around 150 drives and only doing 500-1000 ops overall.
The network is dual 10gbit using lacp. Vlan for private ceph traffic and 
untagged for public

Glen
From: Konstantin Shalygin 
Sent: Wednesday, 3 April 2019 11:39 AM
To: Glen Baars 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] op_w_latency


Hello Ceph Users,



I am finding that the write latency across my ceph clusters isn't great and I 
wanted to see what other people are getting for op_w_latency. Generally I am 
getting 70-110ms latency.



I am using: ceph --admin-daemon /var/run/ceph/ceph-osd.102.asok perf dump | 
grep -A3 '\"op_w_latency' | grep 'avgtime'

Better like this:

ceph daemon osd.102 perf dump | jq '.osd.op_w_latency.avgtime'



Ram, CPU and network don't seem to be the bottleneck. The drives are behind a 
dell H810p raid card with a 1GB writeback cache and battery. I have tried with 
LSI JBOD cards and haven't found it faster ( as you would expect with write 
cache ). The disks through iostat -xyz 1 show 10-30% usage with general service 
+ write latency around 3-4ms. Queue depth is normally less than one. RocksDB 
write latency is around 0.6ms, read 1-2ms. Usage is RBD backend for Cloudstack.


What is your hardware? Your CPU, RAM, Eth?





k

This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] op_w_latency

2019-04-01 Thread Glen Baars
Hello Ceph Users,

I am finding that the write latency across my ceph clusters isn't great and I 
wanted to see what other people are getting for op_w_latency. Generally I am 
getting 70-110ms latency.

I am using: ceph --admin-daemon /var/run/ceph/ceph-osd.102.asok perf dump | 
grep -A3 '\"op_w_latency' | grep 'avgtime'

Ram, CPU and network don't seem to be the bottleneck. The drives are behind a 
dell H810p raid card with a 1GB writeback cache and battery. I have tried with 
LSI JBOD cards and haven't found it faster ( as you would expect with write 
cache ). The disks through iostat -xyz 1 show 10-30% usage with general service 
+ write latency around 3-4ms. Queue depth is normally less than one. RocksDB 
write latency is around 0.6ms, read 1-2ms. Usage is RBD backend for Cloudstack.

Dumping the ops seems to show the latency here: (ceph --admin-daemon 
/var/run/ceph/ceph-osd.102.asok dump_historic_ops_by_duration  |less)

{
"time": "2019-04-01 22:24:38.432000",
"event": "queued_for_pg"
},
{
"time": "2019-04-01 22:24:38.438691",
"event": "reached_pg"
},
{
"time": "2019-04-01 22:24:38.438740",
"event": "started"
},
{
"time": "2019-04-01 22:24:38.727820",
"event": "sub_op_started"
},
{
"time": "2019-04-01 22:24:38.728448",
"event": "sub_op_committed"
},
{
"time": "2019-04-01 22:24:39.129175",
"event": "commit_sent"
},
{
"time": "2019-04-01 22:24:39.129231",
"event": "done"
}
]
}
}

This write was around a very slow one and I am wondering if I have a few ops 
that are taking along time and most that are good

What else can I do to figure out where the issue is?
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] When to use a separate RocksDB SSD

2019-03-21 Thread Glen Baars
Hello Ceph,

What is the best way to find out how the RocksDB is currently performing? I 
need to build a business case for NVME devices for RocksDB.
Kind regards,
Glen Baars
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow OPS

2019-03-21 Thread Glen Baars
Hello Brad,

It doesn't seem to be a set of OSDs, the cluster has 160ish OSDs over 9 hosts.

I seem to get a lot of these ops also that don't show a client.

"description": "osd_repop(client.14349712.0:4866968 15.36 
e30675/22264 15:6dd17247:::rbd_data.2359ef6b8b4567.0042766
a:head v 30675'5522366)",
"initiated_at": "2019-03-21 16:51:56.862447",
"age": 376.527241,
        "duration": 1.331278,

Kind regards,
Glen Baars

-Original Message-
From: Brad Hubbard 
Sent: Thursday, 21 March 2019 1:43 PM
To: Glen Baars 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Slow OPS

Actually, the lag is between "sub_op_committed" and "commit_sent". Is there any 
pattern to these slow requests? Do they involve the same osd, or set of osds?

On Thu, Mar 21, 2019 at 3:37 PM Brad Hubbard  wrote:
>
> On Thu, Mar 21, 2019 at 3:20 PM Glen Baars  
> wrote:
> >
> > Thanks for that - we seem to be experiencing the wait in this section of 
> > the ops.
> >
> > {
> > "time": "2019-03-21 14:12:42.830191",
> > "event": "sub_op_committed"
> > },
> > {
> > "time": "2019-03-21 14:12:43.699872",
> > "event": "commit_sent"
> > },
> >
> > Does anyone know what that section is waiting for?
>
> Hi Glen,
>
> These are documented, to some extent, here.
>
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting
> -osd/
>
> It looks like it may be taking a long time to communicate the commit
> message back to the client? Are these slow ops always the same client?
>
> >
> > Kind regards,
> > Glen Baars
> >
> > -Original Message-
> > From: Brad Hubbard 
> > Sent: Thursday, 21 March 2019 8:23 AM
> > To: Glen Baars 
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Slow OPS
> >
> > On Thu, Mar 21, 2019 at 12:11 AM Glen Baars  
> > wrote:
> > >
> > > Hello Ceph Users,
> > >
> > >
> > >
> > > Does anyone know what the flag point ‘Started’ is? Is that ceph osd 
> > > daemon waiting on the disk subsystem?
> >
> > This is set by "mark_started()" and is roughly set when the pg starts 
> > processing the op. Might want to capture dump_historic_ops output after the 
> > op completes.
> >
> > >
> > >
> > >
> > > Ceph 13.2.4 on centos 7.5
> > >
> > >
> > >
> > > "description": "osd_op(client.1411875.0:422573570
> > > 5.18ds0
> > > 5:b1ed18e5:::rbd_data.6.cf7f46b8b4567.0046e41a:head [read
> > >
> > > 1703936~16384] snapc 0=[] ondisk+read+known_if_redirected
> > > e30622)",
> > >
> > > "initiated_at": "2019-03-21 01:04:40.598438",
> > >
> > > "age": 11.340626,
> > >
> > > "duration": 11.342846,
> > >
> > > "type_data": {
> > >
> > > "flag_point": "started",
> > >
> > > "client_info": {
> > >
> > > "client": "client.1411875",
> > >
> > > "client_addr": "10.4.37.45:0/627562602",
> > >
> > > "tid": 422573570
> > >
> > > },
> > >
> > > "events": [
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598438",
> > >
> > > "event": "initiated"
> > >
> > > },
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598438",
> > >
> > > "event": "header_read"
> > >
> > > },
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598439",
> > >
> > > &q

Re: [ceph-users] Slow OPS

2019-03-20 Thread Glen Baars
Thanks for that - we seem to be experiencing the wait in this section of the 
ops.

{
"time": "2019-03-21 14:12:42.830191",
"event": "sub_op_committed"
},
{
"time": "2019-03-21 14:12:43.699872",
"event": "commit_sent"
    },

Does anyone know what that section is waiting for?

Kind regards,
Glen Baars

-Original Message-
From: Brad Hubbard 
Sent: Thursday, 21 March 2019 8:23 AM
To: Glen Baars 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Slow OPS

On Thu, Mar 21, 2019 at 12:11 AM Glen Baars  wrote:
>
> Hello Ceph Users,
>
>
>
> Does anyone know what the flag point ‘Started’ is? Is that ceph osd daemon 
> waiting on the disk subsystem?

This is set by "mark_started()" and is roughly set when the pg starts 
processing the op. Might want to capture dump_historic_ops output after the op 
completes.

>
>
>
> Ceph 13.2.4 on centos 7.5
>
>
>
> "description": "osd_op(client.1411875.0:422573570 5.18ds0
> 5:b1ed18e5:::rbd_data.6.cf7f46b8b4567.0046e41a:head [read
>
> 1703936~16384] snapc 0=[] ondisk+read+known_if_redirected e30622)",
>
> "initiated_at": "2019-03-21 01:04:40.598438",
>
> "age": 11.340626,
>
> "duration": 11.342846,
>
> "type_data": {
>
> "flag_point": "started",
>
> "client_info": {
>
> "client": "client.1411875",
>
> "client_addr": "10.4.37.45:0/627562602",
>
> "tid": 422573570
>
> },
>
> "events": [
>
> {
>
> "time": "2019-03-21 01:04:40.598438",
>
> "event": "initiated"
>
> },
>
> {
>
> "time": "2019-03-21 01:04:40.598438",
>
> "event": "header_read"
>
> },
>
> {
>
> "time": "2019-03-21 01:04:40.598439",
>
> "event": "throttled"
>
> },
>
> {
>
> "time": "2019-03-21 01:04:40.598450",
>
> "event": "all_read"
>
> },
>
> {
>
> "time": "2019-03-21 01:04:40.598499",
>
> "event": "dispatched"
>
> },
>
> {
>
> "time": "2019-03-21 01:04:40.598504",
>
> "event": "queued_for_pg"
>
> },
>
> {
>
> "time": "2019-03-21 01:04:40.598883",
>
> "event": "reached_pg"
>
> },
>
> {
>
> "time": "2019-03-21 01:04:40.598905",
>
> "event": "started"
>
> }
>
> ]
>
> }
>
> }
>
> ],
>
>
>
> Glen
>
> This e-mail is intended solely for the benefit of the addressee(s) and any 
> other named recipient. It is confidential and may contain legally privileged 
> or confidential information. If you are not the recipient, any use, 
> distribution, disclosure or copying of this e-mail is prohibited. The 
> confidentiality and legal privilege attached to this communication is not 
> waived or lost by reason of the mistaken transmission or delivery to you. If 
> you have received this e-mail in error, please notify us immediately.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Cheers,
Brad
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Slow OPS

2019-03-20 Thread Glen Baars
Hello Ceph Users,

Does anyone know what the flag point 'Started' is? Is that ceph osd daemon 
waiting on the disk subsystem?

Ceph 13.2.4 on centos 7.5

"description": "osd_op(client.1411875.0:422573570 5.18ds0 
5:b1ed18e5:::rbd_data.6.cf7f46b8b4567.0046e41a:head [read
1703936~16384] snapc 0=[] ondisk+read+known_if_redirected e30622)",
"initiated_at": "2019-03-21 01:04:40.598438",
"age": 11.340626,
"duration": 11.342846,
"type_data": {
"flag_point": "started",
"client_info": {
"client": "client.1411875",
"client_addr": "10.4.37.45:0/627562602",
"tid": 422573570
},
"events": [
{
"time": "2019-03-21 01:04:40.598438",
"event": "initiated"
},
{
"time": "2019-03-21 01:04:40.598438",
"event": "header_read"
},
{
"time": "2019-03-21 01:04:40.598439",
"event": "throttled"
},
{
"time": "2019-03-21 01:04:40.598450",
"event": "all_read"
},
{
"time": "2019-03-21 01:04:40.598499",
"event": "dispatched"
},
{
"time": "2019-03-21 01:04:40.598504",
"event": "queued_for_pg"
},
{
"time": "2019-03-21 01:04:40.598883",
"event": "reached_pg"
},
{
"time": "2019-03-21 01:04:40.598905",
"event": "started"
}
]
}
}
],

Glen
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic 13.2.4 rbd du slowness

2019-02-28 Thread Glen Baars
Here is the strace result.

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
99.940.236170 790   299 5 futex
  0.060.000136   0   365   brk
  0.000.00   041 2 read
  0.000.00   048   write
  0.000.00   07227 open
  0.000.00   043   close
  0.000.00   010 5 stat
  0.000.00   036   fstat
  0.000.00   0 1   lseek
  0.000.00   0   103   mmap
  0.000.00   070   mprotect
  0.000.00   019   munmap
  0.000.00   011   rt_sigaction
  0.000.00   032   rt_sigprocmask
  0.000.00   02626 access
  0.000.00   0 3   pipe
  0.000.00   019   clone
  0.000.00   0 1   execve
  0.000.00   0 7   uname
  0.000.00   012   fcntl
  0.000.00   0 1   getrlimit
  0.000.00   0 2   sysinfo
  0.000.00   0 1   getuid
  0.000.00   0 1   prctl
  0.000.00   0 1   arch_prctl
  0.000.00   0 1   gettid
  0.000.00   0 3   epoll_create
  0.000.00   0 1   set_tid_address
  0.000.00   0 1   set_robust_list
  0.000.00   0 1   membarrier
-- --- --- - - 
100.000.236306  123165 total
From: David Turner 
Sent: Friday, 1 March 2019 11:46 AM
To: Glen Baars 
Cc: Wido den Hollander ; ceph-users 
Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness

Have you used strace on the du command to see what it's spending its time doing?

On Thu, Feb 28, 2019, 8:45 PM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Wido,

The cluster layout is as follows:

3 x Monitor hosts ( 2 x 10Gbit bonded )
9 x OSD hosts (
2 x 10Gbit bonded,
LSI cachecade and write cache drives set to single,
All HDD in this pool,
no separate DB / WAL. With the write cache and the SSD read cache on the LSI 
card it seems to perform well.
168 OSD disks

No major increase in OSD disk usage or CPU usage. The RBD DU process uses 100% 
of a single 2.4Ghz core while running - I think that is the limiting factor.

I have just tried removing most of the snapshots for that volume ( from 14 
snapshots down to 1 snapshot ) and the rbd du command now takes around 2-3 
minutes.

Kind regards,
Glen Baars

-Original Message-
From: Wido den Hollander mailto:w...@42on.com>>
Sent: Thursday, 28 February 2019 5:05 PM
To: Glen Baars 
mailto:g...@onsitecomputers.com.au>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness



On 2/28/19 9:41 AM, Glen Baars wrote:
> Hello Wido,
>
> I have looked at the libvirt code and there is a check to ensure that 
> fast-diff is enabled on the image and only then does it try to get the real 
> disk usage. The issue for me is that even with fast-diff enabled it takes 
> 25min to get the space usage for a 50TB image.
>
> I had considered turning off fast-diff on the large images to get
> around to issue but I think that will hurt my snapshot removal times (
> untested )
>

Can you tell a bit more about the Ceph cluster? HDD? SSD? DB and WAL on SSD?

Do you see OSDs spike in CPU or Disk I/O when you do a 'rbd du' on these images?

Wido

> I can't see in the code any other way of bypassing the disk usage check but I 
> am not that familiar with the code.
>
> ---
> if (volStorageBackendRBDUseFastDiff(features)) {
> VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. "
>   "Querying for actual allocation",
>   def->source.name<http://source.name>, vol->name);
>
> if (virStorageBackendRBDSetAllocation(vol, image, ) < 0)
> goto cleanup;
> } else {
> vol->target.allocation = info.obj_size * info.num_objs; }
> --
>
> Kind regards,
> Glen Baars
>
> -Original Message-
> From: Wido den Hollander mailto:w...@42on.com>>
> Sent: Thursday, 28 February 2019 3:49 PM
> To: Glen Baars 
> mailto:g...@onsitecomputers.com.au>>;
> cep

Re: [ceph-users] Mimic 13.2.4 rbd du slowness

2019-02-28 Thread Glen Baars
Hello Wido,

The cluster layout is as follows:

3 x Monitor hosts ( 2 x 10Gbit bonded )
9 x OSD hosts (
2 x 10Gbit bonded,
LSI cachecade and write cache drives set to single,
All HDD in this pool,
no separate DB / WAL. With the write cache and the SSD read cache on the LSI 
card it seems to perform well.
168 OSD disks

No major increase in OSD disk usage or CPU usage. The RBD DU process uses 100% 
of a single 2.4Ghz core while running - I think that is the limiting factor.

I have just tried removing most of the snapshots for that volume ( from 14 
snapshots down to 1 snapshot ) and the rbd du command now takes around 2-3 
minutes.

Kind regards,
Glen Baars

-Original Message-
From: Wido den Hollander 
Sent: Thursday, 28 February 2019 5:05 PM
To: Glen Baars ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness



On 2/28/19 9:41 AM, Glen Baars wrote:
> Hello Wido,
>
> I have looked at the libvirt code and there is a check to ensure that 
> fast-diff is enabled on the image and only then does it try to get the real 
> disk usage. The issue for me is that even with fast-diff enabled it takes 
> 25min to get the space usage for a 50TB image.
>
> I had considered turning off fast-diff on the large images to get
> around to issue but I think that will hurt my snapshot removal times (
> untested )
>

Can you tell a bit more about the Ceph cluster? HDD? SSD? DB and WAL on SSD?

Do you see OSDs spike in CPU or Disk I/O when you do a 'rbd du' on these images?

Wido

> I can't see in the code any other way of bypassing the disk usage check but I 
> am not that familiar with the code.
>
> ---
> if (volStorageBackendRBDUseFastDiff(features)) {
> VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. "
>   "Querying for actual allocation",
>   def->source.name, vol->name);
>
> if (virStorageBackendRBDSetAllocation(vol, image, ) < 0)
> goto cleanup;
> } else {
> vol->target.allocation = info.obj_size * info.num_objs; }
> --
>
> Kind regards,
> Glen Baars
>
> -Original Message-
> From: Wido den Hollander 
> Sent: Thursday, 28 February 2019 3:49 PM
> To: Glen Baars ;
> ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness
>
>
>
> On 2/28/19 2:59 AM, Glen Baars wrote:
>> Hello Ceph Users,
>>
>> Has anyone found a way to improve the speed of the rbd du command on large 
>> rbd images? I have object map and fast diff enabled - no invalid flags on 
>> the image or it's snapshots.
>>
>> We recently upgraded our Ubuntu 16.04 KVM servers for Cloudstack to Ubuntu 
>> 18.04. The upgrades libvirt to version 4. When libvirt 4 adds an rbd pool it 
>> discovers all images in the pool and tries to get their disk usage. We are 
>> seeing a 50TB image take 25min. The pool has over 300TB of images in it and 
>> takes hours for libvirt to start.
>>
>
> This is actually a pretty bad thing imho. As a lot of images people will be 
> using do not have fast-diff enabled (images from the past) and that will kill 
> their performance.
>
> Isn't there a way to turn this off in libvirt?
>
> Wido
>
>> We can replicate the issue without libvirt by just running a rbd du on the 
>> large images. The limiting factor is the cpu on the rbd du command, it uses 
>> 100% of a single core.
>>
>> Our cluster is completely bluestore/mimic 13.2.4. 168 OSDs, 12 Ubuntu 16.04 
>> hosts.
>>
>> Kind regards,
>> Glen Baars
>> This e-mail is intended solely for the benefit of the addressee(s) and any 
>> other named recipient. It is confidential and may contain legally privileged 
>> or confidential information. If you are not the recipient, any use, 
>> distribution, disclosure or copying of this e-mail is prohibited. The 
>> confidentiality and legal privilege attached to this communication is not 
>> waived or lost by reason of the mistaken transmission or delivery to you. If 
>> you have received this e-mail in error, please notify us immediately.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> This e-mail is intended solely for the benefit of the addressee(s) and any 
> other named recipient. It is confidential and may contain legally privileged 
> or confidential information. If you are not the recipient, any use, 
> distribution, disclosure or copying of this e-mail is prohibited. The 
> confidentiality and legal privilege attached to this communication is not 

Re: [ceph-users] Mimic 13.2.4 rbd du slowness

2019-02-28 Thread Glen Baars
Hello Wido,

I have looked at the libvirt code and there is a check to ensure that fast-diff 
is enabled on the image and only then does it try to get the real disk usage. 
The issue for me is that even with fast-diff enabled it takes 25min to get the 
space usage for a 50TB image.

I had considered turning off fast-diff on the large images to get around to 
issue but I think that will hurt my snapshot removal times ( untested )

I can't see in the code any other way of bypassing the disk usage check but I 
am not that familiar with the code.

---
if (volStorageBackendRBDUseFastDiff(features)) {
VIR_DEBUG("RBD image %s/%s has fast-diff feature enabled. "
  "Querying for actual allocation",
  def->source.name, vol->name);

if (virStorageBackendRBDSetAllocation(vol, image, ) < 0)
goto cleanup;
} else {
vol->target.allocation = info.obj_size * info.num_objs;
}
------

Kind regards,
Glen Baars

-Original Message-
From: Wido den Hollander 
Sent: Thursday, 28 February 2019 3:49 PM
To: Glen Baars ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Mimic 13.2.4 rbd du slowness



On 2/28/19 2:59 AM, Glen Baars wrote:
> Hello Ceph Users,
>
> Has anyone found a way to improve the speed of the rbd du command on large 
> rbd images? I have object map and fast diff enabled - no invalid flags on the 
> image or it's snapshots.
>
> We recently upgraded our Ubuntu 16.04 KVM servers for Cloudstack to Ubuntu 
> 18.04. The upgrades libvirt to version 4. When libvirt 4 adds an rbd pool it 
> discovers all images in the pool and tries to get their disk usage. We are 
> seeing a 50TB image take 25min. The pool has over 300TB of images in it and 
> takes hours for libvirt to start.
>

This is actually a pretty bad thing imho. As a lot of images people will be 
using do not have fast-diff enabled (images from the past) and that will kill 
their performance.

Isn't there a way to turn this off in libvirt?

Wido

> We can replicate the issue without libvirt by just running a rbd du on the 
> large images. The limiting factor is the cpu on the rbd du command, it uses 
> 100% of a single core.
>
> Our cluster is completely bluestore/mimic 13.2.4. 168 OSDs, 12 Ubuntu 16.04 
> hosts.
>
> Kind regards,
> Glen Baars
> This e-mail is intended solely for the benefit of the addressee(s) and any 
> other named recipient. It is confidential and may contain legally privileged 
> or confidential information. If you are not the recipient, any use, 
> distribution, disclosure or copying of this e-mail is prohibited. The 
> confidentiality and legal privilege attached to this communication is not 
> waived or lost by reason of the mistaken transmission or delivery to you. If 
> you have received this e-mail in error, please notify us immediately.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mimic 13.2.4 rbd du slowness

2019-02-27 Thread Glen Baars
Hello Ceph Users,

Has anyone found a way to improve the speed of the rbd du command on large rbd 
images? I have object map and fast diff enabled - no invalid flags on the image 
or it's snapshots.

We recently upgraded our Ubuntu 16.04 KVM servers for Cloudstack to Ubuntu 
18.04. The upgrades libvirt to version 4. When libvirt 4 adds an rbd pool it 
discovers all images in the pool and tries to get their disk usage. We are 
seeing a 50TB image take 25min. The pool has over 300TB of images in it and 
takes hours for libvirt to start.

We can replicate the issue without libvirt by just running a rbd du on the 
large images. The limiting factor is the cpu on the rbd du command, it uses 
100% of a single core.

Our cluster is completely bluestore/mimic 13.2.4. 168 OSDs, 12 Ubuntu 16.04 
hosts.

Kind regards,
Glen Baars
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mimic Bluestore memory optimization

2019-02-24 Thread Glen Baars
Hello Ceph!

I am tracking down a performance issue with some of our mimic 13.2.4 OSDs. It 
feels like a lack of memory but I have no real proof of the issue. I have used 
the memory profiling ( pprof tool ) and the OSD's are maintaining their 4GB 
allocated limit.

My questions are:

1.How do you know if the allocated memory is enough for the OSD? My 1TB disks 
and 12TB disks take the same memory and I wonder if the OSDs should have memory 
allocated based on the size of the disks?
2.In the past, SSD disks needs 3 times the memory and now they don't, why is 
that? ( 1GB ram per HDD and 3GB ram per SSD both went to 4GB )
3.I have read that the number of placement groups per OSD is a significant 
factor in the memory usage. Generally I have ~200 placement groups per OSD, 
this is at the higher end of the recommended values and I wonder if its causing 
high memory usage?

For reference the hosts are 1 x 6 core CPU, 72GB ram, 14 OSDs, 2 x 10Gbit. LSI 
cachecade / writeback cache for the HDD and LSI JBOD for SSDs. 9 hosts in this 
cluster.

Kind regards,
Glen Baars
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Segfaults on 12.2.9 and 12.2.8

2019-01-14 Thread Glen Baars
dedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) 
[0x55565ee0c1a4]
18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55565ee0f1e0]
19: (()+0x76ba) [0x7fec8af206ba]
20: (clone()+0x6d) [0x7fec89f9741d]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

Kind regards,
Glen Baars
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hyper-v ISCSI support

2018-09-21 Thread Glen Baars
Hello Ceph Users,

We have been using ceph-iscsi-cli for some time now with vmware and it is 
performing ok.

We would like to use the same iscsi service to store our Hyper-v VMs via 
windows clustered shared volumes. When we add the volume to windows failover 
manager we get a device is not ready error. I am assuming this is due to SCSI-3 
persistent reservations.

Has anyone managed to get ceph to serve iscsi to windows clustered shared 
volumes? If so, how?
Kind regards,
Glen Baars
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Invalid Object map without flags set

2018-08-20 Thread Glen Baars
Hello K,

We have found our issue – we were only fixing the main RDB image in our script 
rather than the snapshots. Working fine now.

Thanks for your help.
Kind regards,
Glen Baars
From: Konstantin Shalygin 
Sent: Friday, 17 August 2018 11:20 AM
To: ceph-users@lists.ceph.com; Glen Baars 
Subject: Re: [ceph-users] Invalid Object map without flags set


We are having issues with ensuring that object-map and fast-diff is working 
correctly. Most of the time when there is an invalid fast-diff map, the flag is 
set to correctly indicate this. We have a script that checks for this and 
rebuilds object maps as required. If we don't fix these, snapshot removal and 
rbd usage commands are too slow.



About 10% of the time when we issue an `rbd du` command we get the following 
situation.



warning: fast-diff map is invalid for 2d6b4502-f720-4c00-b4a4-8879e415f283 at 
18-D-2018-07-11<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>:1109. 
operation may be slow.



When we check the `rbd info` it doesn't have any flags set.



[INFO]
{"name":"2d6b4502-f720-4c00-b4a4-8879e415f283","size":536870912000,"objects":128000,"order":22,"object_size":4194304,"block_name_prefix":"rbd_data.17928643c9869","format":2,"features":["layering","exclusive-lock","object-map","fast-diff","deep-flatten"],"flags":[],"create_timestamp":"Sat
 Apr 28 19:45:59 2018"}

[Feat]["layering","exclusive-lock","object-map","fast-diff","deep-flatten"]

[Flag][]



Is there another way to detect invalid object maps?



Ceph 12.2.7 - All Bluestore
As long as I remember, when object-map is bad you will see this flag on rbd 
info.



for e in `rbd ls replicated_rbd`; do echo "replicated_rbd/${e}"; rbd info 
replicated_rbd/${e} | grep "flag"; done





k

This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Invalid Object map without flags set

2018-08-16 Thread Glen Baars
Hello Ceph Users,

We are having issues with ensuring that object-map and fast-diff is working 
correctly. Most of the time when there is an invalid fast-diff map, the flag is 
set to correctly indicate this. We have a script that checks for this and 
rebuilds object maps as required. If we don't fix these, snapshot removal and 
rbd usage commands are too slow.

About 10% of the time when we issue an `rbd du` command we get the following 
situation.

warning: fast-diff map is invalid for 
2d6b4502-f720-4c00-b4a4-8879e415f283@18-D-2018-07-11:1109. operation may be 
slow.

When we check the `rbd info` it doesn't have any flags set.

[INFO]
{"name":"2d6b4502-f720-4c00-b4a4-8879e415f283","size":536870912000,"objects":128000,"order":22,"object_size":4194304,"block_name_prefix":"rbd_data.17928643c9869","format":2,"features":["layering","exclusive-lock","object-map","fast-diff","deep-flatten"],"flags":[],"create_timestamp":"Sat
 Apr 28 19:45:59 2018"}
[Feat]["layering","exclusive-lock","object-map","fast-diff","deep-flatten"]
[Flag][]

Is there another way to detect invalid object maps?

Ceph 12.2.7 - All Bluestore

Kind regards,

Glen Baars

This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD journal feature

2018-08-16 Thread Glen Baars
Thanks for your help 
Kind regards,
Glen Baars
From: Jason Dillaman 
Sent: Thursday, 16 August 2018 10:21 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] RBD journal feature

On Thu, Aug 16, 2018 at 2:37 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Is there any workaround that you can think of to correctly enable journaling on 
locked images?

You could add the "rbd journal pool = XYZ" configuration option to the 
ceph.conf on the hosts currently using the images (or use 'rbd image-meta set 
 conf_rbd_journal_pool SSDPOOL' on each image), 
restart/live-migrate the affected VMs(?) to pick up the config changes, and 
enable journaling.

Kind regards,
Glen Baars

From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
On Behalf Of Glen Baars
Sent: Tuesday, 14 August 2018 9:36 PM
To: dilla...@redhat.com<mailto:dilla...@redhat.com>
Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

Hello Jason,

Thanks for your help. Here is the output you asked for also.

https://pastebin.com/dKH6mpwk
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:33 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 9:31 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I have now narrowed it down.

If the image has an exclusive lock – the journal doesn’t go on the correct pool.

OK, that makes sense. If you have an active client on the image holding the 
lock, the request to enable journaling is sent over to that client but it's 
missing all the journal options. I'll open a tracker ticket to fix the issue.

Thanks.

Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:29 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature


On Tue, Aug 14, 2018 at 9:19 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it 
doesn’t seem to make a difference.

It should be SSDPOOL, but regardless, I am at a loss as to why it's not working 
for you. You can try appending "--debug-rbd=20" to the end of the "rbd feature 
enable" command and provide the generated logs in a pastebin link.

Also, here is the output:

rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
There are 0 metadata on this image.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:00 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling 
journaling on a different pool:

$ rbd info rbd/foo
rbd image 'foo':
   size 1024 MB in 256 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.101e6b8b4567
   format: 2
   features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten
   flags:
   create_timestamp: Tue Aug 14 08:51:19 2018
$ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd --image foo
rbd journal '101e6b8b4567':
   header_oid: journal.101e6b8b4567
   object_oid_prefix: journal_data.1.101e6b8b4567.
   order: 24 (16384 kB objects)
   splay_width: 4
   object_pool: rbd_ssd

Can you please run "rbd image-meta list " to see if you are 
overwriting any configuration settings? Do you have any client configuration 
overrides in your "/etc/ceph/ceph.conf"?

On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I will also complete testing of a few combinations tomorrow to try and isolate 
the issue now that we can get it to work with a new image.

The cluster started out at 12.2.3 bluestore so there shouldn’t be any old 
issues from previous versions.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 7:43 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current image

Re: [ceph-users] RBD journal feature

2018-08-16 Thread Glen Baars
Is there any workaround that you can think of to correctly enable journaling on 
locked images?
Kind regards,
Glen Baars

From: ceph-users  On Behalf Of Glen Baars
Sent: Tuesday, 14 August 2018 9:36 PM
To: dilla...@redhat.com
Cc: ceph-users 
Subject: Re: [ceph-users] RBD journal feature

Hello Jason,

Thanks for your help. Here is the output you asked for also.

https://pastebin.com/dKH6mpwk
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:33 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 9:31 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I have now narrowed it down.

If the image has an exclusive lock – the journal doesn’t go on the correct pool.

OK, that makes sense. If you have an active client on the image holding the 
lock, the request to enable journaling is sent over to that client but it's 
missing all the journal options. I'll open a tracker ticket to fix the issue.

Thanks.

Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:29 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature


On Tue, Aug 14, 2018 at 9:19 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it 
doesn’t seem to make a difference.

It should be SSDPOOL, but regardless, I am at a loss as to why it's not working 
for you. You can try appending "--debug-rbd=20" to the end of the "rbd feature 
enable" command and provide the generated logs in a pastebin link.

Also, here is the output:

rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
There are 0 metadata on this image.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:00 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling 
journaling on a different pool:

$ rbd info rbd/foo
rbd image 'foo':
   size 1024 MB in 256 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.101e6b8b4567
   format: 2
   features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten
   flags:
   create_timestamp: Tue Aug 14 08:51:19 2018
$ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd --image foo
rbd journal '101e6b8b4567':
   header_oid: journal.101e6b8b4567
   object_oid_prefix: journal_data.1.101e6b8b4567.
   order: 24 (16384 kB objects)
   splay_width: 4
   object_pool: rbd_ssd

Can you please run "rbd image-meta list " to see if you are 
overwriting any configuration settings? Do you have any client configuration 
overrides in your "/etc/ceph/ceph.conf"?

On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I will also complete testing of a few combinations tomorrow to try and isolate 
the issue now that we can get it to work with a new image.

The cluster started out at 12.2.3 bluestore so there shouldn’t be any old 
issues from previous versions.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 7:43 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.

I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 
just in case it's an issue that's only in the luminous release.

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HD

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Glen Baars
Hello Jason,

Thanks for your help. Here is the output you asked for also.

https://pastebin.com/dKH6mpwk
Kind regards,
Glen Baars

From: Jason Dillaman 
Sent: Tuesday, 14 August 2018 9:33 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 9:31 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I have now narrowed it down.

If the image has an exclusive lock – the journal doesn’t go on the correct pool.

OK, that makes sense. If you have an active client on the image holding the 
lock, the request to enable journaling is sent over to that client but it's 
missing all the journal options. I'll open a tracker ticket to fix the issue.

Thanks.

Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:29 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature


On Tue, Aug 14, 2018 at 9:19 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it 
doesn’t seem to make a difference.

It should be SSDPOOL, but regardless, I am at a loss as to why it's not working 
for you. You can try appending "--debug-rbd=20" to the end of the "rbd feature 
enable" command and provide the generated logs in a pastebin link.

Also, here is the output:

rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
There are 0 metadata on this image.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:00 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling 
journaling on a different pool:

$ rbd info rbd/foo
rbd image 'foo':
   size 1024 MB in 256 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.101e6b8b4567
   format: 2
   features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten
   flags:
   create_timestamp: Tue Aug 14 08:51:19 2018
$ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd --image foo
rbd journal '101e6b8b4567':
   header_oid: journal.101e6b8b4567
   object_oid_prefix: journal_data.1.101e6b8b4567.
   order: 24 (16384 kB objects)
   splay_width: 4
   object_pool: rbd_ssd

Can you please run "rbd image-meta list " to see if you are 
overwriting any configuration settings? Do you have any client configuration 
overrides in your "/etc/ceph/ceph.conf"?

On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I will also complete testing of a few combinations tomorrow to try and isolate 
the issue now that we can get it to work with a new image.

The cluster started out at 12.2.3 bluestore so there shouldn’t be any old 
issues from previous versions.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 7:43 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.

I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 
just in case it's an issue that's only in the luminous release.

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a

:~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling 
--journal-pool RBD_SSD

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd journal '37c8974b0dc51':
header_oid: journal.

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Glen Baars
Hello Jason,

I have now narrowed it down.

If the image has an exclusive lock – the journal doesn’t go on the correct pool.
Kind regards,
Glen Baars

From: Jason Dillaman 
Sent: Tuesday, 14 August 2018 9:29 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] RBD journal feature


On Tue, Aug 14, 2018 at 9:19 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it 
doesn’t seem to make a difference.

It should be SSDPOOL, but regardless, I am at a loss as to why it's not working 
for you. You can try appending "--debug-rbd=20" to the end of the "rbd feature 
enable" command and provide the generated logs in a pastebin link.

Also, here is the output:

rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
There are 0 metadata on this image.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 9:00 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling 
journaling on a different pool:

$ rbd info rbd/foo
rbd image 'foo':
   size 1024 MB in 256 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.101e6b8b4567
   format: 2
   features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten
   flags:
   create_timestamp: Tue Aug 14 08:51:19 2018
$ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd --image foo
rbd journal '101e6b8b4567':
   header_oid: journal.101e6b8b4567
   object_oid_prefix: journal_data.1.101e6b8b4567.
   order: 24 (16384 kB objects)
   splay_width: 4
   object_pool: rbd_ssd

Can you please run "rbd image-meta list " to see if you are 
overwriting any configuration settings? Do you have any client configuration 
overrides in your "/etc/ceph/ceph.conf"?

On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I will also complete testing of a few combinations tomorrow to try and isolate 
the issue now that we can get it to work with a new image.

The cluster started out at 12.2.3 bluestore so there shouldn’t be any old 
issues from previous versions.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 7:43 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.

I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 
just in case it's an issue that's only in the luminous release.

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a

:~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling 
--journal-pool RBD_SSD

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd journal '37c8974b0dc51':
header_oid: journal.37c8974b0dc51
object_oid_prefix: journal_data.1.37c8974b0dc51.
order: 24 (16384 kB objects)
splay_width: 4
*** 

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, journaling
flags:
create_timestamp: Sat May  5 11:39:07 2018
journal: 37c8974b0dc51
mirroring state: disabled

Kind regards,
Glen Baars
From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 12

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Glen Baars
Hello Jason,

I have tried with and without ‘rbd journal pool = rbd’ in the ceph.conf. it 
doesn’t seem to make a difference.

Also, here is the output:

rbd image-meta list RBD-HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
There are 0 metadata on this image.
Kind regards,
Glen Baars

From: Jason Dillaman 
Sent: Tuesday, 14 August 2018 9:00 PM
To: Glen Baars 
Cc: dillaman ; ceph-users 
Subject: Re: [ceph-users] RBD journal feature

I tried w/ a rbd CLI from 12.2.7 and I still don't have an issue enabling 
journaling on a different pool:

$ rbd info rbd/foo
rbd image 'foo':
   size 1024 MB in 256 objects
   order 22 (4096 kB objects)
   block_name_prefix: rbd_data.101e6b8b4567
   format: 2
   features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten
   flags:
   create_timestamp: Tue Aug 14 08:51:19 2018
$ rbd feature enable rbd/foo journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd --image foo
rbd journal '101e6b8b4567':
   header_oid: journal.101e6b8b4567
   object_oid_prefix: journal_data.1.101e6b8b4567.
   order: 24 (16384 kB objects)
   splay_width: 4
   object_pool: rbd_ssd

Can you please run "rbd image-meta list " to see if you are 
overwriting any configuration settings? Do you have any client configuration 
overrides in your "/etc/ceph/ceph.conf"?

On Tue, Aug 14, 2018 at 8:25 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I will also complete testing of a few combinations tomorrow to try and isolate 
the issue now that we can get it to work with a new image.

The cluster started out at 12.2.3 bluestore so there shouldn’t be any old 
issues from previous versions.
Kind regards,
Glen Baars

From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 7:43 PM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.

I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 
just in case it's an issue that's only in the luminous release.

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a

:~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling 
--journal-pool RBD_SSD

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd journal '37c8974b0dc51':
header_oid: journal.37c8974b0dc51
object_oid_prefix: journal_data.1.37c8974b0dc51.
order: 24 (16384 kB objects)
splay_width: 4
*** 

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, journaling
flags:
create_timestamp: Sat May  5 11:39:07 2018
journal: 37c8974b0dc51
    mirroring state: disabled

Kind regards,
Glen Baars
From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 12:04 AM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Sun, Aug 12, 2018 at 12:13 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. 
Is this the correct way to view the journal objects?

You won't see any journal objects in the SSDPOOL until you issue a write:

$ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test
$ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M 
rbd_hdd/test --rbd-cache=false
bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
  SEC

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Glen Baars
Hello Jason,

I will also complete testing of a few combinations tomorrow to try and isolate 
the issue now that we can get it to work with a new image.

The cluster started out at 12.2.3 bluestore so there shouldn’t be any old 
issues from previous versions.
Kind regards,
Glen Baars

From: Jason Dillaman 
Sent: Tuesday, 14 August 2018 7:43 PM
To: Glen Baars 
Cc: dillaman ; ceph-users 
Subject: Re: [ceph-users] RBD journal feature

On Tue, Aug 14, 2018 at 4:08 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.

I wouldn't think so. I used master branch for my testing but I'll try 12.2.7 
just in case it's an issue that's only in the luminous release.

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a

:~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling 
--journal-pool RBD_SSD

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd journal '37c8974b0dc51':
header_oid: journal.37c8974b0dc51
object_oid_prefix: journal_data.1.37c8974b0dc51.
order: 24 (16384 kB objects)
splay_width: 4
*** 

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, journaling
flags:
create_timestamp: Sat May  5 11:39:07 2018
journal: 37c8974b0dc51
mirroring state: disabled

Kind regards,
Glen Baars
From: Jason Dillaman mailto:jdill...@redhat.com>>
Sent: Tuesday, 14 August 2018 12:04 AM
To: Glen Baars mailto:g...@onsitecomputers.com.au>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] RBD journal feature

On Sun, Aug 12, 2018 at 12:13 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. 
Is this the correct way to view the journal objects?

You won't see any journal objects in the SSDPOOL until you issue a write:

$ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test
$ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M 
rbd_hdd/test --rbd-cache=false
bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
  SEC   OPS   OPS/SEC   BYTES/SEC
1   320332.01  1359896.98
2   736360.83  1477975.96
3  1040351.17  1438393.57
4  1392350.94  1437437.51
5  1744350.24  1434576.94
6  2080349.82  1432866.06
7  2416341.73  1399731.23
8  2784348.37  1426930.69
9  3152347.40  1422966.67
   10  3520356.04  1458356.70
   11  3920361.34  1480050.97
elapsed:11  ops: 4096  ops/sec:   353.61  bytes/sec: 1448392.06
$ rbd feature enable rbd_hdd/test journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd_hdd --image test
rbd journal '10746b8b4567':
header_oid: journal.10746b8b4567
object_oid_prefix: journal_data.2.10746b8b4567.
order: 24 (16 MiB objects)
splay_width: 4
object_pool: rbd_ssd
$ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M 
rbd_hdd/test --rbd-cache=false
bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
  SEC   OPS   OPS/SEC   BYTES/SEC
1   240248.54  1018005.17
2   512263.47  1079154.06
3   768258.74  1059792.10
4  1040258.50  1058812.60
5  1312258.06  1057001.34
6  1536258.21  1057633.14
7  1792253.81  1039604.73
8  2032253.66  1038971.01
9  2256241.41  988800.93
   10  2480237.87  974335.65
   11  2752239.41  980624.20
   12  2992239.61  981440.94
   13  3200233.13  954887.84
   14  3440237.36  972237.80
   15  3680239.47  980853.37
   16  3920238.75  977920.70
el

Re: [ceph-users] RBD journal feature

2018-08-14 Thread Glen Baars
Hello Jason,

I can confirm that your tests work on our cluster with a newly created image.

We still can’t get the current images to use a different object pool. Do you 
think that maybe another feature is incompatible with this feature? Below is a 
log of the issue.


:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Sat May  5 11:39:07 2018

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd: journaling is not enabled for image 2ef34a96-27e0-4ae7-9888-fd33c38f657a

:~# rbd feature enable RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a journaling 
--journal-pool RBD_SSD

:~# rbd journal info --pool RBD_HDD --image 2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd journal '37c8974b0dc51':
header_oid: journal.37c8974b0dc51
object_oid_prefix: journal_data.1.37c8974b0dc51.
order: 24 (16384 kB objects)
splay_width: 4
*** 

:~# rbd info RBD_HDD/2ef34a96-27e0-4ae7-9888-fd33c38f657a
rbd image '2ef34a96-27e0-4ae7-9888-fd33c38f657a':
size 51200 MB in 12800 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.37c8974b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, journaling
flags:
create_timestamp: Sat May  5 11:39:07 2018
journal: 37c8974b0dc51
mirroring state: disabled

Kind regards,
Glen Baars
From: Jason Dillaman 
Sent: Tuesday, 14 August 2018 12:04 AM
To: Glen Baars 
Cc: dillaman ; ceph-users 
Subject: Re: [ceph-users] RBD journal feature

On Sun, Aug 12, 2018 at 12:13 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Jason,

Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. 
Is this the correct way to view the journal objects?

You won't see any journal objects in the SSDPOOL until you issue a write:

$ rbd create --size 1G --image-feature exclusive-lock rbd_hdd/test
$ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M 
rbd_hdd/test --rbd-cache=false
bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
  SEC   OPS   OPS/SEC   BYTES/SEC
1   320332.01  1359896.98
2   736360.83  1477975.96
3  1040351.17  1438393.57
4  1392350.94  1437437.51
5  1744350.24  1434576.94
6  2080349.82  1432866.06
7  2416341.73  1399731.23
8  2784348.37  1426930.69
9  3152347.40  1422966.67
   10  3520356.04  1458356.70
   11  3920361.34  1480050.97
elapsed:11  ops: 4096  ops/sec:   353.61  bytes/sec: 1448392.06
$ rbd feature enable rbd_hdd/test journaling --journal-pool rbd_ssd
$ rbd journal info --pool rbd_hdd --image test
rbd journal '10746b8b4567':
header_oid: journal.10746b8b4567
object_oid_prefix: journal_data.2.10746b8b4567.
order: 24 (16 MiB objects)
splay_width: 4
object_pool: rbd_ssd
$ rbd bench --io-type=write --io-pattern=rand --io-size=4K --io-total=16M 
rbd_hdd/test --rbd-cache=false
bench  type write io_size 4096 io_threads 16 bytes 16777216 pattern random
  SEC   OPS   OPS/SEC   BYTES/SEC
1   240248.54  1018005.17
2   512263.47  1079154.06
3   768258.74  1059792.10
4  1040258.50  1058812.60
5  1312258.06  1057001.34
6  1536258.21  1057633.14
7  1792253.81  1039604.73
8  2032253.66  1038971.01
9  2256241.41  988800.93
   10  2480237.87  974335.65
   11  2752239.41  980624.20
   12  2992239.61  981440.94
   13  3200233.13  954887.84
   14  3440237.36  972237.80
   15  3680239.47  980853.37
   16  3920238.75  977920.70
elapsed:16  ops: 4096  ops/sec:   245.04  bytes/sec: 1003692.81
$ rados -p rbd_ssd ls | grep journal_data.2.10746b8b4567.
journal_data.2.10746b8b4567.3
journal_data.2.10746b8b4567.0
journal_data.2.10746b8b4567.2
journal_data.2.10746b8b4567.1

rbd feature enable SLOWPOOL/RBDImage journaling --journal-pool SSDPOOL
The symptoms that we are experiencing is a huge decrease in write speed ( 1QD 
128K writes from 160MB/s down to 14MB/s ). We see no improvement when moving 
the journal to SSDPOOL ( but we don’t think it is really moving )

If you are trying to optimize for 128KiB writes, you might need to tweak the 
"rbd_journal_max_payload_bytes" setting since it currently is defaulted to 
split journal write events into a maximum of 16KiB payload [1] in order to 
optimize the worst-case memory usage of the r

Re: [ceph-users] RBD journal feature

2018-08-11 Thread Glen Baars
Hello Jason,

Interesting, I used ‘rados ls’ to view the SSDPOOL and can’t see any objects. 
Is this the correct way to view the journal objects?
rbd feature enable SLOWPOOL/RBDImage journaling --journal-pool SSDPOOL
The symptoms that we are experiencing is a huge decrease in write speed ( 1QD 
128K writes from 160MB/s down to 14MB/s ). We see no improvement when moving 
the journal to SSDPOOL ( but we don’t think it is really moving )
Kind regards,
Glen Baars

From: Jason Dillaman 
Sent: Saturday, 11 August 2018 11:28 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] RBD journal feature

On Fri, Aug 10, 2018 at 3:01 AM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Ceph Users,

I am trying to implement image journals for our RBD images ( required for 
mirroring )

rbd feature enable SLOWPOOL/RBDImage journaling --journal-pool SSDPOOL

When we run the above command we still find the journal on the SLOWPOOL and not 
on the SSDPOOL. We are running 12.2.7 and all bluestore. We have also tried the 
ceph.conf option (rbd journal pool = SSDPOOL )
Has anyone else gotten this working?
The journal header was on SLOWPOOL or the journal data objects? I would expect 
that the journal metadata header is located on SLOWPOOL but all data objects 
should be created on SSDPOOL as needed.

Kind regards,
Glen Baars
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Jason
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-10 Thread Glen Baars
I have now gotten this working. Thanks everyone for the help. The RBD-Mirror 
service is co-located on a MON server.

Key points are:

Start the services on the boxes with the following syntax ( depending on your 
config file names )

On primary
systemctl start ceph-rbd-mirror@primary

On secondary
systemctl start ceph-rbd-mirror@secondary

Ensure this works on both boxes
ceph --cluster secondary -n client.secondary -s
ceph --cluster primary -n client.primary -s

check the log files under - /var/log/ceph/ceph-client.primary.log and 
/var/log/ceph/ceph-client.secondary.log

My primary server had these files in it.

ceph.client.admin.keyring
ceph.client.primary.keyring
ceph.conf
primary.client.primary.keyring
primary.conf
secondary.client.secondary.keyring
secondary.conf

Kind regards,
Glen Baars

-Original Message-
From: Thode Jocelyn 
Sent: Thursday, 9 August 2018 1:41 PM
To: Erik McCormick 
Cc: Glen Baars ; Vasu Kulkarni 
; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] [Ceph-deploy] Cluster Name

Hi Erik,

The thing is that the rbd-mirror service uses the /etc/sysconfig/ceph file to 
determine which configuration file to use (from CLUSTER_NAME). So you need to 
set this to the name you chose for rbd-mirror to work. However setting this 
CLUSTER_NAME variable in /etc/sysconfig/ceph makes it so that the mon, osd etc 
services will also use this variable. Because of this they cannot start anymore 
as all their path are set with "ceph" as cluster name.

However there might be something that I missed which would make this point moot

Best Regards
Jocelyn Thode

-Original Message-
From: Erik McCormick [mailto:emccorm...@cirrusseven.com]
Sent: mercredi, 8 août 2018 16:39
To: Thode Jocelyn 
Cc: Glen Baars ; Vasu Kulkarni 
; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

I'm not using this feature, so maybe I'm missing something, but from the way I 
understand cluster naming to work...

I still don't understand why this is blocking for you. Unless you are 
attempting to mirror between two clusters running on the same hosts (why would 
you do this?) then systemd doesn't come into play. The --cluster flag on the 
rbd command will simply set the name of a configuration file with the FSID and 
settings of the appropriate cluster. Cluster name is just a way of telling ceph 
commands and systemd units where to find the configs.

So, what you end up with is something like:

/etc/ceph/ceph.conf (your local cluster configuration) on both clusters 
/etc/ceph/local.conf (config of the source cluster. Just a copy of ceph.conf of 
the source clsuter) /etc/ceph/remote.conf (config of destination peer cluster. 
Just a copy of ceph.conf of the remote cluster).

Run all your rbd mirror commands against local and remote names.
However when starting things like mons, osds, mds, etc. you need no cluster 
name as it can use ceph.conf (cluster name of ceph).

Am I making sense, or have I completely missed something?

-Erik

On Wed, Aug 8, 2018 at 8:34 AM, Thode Jocelyn  wrote:
> Hi,
>
>
>
> We are still blocked by this problem on our end. Glen did you  or
> someone else figure out something for this ?
>
>
>
> Regards
>
> Jocelyn Thode
>
>
>
> From: Glen Baars [mailto:g...@onsitecomputers.com.au]
> Sent: jeudi, 2 août 2018 05:43
> To: Erik McCormick 
> Cc: Thode Jocelyn ; Vasu Kulkarni
> ; ceph-users@lists.ceph.com
> Subject: RE: [ceph-users] [Ceph-deploy] Cluster Name
>
>
>
> Hello Erik,
>
>
>
> We are going to use RBD-mirror to replicate the clusters. This seems
> to need separate cluster names.
>
> Kind regards,
>
> Glen Baars
>
>
>
> From: Erik McCormick 
> Sent: Thursday, 2 August 2018 9:39 AM
> To: Glen Baars 
> Cc: Thode Jocelyn ; Vasu Kulkarni
> ; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name
>
>
>
> Don't set a cluster name. It's no longer supported. It really only
> matters if you're running two or more independent clusters on the same
> boxes. That's generally inadvisable anyway.
>
>
>
> Cheers,
>
> Erik
>
>
>
> On Wed, Aug 1, 2018, 9:17 PM Glen Baars  wrote:
>
> Hello Ceph Users,
>
> Does anyone know how to set the Cluster Name when deploying with
> Ceph-deploy? I have 3 clusters to configure and need to correctly set
> the name.
>
> Kind regards,
> Glen Baars
>
> -Original Message-
> From: ceph-users  On Behalf Of Glen
> Baars
> Sent: Monday, 23 July 2018 5:59 PM
> To: Thode Jocelyn ; Vasu Kulkarni
> 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name
>
> How very timely, I am facing the exact same issue.
>
> Kind regards,
> Glen Baars
>
> -Original Message-
> From: ceph-users  On Behalf Of
> Thode Jocelyn
> Sent: M

[ceph-users] RBD journal feature

2018-08-10 Thread Glen Baars
Hello Ceph Users,

I am trying to implement image journals for our RBD images ( required for 
mirroring )

rbd feature enable SLOWPOOL/RBDImage journaling --journal-pool SSDPOOL

When we run the above command we still find the journal on the SLOWPOOL and not 
on the SSDPOOL. We are running 12.2.7 and all bluestore. We have also tried the 
ceph.conf option (rbd journal pool = SSDPOOL )
Has anyone else gotten this working?
Kind regards,
Glen Baars
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-01 Thread Glen Baars
Hello Erik,

We are going to use RBD-mirror to replicate the clusters. This seems to need 
separate cluster names.
Kind regards,
Glen Baars

From: Erik McCormick 
Sent: Thursday, 2 August 2018 9:39 AM
To: Glen Baars 
Cc: Thode Jocelyn ; Vasu Kulkarni ; 
ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

Don't set a cluster name. It's no longer supported. It really only matters if 
you're running two or more independent clusters on the same boxes. That's 
generally inadvisable anyway.

Cheers,
Erik

On Wed, Aug 1, 2018, 9:17 PM Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Ceph Users,

Does anyone know how to set the Cluster Name when deploying with Ceph-deploy? I 
have 3 clusters to configure and need to correctly set the name.

Kind regards,
Glen Baars

-Original Message-
From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
On Behalf Of Glen Baars
Sent: Monday, 23 July 2018 5:59 PM
To: Thode Jocelyn mailto:jocelyn.th...@elca.ch>>; Vasu 
Kulkarni mailto:vakul...@redhat.com>>
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

How very timely, I am facing the exact same issue.

Kind regards,
Glen Baars

-Original Message-
From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
On Behalf Of Thode Jocelyn
Sent: Monday, 23 July 2018 1:42 PM
To: Vasu Kulkarni mailto:vakul...@redhat.com>>
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

Hi,

Yes my rbd-mirror is coloctaed with my mon/osd. It only affects nodes where 
they are collocated as they all use the "/etc/sysconfig/ceph" configuration 
file.

Best
Jocelyn Thode

-Original Message-
From: Vasu Kulkarni [mailto:vakul...@redhat.com<mailto:vakul...@redhat.com>]
Sent: vendredi, 20 juillet 2018 17:25
To: Thode Jocelyn mailto:jocelyn.th...@elca.ch>>
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

On Fri, Jul 20, 2018 at 7:29 AM, Thode Jocelyn 
mailto:jocelyn.th...@elca.ch>> wrote:
> Hi,
>
>
>
> I noticed that in commit
> https://github.com/ceph/ceph-deploy/commit/b1c27b85d524f2553af2487a980
> 23b60efe421f3, the ability to specify a cluster name was removed. Is
> there a reason for this removal ?
>
>
>
> Because right now, there are no possibility to create a ceph cluster
> with a different name with ceph-deploy which is a big problem when
> having two clusters replicating with rbd-mirror as we need different names.
>
>
>
> And even when following the doc here:
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/h
> tml/block_device_guide/block_device_mirroring#rbd-mirroring-clusters-w
> ith-the-same-name
>
>
>
> This is not sufficient as once we change the CLUSTER variable in the
> sysconfig file, mon,osd, mds etc. all use it and fail to start on a
> reboot as they then try to load data from a path in /var/lib/ceph
> containing the cluster name.

Is you rbd-mirror client also colocated with mon/osd? This needs to be changed 
only on the client side where you are doing mirroring, rest of the nodes are 
not affected?


>
>
>
> Is there a solution to this problem ?
>
>
>
> Best Regards
>
> Jocelyn Thode
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is 

Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-01 Thread Glen Baars
Hello Ceph Users,

Does anyone know how to set the Cluster Name when deploying with Ceph-deploy? I 
have 3 clusters to configure and need to correctly set the name.

Kind regards,
Glen Baars

-Original Message-
From: ceph-users  On Behalf Of Glen Baars
Sent: Monday, 23 July 2018 5:59 PM
To: Thode Jocelyn ; Vasu Kulkarni 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

How very timely, I am facing the exact same issue.

Kind regards,
Glen Baars

-Original Message-
From: ceph-users  On Behalf Of Thode Jocelyn
Sent: Monday, 23 July 2018 1:42 PM
To: Vasu Kulkarni 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

Hi,

Yes my rbd-mirror is coloctaed with my mon/osd. It only affects nodes where 
they are collocated as they all use the "/etc/sysconfig/ceph" configuration 
file.

Best
Jocelyn Thode

-Original Message-
From: Vasu Kulkarni [mailto:vakul...@redhat.com]
Sent: vendredi, 20 juillet 2018 17:25
To: Thode Jocelyn 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

On Fri, Jul 20, 2018 at 7:29 AM, Thode Jocelyn  wrote:
> Hi,
>
>
>
> I noticed that in commit
> https://github.com/ceph/ceph-deploy/commit/b1c27b85d524f2553af2487a980
> 23b60efe421f3, the ability to specify a cluster name was removed. Is
> there a reason for this removal ?
>
>
>
> Because right now, there are no possibility to create a ceph cluster
> with a different name with ceph-deploy which is a big problem when
> having two clusters replicating with rbd-mirror as we need different names.
>
>
>
> And even when following the doc here:
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/h
> tml/block_device_guide/block_device_mirroring#rbd-mirroring-clusters-w
> ith-the-same-name
>
>
>
> This is not sufficient as once we change the CLUSTER variable in the
> sysconfig file, mon,osd, mds etc. all use it and fail to start on a
> reboot as they then try to load data from a path in /var/lib/ceph
> containing the cluster name.

Is you rbd-mirror client also colocated with mon/osd? This needs to be changed 
only on the client side where you are doing mirroring, rest of the nodes are 
not affected?


>
>
>
> Is there a solution to this problem ?
>
>
>
> Best Regards
>
> Jocelyn Thode
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-07-23 Thread Glen Baars
How very timely, I am facing the exact same issue.

Kind regards,
Glen Baars

-Original Message-
From: ceph-users  On Behalf Of Thode Jocelyn
Sent: Monday, 23 July 2018 1:42 PM
To: Vasu Kulkarni 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

Hi,

Yes my rbd-mirror is coloctaed with my mon/osd. It only affects nodes where 
they are collocated as they all use the "/etc/sysconfig/ceph" configuration 
file.

Best
Jocelyn Thode

-Original Message-
From: Vasu Kulkarni [mailto:vakul...@redhat.com]
Sent: vendredi, 20 juillet 2018 17:25
To: Thode Jocelyn 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name

On Fri, Jul 20, 2018 at 7:29 AM, Thode Jocelyn  wrote:
> Hi,
>
>
>
> I noticed that in commit
> https://github.com/ceph/ceph-deploy/commit/b1c27b85d524f2553af2487a980
> 23b60efe421f3, the ability to specify a cluster name was removed. Is
> there a reason for this removal ?
>
>
>
> Because right now, there are no possibility to create a ceph cluster
> with a different name with ceph-deploy which is a big problem when
> having two clusters replicating with rbd-mirror as we need different names.
>
>
>
> And even when following the doc here:
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/h
> tml/block_device_guide/block_device_mirroring#rbd-mirroring-clusters-w
> ith-the-same-name
>
>
>
> This is not sufficient as once we change the CLUSTER variable in the
> sysconfig file, mon,osd, mds etc. all use it and fail to start on a
> reboot as they then try to load data from a path in /var/lib/ceph
> containing the cluster name.

Is you rbd-mirror client also colocated with mon/osd? This needs to be changed 
only on the client side where you are doing mirroring, rest of the nodes are 
not affected?


>
>
>
> Is there a solution to this problem ?
>
>
>
> Best Regards
>
> Jocelyn Thode
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 12.2.7 - Available space decreasing when adding disks

2018-07-21 Thread Glen Baars
Thanks for the reply! - it ended being that the HDD pool in this server is 
larger than the other servers. This increases the server's weight and therefore 
the SSD pool in this server is affected.

I will add more SSDs to this server to keep the ratio of HDDs to SSDs the same 
across all hosts.
Kind regards,
Glen Baars

From: Linh Vu 
Sent: Sunday, 22 July 2018 7:46 AM
To: Glen Baars ; ceph-users 

Subject: Re: 12.2.7 - Available space decreasing when adding disks


Something funny going on with your  new disks:


138   ssd 0.90970  1.0  931G  820G  111G 88.08 2.71 216 Added
139   ssd 0.90970  1.0  931G  771G  159G 82.85 2.55 207 Added
140   ssd 0.90970  1.0  931G  709G  222G 76.12 2.34 197 Added
141   ssd 0.90970  1.0  931G  664G  267G 71.31 2.19 184 Added


The last 3 columns are: % used, variation, and PG count. These 4 have much 
higher %used and PG count than the rest, almost double. You probably have these 
disks in multiple pools and therefore have too many PGs on them.



One of them is at 88% used. The max available capacity of a pool is calculated 
based on the most full OSD in it, which is why your total available capacity 
drops to 0.6TB.


From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
on behalf of Glen Baars 
mailto:g...@onsitecomputers.com.au>>
Sent: Saturday, 21 July 2018 10:43:16 AM
To: ceph-users
Subject: [ceph-users] 12.2.7 - Available space decreasing when adding disks


Hello Ceph Users,



We have added more ssd storage to our ceph cluster last night. We added 4 x 1TB 
drives and the available space went from 1.6TB to 0.6TB ( in `ceph df` for the 
SSD pool ).



I would assume that the weight needs to be changed but I didn't think I would 
need to? Should I change them to 0.75 from 0.9 and hopefully it will rebalance 
correctly?



#ceph osd tree | grep -v hdd

ID  CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF

-1   534.60309 root default

-1962.90637 host NAS-AUBUN-RK2-CEPH06

115   ssd   0.43660 osd.115   up  1.0 1.0

116   ssd   0.43660 osd.116   up  1.0 1.0

117   ssd   0.43660 osd.117   up  1.0 1.0

118   ssd   0.43660 osd.118   up  1.0 1.0

-22   105.51169 host NAS-AUBUN-RK2-CEPH07

138   ssd   0.90970 osd.138   up  1.0 1.0 Added

139   ssd   0.90970 osd.139   up  1.0 1.0 Added

-25   105.51169 host NAS-AUBUN-RK2-CEPH08

140   ssd   0.90970 osd.140   up  1.0 1.0 Added

141   ssd   0.90970 osd.141   up  1.0 1.0 Added

-356.32617 host NAS-AUBUN-RK3-CEPH01

60   ssd   0.43660 osd.60up  1.0 1.0

61   ssd   0.43660 osd.61up  1.0 1.0

62   ssd   0.43660 osd.62up  1.0 1.0

63   ssd   0.43660 osd.63up  1.0 1.0

-556.32617 host NAS-AUBUN-RK3-CEPH02

64   ssd   0.43660 osd.64up  1.0 1.0

65   ssd   0.43660 osd.65up  1.0 1.0

66   ssd   0.43660 osd.66up  1.0 1.0

67   ssd   0.43660 osd.67up  1.0 1.0

-756.32617 host NAS-AUBUN-RK3-CEPH03

68   ssd   0.43660 osd.68up  1.0 1.0

69   ssd   0.43660 osd.69up  1.0 1.0

70   ssd   0.43660 osd.70up  1.0 1.0

71   ssd   0.43660 osd.71up  1.0 1.0

-1345.84741 host NAS-AUBUN-RK3-CEPH04

72   ssd   0.54579 osd.72up  1.0 1.0

73   ssd   0.54579 osd.73up  1.0 1.0

76   ssd   0.54579 osd.76up  1.0 1.0

77   ssd   0.54579 osd.77up  1.0 1.0

-1645.84741 host NAS-AUBUN-RK3-CEPH05

74   ssd   0.54579 osd.74up  1.0 1.0

75   ssd   0.54579 osd.75up  1.0 1.0

78   ssd   0.54579 osd.78up  1.0 1.0

79   ssd   0.54579 osd.79up  1.0 1.0



# ceph osd df | grep -v hdd

ID  CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL %USE  VAR  PGS

115   ssd 0.43660  1.0  447G  250G  196G 56.00 1.72 103

116   ssd 0.43660  1.0  447G  191G  255G 42.89 1.32  84

117   ssd 0.43660  1.0  447G  213G  233G 47.79 1.47  92

118   ssd 0.43660  1.0  447G  208G  238G 46.61 1.43  85

138   ssd 0.90970  1.0  931G  820G  111G 88.08 2.71 216 Added

139   ssd 0.90970  1.0  931G  771G  159G 82.85 2.55 207 Added

140   ssd 0.90970  1

Re: [ceph-users] 12.2.7 - Available space decreasing when adding disks

2018-07-21 Thread Glen Baars
   0.43660 osd.68up  1.0 1.0
69   ssd   0.43660 osd.69up  1.0 1.0
70   ssd   0.43660 osd.70up  1.0 1.0
71   ssd   0.43660 osd.71up  1.0 1.0
-1345.84741 host NAS-AUBUN-RK3-CEPH04
80   hdd   3.63869 osd.80up  1.0 1.0
81   hdd   3.63869 osd.81up  1.0 1.0
82   hdd   3.63869 osd.82up  1.0 1.0
83   hdd   3.63869 osd.83up  1.0 1.0
84   hdd   3.63869 osd.84up  1.0 1.0
85   hdd   3.63869 osd.85up  1.0 1.0
86   hdd   3.63869 osd.86up  1.0 1.0
87   hdd   3.63869 osd.87up  1.0 1.0
88   hdd   3.63869 osd.88up  1.0 1.0
89   hdd   3.63869 osd.89up  1.0 1.0
90   hdd   3.63869 osd.90up  1.0 1.0
91   hdd   3.63869 osd.91up  1.0 1.0
72   ssd   0.54579 osd.72up  1.0 1.0
73   ssd   0.54579 osd.73up  1.0 1.0
76   ssd   0.54579 osd.76up  1.0 1.0
77   ssd   0.54579 osd.77up  1.0 1.0
-1645.84741 host NAS-AUBUN-RK3-CEPH05
92   hdd   3.63869 osd.92up  1.0 1.0
93   hdd   3.63869 osd.93up  1.0 1.0
94   hdd   3.63869 osd.94up  1.0 1.0
95   hdd   3.63869 osd.95up  1.0 1.0
96   hdd   3.63869 osd.96up  1.0 1.0
97   hdd   3.63869 osd.97up  1.0 1.0
98   hdd   3.63869 osd.98up  1.0 1.0
99   hdd   3.63869 osd.99up  1.0 1.0
100   hdd   3.63869 osd.100   up  1.0 1.0
101   hdd   3.63869 osd.101   up  1.0 1.0
102   hdd   3.63869 osd.102   up  1.0 1.0
103   hdd   3.63869 osd.103   up  1.0 1.0
74   ssd   0.54579 osd.74up  1.0 1.0
75   ssd   0.54579 osd.75up  1.0 1.0
78   ssd   0.54579 osd.78up  1.0 1.0
79   ssd   0.54579 osd.79up  1.0 1.0
Kind regards,
Glen Baars

From: Shawn Iverson 
Sent: Saturday, 21 July 2018 9:21 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] 12.2.7 - Available space decreasing when adding disks

Glen,

Correction...looked at the wrong column for weights, my bad...

I was looking at the wrong column for weight.  You have varying weights, but 
the process is still the same.  Balance your buckets (hosts) in your crush map, 
and balance your osds in each bucket (host).

On Sat, Jul 21, 2018 at 9:14 AM, Shawn Iverson 
mailto:ivers...@rushville.k12.in.us>> wrote:
Glen,

It appears you have 447G, 931G, and 558G disks in your cluster, all with a 
weight of 1.0.  This means that although the new disks are bigger, they are not 
going to be utilized by pgs any more than any other disk.

I would suggest reweighting your other disks (they are smaller), so that you 
balance your cluster.  You should do this gradually over time, preferably 
during off-peak times, when remapping will not affect operations.

I do a little math, first by taking total cluster capacity and dividing it by 
total capacity of each bucket.  I then do the same thing in each bucket, until 
everything is proportioned appropriately down to the osds.

On Fri, Jul 20, 2018 at 8:43 PM, Glen Baars 
mailto:g...@onsitecomputers.com.au>> wrote:
Hello Ceph Users,

We have added more ssd storage to our ceph cluster last night. We added 4 x 1TB 
drives and the available space went from 1.6TB to 0.6TB ( in `ceph df` for the 
SSD pool ).

I would assume that the weight needs to be changed but I didn’t think I would 
need to? Should I change them to 0.75 from 0.9 and hopefully it will rebalance 
correctly?

#ceph osd tree | grep -v hdd
ID  CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
-1   534.60309 root default
-1962.90637 host NAS-AUBUN-RK2-CEPH06
115   ssd   0.43660 osd.115   up  1.0 1.0
116   ssd   0.43660 osd.116   up  1.0 1.0
117   ssd   0.43660 osd.117   up  1.0 1.0
118   ssd   0.43660 osd.118   up  1.0 1.0
-22   105.51169 host NAS-AUBUN-RK2-CEPH07
138   ssd   0.90970 osd.138  

[ceph-users] 12.2.7 - Available space decreasing when adding disks

2018-07-20 Thread Glen Baars
Hello Ceph Users,

We have added more ssd storage to our ceph cluster last night. We added 4 x 1TB 
drives and the available space went from 1.6TB to 0.6TB ( in `ceph df` for the 
SSD pool ).

I would assume that the weight needs to be changed but I didn't think I would 
need to? Should I change them to 0.75 from 0.9 and hopefully it will rebalance 
correctly?

#ceph osd tree | grep -v hdd
ID  CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
-1   534.60309 root default
-1962.90637 host NAS-AUBUN-RK2-CEPH06
115   ssd   0.43660 osd.115   up  1.0 1.0
116   ssd   0.43660 osd.116   up  1.0 1.0
117   ssd   0.43660 osd.117   up  1.0 1.0
118   ssd   0.43660 osd.118   up  1.0 1.0
-22   105.51169 host NAS-AUBUN-RK2-CEPH07
138   ssd   0.90970 osd.138   up  1.0 1.0 Added
139   ssd   0.90970 osd.139   up  1.0 1.0 Added
-25   105.51169 host NAS-AUBUN-RK2-CEPH08
140   ssd   0.90970 osd.140   up  1.0 1.0 Added
141   ssd   0.90970 osd.141   up  1.0 1.0 Added
-356.32617 host NAS-AUBUN-RK3-CEPH01
60   ssd   0.43660 osd.60up  1.0 1.0
61   ssd   0.43660 osd.61up  1.0 1.0
62   ssd   0.43660 osd.62up  1.0 1.0
63   ssd   0.43660 osd.63up  1.0 1.0
-556.32617 host NAS-AUBUN-RK3-CEPH02
64   ssd   0.43660 osd.64up  1.0 1.0
65   ssd   0.43660 osd.65up  1.0 1.0
66   ssd   0.43660 osd.66up  1.0 1.0
67   ssd   0.43660 osd.67up  1.0 1.0
-756.32617 host NAS-AUBUN-RK3-CEPH03
68   ssd   0.43660 osd.68up  1.0 1.0
69   ssd   0.43660 osd.69up  1.0 1.0
70   ssd   0.43660 osd.70up  1.0 1.0
71   ssd   0.43660 osd.71up  1.0 1.0
-1345.84741 host NAS-AUBUN-RK3-CEPH04
72   ssd   0.54579 osd.72up  1.0 1.0
73   ssd   0.54579 osd.73up  1.0 1.0
76   ssd   0.54579 osd.76up  1.0 1.0
77   ssd   0.54579 osd.77up  1.0 1.0
-1645.84741 host NAS-AUBUN-RK3-CEPH05
74   ssd   0.54579 osd.74up  1.0 1.0
75   ssd   0.54579 osd.75up  1.0 1.0
78   ssd   0.54579 osd.78up  1.0 1.0
79   ssd   0.54579 osd.79up  1.0 1.0

# ceph osd df | grep -v hdd
ID  CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL %USE  VAR  PGS
115   ssd 0.43660  1.0  447G  250G  196G 56.00 1.72 103
116   ssd 0.43660  1.0  447G  191G  255G 42.89 1.32  84
117   ssd 0.43660  1.0  447G  213G  233G 47.79 1.47  92
118   ssd 0.43660  1.0  447G  208G  238G 46.61 1.43  85
138   ssd 0.90970  1.0  931G  820G  111G 88.08 2.71 216 Added
139   ssd 0.90970  1.0  931G  771G  159G 82.85 2.55 207 Added
140   ssd 0.90970  1.0  931G  709G  222G 76.12 2.34 197 Added
141   ssd 0.90970  1.0  931G  664G  267G 71.31 2.19 184 Added
60   ssd 0.43660  1.0  447G  275G  171G 61.62 1.89 100
61   ssd 0.43660  1.0  447G  237G  209G 53.04 1.63  90
62   ssd 0.43660  1.0  447G  275G  171G 61.58 1.89  95
63   ssd 0.43660  1.0  447G  260G  187G 58.15 1.79  97
64   ssd 0.43660  1.0  447G  232G  214G 52.08 1.60  83
65   ssd 0.43660  1.0  447G  207G  239G 46.36 1.42  75
66   ssd 0.43660  1.0  447G  217G  230G 48.54 1.49  84
67   ssd 0.43660  1.0  447G  252G  195G 56.36 1.73  92
68   ssd 0.43660  1.0  447G  248G  198G 55.56 1.71  94
69   ssd 0.43660  1.0  447G  229G  217G 51.25 1.57  84
70   ssd 0.43660  1.0  447G  259G  187G 58.01 1.78  87
71   ssd 0.43660  1.0  447G  267G  179G 59.83 1.84  97
72   ssd 0.54579  1.0  558G  217G  341G 38.96 1.20 100
73   ssd 0.54579  1.0  558G  283G  275G 50.75 1.56 121
76   ssd 0.54579  1.0  558G  286G  272G 51.33 1.58 129
77   ssd 0.54579  1.0  558G  246G  312G 44.07 1.35 104
74   ssd 0.54579  1.0  558G  273G  285G 48.91 1.50 122
75   ssd 0.54579  1.0  558G  281G  276G 50.45 1.55 114
78   ssd 0.54579  1.0  558G  289G  269G 51.80 1.59 133
79   ssd 0.54579  1.0  558G  276G  282G 49.39 1.52 119
Kind regards,
Glen Baars
BackOnline Manager

This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you

Re: [ceph-users] 12.2.6 upgrade

2018-07-20 Thread Glen Baars
Thanks, we are fully bluestore and therefore just set osd skip data digest = 
true

Kind regards,
Glen Baars

-Original Message-
From: Dan van der Ster 
Sent: Friday, 20 July 2018 4:08 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] 12.2.6 upgrade

That's right. But please read the notes carefully to understand if you need to 
set
   osd skip data digest = true
or
   osd distrust data digest = true

.. dan

On Fri, Jul 20, 2018 at 10:02 AM Glen Baars  wrote:
>
> I saw that on the release notes.
>
> Does that mean that the active+clean+inconsistent PGs will be OK?
>
> Is the data still getting replicated even if inconsistent?
>
> Kind regards,
> Glen Baars
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Friday, 20 July 2018 3:57 PM
> To: Glen Baars 
> Cc: ceph-users 
> Subject: Re: [ceph-users] 12.2.6 upgrade
>
> CRC errors are expected in 12.2.7 if you ran 12.2.6 with bluestore.
> See
> https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12
> -2-6
>
> On Fri, Jul 20, 2018 at 8:30 AM Glen Baars  
> wrote:
> >
> > Hello Ceph Users,
> >
> >
> >
> > We have upgraded all nodes to 12.2.7 now. We have 90PGs ( ~2000 scrub 
> > errors ) to fix from the time when we ran 12.2.6. It doesn’t seem to be 
> > affecting production at this time.
> >
> >
> >
> > Below is the log of a PG repair. What is the best way to correct these 
> > errors? Is there any further information required?
> >
> >
> >
> > rados list-inconsistent-obj 1.275 --format=json-pretty
> >
> > {
> >
> > "epoch": 38481,
> >
> > "inconsistents": []
> >
> > }
> >
> >
> >
> > Is it odd that it doesn’t list any inconsistents?
> >
> >
> >
> > Ceph.log entries for this PG.
> >
> > 2018-07-20 12:13:28.381903 osd.124 osd.124 10.4.35.36:6810/1865422
> > 81 : cluster [ERR] 1.275 shard 100: soid
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head
> > data_digest 0x1a131dab != data_digest 0x92f2c4c8 from auth oi
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'3148
> > 36 client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304
> > uv 314836 dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:13:28.381907 osd.124 osd.124 10.4.35.36:6810/1865422
> > 82 : cluster [ERR] 1.275 shard 124: soid
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head
> > data_digest 0x1a131dab != data_digest 0x92f2c4c8 from auth oi
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'3148
> > 36 client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304
> > uv 314836 dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:13:28.381909 osd.124 osd.124 10.4.35.36:6810/1865422
> > 83 : cluster [ERR] 1.275 soid
> > 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head: failed to
> > pick suitable auth object
> >
> > 2018-07-20 12:15:15.310579 osd.124 osd.124 10.4.35.36:6810/1865422
> > 84 : cluster [ERR] 1.275 shard 100: soid
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head
> > data_digest 0xdf907335 != data_digest 0x38400b00 from auth oi
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'3306
> > 51 client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304
> > uv 307138 dd 38400b00 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:15:15.310582 osd.124 osd.124 10.4.35.36:6810/1865422
> > 85 : cluster [ERR] 1.275 shard 124: soid
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head
> > data_digest 0xdf907335 != data_digest 0x38400b00 from auth oi
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'3306
> > 51 client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304
> > uv 307138 dd 38400b00 od  alloc_hint [4194304 4194304 0])
> >
> > 2018-07-20 12:15:15.310584 osd.124 osd.124 10.4.35.36:6810/1865422
> > 86 : cluster [ERR] 1.275 soid
> > 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head: failed to
> > pick suitable auth object
> >
> > 2018-07-20 12:16:07.518970 osd.124 osd.124 10.4.35.36:6810/1865422
> > 87 : cluster [ERR] 1.275 shard 100: soid
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head
> > data_digest 0x6555a7c9 != data_digest 0xbad822f from auth oi
> > 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'3148
> > 79 client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304
> > uv 314879 dd bad82

Re: [ceph-users] 12.2.6 upgrade

2018-07-20 Thread Glen Baars
I saw that on the release notes.

Does that mean that the active+clean+inconsistent PGs will be OK?

Is the data still getting replicated even if inconsistent?

Kind regards,
Glen Baars

-Original Message-
From: Dan van der Ster 
Sent: Friday, 20 July 2018 3:57 PM
To: Glen Baars 
Cc: ceph-users 
Subject: Re: [ceph-users] 12.2.6 upgrade

CRC errors are expected in 12.2.7 if you ran 12.2.6 with bluestore. See
https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6

On Fri, Jul 20, 2018 at 8:30 AM Glen Baars  wrote:
>
> Hello Ceph Users,
>
>
>
> We have upgraded all nodes to 12.2.7 now. We have 90PGs ( ~2000 scrub errors 
> ) to fix from the time when we ran 12.2.6. It doesn’t seem to be affecting 
> production at this time.
>
>
>
> Below is the log of a PG repair. What is the best way to correct these 
> errors? Is there any further information required?
>
>
>
> rados list-inconsistent-obj 1.275 --format=json-pretty
>
> {
>
> "epoch": 38481,
>
> "inconsistents": []
>
> }
>
>
>
> Is it odd that it doesn’t list any inconsistents?
>
>
>
> Ceph.log entries for this PG.
>
> 2018-07-20 12:13:28.381903 osd.124 osd.124 10.4.35.36:6810/1865422 81 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest 
> 0x1a131dab != data_digest 0x92f2c4c8 from auth oi 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 
> client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 
> dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:13:28.381907 osd.124 osd.124 10.4.35.36:6810/1865422 82 : 
> cluster [ERR] 1.275 shard 124: soid 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head data_digest 
> 0x1a131dab != data_digest 0x92f2c4c8 from auth oi 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head(37917'314836 
> client.1079025.0:24453722 dirty|data_digest|omap_digest s 4194304 uv 314836 
> dd 92f2c4c8 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:13:28.381909 osd.124 osd.124 10.4.35.36:6810/1865422 83 : 
> cluster [ERR] 1.275 soid 
> 1:ae423e16:::rbd_data.37c2374b0dc51.0004917b:head: failed to pick 
> suitable auth object
>
> 2018-07-20 12:15:15.310579 osd.124 osd.124 10.4.35.36:6810/1865422 84 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest 
> 0xdf907335 != data_digest 0x38400b00 from auth oi 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 
> client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 dd 
> 38400b00 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:15:15.310582 osd.124 osd.124 10.4.35.36:6810/1865422 85 : 
> cluster [ERR] 1.275 shard 124: soid 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head data_digest 
> 0xdf907335 != data_digest 0x38400b00 from auth oi 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head(38269'330651 
> client.232404.0:23912666 dirty|data_digest|omap_digest s 4194304 uv 307138 dd 
> 38400b00 od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:15:15.310584 osd.124 osd.124 10.4.35.36:6810/1865422 86 : 
> cluster [ERR] 1.275 soid 
> 1:ae455519:::rbd_data.3844874b0dc51.000293f2:head: failed to pick 
> suitable auth object
>
> 2018-07-20 12:16:07.518970 osd.124 osd.124 10.4.35.36:6810/1865422 87 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest 
> 0x6555a7c9 != data_digest 0xbad822f from auth oi 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 
> client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 
> dd bad822f od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:16:07.518975 osd.124 osd.124 10.4.35.36:6810/1865422 88 : 
> cluster [ERR] 1.275 shard 124: soid 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head data_digest 
> 0x6555a7c9 != data_digest 0xbad822f from auth oi 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head(37917'314879 
> client.1079025.0:24564045 dirty|data_digest|omap_digest s 4194304 uv 314879 
> dd bad822f od  alloc_hint [4194304 4194304 0])
>
> 2018-07-20 12:16:07.518977 osd.124 osd.124 10.4.35.36:6810/1865422 89 : 
> cluster [ERR] 1.275 soid 
> 1:ae470eb2:::rbd_data.37c2374b0dc51.00049a4b:head: failed to pick 
> suitable auth object
>
> 2018-07-20 12:16:29.476778 osd.124 osd.124 10.4.35.36:6810/1865422 90 : 
> cluster [ERR] 1.275 shard 100: soid 
> 1:ae47e410:::rbd_data.37c2374b0dc51.00024b09:head data_digest 
> 0xa394e845 != data_digest 0xd8aa931c 

[ceph-users] 12.2.6 upgrade

2018-07-20 Thread Glen Baars
gest 0x218b7cb4 from auth oi 
1:ae4de127:::rbd_data.37c2374b0dc51.0002f6a6:head(37426'306744 
client.1079025.0:23363742 dirty|data_digest|omap_digest s 4194304 uv 306744 dd 
218b7cb4 od  alloc_hint [4194304 4194304 0])
2018-07-20 12:19:59.498925 osd.124 osd.124 10.4.35.36:6810/1865422 94 : cluster 
[ERR] 1.275 shard 124: soid 
1:ae4de127:::rbd_data.37c2374b0dc51.0002f6a6:head data_digest 
0x2008cb1b != data_digest 0x218b7cb4 from auth oi 
1:ae4de127:::rbd_data.37c2374b0dc51.0002f6a6:head(37426'306744 
client.1079025.0:23363742 dirty|data_digest|omap_digest s 4194304 uv 306744 dd 
218b7cb4 od  alloc_hint [4194304 4194304 0])
2018-07-20 12:19:59.498927 osd.124 osd.124 10.4.35.36:6810/1865422 95 : cluster 
[ERR] 1.275 soid 1:ae4de127:::rbd_data.37c2374b0dc51.0002f6a6:head: 
failed to pick suitable auth object
2018-07-20 12:20:29.937564 osd.124 osd.124 10.4.35.36:6810/1865422 96 : cluster 
[ERR] 1.275 shard 100: soid 
1:ae4f1dd8:::rbd_data.7695c59bb0bc2.05bb:head data_digest 
0x1b42858b != data_digest 0x69a5f3de from auth oi 
1:ae4f1dd8:::rbd_data.7695c59bb0bc2.05bb:head(38220'328463 
client.1084539.0:403248048 dirty|data_digest|omap_digest s 4194304 uv 308146 dd 
69a5f3de od  alloc_hint [4194304 4194304 0])
2018-07-20 12:20:29.937568 osd.124 osd.124 10.4.35.36:6810/1865422 97 : cluster 
[ERR] 1.275 shard 124: soid 
1:ae4f1dd8:::rbd_data.7695c59bb0bc2.05bb:head data_digest 
0x1b42858b != data_digest 0x69a5f3de from auth oi 
1:ae4f1dd8:::rbd_data.7695c59bb0bc2.05bb:head(38220'328463 
client.1084539.0:403248048 dirty|data_digest|omap_digest s 4194304 uv 308146 dd 
69a5f3de od  alloc_hint [4194304 4194304 0])
2018-07-20 12:20:29.937570 osd.124 osd.124 10.4.35.36:6810/1865422 98 : cluster 
[ERR] 1.275 soid 1:ae4f1dd8:::rbd_data.7695c59bb0bc2.05bb:head: 
failed to pick suitable auth object
2018-07-20 12:21:07.463206 osd.124 osd.124 10.4.35.36:6810/1865422 99 : cluster 
[ERR] 1.275 repair 12 errors, 0 fixed

Kind regards,
Glen Baars

From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
On Behalf Of Glen Baars
Sent: Wednesday, 18 July 2018 10:33 PM
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: [ceph-users] 10.2.6 upgrade

Hello Ceph Users,

We installed 12.2.6 on a single node in the cluster ( new node added, 80TB 
moved )
Disabled scrub/deepscrub once the issues with 12.2.6 were discovered.


Today we upgrade the one affected node to 12.2.7 today, set osd skip data 
digest = true and re enabled the scrubs. It's a 500TB all bluestore cluster.


We are now seeing inconsistent PGs and scrub errors now the scrubbing has 
resumed.

What is the best way forward?


  1.  Upgrade all nodes to 12.2.7?
  2.  Remove the 12.2.7 node and rebuild?
Kind regards,
Glen Baars
BackOnline Manager
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 10.2.6 upgrade

2018-07-18 Thread Glen Baars
Hello Sage,

Thanks for the response.

I new fairly new to ceph. Is there any commands that would help confirm the 
issue?

Kind regards,
Glen Baars

T  1300 733 328
NZ +64 9280 3561
MOB +61 447 991 234


This e-mail may contain confidential and/or privileged information.If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-mail is strictly 
forbidden.

-Original Message-
From: Sage Weil 
Sent: Wednesday, 18 July 2018 10:38 PM
To: Glen Baars 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] 10.2.6 upgrade

On Wed, 18 Jul 2018, Glen Baars wrote:
> Hello Ceph Users,
>
> We installed 12.2.6 on a single node in the cluster ( new node added,
> 80TB moved ) Disabled scrub/deepscrub once the issues with 12.2.6 were 
> discovered.
>
>
> Today we upgrade the one affected node to 12.2.7 today, set osd skip data 
> digest = true and re enabled the scrubs. It's a 500TB all bluestore cluster.
>
>
> We are now seeing inconsistent PGs and scrub errors now the scrubbing has 
> resumed.

It is likely the inconsistencies were tehre from teh period running 12.2.6, not 
due ot 12.2.7.  I would suggest continuing the upgrade.  The scrub errors will 
either go away on their own or need to wait until 12.2.8 for scrub to learn how 
to repair them for you.

Can you share the scrub error you got to confirm it is the digest issue in
12.2.6 that is to blame?

sage

> What is the best way forward?
>
>
>   1.  Upgrade all nodes to 12.2.7?
>   2.  Remove the 12.2.7 node and rebuild?
> Kind regards,
> Glen Baars
> BackOnline Manager
> This e-mail is intended solely for the benefit of the addressee(s) and any 
> other named recipient. It is confidential and may contain legally privileged 
> or confidential information. If you are not the recipient, any use, 
> distribution, disclosure or copying of this e-mail is prohibited. The 
> confidentiality and legal privilege attached to this communication is not 
> waived or lost by reason of the mistaken transmission or delivery to you. If 
> you have received this e-mail in error, please notify us immediately.
>
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 10.2.6 upgrade

2018-07-18 Thread Glen Baars
Hello Ceph Users,

We installed 12.2.6 on a single node in the cluster ( new node added, 80TB 
moved )
Disabled scrub/deepscrub once the issues with 12.2.6 were discovered.


Today we upgrade the one affected node to 12.2.7 today, set osd skip data 
digest = true and re enabled the scrubs. It's a 500TB all bluestore cluster.


We are now seeing inconsistent PGs and scrub errors now the scrubbing has 
resumed.

What is the best way forward?


  1.  Upgrade all nodes to 12.2.7?
  2.  Remove the 12.2.7 node and rebuild?
Kind regards,
Glen Baars
BackOnline Manager
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] intermittent slow requests on idle ssd ceph clusters

2018-07-16 Thread Glen Baars
Hello Pavel,

I don't have all that much info ( fairly new to Ceph ) but we are facing a 
similar issue. If the cluster is fairly idle we get slow requests - if I'm 
backfilling a new node there is no slow requests. Same X540 network cards but 
ceph 12.2.5 and Ubuntu 16.04. 4.4.0 kernel. LACP with VLANs for ceph 
front/backend networks.

Not sure that it is the same issue but if you want me to do any tests - let me 
know.

Kind regards,
Glen Baars

-Original Message-
From: ceph-users  On Behalf Of Xavier Trilla
Sent: Tuesday, 17 July 2018 6:16 AM
To: Pavel Shub ; Ceph Users 
Subject: Re: [ceph-users] intermittent slow requests on idle ssd ceph clusters

Hi Pavel,

Any strange messages on dmesg, syslog, etc?

I would recommend profiling the kernel with perf and checking for the calls 
that are consuming more CPU.

We had several problems like the one you are describing, and for example one of 
them got fixed increasing vm.min_free_kbytes to 4GB.

Also, how is the sys usage if you run top on the machines hosting the OSDs?

Saludos Cordiales,
Xavier Trilla P.
Clouding.io

¿Un Servidor Cloud con SSDs, redundado
y disponible en menos de 30 segundos?

¡Pruébalo ahora en Clouding.io!

-Mensaje original-
De: ceph-users  En nombre de Pavel Shub 
Enviado el: lunes, 16 de julio de 2018 23:52
Para: Ceph Users 
Asunto: [ceph-users] intermittent slow requests on idle ssd ceph clusters

Hello folks,

We've been having issues with slow requests cropping up on practically idle 
ceph clusters. From what I can tell the requests are hanging waiting for 
subops, and the OSD on the other end receives requests minutes later! Below it 
started waiting for subops at 12:09:51 and the subop was completed at 12:14:28.

{
"description": "osd_op(client.903117.0:569924 6.391 
6:89ed76f2:::%2fraster%2fv5%2fes%2f16%2f36320%2f24112:head [writefull 0~2072] 
snapc 0=[] ondisk+write+known_if_redirected e5777)",
"initiated_at": "2018-07-05 12:09:51.191419",
"age": 326.651167,
"duration": 276.977834,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.903117",
"client_addr": "10.20.31.234:0/1433094386",
"tid": 569924
},
"events": [
{
"time": "2018-07-05 12:09:51.191419",
"event": "initiated"
},
{
"time": "2018-07-05 12:09:51.191471",
"event": "queued_for_pg"
},
{
"time": "2018-07-05 12:09:51.191538",
"event": "reached_pg"
},
{
"time": "2018-07-05 12:09:51.191877",
"event": "started"
},
{
"time": "2018-07-05 12:09:51.192135",
"event": "waiting for subops from 11"
},
{
"time": "2018-07-05 12:09:51.192599",
"event": "op_commit"
},
{
"time": "2018-07-05 12:09:51.192616",
"event": "op_applied"
},
{
"time": "2018-07-05 12:14:28.169018",
"event": "sub_op_commit_rec from 11"
},
{
"time": "2018-07-05 12:14:28.169164",
"event": "commit_sent"
},
{
"time": "2018-07-05 12:14:28.169253",
"event": "done"
}
]
}
},

The below is what I assume the corresponding request on osd.11, it seems to be 
receiving the network request ~4 minutes later.

2018-07-05 12:14:28.058552 7fb75ee0e700 20 osd.11 5777 share_map_peer
0x562b61bca000 already has epoch 5777
2018-07-05 12:14:28.167247 7fb75de0c700 10 osd.11 5777  new session
0x562cc23f0200 con=0x562baaa0e000 addr=10.16.15.28:6805/3218
2018-07-05 12:14:28.167282 7fb75de0c700 10 osd.11 5777  session
0x562cc23f0200 osd.20 has caps osdcap[grant(*)] 'allow *'
2018-07-05 12:14:28.167291 7fb75de0c700  0 -- 10.16.16.32:6817/3808 >>
10.16.15.28:6805/3218 conn(0x562baaa0e000 :6817 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg 
accept connect_seq 20 vs existing csq=19 existing_state=STATE_STANDBY
2018-07-05 12:14:28.167322 7fb7546d6700  2 osd.11 5777 ms_handle_reset con 
0x562baaa0e000 session 0x562cc23f0200
2018-07-05 12:14:28.167546 7fb75de0c700 10 osd.11 5777  session
0x562b62195c00 osd.20 has caps osdcap[grant(*)] 'allow *'

This is an all SSD cluster with minimal load. All hardware checks return good 
values. The cluster is currently running latest ceph mimic
(13.2.0) but we have also experienced this on other versions of luminous 12.2.2 
and 12.2.5.

I'm starting to think that this is a potential network driver issue.
We're currently running on kernel 4.14.15 and when we updated to latest 4.17 
the slow requests seem to occur more frequently. The network cards that we run 
are 10g intel X540.

Does anyone know how I can debug this further?

Thanks,
Pavel
_

Re: [ceph-users] 12.2.6 CRC errors

2018-07-14 Thread Glen Baars
Thanks Uwe,

I was that on the website.

Any idea if what I have done is correct? Do I now just wait?

Sent from my Cyanogen phone

On 14 Jul 2018 11:16 PM, Uwe Sauter  wrote:
Hi Glen,

about 16h ago there has been a notice on this list with subject "IMPORTANT: 
broken luminous 12.2.6 release in repo, do
not upgrade" from Sage Weil (main developer of Ceph).

Quote from this notice:

"tl;dr:  Please avoid the 12.2.6 packages that are currently present on
download.ceph.com.  We will have a 12.2.7 published ASAP (probably
Monday).

If you do not use bluestore or erasure-coded pools, none of the issues
affect you.


Details:

We built 12.2.6 and pushed it to the repos Wednesday, but as that was
happening realized there was a potentially dangerous regression in
12.2.5[1] that an upgrade might exacerbate.  While we sorted that issue
out, several people noticed the updated version in the repo and
upgraded.  That turned up two other regressions[2][3].  We have fixes for
those, but are working on an additional fix to make the damage from [3]
be transparently repaired."



Regards,

Uwe



Am 14.07.2018 um 17:02 schrieb Glen Baars:
> Hello Ceph users!
>
> Note to users, don't install new servers on Friday the 13th!
>
> We added a new ceph node on Friday and it has received the latest 12.2.6 
> update. I started to see CRC errors and investigated hardware issues. I have 
> since found that it is caused by the 12.2.6 release. About 80TB copied onto 
> this server.
>
> I have set noout,noscrub,nodeepscrub and repaired the affected PGs ( ceph pg 
> repair ) . This has cleared the errors.
>
> * no idea if this is a good way to fix the issue. From the bug report 
> this issue is in the deepscrub and therefore I suppose stopping it will limit 
> the issues. ***
>
> Can anyone tell me what to do? Downgrade seems that it won't fix the issue. 
> Maybe remove this node and rebuild with 12.2.5 and resync data? Wait a few 
> days for 12.2.7?
>
> Kind regards,
> Glen Baars
> This e-mail is intended solely for the benefit of the addressee(s) and any 
> other named recipient. It is confidential and may contain legally privileged 
> or confidential information. If you are not the recipient, any use, 
> distribution, disclosure or copying of this e-mail is prohibited. The 
> confidentiality and legal privilege attached to this communication is not 
> waived or lost by reason of the mistaken transmission or delivery to you. If 
> you have received this e-mail in error, please notify us immediately.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 12.2.6 CRC errors

2018-07-14 Thread Glen Baars
Hello Ceph users!

Note to users, don't install new servers on Friday the 13th!

We added a new ceph node on Friday and it has received the latest 12.2.6 
update. I started to see CRC errors and investigated hardware issues. I have 
since found that it is caused by the 12.2.6 release. About 80TB copied onto 
this server.

I have set noout,noscrub,nodeepscrub and repaired the affected PGs ( ceph pg 
repair ) . This has cleared the errors.

* no idea if this is a good way to fix the issue. From the bug report this 
issue is in the deepscrub and therefore I suppose stopping it will limit the 
issues. ***

Can anyone tell me what to do? Downgrade seems that it won't fix the issue. 
Maybe remove this node and rebuild with 12.2.5 and resync data? Wait a few days 
for 12.2.7?

Kind regards,
Glen Baars
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com