Re: [ceph-users] Ceph pg repair clone_missing?

2019-10-02 Thread Brad Hubbard
On Wed, Oct 2, 2019 at 9:00 PM Marc Roos wrote: > > > > Hi Brad, > > I was following the thread where you adviced on this pg repair > > I ran these rados 'list-inconsistent-obj'/'rados > list-inconsistent-snapset' and have output on the snapset. I tried to > extrapolate your comment on the

Re: [ceph-users] rgw S3 lifecycle cannot keep up

2019-10-02 Thread Robin H. Johnson
On Wed, Oct 02, 2019 at 01:48:40PM +0200, Christian Pedersen wrote: > Hi Martin, > > Even before adding cold storage on HDD, I had the cluster with SSD only. That > also could not keep up with deleting the files. > I am no where near I/O exhaustion on the SSDs or even the HDDs. Please see my

Re: [ceph-users] tcmu-runner: mismatched sizes for rbd image size

2019-10-02 Thread Mike Christie
On 10/02/2019 02:15 PM, Kilian Ries wrote: > Ok i just compared my local python files and the git commit you sent me > - it really looks like i have the old files installed. All the changes > are missing in my local files. > > > > Where can i get a new ceph-iscsi-config package that has the

[ceph-users] Unexpected increase in the memory usage of OSDs

2019-10-02 Thread Vladimir Brik
Hello I am running a Ceph 14.2.2 cluster and a few days ago, memory consumption of our OSDs started to unexpectedly grow on all 5 nodes, after being stable for about 6 months. Node memory consumption: https://icecube.wisc.edu/~vbrik/graph.png Average OSD resident size:

Re: [ceph-users] Local Device Health PG inconsistent

2019-10-02 Thread Reed Dier
And now to fill in the full circle. Sadly my solution was to run > $ ceph pg repair 33.0 which returned > 2019-10-02 15:38:54.499318 osd.12 (osd.12) 181 : cluster [DBG] 33.0 repair > starts > 2019-10-02 15:38:55.502606 osd.12 (osd.12) 182 : cluster [ERR] 33.0 repair : > stat mismatch, got

Re: [ceph-users] tcmu-runner: mismatched sizes for rbd image size

2019-10-02 Thread Kilian Ries
Ok i just compared my local python files and the git commit you sent me - it really looks like i have the old files installed. All the changes are missing in my local files. Where can i get a new ceph-iscsi-config package that has the fixe included? I have installed version:

Re: [ceph-users] tcmu-runner: mismatched sizes for rbd image size

2019-10-02 Thread Kilian Ries
Yes, i created all four luns with these sizes: lun0 - 5120G lun1 - 5121G lun2 - 5122G lun3 - 5123G Its always one GB more per LUN... Is there any newer ceph-iscsi-config package than i have installed? ceph-iscsi-config-2.6-2.6.el7.noarch Then i could try to update the package and see

[ceph-users] MDS Stability with lots of CAPS

2019-10-02 Thread Stefan Kooman
Hi, According to [1] there are new parameters in place to have the MDS behave more stable. Quoting that blog post "One of the more recent issues weve discovered is that an MDS with a very large cache (64+GB) will hang during certain recovery events." For all of us that are not (yet) running

Re: [ceph-users] tcmu-runner: mismatched sizes for rbd image size

2019-10-02 Thread Jason Dillaman
On Wed, Oct 2, 2019 at 9:50 AM Kilian Ries wrote: > > Hi, > > > i'm running a ceph mimic cluster with 4x ISCSI gateway nodes. Cluster was > setup via ceph-ansible v3.2-stable. I just checked my nodes and saw that only > two of the four configured iscsi gw nodes are working correct. I first >

[ceph-users] tcmu-runner: mismatched sizes for rbd image size

2019-10-02 Thread Kilian Ries
Hi, i'm running a ceph mimic cluster with 4x ISCSI gateway nodes. Cluster was setup via ceph-ansible v3.2-stable. I just checked my nodes and saw that only two of the four configured iscsi gw nodes are working correct. I first noticed via gwcli: ### $gwcli -d ls Traceback (most recent

Re: [ceph-users] rgw S3 lifecycle cannot keep up

2019-10-02 Thread Christian Pedersen
Hi Martin, Even before adding cold storage on HDD, I had the cluster with SSD only. That also could not keep up with deleting the files. I am no where near I/O exhaustion on the SSDs or even the HDDs. Cheers, Christian On Oct 2 2019, at 1:23 pm, Martin Verges wrote: > Hello Christian, > > the

Re: [ceph-users] rgw S3 lifecycle cannot keep up

2019-10-02 Thread Martin Verges
Hello Christian, the problem is, that HDD is not capable of providing lots of IOs required for "~4 million small files". -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO:

[ceph-users] Ceph pg repair clone_missing?

2019-10-02 Thread Marc Roos
Hi Brad, I was following the thread where you adviced on this pg repair I ran these rados 'list-inconsistent-obj'/'rados list-inconsistent-snapset' and have output on the snapset. I tried to extrapolate your comment on the data/omap_digest_mismatch_info onto my situation. But I don't

[ceph-users] rgw S3 lifecycle cannot keep up

2019-10-02 Thread Christian Pedersen
Hi, Using the S3 gateway I store ~4 million small files in my cluster every day. I have a lifecycle setup to move these files to cold storage after a day and delete them after two days. The default storage is SSD based and the cold storage is HDD. However the rgw lifecycle process cannot keep

Re: [ceph-users] Have you enabled the telemetry module yet?

2019-10-02 Thread Stefan Kooman
> > I created this issue: https://tracker.ceph.com/issues/42116 > > Seems to be related to the 'crash' module not enabled. > > If you enable the module the problem should be gone. Now I need to check > why this message is popping up. Yup, crash module enabled and error message is gone. Either

Re: [ceph-users] Have you enabled the telemetry module yet?

2019-10-02 Thread Wido den Hollander
On 10/1/19 4:38 PM, Stefan Kooman wrote: > Quoting Wido den Hollander (w...@42on.com): >> Hi, >> >> The Telemetry [0] module has been in Ceph since the Mimic release and >> when enabled it sends back a anonymized JSON back to >> https://telemetry.ceph.com/ every 72 hours with information about

Re: [ceph-users] ceph-osd@n crash dumps

2019-10-02 Thread Del Monaco, Andrea
Hi Brad, Apologies for the flow of messages - the previous messages went for approval because of their length. Here you can see the requested output: https://pastebin.com/N8jG08sH Regards, [Atos logo] Andrea Del Monaco HPC Consultant – Big Data & Security M: +31 612031174 Burgemeester

Re: [ceph-users] Have you enabled the telemetry module yet?

2019-10-02 Thread Wido den Hollander
On 10/1/19 5:11 PM, Mattia Belluco wrote: > Hi all, > > Same situation here: > > Ceph 13.2.6 on Ubuntu 16.04. > Thanks for the feedback both! I enabled it on a Ubuntu 18.04 with Nautilus 14.2.4 system. > Best > Mattia > > On 10/1/19 4:38 PM, Stefan Kooman wrote: >> Quoting Wido den