[ceph-users] Re: MDS corrupt (also RADOS-level copy?)

2023-05-31 Thread Jake Grimmett
Dear All, My apologies, I forgot to state we are using Quincy 17.2.6 thanks again, Jake root@wilma-s1 15:22 [~]: ceph -v ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) Dear All, we are trying to recover from what we suspect is a corrupt MDS :( and have

[ceph-users] MDS corrupt (also RADOS-level copy?)

2023-05-31 Thread Jake Grimmett
ed to get a feeling from others about how dangerous this could be? We have a backup, but as there is 1.8PB of data, it's going to take a few weeks to restore any ideas gratefully received. Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular

[ceph-users] Re: cephfs and samba

2022-08-19 Thread Jake Grimmett
r in AlmaLinux 8.6, plus a recent version of Samba, together with Quincy improve performance... best regards Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. ___

[ceph-users] Re: Quincy: cephfs "df" used 6x higher than "du"

2022-07-20 Thread Jake Grimmett
5 GiB 67 GiB1 KiB 914 MiB 16 TiB 2.14 1.01 99 up thanks Jake On 20/07/2022 11:52, Jake Grimmett wrote: Dear All, We have just built a new cluster using Quincy 17.2.1 After copying ~25TB to the cluster (from a mimic cluster), we see 152 TB used, which is ~6x disparity. Is

[ceph-users] Quincy: cephfs "df" used 6x higher than "du"

2022-07-20 Thread Jake Grimmett
coded data pool (hdd with NVMe db/wal), and a 3x replicated default data pool (primary_fs_data - NVMe) bluestore_min_alloc_size_hdd is 4096 ceph pool set ec82pool compression_algorithm lz4 ceph osd pool set ec82pool compression_mode aggressive many thanks for any help Jake -- Dr Jake Grimmett

[ceph-users] Re: Suggestion to build ceph storage

2022-06-20 Thread Jake Grimmett
ceph-users-le...@ceph.io For help, read https://www.mrc-lmb.cam.ac.uk/scicomp/ then contact unixad...@mrc-lmb.cam.ac.uk -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. Phone 01223 267019 Mobile

[ceph-users] Re: Bug with autoscale-status in 17.2.0 ?

2022-06-10 Thread Jake Grimmett
USED RAW USED %RAW USED hdd7.0 PiB 6.9 PiB 126 TiB 126 TiB 1.75 ssd2.7 TiB 2.7 TiB 3.2 GiB 3.2 GiB 0.12 TOTAL 7.0 PiB 6.9 PiB 126 TiB 126 TiB 1.75 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 21

[ceph-users] Bug with autoscale-status in 17.2.0 ?

2022-06-10 Thread Jake Grimmett
0T 0. 1.01024 32 offFalse Any ideas on what might be going on? We get a similar problem if we specify hdd as the class. best regards Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge C

[ceph-users] Re: Pause cluster if node crashes?

2022-02-18 Thread Jake Grimmett
-mon_osd_down_out_subtree_limit <https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_subtree_limit> The default is rack -- you want to set that to "host". Cheers, Dan On Fri., Feb. 18, 2022, 11:23 Jake Grimmett, <mailto:j...@mrc-lmb.cam.ac.uk>> wrote:

[ceph-users] Pause cluster if node crashes?

2022-02-18 Thread Jake Grimmett
at turning the watchdog on, giving nagios an action, etc, but I'd rather use any tools that ceph has built in. BTW, this is an Octopus cluster 15.2.15, 580 x OSDs, using EC 8+2 best regards, Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Cr

[ceph-users] Re: Disk Failure Predication cloud module?

2022-01-21 Thread Jake Grimmett
ard's wizard. If for some reason you can not or wish not to opt-it, please share the reason with us. Thanks, Yaarit On Thu, Jan 20, 2022 at 6:39 AM Jake Grimmett <mailto:j...@mrc-lmb.cam.ac.uk>> wrote: Dear All, Is the cloud option for the diskprediction module deprecated i

[ceph-users] Disk Failure Predication cloud module?

2022-01-20 Thread Jake Grimmett
this module useful? many thanks Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users

[ceph-users] Re: dashboard with grafana embedding in 16.2.6

2021-11-26 Thread Jake Grimmett
quot;, so we could add a setting to customize that if required. Kind Regards, Ernesto -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. ___ ceph-users mailing list -- ce

[ceph-users] dashboard with grafana embedding in 16.2.6

2021-11-25 Thread Jake Grimmett
in grafana? The grafana install docs here: https://docs.ceph.com/en/latest/mgr/dashboard/ State: "Add Prometheus as data source to Grafana using the Grafana Web UI." If the data source is now hard coded to "Dashboard1", can we update the docs? best regards, Jake -- Dr

[ceph-users] Re: Has anyone contact Data for Samsung Datacenter SSD Support ?

2021-03-16 Thread Jake Grimmett
eph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io Note: I am working from home until further notice. For help, contact unixad...@mrc-lmb.cam.ac.uk -- Dr Jake Grimmett Head Of Scientific Computi

[ceph-users] Re: objects misplaced jumps up at 5%

2020-09-30 Thread Jake Grimmett
s helpful. > > Sent from my iPad > >> On Sep 29, 2020, at 18:34, Jake Grimmett wrote: >> >> Hi Paul, >> >> I think you found the answer! >> >> When adding 100 new OSDs to the cluster, I increased both pg and pgp >> from 4096 to 16,384 >>

[ceph-users] Re: objects misplaced jumps up at 5%

2020-09-29 Thread Jake Grimmett
lue of pg target. > > Also: Looks like you've set osd_scrub_during_recovery = false, this > setting can be annoying on large erasure-coded setups on HDDs that see > long recovery times. It's better to get IO priorities right; search > mailing list for osd op queue cut off high. > > Paul -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: objects misplaced jumps up at 5%

2020-09-28 Thread Jake Grimmett
:45, Jake Grimmett wrote: > >> To show the cluster before and immediately after an "episode" >> >> *** >> >> [root@ceph7 ceph]# ceph -s >> cluster: >> id: 36ed7113-080c-49b8-80e2-4947cc4

[ceph-users] objects misplaced jumps up at 5%

2020-09-28 Thread Jake Grimmett
X to 106803'6043528 2020-09-24 14:44:38.947 7f2e569e9700 0 log_channel(cluster) log [DBG] : 5.157ds0 starting backfill to osd.533(5) from (0'0,0'0] MAX to 106803'6043528 *** any advice appreciated, Jake -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biolog

[ceph-users] Re: kernel: ceph: mdsmap_decode got incorrect state(up:standby-replay)

2020-04-29 Thread Jake Grimmett
7f3cfe5f9700 0 mds.0.cache creating system inode with ino:0x1 best regards, Jake On 29/04/2020 14:33, Jake Grimmett wrote: > Dear all, > > After enabling "allow_standby_replay" on our cluster we are getting > (lots) of identical errors on the client /var/log/messages l

[ceph-users] kernel: ceph: mdsmap_decode got incorrect state(up:standby-replay)

2020-04-29 Thread Jake Grimmett
dby_replay ? any advice appreciated, many thanks Jake Note: I am working from home until further notice. For help, contact unixad...@mrc-lmb.cam.ac.uk -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. Phone 01223 267

[ceph-users] Re: Help: corrupt pg

2020-03-27 Thread Jake Grimmett
e_size.count(clone)) leaving us with a pg in a very bad state... I will see if we can buy some consulting time, the alternative is several weeks of rsync. Many thanks again for your advice, it's very much appreciated, Jake On 26/03/2020 17:21, Gregory Farnum wrote: On Wed, Mar 25, 2020 at

[ceph-users] Re: Help: corrupt pg

2020-03-25 Thread Jake Grimmett
5/03/2020 14:22, Eugen Block wrote: Hi, is there any chance to recover the other failing OSDs that seem to have one chunk of this PG? Do the other OSDs fail with the same error? Zitat von Jake Grimmett : Dear All, We are "in a bit of a pickle"... No reply to my message (23/03/2020)

[ceph-users] Help: corrupt pg

2020-03-25 Thread Jake Grimmett
dvice gratefully received, best regards, Jake Note: I am working from home until further notice. For help, contact unixad...@mrc-lmb.cam.ac.uk -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0

[ceph-users] OSD: FAILED ceph_assert(clone_size.count(clone))

2020-03-23 Thread Jake Grimmett
d to backup a live cephfs cluster and has 1.8PB of data, including 30 days of snapshots. We are using 8+2 EC. Any help appreciated, Jake Note: I am working from home until further notice. For help, contact unixad...@mrc-lmb.cam.ac.uk -- Dr Jake Grimmett Head Of Scientific Computing MRC Labor

[ceph-users] Re: v14.2.8 Nautilus released

2020-03-17 Thread Jake Grimmett
as possible and there > was no suggestion to use a default replicated pool and than add the EC > pool. We did exactly the oder way around :-/ > > Best > Dietmar > -- Dr Jake Grimmett Head Of Scientific Computing MRC Laboratory of Molecular Biology Francis Crick Avenue, Cam

[ceph-users] Re: Need clarification on CephFS, EC Pools, and File Layouts

2020-03-04 Thread Jake Grimmett
ystem at this time. Someday we would like > to change this but there is no timeline. > -- Dr Jake Grimmett MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubs

[ceph-users] Re: Fwd: PrimaryLogPG.cc: 11550: FAILED ceph_assert(head_obc)

2020-02-11 Thread Jake Grimmett
data /dev/sdab activate the OSD # ceph-volume lvm activate 443 6e252371-d158-4d16-ac31-fed8f7d0cb1f Now watching to see if the cluster recovers... best, Jake On 2/10/20 3:31 PM, Jake Grimmett wrote: > Dear All, > > Following a clunky* cluster restart, we had > > 23 "

[ceph-users] Fwd: PrimaryLogPG.cc: 11550: FAILED ceph_assert(head_obc)

2020-02-10 Thread Jake Grimmett
's primary OSD) * thread describing the bad restart :> <https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/IRKCDRRAH7YZEVXN5CH4JT2NH4EWYRGI/#IRKCDRRAH7YZEVXN5CH4JT2NH4EWYRGI> many thanks! Jake -- Dr Jake Grimmett MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge

[ceph-users] Re: recovery_unfound

2020-02-05 Thread Jake Grimmett
ith no change. I'm leaving > things alone hoping that croit.io will update their package to 13.2.8 > soonish.  Maybe that will help kick it in the pants. > > Chad. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an ema

[ceph-users] Re: recovery_unfound

2020-02-04 Thread Jake Grimmett
[root@ceph1 ~]# ceph osd down 347 This doesn't change the output of "ceph pg 5.5c9 query", apart from updating the Started time, and ceph health still shows unfound objects. To fix this, do we need to issue a scrub (or deep scrub) so that the objects can be fo