Dear All,
My apologies, I forgot to state we are using Quincy 17.2.6
thanks again,
Jake
root@wilma-s1 15:22 [~]: ceph -v
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
(stable)
Dear All,
we are trying to recover from what we suspect is a corrupt MDS :(
and have
ed
to get a feeling from others about how dangerous this could be?
We have a backup, but as there is 1.8PB of data, it's going to take a
few weeks to restore
any ideas gratefully received.
Jake
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular
r in AlmaLinux 8.6, plus a recent version of Samba, together
with Quincy improve performance...
best regards
Jake
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
___
5 GiB 67 GiB1 KiB
914 MiB 16 TiB 2.14 1.01 99 up
thanks
Jake
On 20/07/2022 11:52, Jake Grimmett wrote:
Dear All,
We have just built a new cluster using Quincy 17.2.1
After copying ~25TB to the cluster (from a mimic cluster), we see 152 TB
used, which is ~6x disparity.
Is
coded data pool (hdd with NVMe db/wal),
and a 3x replicated default data pool (primary_fs_data - NVMe)
bluestore_min_alloc_size_hdd is 4096
ceph pool set ec82pool compression_algorithm lz4
ceph osd pool set ec82pool compression_mode aggressive
many thanks for any help
Jake
--
Dr Jake Grimmett
ceph-users-le...@ceph.io
For help, read https://www.mrc-lmb.cam.ac.uk/scicomp/
then contact unixad...@mrc-lmb.cam.ac.uk
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
Phone 01223 267019
Mobile
USED RAW USED %RAW USED
hdd7.0 PiB 6.9 PiB 126 TiB 126 TiB 1.75
ssd2.7 TiB 2.7 TiB 3.2 GiB 3.2 GiB 0.12
TOTAL 7.0 PiB 6.9 PiB 126 TiB 126 TiB 1.75
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 21
0T 0.
1.01024 32 offFalse
Any ideas on what might be going on?
We get a similar problem if we specify hdd as the class.
best regards
Jake
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge C
-mon_osd_down_out_subtree_limit
<https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_subtree_limit>
The default is rack -- you want to set that to "host".
Cheers, Dan
On Fri., Feb. 18, 2022, 11:23 Jake Grimmett, <mailto:j...@mrc-lmb.cam.ac.uk>> wrote:
at turning the watchdog on, giving nagios an action, etc,
but I'd rather use any tools that ceph has built in.
BTW, this is an Octopus cluster 15.2.15, 580 x OSDs, using EC 8+2
best regards,
Jake
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Cr
ard's wizard.
If for some reason you can not or wish not to opt-it, please share the
reason with us.
Thanks,
Yaarit
On Thu, Jan 20, 2022 at 6:39 AM Jake Grimmett <mailto:j...@mrc-lmb.cam.ac.uk>> wrote:
Dear All,
Is the cloud option for the diskprediction module deprecated i
this module useful?
many thanks
Jake
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users
quot;, so we could add a setting to customize that if required.
Kind Regards,
Ernesto
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
___
ceph-users mailing list -- ce
in grafana?
The grafana install docs here:
https://docs.ceph.com/en/latest/mgr/dashboard/
State:
"Add Prometheus as data source to Grafana using the Grafana Web UI."
If the data source is now hard coded to "Dashboard1", can we update the
docs?
best regards,
Jake
--
Dr
eph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Note: I am working from home until further notice.
For help, contact unixad...@mrc-lmb.cam.ac.uk
--
Dr Jake Grimmett
Head Of Scientific Computi
s helpful.
>
> Sent from my iPad
>
>> On Sep 29, 2020, at 18:34, Jake Grimmett wrote:
>>
>> Hi Paul,
>>
>> I think you found the answer!
>>
>> When adding 100 new OSDs to the cluster, I increased both pg and pgp
>> from 4096 to 16,384
>>
lue of pg target.
>
> Also: Looks like you've set osd_scrub_during_recovery = false, this
> setting can be annoying on large erasure-coded setups on HDDs that see
> long recovery times. It's better to get IO priorities right; search
> mailing list for osd op queue cut off high.
>
> Paul
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
:45, Jake Grimmett wrote:
>
>> To show the cluster before and immediately after an "episode"
>>
>> ***
>>
>> [root@ceph7 ceph]# ceph -s
>> cluster:
>> id: 36ed7113-080c-49b8-80e2-4947cc4
X to
106803'6043528
2020-09-24 14:44:38.947 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.533(5) from (0'0,0'0] MAX to
106803'6043528
***
any advice appreciated,
Jake
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biolog
7f3cfe5f9700 0 mds.0.cache creating system
inode with ino:0x1
best regards,
Jake
On 29/04/2020 14:33, Jake Grimmett wrote:
> Dear all,
>
> After enabling "allow_standby_replay" on our cluster we are getting
> (lots) of identical errors on the client /var/log/messages l
dby_replay ?
any advice appreciated,
many thanks
Jake
Note: I am working from home until further notice.
For help, contact unixad...@mrc-lmb.cam.ac.uk
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
Phone 01223 267
e_size.count(clone)) leaving us with a pg in a very bad
state...
I will see if we can buy some consulting time, the alternative is
several weeks of rsync.
Many thanks again for your advice, it's very much appreciated,
Jake
On 26/03/2020 17:21, Gregory Farnum wrote:
On Wed, Mar 25, 2020 at
5/03/2020 14:22, Eugen Block wrote:
Hi,
is there any chance to recover the other failing OSDs that seem to
have one chunk of this PG? Do the other OSDs fail with the same error?
Zitat von Jake Grimmett :
Dear All,
We are "in a bit of a pickle"...
No reply to my message (23/03/2020)
dvice gratefully received,
best regards,
Jake
Note: I am working from home until further notice.
For help, contact unixad...@mrc-lmb.cam.ac.uk
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0
d to backup a live cephfs cluster and has 1.8PB
of data, including 30 days of snapshots. We are using 8+2 EC.
Any help appreciated,
Jake
Note: I am working from home until further notice.
For help, contact unixad...@mrc-lmb.cam.ac.uk
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Labor
as possible and there
> was no suggestion to use a default replicated pool and than add the EC
> pool. We did exactly the oder way around :-/
>
> Best
> Dietmar
>
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cam
ystem at this time. Someday we would like
> to change this but there is no timeline.
>
--
Dr Jake Grimmett
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubs
data /dev/sdab
activate the OSD
# ceph-volume lvm activate 443 6e252371-d158-4d16-ac31-fed8f7d0cb1f
Now watching to see if the cluster recovers...
best,
Jake
On 2/10/20 3:31 PM, Jake Grimmett wrote:
> Dear All,
>
> Following a clunky* cluster restart, we had
>
> 23 "
's
primary OSD)
* thread describing the bad restart :>
<https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/IRKCDRRAH7YZEVXN5CH4JT2NH4EWYRGI/#IRKCDRRAH7YZEVXN5CH4JT2NH4EWYRGI>
many thanks!
Jake
--
Dr Jake Grimmett
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge
ith no change. I'm leaving
> things alone hoping that croit.io will update their package to 13.2.8
> soonish. Maybe that will help kick it in the pants.
>
> Chad.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an ema
[root@ceph1 ~]# ceph osd down 347
This doesn't change the output of "ceph pg 5.5c9 query", apart from
updating the Started time, and ceph health still shows unfound objects.
To fix this, do we need to issue a scrub (or deep scrub) so that the
objects can be fo
31 matches
Mail list logo