[ceph-users] Re: Archive in Ceph similar to Hadoop Archive Utility (HAR)

2022-02-24 Thread Anthony D'Atri
There was a similar discussion last year around Software Heritage’s archive project, suggest digging up that thread. Some ideas: * Pack them into (optionally compressed) tarballs - from a quick search it sorta looks like HAR uses a similar model. Store the tarballs as RGW objects, or as RBD

[ceph-users] Re: OSD Container keeps restarting after drive crash

2022-02-24 Thread Eugen Block
Hi, these are the defaults set by cephadm in Octopus and Pacific: ---snip--- [Service] LimitNOFILE=1048576 LimitNPROC=1048576 EnvironmentFile=-/etc/environment ExecStart=/bin/bash {data_dir}/{fsid}/%i/unit.run ExecStop=-{container_path} stop ceph-{fsid}-%i ExecStopPost=-/bin/bash

[ceph-users] Archive in Ceph similar to Hadoop Archive Utility (HAR)

2022-02-24 Thread Bobby
Hi, Is there any archive utility in Ceph similar to Hadoop Archive Utility (HAR)? Or in other words. how can one archive small files in Ceph? Thanks ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] Re: One PG stuck in active+clean+remapped

2022-02-24 Thread Erwin Lubbers
Hoi Dan, That did the trick. Thanks! Regards, Erwin > Op 24 feb. 2022, om 20:25 heeft Dan van der Ster het > volgende geschreven: > > Hi Erwin, > > This may be one of the rare cases where the default choose_total_tries > = 50 is too low. > You can try increasing it to 75 or 100 and see if

[ceph-users] Re: One PG stuck in active+clean+remapped

2022-02-24 Thread Dan van der Ster
Hi Erwin, This may be one of the rare cases where the default choose_total_tries = 50 is too low. You can try increasing it to 75 or 100 and see if crush can find 3 up OSDs. Here's the basic recipe: # ceph osd getcrushmap -o crush.map # crushtool -d crush.map -o crush.txt # vi crush.txt # and

[ceph-users] One PG stuck in active+clean+remapped

2022-02-24 Thread Erwin Lubbers
Hi all, I have one active+clean+remapped PG on a 152 OSD Octopus (15.2.15) cluster with equal balanced OSD's (around 40% usage). The cluster has three replicas spreaded around three datacenters (A+B+C). All PGs are available in each datacenter (as defined in the crush map), but only this one

[ceph-users] Re: ceph fs snaptrim speed

2022-02-24 Thread Dan van der Ster
Hi Frank, Thanks for the feedback -- improving the docs is in everyone's best interest. This semantic of "override if non-zero" is quite common in the OSD. See https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L3388-L3451 for a few examples. So it doesn't make sense to change the way this

[ceph-users] Mon crash - abort in RocksDB

2022-02-24 Thread Chris Palmer
We have a small Pacific 16.2.7 test cluster that has been ticking over for a couple of years with no problems whatever. The last "event" was 14 days ago when I was testing some OSD replacement procedures - nothing remarkable. At 0146 this morning though mon03 signalled an abort in the RocksDB

[ceph-users] Re: ceph os filesystem in read only

2022-02-24 Thread Eugen Block
Hi, 1. How long will ceph continue to run before it starts complaining about this? Looks like it is fine for a few hours, ceph osd tree and ceph -s, seem not to notice anything. if the OSDs don't have to log anything to disk (which can take quite some time depending on the log settings)

[ceph-users] Re: ceph fs snaptrim speed

2022-02-24 Thread Dan van der Ster
Hi Frank, The semantic of osd_snap_trim_sleep was copied from osd_delete_sleep. The general setting "osd_snap_trim_sleep" is used only to override the _hdd _hybrid _ssd tuned values. Here's the code to get the effective sleep value: if (osd_snap_trim_sleep > 0) return osd_snap_trim_sleep;

[ceph-users] Re: Unclear on metadata config for new Pacific cluster

2022-02-24 Thread Kai Stian Olstad
On Wed, Feb 23, 2022 at 12:02:53PM +, Adam Huffman wrote: > On Wed, 23 Feb 2022 at 11:25, Eugen Block wrote: > > > How exactly did you determine that there was actual WAL data on the HDDs? > > > I couldn't say exactly what it was, but 7 or so TBs was in use, even with > no user data at all.

[ceph-users] Re: CephFS snaptrim bug?

2022-02-24 Thread Arthur Outhenin-Chalandre
On 2/24/22 09:26, Arthur Outhenin-Chalandre wrote: > On 2/23/22 21:43, Linkriver Technology wrote: >> Could someone shed some light please? Assuming that snaptrim didn't run to >> completion, how can I manually delete objects from now-removed snapshots? I >> believe this is what the Ceph

[ceph-users] Re: CephFS snaptrim bug?

2022-02-24 Thread Dan van der Ster
See https://tracker.ceph.com/issues/54396 I don't know how to tell the osds to rediscover those trimmed snaps. Neha does that possible? Cheers, Dan On Thu, Feb 24, 2022 at 9:27 AM Dan van der Ster wrote: > > Hi, > > I had a look at the code -- looks like there's a flaw in the logic: > the

[ceph-users] Re: CephFS snaptrim bug?

2022-02-24 Thread Dan van der Ster
Hi, I had a look at the code -- looks like there's a flaw in the logic: the snaptrim queue is cleared if osd_pg_max_concurrent_snap_trims = 0. I'll open a tracker and send a PR to restrict osd_pg_max_concurrent_snap_trims to >= 1. Cheers, Dan On Wed, Feb 23, 2022 at 9:44 PM Linkriver

[ceph-users] Re: CephFS snaptrim bug?

2022-02-24 Thread Arthur Outhenin-Chalandre
Hi, On 2/23/22 21:43, Linkriver Technology wrote: > Could someone shed some light please? Assuming that snaptrim didn't run to > completion, how can I manually delete objects from now-removed snapshots? I > believe this is what the Ceph documentation calls a "backwards scrub" - but I > didn't

[ceph-users] Re: Cluster crash after 2B objects pool removed

2022-02-24 Thread Dan van der Ster
Hi, Basically, a deletion of any size shouldn't cause osds to crash. So please open a tracker with some example osd logs showing the crash backtraces. Cheers, Dan On Thu, Feb 24, 2022 at 6:20 AM Szabo, Istvan (Agoda) wrote: > > Hi, > > I've removed the old RGW data pool with 2B objects because