[ceph-users] Re: cephfs - max snapshot limit?

2023-05-01 Thread Venky Shankar
Hi Arnaud,

On Fri, Apr 28, 2023 at 2:16 PM MARTEL Arnaud  wrote:
>
> Hi Venky,
>
> > Also, at one point the kclient wasn't able to handle more than 400 
> > snapshots (per file system), but we have come a long way from that and that 
> > is not a constraint right now.
> Does it mean that there is no more limit to the number of snapshots per 
> filesystem? And, if not, do you know what is the max number of snapshots now 
> (per filesystem) ??

There are no global per file system limits as such - just the per
directory (configurable) limit.

>
> Cheers,
> Arnaud
>


-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-01 Thread Brad Hubbard
On Fri, Apr 28, 2023 at 7:21 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59542#note-1
> Release Notes - TBD
>
> Seeking approvals for:
>
> smoke - Radek, Laura
> rados - Radek, Laura
>   rook - Sébastien Han
>   cephadm - Adam K
>   dashboard - Ernesto
>
> rgw - Casey
> rbd - Ilya
> krbd - Ilya
> fs - Venky, Patrick
> upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> upgrade/pacific-p2p - Laura
> powercycle - Brad (SELinux denials)

Still waiting on https://github.com/ceph/teuthology/pull/1830 to
resolve these - Approved

> ceph-volume - Guillaume, Adam K
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io



-- 
Cheers,
Brad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-01 Thread Laura Flores
*Smoke* and *pacific-p2p *are approved; still working through some failures
in *upgrade/octopus-x,* which I'll have ready soon.

As for rados, I have summarized the suite and passed the results to Neha.
In my eyes it looks good, but I want to leave final approval to Radek
and/or Neha.
smoke
https://pulpito.ceph.com/yuriw-2023-04-24_23:35:26-smoke-pacific-release-distro-default-smithi
https://pulpito.ceph.com/yuriw-2023-04-25_03:31:51-smoke-pacific-release-distro-default-smithi
https://pulpito.ceph.com/yuriw-2023-04-25_14:15:06-smoke-pacific-release-distro-default-smithi

Failures:
1. https://tracker.ceph.com/issues/59192
2. https://tracker.ceph.com/issues/51282

Details:
1. cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an
application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
2. pybind/mgr/mgr_util: .mgr pool may be created too early causing
spurious PG_DEGRADED warnings - Ceph - Mgr
upgrade/pacific-p2p
https://pulpito.ceph.com/yuriw-2023-04-25_14:52:56-upgrade:pacific-p2p-pacific-release-distro-default-smithi
https://pulpito.ceph.com/yuriw-2023-04-26_22:54:40-upgrade:pacific-p2p-pacific-release-distro-default-smithi

Failures:
1. https://tracker.ceph.com/issues/52590
2. https://tracker.ceph.com/issues/58289
3. https://tracker.ceph.com/issues/58223

Details:
1. "[ FAILED ] CmpOmap.cmp_vals_u64_invalid_default" in
upgrade:pacific-p2p-pacific - Ceph - RGW
2. "AssertionError: wait_for_recovery: failed before timeout expired"
from down pg in pacific-p2p-pacific - Ceph - RADOS
3. failure on `sudo fuser -v /var/lib/dpkg/lock-frontend` -
Infrastructure

On Mon, May 1, 2023 at 12:53 PM Adam King  wrote:

> approved for the rados/cephadm stuff
>
> On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein 
> wrote:
>
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/59542#note-1
> > Release Notes - TBD
> >
> > Seeking approvals for:
> >
> > smoke - Radek, Laura
> > rados - Radek, Laura
> >   rook - Sébastien Han
> >   cephadm - Adam K
> >   dashboard - Ernesto
> >
> > rgw - Casey
> > rbd - Ilya
> > krbd - Ilya
> > fs - Venky, Patrick
> > upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> > upgrade/pacific-p2p - Laura
> > powercycle - Brad (SELinux denials)
> > ceph-volume - Guillaume, Adam K
> >
> > Thx
> > YuriW
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [multisite] Resetting an empty bucket

2023-05-01 Thread Matt Benjamin
Hi Yixin,

This sounds interesting.  I kind of suspect that this feature requires some
more conceptual design support.  Like, at a high level, how a bucket's
"zone residency" might be defined and specified, and what policies might
govern changing it, not to mention, how you direct things (commands? ops?).

This might be something to bring to an RGW upstream call (the "refactoring"
meeting), Wednesdays at 11:30 EST.

Matt

On Mon, May 1, 2023 at 5:27 PM Yixin Jin  wrote:

> Hi folks,
>
> Armed with bucket-specific sync policy feature, I found that we could move
> objects of a bucket between zones. It is migration via sync followed by
> object removal at the source. This allows us to better utilize available
> capacities in different clusters/zones. However, to achieve this, we need a
> way to reset an empty bucket so that it can serve as a destination for a
> migration after it serves as a source before. ceph/rgw currently doesn't
> seem to be able to do that. So I create a feature request for it
> https://tracker.ceph.com/issues/59593
>
> My own prototype shows that this feature is fairly simply to implement and
> works well for bucket migration.
>
> Cheers,
> Yixin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [multisite] Resetting an empty bucket

2023-05-01 Thread Yixin Jin
Hi folks,

Armed with bucket-specific sync policy feature, I found that we could move 
objects of a bucket between zones. It is migration via sync followed by object 
removal at the source. This allows us to better utilize available capacities in 
different clusters/zones. However, to achieve this, we need a way to reset an 
empty bucket so that it can serve as a destination for a migration after it 
serves as a source before. ceph/rgw currently doesn't seem to be able to do 
that. So I create a feature request for it https://tracker.ceph.com/issues/59593

My own prototype shows that this feature is fairly simply to implement and 
works well for bucket migration.

Cheers,
Yixin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery

2023-05-01 Thread wodel youchi
Thank you for the clarification.

On Mon, May 1, 2023, 20:11 Wesley Dillingham  wrote:

> Assuming size=3 and min_size=2 It will run degraded (read/write capable)
> until a third host becomes available at which point it will backfill the
> third copy on the third host. It will be unable to create the third copy of
> data if no third host exists. If an additional host is lost the data will
> become inactive+degraded (below min_size) and will be unavailable for use.
> Though data will not be lost assuming no further failures beyond the 2 full
> hosts occurs and again if the second and third host comes back the data
> will recover. Always best to have an additional host beyond the size
> setting for this reason.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
>
>
> On Mon, May 1, 2023 at 11:34 AM wodel youchi 
> wrote:
>
>> Hi,
>>
>> When creating a ceph cluster, a failover domain is created, and by default
>> it uses host as a minimal domain, that domain can be modified to chassis,
>> or rack, ...etc.
>>
>> My question is :
>> Suppose I have three osd nodes, my replication is 3 and my failover domain
>> is host, which means that each copy of data is stored on a different node.
>>
>> What happens when one node crashes, does Ceph use the remaining free space
>> on the other two to create the third copy, or the ceph cluster will run in
>> degraded mode, like a RAID5
>>  which lost a disk.
>>
>> Regards.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nearly 1 exabyte of Ceph storage

2023-05-01 Thread Yaarit Hatuka
We are very excited to announce that we have reached the 1 exabyte
milestone of community Ceph clusters via telemetry
!

Thank you to everyone who is opted-in!

Read more here:
https://ceph.io/en/news/blog/2023/telemetry-celebrate-1-exabyte/

https://www.linkedin.com/posts/nehaojha_ceph-telemetry-powerofopensource-activity-7057440216088805377-1qeH

https://www.linkedin.com/posts/kyle-bader-5267a030_ceph-opensource-opensourcesoftware-activity-7056847983937540097-lDEU

On Wed, Apr 12, 2023 at 6:47 AM Yaarit Hatuka  wrote:

>
>
> On Wed, Apr 12, 2023 at 12:32 PM Marc  wrote:
>
>> >
>> > We are excited to share with you the latest statistics from our Ceph
>> > public telemetry dashboards  .
>>
>> :)
>>
>> > One of the things telemetry helps us to understand is version adoption
>> > rate. See, for example, the trend of Quincy > > public.ceph.com/d/ZFYuv1qWz/telemetry?viewPanel=28=1
>> > display=Minor=17=All=All>  deployments
>> > in the community.
>> >
>>
>> What is the 'weird' drop at the 5th of February?
>>
>> It is due to issues we had with the lab, the service was a bit unstable.
>
>
>> > Ceph telemetry is on an opt-in basis. You can opt-in with:
>> > `ceph telemetry on`
>> > Learn more here
>> >  .
>> >
>> > Help us cross the exabyte mark by opting-in today!
>> > Learn more about the latest developments around Telemetry
>> >   at the upcoming Cephalocon.
>> >
>>
>> :)
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PVE CEPH OSD heartbeat show

2023-05-01 Thread Peter
Hi Fabian,

Thank you for your prompt response. It's crucial to understand how things work, 
and I appreciate your assistance.

After replacing the switch for our Ceph environment, we experienced three days 
of normalcy before the issue recurred this morning. I noticed that the TCP 
in/out became unstable, and TCP errors occurred simultaneously. The UDP in/out 
values were 70K and 150K, respectively, while the errors peaked at around 50K 
per second.

I reviewed the Proxmox documentation and found that it is recommended to 
separate the cluster network and storage network. Currently, we have more than 
20 Ceph nodes across five different locations, and only one location has 
experienced this issue. We are fortunate that it has not happened in other 
areas. While we plan to separate the network soon, I was wondering if there are 
any temporary solutions or configurations that could limit the UDP triggering 
and resolve the "corosync" issue.

I appreciate your help in this matter and look forward to your response.

Peter

-Original Message-
From: Fabian Grünbichler  
Sent: Wednesday, April 26, 2023 12:42 AM
To: ceph-users@ceph.io; Peter 
Subject: Re: [ceph-users] PVE CEPH OSD heartbeat show

On April 25, 2023 9:03 pm, Peter wrote:
> Dear all,
> 
> We are experiencing with Ceph after deploying it by PVE with the network 
> backed by a 10G Cisco switch with VPC feature on. We are encountering a slow 
> OSD heartbeat and have not been able to identify any network traffic issues.
> 
> Upon checking, we found that the ping is around 0.1ms, and there is 
> occasional 2% packet loss when using flood ping, but not consistently. We 
> also noticed a large number of UDP port 5405 packets and the 'corosync' 
> process utilizing a significant amount of CPU.
> 
> When running the 'ceph -s' command, we observed a slow OSD heartbeat on the 
> back and front, with the longest latency being 2250.54ms. We suspect that 
> this may be a network issue, but we are unsure of how Ceph detects such long 
> latency. Additionally, we are wondering if a 2% packet loss can significantly 
> affect Ceph's performance and even cause the OSD process to fail sometimes.
> 
> We have heard about potential issues with rockdb 6 causing OSD process 
> failures, and we are curious about how to check the rockdb version. 
> Furthermore, we are wondering how severe traffic package loss and latency 
> must be to cause OSD process crashes, and how the monitoring system 
> determines that an OSD is offline.
> 
> We would greatly appreciate any assistance or insights you could provide on 
> these matters.
> Thanks,

are you using separate (physical) links for Corosync and Ceph traffic?
if not, they will step on each others toes and cause problems. Corosync is very 
latency sensitive.

https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_cluster_network_requirements

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery

2023-05-01 Thread Wesley Dillingham
Assuming size=3 and min_size=2 It will run degraded (read/write capable)
until a third host becomes available at which point it will backfill the
third copy on the third host. It will be unable to create the third copy of
data if no third host exists. If an additional host is lost the data will
become inactive+degraded (below min_size) and will be unavailable for use.
Though data will not be lost assuming no further failures beyond the 2 full
hosts occurs and again if the second and third host comes back the data
will recover. Always best to have an additional host beyond the size
setting for this reason.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, May 1, 2023 at 11:34 AM wodel youchi  wrote:

> Hi,
>
> When creating a ceph cluster, a failover domain is created, and by default
> it uses host as a minimal domain, that domain can be modified to chassis,
> or rack, ...etc.
>
> My question is :
> Suppose I have three osd nodes, my replication is 3 and my failover domain
> is host, which means that each copy of data is stored on a different node.
>
> What happens when one node crashes, does Ceph use the remaining free space
> on the other two to create the third copy, or the ceph cluster will run in
> degraded mode, like a RAID5
>  which lost a disk.
>
> Regards.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW Lua - cancel request

2023-05-01 Thread Yuval Lifshitz
Vladimir and Ondřej,
Created a quincy backport PR: https://github.com/ceph/ceph/pull/51300
Hopefully it would land in the next quincy release.

Yuval

On Mon, May 1, 2023 at 7:37 PM Vladimir Sigunov 
wrote:

> Hi Yuval,
>
> Playing with Lua, I faced similar issue.
> It would be perfect if you can backport this fix to Quincy.
>
> Thank you!
> Vladimir.
>
> -Original Message-
> *From*: Yuval Lifshitz  >
> *To*: Ondřej Kukla  <%3d%3futf-8%3fq%3fond%3dc5%3d99ej%3f%3d%20kukla%20%3cond...@kuuk.la%3e>>
> *Cc*: ceph-users@ceph.io
> *Subject*: [ceph-users] Re: RGW Lua - cancel request
> *Date*: Sun, 30 Apr 2023 21:00:21 +0300
>
> Hi Ondřej,
> Greater to hear that you use lua. You are right, this field has become
> writable only in reef.
> I can backport the fix to quincy, so that you can use it in the next quincy
> release (not sure when it is).
> A better option would be to allow setting the failure from lua - but this
> would be a new feature, that would probably land post reef.
>
> Yuval
>
>
> On Sun, Apr 30, 2023 at 7:49 PM Ondřej Kukla  wrote:
>
> Hello,
>
> Lately I’ve been playing with Lua scripting on top of RGW.
>
> I would like to implement a request blocking based on bucket name -> when
> there is a dot in a bucket name return error code and a message that this
> name is invalid.
>
> Here is the code I was able to came up with.
>
> if string.find(Request.HTTP.URI, '%.') then
>Request.Response.HTTPStatusCode = 400
>Request.Response.HTTPStatus = “InvalidBucketName"
>Request.Response.Message = “Dots in bucket name are not allowed."
> end
>
> This works fine, but the request for creating a bucket would be processed
> and the bucket will be created. I thought about a dirty workaround with
> setting the Request.Bucket.Name to a bucket that already exists but it
> seems that this field is not writable in Quincy.
>
> Is there a way to block the request from processing?
>
> Any help is much appreciated.
>
> Kind regards,
>
> Ondrej
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-01 Thread Adam King
approved for the rados/cephadm stuff

On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein  wrote:

> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59542#note-1
> Release Notes - TBD
>
> Seeking approvals for:
>
> smoke - Radek, Laura
> rados - Radek, Laura
>   rook - Sébastien Han
>   cephadm - Adam K
>   dashboard - Ernesto
>
> rgw - Casey
> rbd - Ilya
> krbd - Ilya
> fs - Venky, Patrick
> upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> upgrade/pacific-p2p - Laura
> powercycle - Brad (SELinux denials)
> ceph-volume - Guillaume, Adam K
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW Lua - cancel request

2023-05-01 Thread Vladimir Sigunov
Hi Yuval,

Playing with Lua, I faced similar issue.
It would be perfect if you can backport this fix to Quincy.

Thank you!
Vladimir.

-Original Message-
From: Yuval Lifshitz 
To: Ondřej Kukla 
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: RGW Lua - cancel request
Date: Sun, 30 Apr 2023 21:00:21 +0300

Hi Ondřej,
Greater to hear that you use lua. You are right, this field has become
writable only in reef.
I can backport the fix to quincy, so that you can use it in the next
quincy
release (not sure when it is).
A better option would be to allow setting the failure from lua - but
this
would be a new feature, that would probably land post reef.

Yuval


On Sun, Apr 30, 2023 at 7:49 PM Ondřej Kukla  wrote:

> Hello,
> 
> Lately I’ve been playing with Lua scripting on top of RGW.
> 
> I would like to implement a request blocking based on bucket name ->
> when
> there is a dot in a bucket name return error code and a message that
> this
> name is invalid.
> 
> Here is the code I was able to came up with.
> 
> if string.find(Request.HTTP.URI, '%.') then
>    Request.Response.HTTPStatusCode = 400
>    Request.Response.HTTPStatus = “InvalidBucketName"
>    Request.Response.Message = “Dots in bucket name are not allowed."
> end
> 
> This works fine, but the request for creating a bucket would be
> processed
> and the bucket will be created. I thought about a dirty workaround
> with
> setting the Request.Bucket.Name to a bucket that already exists but
> it
> seems that this field is not writable in Quincy.
> 
> Is there a way to block the request from processing?
> 
> Any help is much appreciated.
> 
> Kind regards,
> 
> Ondrej
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-05-01 Thread Niklas Hambüchen

That one talks about resilvering, which is not the same as neither ZFS scrubs 
nor ceph scrubs.


The commit I linked is titled "Sequential scrub and resilvers".

So ZFS scrubs are included.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph recovery

2023-05-01 Thread wodel youchi
Hi,

When creating a ceph cluster, a failover domain is created, and by default
it uses host as a minimal domain, that domain can be modified to chassis,
or rack, ...etc.

My question is :
Suppose I have three osd nodes, my replication is 3 and my failover domain
is host, which means that each copy of data is stored on a different node.

What happens when one node crashes, does Ceph use the remaining free space
on the other two to create the third copy, or the ceph cluster will run in
degraded mode, like a RAID5
 which lost a disk.

Regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD mirroring, asking for clarification

2023-05-01 Thread wodel youchi
Hi,

When using rbd mirroring, the mirroring concerns the images only, not the
whole pool? So, we don't need to have a dedicated pool in the destination
site to be mirrored, the only obligation is that the mirrored pools must
have the same name.

In other words, We create two pools with the same name, one on the source
site the other on the destination site, we create the mirror link (one way
or two ways replication), then we choose what images to sync.

Both pools can be used simultaneously on both sites, it's the mirrored
images that cannot be used simultaneously, only promoted ones.

Is this correct?

Regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Block RGW request using Lua

2023-05-01 Thread ondrej
Hello everyone,

I've started playing with Lua scripting and would like to ask If anyone knows 
about a way to drop or close user request on the prerequest context.

I would like to block creating buckets with dots in the name, but the use-case 
could be blocking certain operations, etc.

I was able to come up with some like this

if string.find(Request.HTTP.URI, '%.') then
   Request.Response.HTTPStatusCode = 400
   Request.Response.HTTPStatus = "InvalidBucketName"
   Request.Response.Message = "Dots are not allowed."
end

This works fine, but the bucket is created which is something that I don't want 
to do. As a dirty workaround, I've thought about changing the bucket name here 
to an already existing bucket, but the Request.Bucket.Name = "taken" doesn't 
seem to work as the log gives me an error "attempt to index a nil value (field 
'Bucket')".

Any help is much appreciated.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-05-01 Thread Niklas Hambüchen

Hi all,


Scrubs only read data that does exist in ceph as it exists, not every sector of 
the drive, written or not.


Thanks, this does explain it.

I just discovered:

ZFS had this problem in the past:

* 
https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSNonlinearScrubs?showcomments#comments

OpenZFS solved it in 2017, using two-phase scrubs:

* https://github.com/openzfs/zfs/issues/3625
* https://github.com/openzfs/zfs/commit/d4a72f23863382bdf6d0ae33196f5b5decbc48fd

Perhaps Ceph can use the same approach; I filed 
https://tracker.ceph.com/issues/59584 for it.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Deep-scrub much slower than HDD speed

2023-05-01 Thread Niklas Hambüchen

Hi Marc,

thanks for your numbers, this seems to confirm the suspicions.


Oh I get it. Interesting. I think if you will expand the cluster in the
future with more disks you will spread the load have more iops, this
will disappear.


This one I'm not sure about:
If I expand the cluster 2x, I'll also have 2x the data to scrub. So the ratio 
should be the same.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-01 Thread Loic Tortay

On 01/05/2023 11:35, Frank Schilder wrote:

Hi all,

I think we might be hitting a known problem 
(https://tracker.ceph.com/issues/57244). I don't want to fail the mds yet, 
because we have troubles with older kclients that miss the mds restart and hold 
on to cache entries referring to the killed instance, leading to hanging jobs 
on our HPC cluster.

I have seen this issue before and there was a process in D-state that 
dead-locked itself. Usually, killing this process succeeded and resolved the 
issue. However, this time I can't find such a process.

The tracker mentions that one can delete the file/folder. I have the inode 
number, but really don't want to start a find on a 1.5PB file system. Is there 
a better way to find what path is causing the issue (ask the MDS directly, look 
at a cache dump, or similar)? Is there an alternative to deletion or MDS fail?


Hello,
If you have the inode number, you can retrieve the name with something like:
 rados getxattr -p $POOL ${ino}. parent | \
  ceph-dencoder type inode_backtrace_t import - decode dump_json | \
  jq -M '[.ancestors[].dname]' | tr -d '[[",\]]' | \
  awk 't!=""{t=$1 "/" t;}t==""{t=$1;}END{print t}'

Where $POOL is the "default pool" name (for files) or the metadata pool 
name (for directories) and $ino is the inode number (in hexadecimal).



Loïc.
--
|   Loīc Tortay   - IN2P3 Computing Centre |
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-01 Thread Frank Schilder
Hi all,

I think we might be hitting a known problem 
(https://tracker.ceph.com/issues/57244). I don't want to fail the mds yet, 
because we have troubles with older kclients that miss the mds restart and hold 
on to cache entries referring to the killed instance, leading to hanging jobs 
on our HPC cluster.

I have seen this issue before and there was a process in D-state that 
dead-locked itself. Usually, killing this process succeeded and resolved the 
issue. However, this time I can't find such a process.

The tracker mentions that one can delete the file/folder. I have the inode 
number, but really don't want to start a find on a 1.5PB file system. Is there 
a better way to find what path is causing the issue (ask the MDS directly, look 
at a cache dump, or similar)? Is there an alternative to deletion or MDS fail?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How can I use not-replicated pool (replication 1 or raid-0)

2023-05-01 Thread Frank Schilder
I think you misunderstood Janne's reply. The main statement is at the end, ceph 
is not designed for an "I don't care about data" use case. If you need speed 
for temporary data where you can sustain data loss, go for something simpler. 
For example, we use beegfs with great success for a burst buffer for an HPC 
cluster. It is very lightweight and will pull out all performance your drives 
can offer. In case of disaster it is easily possible to clean up. Beegfs does 
not care about lost data, such data will simply become inaccessible while 
everything else just moves on. It will not try to self-heal either. It doesn't 
even scrub data, so no competition of users with admin IO.

Its pretty much your use case. We clean it up every 6-8 weeks and if something 
breaks we just redeploy the whole thing from scratch. Performance is great and 
its a very simple and economic system to administrate. No need for the whole 
ceph daemon engine with large RAM requirements and extra admin daemons.

Use ceph for data you want to survive a nuclear blast. Don't use it for things 
its not made for and then complain.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: mhnx 
Sent: Saturday, April 29, 2023 5:48 AM
To: Janne Johansson
Cc: Ceph Users
Subject: [ceph-users] Re: How can I use not-replicated pool (replication 1 or 
raid-0)

Hello Janne, thank you for your response.

I understand your advice and be sure that I've designed too many EC
pools and I know the mess. This is not an option because I need SPEED.

Please let me tell you, my hardware first to meet the same vision.
Server: R620
Cpu: 2 x Xeon E5-2630 v2 @ 2.60GHz
Ram: 128GB - DDR3
Disk1: 20x Samsung SSD 860 2TB
Disk2: 10x Samsung SSD 870 2TB

My ssds does not have PLP. Because of that, every ceph write also
waits for TRIM. I want to know how much latency we are talking about
because I'm thinking of adding PLP NVME for wal+db cache to gain some
speed.
As you can see, I even try to gain from every TRIM command.
Currently I'm testing replication 2 pool and even this speed is not
enough for my use case.
Now I'm trying to boost the deletion speed because I'm writing and
deleting files all the time and this never ends.
I write this mail because replication 1 will decrease the deletion
speed but still I'm trying to tune some MDS+ODS parameters to increase
delete speed.

Any help and idea will be great for me. Thanks.
Regards.



Janne Johansson , 12 Nis 2023 Çar, 10:10
tarihinde şunu yazdı:
>
> Den mån 10 apr. 2023 kl 22:31 skrev mhnx :
> > Hello.
> > I have a 10 node cluster. I want to create a non-replicated pool
> > (replication 1) and I want to ask some questions about it:
> >
> > Let me tell you my use case:
> > - I don't care about losing data,
> > - All of my data is JUNK and these junk files are usually between 1KB to 
> > 32MB.
> > - These files will be deleted in 5 days.
> > - Writable space and I/O speed is more important.
> > - I have high Write/Read/Delete operations, minimum 200GB a day.
>
> That is "only" 18MB/s which should easily be doable even with
> repl=2,3,4. or EC. This of course depends on speed of drives, network,
> cpus and all that, but in itself it doesn't seem too hard to achieve
> in terms of average speeds. We have EC8+3 rgw backed by some 12-14 OSD
> hosts with hdd and nvme (for wal+db) that can ingest over 1GB/s if you
> parallelize the rgw streams, so 18MB/s seems totally doable with 10
> decent machines. Even with replication.
>
> > I'm afraid that, in any failure, I won't be able to access the whole
> > cluster. Losing data is okay but I have to ignore missing files,
>
> Even with repl=1, in case of a failure, the cluster will still aim at
> fixing itself rather than ignoring currently lost data and moving on,
> so any solution that involves "forgetting" about lost data would need
> a ceph operator telling the cluster to ignore all the missing parts
> and to recreate the broken PGs. This would not be automatic.
>
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io