[ceph-users] Re: cache pressure?

2024-04-26 Thread William Edwards

Hi Erich,

Erich Weiler schreef op 2024-04-23 15:47:
So I'm trying to figure out ways to reduce the number of warnings I'm 
getting and I'm thinking about the one "client failing to respond to 
cache pressure".


Is there maybe a way to tell a client (or all clients) to reduce the 
amount of cache it uses or to release caches quickly?  Like, all the 
time?


I know the linux kernel (and maybe ceph) likes to cache everything for 
a while, and rightfully so, but I suspect in my use case it may be more 
efficient to more quickly purge the cache or to in general just cache 
way less overall...?


We have many thousands of threads all doing different things that are 
hitting our filesystem, so I suspect the caching isn't really doing me 
much good anyway due to the churn, and probably is causing more 
problems than it helping...


We are seeing "client failing to respond to cache pressure" on a daily 
basis.


Remounting on the client usually 'fixes' the issue. Sometimes, 
remounting on all clients that have the same directory mounted is 
needed. Also, a larger MDS cache seems to help.


As Dietmar said, VS Code may cause this. Quite funny to read, actually, 
because we've been dealing with this issue for over a year, and 
yesterday was the very first time Ceph complained about a client and we 
saw VS Code's remote stuff running. Coincidence.




-erich
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


Met vriendelijke groeten,

William Edwards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why CEPH is better than other storage solutions?

2024-04-21 Thread William Edwards

> Op 21 apr 2024 om 17:14 heeft Anthony D'Atri  het 
> volgende geschreven:
> 
> Vendor lock-in only benefits vendors.

Strictly speaking, that isn’t necessarily true. Proprietary standards and the 
like *can* enhance user experience in some cases. Making it intentionally 
difficult to migrate is another story. 

> You’ll pay outrageously for support / maint then your gear goes EOL and 
> you’re trolling eBay for parts.   
> 
> With Ceph you use commodity servers, you can swap 100% of the hardware 
> without taking downtime with servers and drives of your choice.  And you get 
> the source code so worst case you can fix or customize.  Ask me sometime 
> about my experience with a certain proprietary HW vendor.  
> 
> Longhorn , openEBS I don’t know much about.  I suspect that they don’t offer 
> the richness of Ceph and that their communities are much smaller.  
> 
> Of course we’re biased here;)
> 
>> On Apr 21, 2024, at 5:21 AM, sebci...@o2.pl wrote:
>> 
>> Hi,
>> I have problem to answer to this question:
>> Why CEPH is better than other storage solutions?
>> 
>> I know this high level texts about
>> - scalability,
>> - flexibility,
>> - distributed,
>> - cost-Effectiveness
>> 
>> What convince me, but could be received also against, is ceph as a product 
>> has everything what I need it mean:
>> block storage (RBD),
>> file storage (CephFS),
>> object storage (S3, Swift)
>> and "plugins" to run NFS, NVMe over Fabric, NFS on object storage.
>> 
>> Also many other features which are usually sold as a option (mirroring, geo 
>> replication, etc) in paid solutions.
>> I have problem to write it done piece by piece.
>> I want convince my managers we are going in good direction.
>> 
>> Why not something from robin.io or purestorage, netapp, dell/EMC. From 
>> opensource longhorn or openEBS.
>> 
>> If you have ideas please write it.
>> 
>> Thanks,
>> S.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs mount 'stalls'

2023-09-17 Thread William Edwards


> Op 17 sep. 2023 om 21:00 heeft Marc  het volgende 
> geschreven:
> 
> 
> 
> I am still on nautilus and some clients are still on centos7 which mount the 
> cephfs. These mounts stall at some point. Currently I am mounting with 
> something like this in the fstab.

Define ‘stall’.

> 
> id=cephfsclientid,client_mountpoint=/cephfs/test  /mnt/test  fuse.ceph  
> noauto,_netdev,noatime,x-systemd.device-timeout=30,x-systemd.mount-timeout=30,x-systemd.automount,x-systemd.idle-timeout=30
>   0 0
> 
> When the mount stalls I am fixing it with a umount -l, but it would be nicer 
> of course when it would not behave like this. Can this be fixed on el7 and 
> Nautilus, like with different mount options or so? 

What’s in dmesg?

> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: backing up CephFS

2023-04-30 Thread William Edwards

Angelo Höngens schreef op 2023-04-30 15:03:

How do you guys backup CephFS? (if at all?)

I'm building 2 ceph clusters, a primary one and a backup one, and I'm
looking into CephFS as the primary store for research files. CephFS
mirroring seems a very fast and efficient way to copy data to the
backup location, and it has the benefit of the files on the backup
location being fully in a ready-to-use state instead of some binary
proprietary archive.

But I am wondering how to do 'ransomware protection' in this setup. I
can't believe I'm the only one that wants to secure my data ;)

I'm reading up on snapshots and mirroring, and that's great to protect
from user error. I could schedule snapshots on the primary cluster,
and they would automatically get synced to the backup cluster.

But a user can still delete all snapshots on the source side, right?

And you need to create a ceph user on the backup cluster, and import
that on the primary cluster. That means that if a hacker has those
credentials, he could also delete the data on the backup cluster? Or
is there some 'append-only' mode for immutability?

Another option I'm looking into is restic. Restic looks like a cool
tool, but it does not support s3 object locks yet. See the discussion
here [1]. I should be able to get immutability working with the
restic-rest backend according to the developer. But I have my worries
that running restic to sync up an 800TB filesystem with millions of
files will be.. worrysome ;) Anyone using restic in production?

Thanks again for your input!


Among others, we mount CephFS's root directory on a machine, and back up 
that mount using Borg. In our experience, Borg is faster than Restic. I 
actually open-sourced the library we wrote for Borg yesterday, see: 
https://github.com/CyberfusionNL/python3-cyberfusion-borg-support




Angelo.



[1] https://github.com/restic/restic/issues/3195
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
With kind regards,

William Edwards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Expression of Interest in Participating in GSoC 2023 with Your Team

2023-03-16 Thread William Edwards

> Op 16 mrt. 2023 om 05:30 heeft Arush Sharma  het 
> volgende geschreven:
> 
> Dear Ceph Team,
> 
> I hope this email finds you well. I am writing to express my keen interest
> in participating in the Google Summer of Code (GSoC) program 2023 with your
> team.
> 
> I am a 3rd year B.tech student in Computer Science Engineering, with a
> strong passion for [specific area of interest related to the team's
> project(s)]

Oops?

> . I have experience working in C++, and I believe that I can
> contribute significantly to your project by bringing my expertise,
> enthusiasm, and commitment.
> I have been following the GSoC program, and I understand the dedication and
> hard work required to complete a project successfully. Therefore, I am
> willing to commit my time and effort to meet the expectations and
> requirements of the program. I am open to learning new technologies and
> programming languages, and I believe that this opportunity will help me
> grow both personally and professionally.
> I have reviewed the list of your team's project ideas, and I am
> particularly interested in Disk Fragmentation Simulator. I would appreciate
> it if you could provide me with any additional information or resources
> that may be helpful to better understand the project requirements and goals.
> 
> Thank you for taking the time to read my email, and I look forward to
> hearing back from you soon.
> 
> Best regards,
> 
> Arush Sharma
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Any ceph constants available?

2023-02-04 Thread William Edwards

> Op 4 feb. 2023 om 00:03 heeft Thomas Cannon  het 
> volgende geschreven:
> 
> 
> Hello Ceph community.
> 
> The company that recently hired me has a 3 mode ceph cluster that has been 
> running and stable. I am the new lone administrator here and do not know ceph 
> and this is my first experience with it. 
> 
> The issue was that it is/was running out of space, which is why I made a 4th 
> node and attempted to add it into the cluster. Along the way, things have 
> begun to break. The manager daemon on boreal-01 failed to boreal-02 along the 
> way and I tried to get it to fail back to boreal-01, but was unable, and 
> realized while working on it yesterday I realized that the nodes in the 
> cluster are all running different versions of the software. I suspect that 
> might be a huge part of why things aren’t working as expected. 
> 
> Boreal-01 - the host - 17.2.5:
> 
> root@boreal-01:/home/kadmin# ceph -v
> ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
> root@boreal-01:/home/kadmin# 
> 
> Boreal-01 - the admin docker instance running on the host 17.2.1:
> 
> root@boreal-01:/home/kadmin# cephadm shell
> Inferring fsid 951fa730-0228-11ed-b1ef-f925f77b75d3
> Inferring config 
> /var/lib/ceph/951fa730-0228-11ed-b1ef-f925f77b75d3/mon.boreal-01/config
> Using ceph image with id 'e5af760fa1c1' and tag 'v17' created on 2022-06-23 
> 19:49:45 + UTC
> quay.io/ceph/ceph@sha256:d3f3e1b59a304a280a3a81641ca730982da141dad41e942631e4c5d88711a66b
>  
> 
> root@boreal-01:/# ceph -v
> ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable)
> root@boreal-01:/# 
> 
> Boreal-02 - 15.2.6:
> 
> root@boreal-02:/home/kadmin# ceph -v
> ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus 
> (stable)
> root@boreal-02:/home/kadmin# 
> 
> 
> Boreal-03 - 15.2.8:
> 
> root@boreal-03:/home/kadmin# ceph -v
> ceph version 15.2.18 (f2877ae32a72fc25acadef57597f44988b805c38) octopus 
> (stable)
> root@boreal-03:/home/kadmin# 
> 
> And the host I added - Boreal-04 - 17.2.5:
> 
> root@boreal-04:/home/kadmin# ceph -v
> ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
> root@boreal-04:/home/kadmin# 
> 
> The cluster ins’t rebalancing data, and drives are filling up unevenly, 
> despite auto balancing being on. I can run a df and see that it isn’t 
> working. However it says it is:
> 
> root@boreal-01:/# ceph balancer status 
> {
>"active": true,
>"last_optimize_duration": "0:00:00.011905",
>"last_optimize_started": "Fri Feb  3 18:39:02 2023",
>"mode": "upmap",
>"optimize_result": "Unable to find further optimization, or pool(s) pg_num 
> is decreasing, or distribution is already perfect",
>"plans": []
> }
> root@boreal-01:/# 
> 
> root@boreal-01:/# ceph -s
>  cluster:
>id: 951fa730-0228-11ed-b1ef-f925f77b75d3
>health: HEALTH_WARN
>There are daemons running an older version of ceph
>6 nearfull osd(s)
>3 pgs not deep-scrubbed in time
>3 pgs not scrubbed in time
>4 pool(s) nearfull
>1 daemons have recently crashed
> 
>  services:
>mon: 4 daemons, quorum boreal-01,boreal-02,boreal-03,boreal-04 (age 22h)
>mgr: boreal-02.lqxcvk(active, since 19h), standbys: boreal-03.vxhpad, 
> boreal-01.ejaggu
>mds: 2/2 daemons up, 2 standby
>osd: 89 osds: 89 up (since 5d), 89 in (since 45h)
> 
>  data:
>volumes: 2/2 healthy
>pools:   7 pools, 549 pgs
>objects: 227.23M objects, 193 TiB
>usage:   581 TiB used, 356 TiB / 937 TiB avail
>pgs: 533 active+clean
> 16  active+clean+scrubbing+deep
> 
>  io:
>client:   55 MiB/s rd, 330 KiB/s wr, 21 op/s rd, 45 op/s wr
> 
> root@boreal-01:/# 
> 
> Part of me suspects that I exacerbated the problems by trying to monkey with 
> boreal-04 for several days, trying to get the drives inside the machine 
> turned into OSDs so that they would be used. One thing I did was attempt to 
> upgrade the code on that machine, and I could have triggered a cluster-wide 
> upgrade that failed outside of 1 and 4. With 2 and 3 not even running the 
> same major release, if I did make that mistake, I can see why instead of an 
> upgrade, things would be worse. 
> 
> According to the documentation, I should be able to upgrade the entire 
> cluster by running a single command on the admin node, but when I go to run 
> commands I get errors that even google can’t solve:
> 
> root@boreal-01:/# ceph orch host ls
> Error ENOENT: Module not found
> root@boreal-01:/# 
> 
> Consequently, I have very little faith that running commands to upgrade 
> everything so that it’s all running the same code will work. I think each 
> host could be upgraded and fix things, but do not feel confident doing so and 
> risking our data.
> 
> Hopefully that gives a better idea of the problems I am facing. 

[ceph-users] Re: Ceph filesystem

2022-12-20 Thread William Edwards

> Op 20 dec. 2022 om 08:39 heeft akshay sharma  het 
> volgende geschreven:
> 
> Ceph fs Authorize cephfs client.user /.
> Sudo mount -t vm:6789,vm2:6789:/ /mnt/cephfs -o name=user, secret=***
> 
> Now, I'm able to copy files from the same machine.. basically copy file
> from home to /mnt/cephfs is working but when copying from remote machine
> using SFTP or SCP to /mnt/cephfs is not working.
> 
> Are we missing something here?

Yes, an explanation of ‘not working’.

> 
>> On Tue, Dec 20, 2022, 7:56 AM Xiubo Li  wrote:
>> 
>> 
>>> On 19/12/2022 21:19, akshay sharma wrote:
>>> Hi All,
>>> 
>>> I have three Virtual machines with a dedicated disk for ceph, ceph
>> cluster
>>> is up as shown below
>>> 
>>> user@ubuntu:~/ceph-deploy$ sudo ceph status
>>> 
>>>   cluster:
>>> 
>>> id: 06a014a8-d166-4add-a21d-24ed52dce5c0
>>> 
>>> health: HEALTH_WARN
>>> 
>>> mons are allowing insecure global_id reclaim
>>> 
>>> clock skew detected on mon.ubuntu36, mon.ubuntu68
>>> 
>>> 
>>> 
>>>   services:
>>> 
>>> mon: 3 daemons, quorum ubuntu35,ubuntu36,ubuntu68 (age 10m)
>>> 
>>> mgr: ubuntu68(active, since 4m)
>>> 
>>> mds: 1/1 daemons up
>>> 
>>> osd: 3 osds: 3 up (since 5m), 3 in (since 5m)
>>> 
>>> 
>>> 
>>>   data:
>>> 
>>> volumes: 1/1 healthy
>>> 
>>> pools:   3 pools, 41 pgs
>>> 
>>> objects: 22 objects, 2.3 KiB
>>> 
>>> usage:   16 MiB used, 150 GiB / 150 GiB avail
>>> 
>>> pgs: 41 active+clean
>>> 
>>> 
>>> 
>>>   progress:
>>> 
>>> 
>>> 
>>> Note: deployed ceph cluster using ceph-deploy utility ..version 2.1.0
>>> 
>>> 
>>> 
>>> Out those three virtual machine, two machines are being used a client
>> also,
>>> using ceph posix filesystem to store data to the cluster.
>>> 
>>> 
>>> 
>>> followed following commands.
>>> 
>>> 
>>> 
>>> Ran below command on the main machine, where all commands or ceph-deploy
>> is
>>> installed.
>>> 
>>> 
>>> sudo ceph auth get-or-create client.user mon 'allow r' mds 'allow r,
>> allow
>>> rw path=/home/cephfs' osd 'allow rw pool=cephfs_data' -o
>>> /etc/ceph/ceph.client.user.keyring
>>> 
>> As Robert mentioned the 'path=' here should be the relative path from
>> the root of the cephfs instead of your local fs.
>> 
>> 
>>> Ran these two on the client.
>>> 
>>> sudo mkdir /mnt/mycephfs $ sudo mount -t ceph
>>> ubuntu1:6789,ubuntu2:6789,ubuntu3:6789:/ /mnt/mycephfs -o
>>> name=user,secret=AQBxnDFdS5atIxAAV0rL9klnSxwy6EFpR/EFbg==
>>> 
>> And you just created the mds auth caps for "/home/cephfs" path, but you
>> were mounting the '/' path with that caps.
>> 
>> Thanks
>> 
>> - Xiubo
>> 
>>> 
>>> After this when we are trying to right to the mount path../mnt/mycephfs..
>>> it is giving permission denied.
>>> 
>>> 
>>> How can we resolve this?
>>> 
>>> 
>>> I tried disabling cephx, but still ceph-deploy mon create-inital is
>> failing
>>> as key mon not found?
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>> 
>> 
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-12 Thread William Edwards

> Op 12 dec. 2022 om 22:47 heeft Sascha Lucas  het 
> volgende geschreven:
> 
> Hi Greg,
> 
>> On Mon, 12 Dec 2022, Gregory Farnum wrote:
>> 
>> On Mon, Dec 12, 2022 at 12:10 PM Sascha Lucas  wrote:
> 
>>> A follow-up of [2] also mentioned having random meta-data corruption: "We
>>> have 4 clusters (all running same version) and have experienced meta-data
>>> corruption on the majority of them at some time or the other"
>> 
>> 
>> Jewel (and upgrading from that version) was much less stable than Luminous
>> (when we declared the filesystem “awesome” and said the Ceph upstream
>> considered it production-ready), and things have generally gotten better
>> with every release since then.
> 
> I see. The cited corruption belongs to older releases...
> 
>>> [3] tells me, that metadata damage can happen either from data loss (which
>>> I'm convinced not to have), or from software bugs. The later would be
>>> worth fixing. Is there a way to find the root cause?
>> 
>> 
>> Yes, we’d very much like to understand this. What versions of the server
>> and kernel client are you using? What platform stack — I see it looks like
>> you are using CephFS through the volumes interface? The simplest
>> possibility I can think of here is that you are running with a bad kernel
>> and it used async ops poorly, maybe? But I don’t remember other spontaneous
>> corruptions of this type anytime recent.
> 
> Ceph "servers" like MONs, OSDs, MDSs etc. are all 17.2.5/cephadm/podman. The 
> filesystem kernel clients are co-located on the same hosts running the 
> "servers".

Isn’t that discouraged?

> For some other reason OS is still RHEL 8.5 (yes with community ceph). Kernel 
> is 4.18.0-348.el8.x86_64 from release media. Just one filesystem kernel 
> client is at 4.18.0-348.23.1.el8_5.x86_64 from EOL of 8.5.
> 
> Are there known issues with this kernel versions?
> 
>> Have you run a normal forward scrub (which is non-disruptive) to check if
>> there are other issues?
> 
> So far I haven't dared, but will do so tomorrow.
> 
> Thanks, Sascha.
> 
> [2] https://www.spinics.net/lists/ceph-users/msg53202.html
> [3] 
> https://docs.ceph.com/en/quincy/cephfs/disaster-recovery/#metadata-damage-and-repair
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

2022-12-07 Thread William Edwards

> Op 7 dec. 2022 om 11:59 heeft Stefan Kooman  het volgende 
> geschreven:
> 
> On 5/13/22 09:38, Xiubo Li wrote:
>>> On 5/12/22 12:06 AM, Stefan Kooman wrote:
>>> Hi List,
>>> 
>>> We have quite a few linux kernel clients for CephFS. One of our customers 
>>> has been running mainline kernels (CentOS 7 elrepo) for the past two years. 
>>> They started out with 3.x kernels (default CentOS 7), but upgraded to 
>>> mainline when those kernels would frequently generate MDS warnings like 
>>> "failing to respond to capability release". That worked fine until 5.14 
>>> kernel. 5.14 and up would use a lot of CPU and *way* more bandwidth on 
>>> CephFS than older kernels (order of magnitude). After the MDS was upgraded 
>>> from Nautilus to Octopus that behavior is gone (comparable CPU / bandwidth 
>>> usage as older kernels). However, the newer kernels are now the ones that 
>>> give "failing to respond to capability release", and worse, clients get 
>>> evicted (unresponsive as far as the MDS is concerned). Even the latest 5.17 
>>> kernels have that. No difference is observed between using messenger v1 or 
>>> v2. MDS version is 15.2.16.
>>> Surprisingly the latest stable kernels from CentOS 7 work flawlessly now. 
>>> Although that is good news, newer operating systems come with newer kernels.
>>> 
>>> Does anyone else observe the same behavior with newish kernel clients?
>> There have some known bugs, which have been fixed or under fixing recently, 
>> even in the mainline and, not sure whether are they related. Such as 
>> [1][2][3][4]. More detail please see ceph-client repo testing branch [5].
> 
> None of the issues you mentioned were related. We gained some more experience 
> with newer kernel clients, specifically on Ubuntu Focal / Jammy (5.15). 
> Performance issues seem to arise in certain workloads, specifically 
> load-balanced Apache shared web hosting clusters with CephFS. We have tested 
> linux kernel clients from 5.8 up to and including 6.0 with a production 
> workload and the short summary is:
> 
> < 5.13, everything works fine
> 5.13 and up is giving issues

I see this issue on 6.0.0 as well.

> 
> We tested the 5.13.-rc1 as well, and already that kernel is giving issues. So 
> something has changed in 5.13 that results in performance regression in 
> certain workloads. And I wonder if it has something to do with the changes 
> related to fscache that have, and are, happening in the kernel. These web 
> servers might access the same directories / files concurrently.
> 
> Note: we have quite a few 5.15 kernel clients not doing any (load-balanced) 
> web based workload (container clusters on CephFS) that don't have any 
> performance issue running these kernels.
> 
> Issue: poor CephFS performance
> Symptom / result: excessive CephFS network usage (order of magnitude higher 
> than for older kernels not having this issue), within a minute there are a 
> bunch of slow web service processes, claiming loads of virtual memory, that 
> result in heavy swap usage and basically rendering the node unusable slow.
> 
> Other users that replied to this thread experienced similar symptoms. It is 
> reproducible on both CentOS (EPEL mainline kernels) as well as on Ubuntu (hwe 
> as well as default relase kernel).
> 
> MDS version used: 15.2.16 (with a backported patch from 15.2.17) (single 
> active / standby-replay)
> 
> Does this ring a bell?
> 
> Gr. Stefan
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Does Ceph support presigned url (like s3) for uploading?

2022-10-28 Thread William Edwards

Szabo, Istvan (Agoda) schreef op 2022-10-28 09:55:

Hi,

I found this long time back tracker
https://tracker.ceph.com/issues/23470 which I guess some way show that
it is possible but haven't really found any documentation in ceph, how
to do properly.
This is how it works with minio:
https://min.io/docs/minio/linux/integrations/presigned-put-upload-via-browser.html
I'm looking for this in ceph.


We use presigned POST URLs on Ceph without issues.



Thank you


This message is confidential and is for the sole use of the intended
recipient(s). It may also be privileged or otherwise protected by
copyright or other legal rules. If you have received it by mistake
please let us know by reply email and delete it from your system. It
is prohibited to copy this message or disclose its content to anyone.
Any confidentiality or privilege is not waived or lost by any mistaken
delivery or unauthorized disclosure of the message. All messages sent
to and from Agoda may be monitored to ensure compliance with company
policies, to protect the company's interests and to remove potential
malware. Electronic messages may be intercepted, amended, lost or
deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
With kind regards,

William Edwards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs ha mount expectations

2022-10-26 Thread William Edwards

> Op 26 okt. 2022 om 10:11 heeft mj  het volgende 
> geschreven:
> 
> Hi!
> 
> We have read https://docs.ceph.com/en/latest/man/8/mount.ceph, and would like 
> to see our expectations confirmed (or denied) here. :-)
> 
> Suppose we build a three-node cluster, three monitors, three MDSs, etc, in 
> order to export a cephfs to multiple client nodes.
> 
> On the (RHEL8) clients (web application servers) fstab, we will mount the 
> cephfs like:
> 
>> cehp1,ceph2,ceph3:/ /mnt/ha-pool/ ceph 
>> name=admin,secretfile=/etc/ceph/admin.secret,noatime 0 2
> 
> We expect that the RHEL clients will then be able to use (read/write) a 
> shared /mnt/ha-pool directory simultaneously.
> 
> Our question: how HA can we expect this setup to be? Looking for some 
> practical experience here.
> 
> Specific: Can we reboot any of the three involved ceph servers without the 
> clients noticing anything? Or will there be certain timeouts involved, during 
> which /mnt/ha-pool/ will appear unresposive, and *after* a timeout the client 
> switches monitor node, and /mnt/ha-pool/ will respond again?

Monitor failovers don’t cause a noticeable disruption IIRC.

MDS failovers do. The MDS needs to replay. You can minimise the effect with 
mds_standby_replay.

> 
> Of course we hope the answer is: in such a setup, cephfs clients should not 
> notice a reboot at all. :-)
> 
> All the best!
> 
> MJ
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

2022-09-26 Thread William Edwards

Stefan Kooman schreef op 2022-05-11 18:06:

Hi List,

We have quite a few linux kernel clients for CephFS. One of our
customers has been running mainline kernels (CentOS 7 elrepo) for the
past two years. They started out with 3.x kernels (default CentOS 7),
but upgraded to mainline when those kernels would frequently generate
MDS warnings like "failing to respond to capability release". That
worked fine until 5.14 kernel. 5.14 and up would use a lot of CPU and
*way* more bandwidth on CephFS than older kernels (order of
magnitude). After the MDS was upgraded from Nautilus to Octopus that
behavior is gone (comparable CPU / bandwidth usage as older kernels).
However, the newer kernels are now the ones that give "failing to
respond to capability release", and worse, clients get evicted
(unresponsive as far as the MDS is concerned). Even the latest 5.17
kernels have that. No difference is observed between using messenger
v1 or v2. MDS version is 15.2.16.
Surprisingly the latest stable kernels from CentOS 7 work flawlessly
now. Although that is good news, newer operating systems come with
newer kernels.

Does anyone else observe the same behavior with newish kernel clients?


Yes.

I upgraded some CephFS clients from kernel 5.10.0 to 5.18.0. Ever since, 
I've experienced these issues on these clients:


- On the busiest client, ceph-msgr reads 3 - 6 Gb/s from disk. With 
5.10.0, this rarely exceeds 200 K/s.

- Clients more often don't respond to capability release.

The cluster is running Nautilus (14.2.22).



Gr. Stefan

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
With kind regards,

William Edwards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Benefits of dockerized ceph?

2022-08-24 Thread William Edwards

> Op 24 aug. 2022 om 22:08 heeft Boris  het volgende geschreven:
> 
> Hi,
> I was just asked if we can switch to dockerized ceph, because it is easier to 
> update. 
> 
> Last time I tried wo use ceph orch i failed really hard to get the rgw daemon 
> running as I would like to (IP/port/zonegroup and so on). 
> Also I never really felt comfortable running production workload in docker. 
> 
> Now I wanted to ask the ML: are there good reasons to run ceph in docker, 
> oder than „update is easier and is decoupled from OS packages“?

There was a very long discussion about this on the mailing list not too long 
ago…

> Cheers
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: S3 and RBD backup

2022-05-16 Thread William Edwards

> Op 16 mei 2022 om 13:41 heeft Sanjeev Jha  het 
> volgende geschreven:
> 
> Hi,
> 
> Could someone please let me know how to take S3 and RBD backup from Ceph side 
> and possibility to take backup from Client/user side?
> 
> Which tool should I use for the backup?

It depends.

> 
> Best regards,
> Sanjeev Kumar Jha
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best way to keep a backup of a bucket

2022-03-31 Thread William Edwards

Szabo, Istvan (Agoda) schreef op 2022-03-31 08:44:

Hi,


Hi,



I have some critical data in couple of buckets I'd like to keep it
somehow safe, but I don't see any kind of snapshot solution in ceph
for objectgateway.


Some work seems to have been done in this area at one point: 
https://tracker.ceph.com/projects/ceph/wiki/Rgw_-_Snapshots



How you guys (if you do) backup RGW buckets or objects what is the
best way to keep some kind of cold data if ceph crash?


Related: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SHY7OY24E4YI3WSQT4RP7QICYWKUM3PF/


Personally, I've a daily cron that loops through my buckets, and `rclone 
sync`s them: https://rclone.org/commands/rclone_sync/




Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---



This message is confidential and is for the sole use of the intended
recipient(s). It may also be privileged or otherwise protected by
copyright or other legal rules. If you have received it by mistake
please let us know by reply email and delete it from your system. It
is prohibited to copy this message or disclose its content to anyone.
Any confidentiality or privilege is not waived or lost by any mistaken
delivery or unauthorized disclosure of the message. All messages sent
to and from Agoda may be monitored to ensure compliance with company
policies, to protect the company's interests and to remove potential
malware. Electronic messages may be intercepted, amended, lost or
deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
With kind regards,

William Edwards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mgr : ModuleNotFoundError: No module named 'requests'

2022-02-19 Thread William Edwards

> Op 19 feb. 2022 om 13:49 heeft Florent B.  het volgende 
> geschreven:
> 
> Hi,
> 
> On a fresh Debian Bullseye installation running Ceph Octopus (15.2.15), new 
> mgr daemons can't start telemetry & dashboard modules because of missing 
> "requests" Python module.
> 
>   2022-02-19T12:31:50.884+ 7f30fdaaa040 -1 mgr[py] Traceback (most
>   recent call last):
>  File "/usr/share/ceph/mgr/dashboard/__init__.py", line 49, in
>   
>from .module import Module, StandbyModule  # noqa: F401
>  File "/usr/share/ceph/mgr/dashboard/module.py", line 38, in 
>from .grafana import push_local_dashboards
>  File "/usr/share/ceph/mgr/dashboard/grafana.py", line 8, in 
>import requests
>   ModuleNotFoundError: No module named 'requests'
> 
>   2022-02-19T12:31:50.884+ 7f30fdaaa040 -1 mgr[py] Class not found
>   in module 'dashboard'
>   2022-02-19T12:31:50.884+ 7f30fdaaa040 -1 mgr[py] Error loading
>   module 'dashboard': (2) No such file or directory
>   2022-02-19T12:31:54.524+ 7f30fdaaa040 -1 mgr[py] Module not
>   found: 'telemetry'
>   2022-02-19T12:31:54.524+ 7f30fdaaa040 -1 mgr[py] Traceback (most
>   recent call last):
>  File "/usr/share/ceph/mgr/telemetry/__init__.py", line 1, in 
>from .module import Module
>  File "/usr/share/ceph/mgr/telemetry/module.py", line 12, in 
>import requests
>   ModuleNotFoundError: No module named 'requests'
> 
> 
> But requests module is installed :
> 
> # echo "import requests; r = requests.get('https://ceph.com/en'); 
> print(r.status_code)"  | python
> 200
> 
> # echo "import requests; r = requests.get('https://ceph.com/en'); 
> print(r.status_code)"  | python3
> 200
> 
> # echo "import requests; r = requests.get('https://ceph.com/en'); 
> print(r.status_code)"  | python3.9
> 200

Maybe it’s in a venv?

> 
> 
> What is my problem ? I don't have problem on old Buster servers running 
> 15.2.14...
> 
> Thanks
> 
> Florent
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Something akin to FSIMAGE in ceph

2022-02-14 Thread William Edwards

> Op 15 feb. 2022 om 02:19 heeft Robert Gallop  het 
> volgende geschreven:
> 
> Had the question posed to me and couldn’t find an immediate answer.
> 
> Is there anyway we can query the MDS or some other component in the ceph
> stack that would give essentially immediate access to all file names
> contained in ceph?
> 
> in HDFS we have the ability to pull the fsimage from the name nodes and
> perform query like operations to find a file, lets say we wanted to see all
> *log4j*.jar files that existed in HDFS, we could run this query and have
> 20k results in a couple seconds.
> 
> Right now with ceph, we are only using cephfs, kernel client mounts, so the
> only “normal” way to do this is to use find, or ls, or whatever normal
> tools could go looking for this jar across the various mount points.

Can you mount / and use mlocate?

> 
> So thought I’d ask if any one had some tricks that could be used to
> basically ask the MDS or component that would know:  Show me the path of
> every file ending in .jar that contains the letters/numbers log4j in its
> name…
> 
> Thanks!
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using ceph.conf for CephFS kernel client with Nautilus cluster

2022-02-03 Thread William Edwards

Hi,

Jeff Layton schreef op 2022-02-03 15:36:

On Thu, 2022-02-03 at 15:26 +0100, William Edwards wrote:

Hi,

Jeff Layton schreef op 2022-02-03 14:45:
> On Thu, 2022-02-03 at 12:01 +0100, William Edwards wrote:
> > Hi,
> >
> > I need to set options from
> > https://docs.ceph.com/en/nautilus/cephfs/client-config-ref/ . I assume
> > these should be placed in the 'client' section in ceph.conf.
> >
> > The documentation for Nautilus says that ceph.conf should be placed
> > when
> > FUSE is used, see:
> > https://docs.ceph.com/en/nautilus/cephfs/mount-prerequisites/ .
> > However,
> > ceph.conf is not mentioned on
> > https://docs.ceph.com/en/nautilus/cephfs/fstab/#kernel-driver .
> > Therefore, the clients don't currently have an /etc/ceph/ceph.conf.
> >
> > In contrast, the documentation for Pacific says that there **must** be
> > a
> > ceph.conf in any case: https://docs.ceph.com/en/latest/cephfs/mount
> > -prerequisites/#general-pre-requisite-for-mounting-cephfs
> >
> > Newer Ceph versions contain the command 'ceph config
> > generate-minimal-conf'. I can deduce from the command's code what
> > ceph.conf on the client should look like:
> > https://github.com/ceph/ceph/blob/master/src/mon/ConfigMonitor.cc#L423
> >
> > L428: [global]
> > L429: fsid
> > L430 - L448: mon_host (not sure what 'is_legacy' and 'size() == 1'
> > entail; I guess I'll see)
> > L449: newline
> > L450 - L458: This is deduced from
> > 
https://github.com/ceph/ceph/blob/a67d1cf2a7a4031609a5d37baa01ffdfef80e993/src/mon/ConfigMap.cc#L98
> > . get_minimal_conf only adds options with the flags FLAG_NO_MON_UPDATE
> > or FLAG_MINIMAL_CONF, but I don't see any 'set_flags' statements in
> > master; so I'm not sure which options have those flags.
> >
> > So the resulting config would contain the global section with 'fsid'
> > and
> > 'mon_host', my custom options in 'client', and possibly 'keyring'.
> >
> > Questions:
> >
> > - Is it acceptable to use a ceph.conf on the kernel client when using
> > a
> > Nautilus cluster? It can be specified as the 'conf' mount option, but
> > as
> > the documentation barely mentions it for kernel clients, I'm not 100%
> > sure.
> > - Is my evaluation of the 'minimal' config correct?
> > - Which options have the FLAG_NO_MON_UPDATE and FLAG_MINIMAL_CONF
> > flags?
> > / Where are flags set?
> >
> > The cluster is running Ceph 14.2.22. The clients are running Ceph
> > 12.2.11. All clients use the kernel client.
> >
>
> The in-kernel client itself does not pay any attention to ceph.conf.
> The
> mount helper program (mount.ceph) will look at that ceph configs and
> keyrings to search for mon addresses and secrets for mounting if you
> don't provide them in the device string and mount options.

Are you saying that the options from
https://docs.ceph.com/en/nautilus/cephfs/client-config-ref/ won't take
effect when using the kernel client?



Yes. Those are ignored by the kernel client.


Thanks. I was hoping to set 'client cache size'. Is there any other way 
to set it when using the kernel client? I doubt switching to FUSE will 
help in solving the performance issue I'm trying to tackle (which is 
what I want to set 'client cache size' for :-) ).


--
With kind regards,

William Edwards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using ceph.conf for CephFS kernel client with Nautilus cluster

2022-02-03 Thread William Edwards

Hi,

Jeff Layton schreef op 2022-02-03 14:45:

On Thu, 2022-02-03 at 12:01 +0100, William Edwards wrote:

Hi,

I need to set options from
https://docs.ceph.com/en/nautilus/cephfs/client-config-ref/ . I assume
these should be placed in the 'client' section in ceph.conf.

The documentation for Nautilus says that ceph.conf should be placed 
when

FUSE is used, see:
https://docs.ceph.com/en/nautilus/cephfs/mount-prerequisites/ . 
However,

ceph.conf is not mentioned on
https://docs.ceph.com/en/nautilus/cephfs/fstab/#kernel-driver .
Therefore, the clients don't currently have an /etc/ceph/ceph.conf.

In contrast, the documentation for Pacific says that there **must** be 
a

ceph.conf in any case: https://docs.ceph.com/en/latest/cephfs/mount
-prerequisites/#general-pre-requisite-for-mounting-cephfs

Newer Ceph versions contain the command 'ceph config
generate-minimal-conf'. I can deduce from the command's code what
ceph.conf on the client should look like:
https://github.com/ceph/ceph/blob/master/src/mon/ConfigMonitor.cc#L423

L428: [global]
L429: fsid
L430 - L448: mon_host (not sure what 'is_legacy' and 'size() == 1'
entail; I guess I'll see)
L449: newline
L450 - L458: This is deduced from
https://github.com/ceph/ceph/blob/a67d1cf2a7a4031609a5d37baa01ffdfef80e993/src/mon/ConfigMap.cc#L98
. get_minimal_conf only adds options with the flags FLAG_NO_MON_UPDATE
or FLAG_MINIMAL_CONF, but I don't see any 'set_flags' statements in
master; so I'm not sure which options have those flags.

So the resulting config would contain the global section with 'fsid' 
and

'mon_host', my custom options in 'client', and possibly 'keyring'.

Questions:

- Is it acceptable to use a ceph.conf on the kernel client when using 
a
Nautilus cluster? It can be specified as the 'conf' mount option, but 
as

the documentation barely mentions it for kernel clients, I'm not 100%
sure.
- Is my evaluation of the 'minimal' config correct?
- Which options have the FLAG_NO_MON_UPDATE and FLAG_MINIMAL_CONF 
flags?

/ Where are flags set?

The cluster is running Ceph 14.2.22. The clients are running Ceph
12.2.11. All clients use the kernel client.



The in-kernel client itself does not pay any attention to ceph.conf. 
The

mount helper program (mount.ceph) will look at that ceph configs and
keyrings to search for mon addresses and secrets for mounting if you
don't provide them in the device string and mount options.


Are you saying that the options from 
https://docs.ceph.com/en/nautilus/cephfs/client-config-ref/ won't take 
effect when using the kernel client?


--
With kind regards,

William Edwards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using ceph.conf for CephFS kernel client with Nautilus cluster

2022-02-03 Thread William Edwards

Hi,

Konstantin Shalygin schreef op 2022-02-03 12:09:

Hi,


On 3 Feb 2022, at 14:01, William Edwards 
wrote:
- Is it acceptable to use a ceph.conf on the kernel client when
using a Nautilus cluster?


If you use kernel client you don't need ceph.conf


That's what the documentation implies, but...



Just setup fstab like this (this is example for msgr2 cluster only),
for example for CentOS Stream kernel:

172.16.16.2:3300,172.16.16.3:3300,172.16.16.4:3300:/folder /srv/folder
ceph
name=client_name,secret=,dirstat,ms_mode=prefer-crc,_netdev


... the options I want to set from 
https://docs.ceph.com/en/nautilus/cephfs/client-config-ref/ aren't 
listed as possible mount options at 
https://docs.ceph.com/en/nautilus/man/8/mount.ceph/#options . 'conf' is.




Gold luck,
k


--
With kind regards,

William Edwards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Using ceph.conf for CephFS kernel client with Nautilus cluster

2022-02-03 Thread William Edwards

Hi,

I need to set options from 
https://docs.ceph.com/en/nautilus/cephfs/client-config-ref/ . I assume 
these should be placed in the 'client' section in ceph.conf.


The documentation for Nautilus says that ceph.conf should be placed when 
FUSE is used, see: 
https://docs.ceph.com/en/nautilus/cephfs/mount-prerequisites/ . However, 
ceph.conf is not mentioned on 
https://docs.ceph.com/en/nautilus/cephfs/fstab/#kernel-driver . 
Therefore, the clients don't currently have an /etc/ceph/ceph.conf.


In contrast, the documentation for Pacific says that there **must** be a 
ceph.conf in any case: https://docs.ceph.com/en/latest/cephfs/mount 
-prerequisites/#general-pre-requisite-for-mounting-cephfs


Newer Ceph versions contain the command 'ceph config 
generate-minimal-conf'. I can deduce from the command's code what 
ceph.conf on the client should look like: 
https://github.com/ceph/ceph/blob/master/src/mon/ConfigMonitor.cc#L423


L428: [global]
L429: fsid
L430 - L448: mon_host (not sure what 'is_legacy' and 'size() == 1' 
entail; I guess I'll see)

L449: newline
L450 - L458: This is deduced from 
https://github.com/ceph/ceph/blob/a67d1cf2a7a4031609a5d37baa01ffdfef80e993/src/mon/ConfigMap.cc#L98 
. get_minimal_conf only adds options with the flags FLAG_NO_MON_UPDATE 
or FLAG_MINIMAL_CONF, but I don't see any 'set_flags' statements in 
master; so I'm not sure which options have those flags.


So the resulting config would contain the global section with 'fsid' and 
'mon_host', my custom options in 'client', and possibly 'keyring'.


Questions:

- Is it acceptable to use a ceph.conf on the kernel client when using a 
Nautilus cluster? It can be specified as the 'conf' mount option, but as 
the documentation barely mentions it for kernel clients, I'm not 100% 
sure.

- Is my evaluation of the 'minimal' config correct?
- Which options have the FLAG_NO_MON_UPDATE and FLAG_MINIMAL_CONF flags? 
/ Where are flags set?


The cluster is running Ceph 14.2.22. The clients are running Ceph 
12.2.11. All clients use the kernel client.


--
With kind regards,

William Edwards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Vitastor, a fast Ceph-like block storage for VMs

2020-09-23 Thread William Edwards
I love how it’s not possible to delete inodes yet. Data loss would be a thing 
of the past!

Jokes aside, interesting project.

Sent from mobile

> Op 23 sep. 2020 om 00:45 heeft vita...@yourcmc.ru het volgende geschreven:
> 
> Hi!
> 
> After almost a year of development in my spare time I present my own 
> software-defined block storage system: Vitastor - https://vitastor.io
> 
> I designed it similar to Ceph in many ways, it also has Pools, PGs, OSDs, 
> different coding schemes, rebalancing and so on. However it's much simpler 
> and much faster. In a test cluster with SATA SSDs it achieved Q1T1 latency of 
> 0.14ms which is especially great compared to Ceph RBD's 1ms for writes and 
> 0.57ms for reads. In an "iops saturation" parallel load benchmark it reached 
> 895k read / 162k write iops, compared to Ceph's 480k / 100k on the same 
> hardware, but the most interesting part was CPU usage: Ceph OSDs were using 
> 40 CPU cores out of 64 on each node and Vitastor was only using 4.
> 
> Of course it's an early pre-release which means that, for example, it lacks 
> snapshot support and other useful features. However the base is finished - it 
> works and runs QEMU VMs. I like the design and I plan to develop it further.
> 
> There are more details in the README file which currently opens from the 
> domain https://vitastor.io
> 
> Sorry if it was a bit off-topic, I just thought it could be interesting for 
> you :)
> 
> -- 
> With best regards,
>  Vitaliy Filippov
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Speeding up reconnection

2020-08-31 Thread William Edwards
I replaced the VMs taking care of routing between clients and MDSes by physical 
machines. Problems below are solved. It seems to have been related to issues 
with the virtual NIC. It seemed to work well with E1000 instead of VirtIO...


Met vriendelijke groeten,

William Edwards

- Original Message -
From: William Edwards (wedwa...@cyberfusion.nl)
Date: 08/11/20 11:38
To: ceph-users@ceph.io
Subject: Speeding up reconnection


Hello,

When connection is lost between kernel client, a few things happen:

1.
Caps become stale:

Aug 11 11:08:14 admin-cap kernel: [308405.227718] ceph: mds0 caps stale

2.
MDS evicts client for being unresponsive:

MDS log: 2020-08-11 11:12:08.923 7fd1f45ae700  0 log_channel(cluster) log [WRN] 
: evicting unresponsive client admin-cap.cf.ha.cyberfusion.cloud:DB0001-cap 
(144786749), after 300.978 seconds
Client log: Aug 11 11:12:11 admin-cap kernel: [308643.051006] ceph: mds0 hung

3.
Socket is closed:

Aug 11 11:22:57 admin-cap kernel: [309289.192705] libceph: mds0 
[fdb7:b01e:7b8e:0:10:10:10:1]:6849 socket closed (con state OPEN)

I am not sure whether the kernel client or MDS closes the connection. I think 
the kernel client does so, because nothing is logged at the MDS side at 11:22:57

4.
Connection is reset by MDS:

MDS log: 2020-08-11 11:22:58.831 7fd1f9e49700  0 --1- 
[v2:[fdb7:b01e:7b8e:0:10:10:10:1]:6800/3619156441,v1:[fdb7:b01e:7b8e:0:10:10:10:1]:6849/3619156441]
 >> v1:[fc00:b6d:cfc:951::7]:0/133007863 conn(0x55bfaf1c2880 0x55c16cb47000 
:6849 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 
l=0).handle_connect_message_2 accept we reset (peer sent cseq 1), sending 
RESETSESSION
Client log: Aug 11 11:22:58 admin-cap kernel: [309290.058222] libceph: mds0 
[fdb7:b01e:7b8e:0:10:10:10:1]:6849 connection reset

5.
Kernel client reconnects:

Aug 11 11:22:58 admin-cap kernel: [309290.058972] ceph: mds0 closed our session
Aug 11 11:22:58 admin-cap kernel: [309290.058973] ceph: mds0 reconnect start
Aug 11 11:22:58 admin-cap kernel: [309290.069979] ceph: mds0 reconnect denied
Aug 11 11:22:58 admin-cap kernel: [309290.069996] ceph: dropping file locks for 
6a23d9dd 1099625041446
Aug 11 11:22:58 admin-cap kernel: [309290.071135] libceph: mds0 
[fdb7:b01e:7b8e:0:10:10:10:1]:6849 socket closed (con state NEGOTIATING)

Question:

As you can see, there's 10 minutes between losing the connection and the 
reconnection attempt (11:12:08 - 11:22:58). I could not find any settings 
related to the period after which reconnection is attempted. I would like to 
change this value from 10 minutes to something like 1 minute. I also tried 
searching the Ceph docs for the string '600' (10 minutes), but did not find 
anything useful.

Hope someone can help.

Environment details:

Client kernel: 4.19.0-10-amd64
Ceph version: ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) 
nautilus (stable)

Met vriendelijke groeten,

William Edwards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD Node Maintenance Question

2020-08-15 Thread William Edwards
Do you mean I/O stopped on your VMs?

Sent from mobile

> Op 15 aug. 2020 om 17:48 heeft Matt Dunavant  
> het volgende geschreven:
> 
> Hi all, 
> 
> We just completed maintenance on an OSD node and we ran into an issue where 
> all data seemed to stop flowing while the node was down. We couldn't connect 
> to any of our VMs during that time. I was under the impression that by 
> setting the 'noout' flag, you would not get the rebalance of the data but you 
> would update the pointers to use the 2nd and 3rd copies of the data. Is that 
> not correct, and what is the proper workflow for taking down an OSD node for 
> maintenance? 
> 
> Thanks,
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Speeding up reconnection

2020-08-11 Thread William Edwards


> Hi,

> you can change the MDS setting to be less strict [1]:

> According to [1] the default is 300 seconds to be evicted. Maybe give  
> the less strict option a try?

Thanks for your reply. I already set mds_session_blacklist_on_timeout to false. 
This seems to have helped somewhat, but still, most of the time, the kernel 
client 'hangs'.

> Regards,
> Eugen



Zitat von William Edwards :

> Hello,
>
> When connection is lost between kernel client, a few things happen:
>
> 1.
> Caps become stale:
>
> Aug 11 11:08:14 admin-cap kernel: [308405.227718] ceph: mds0 caps stale
>
> 2.
> MDS evicts client for being unresponsive:
>
> MDS log: 2020-08-11 11:12:08.923 7fd1f45ae700  0  
> log_channel(cluster) log [WRN] : evicting unresponsive client  
> admin-cap.cf.ha.cyberfusion.cloud:DB0001-cap (144786749), after  
> 300.978 seconds
> Client log: Aug 11 11:12:11 admin-cap kernel: [308643.051006] ceph: mds0 hung
>
> 3.
> Socket is closed:
>
> Aug 11 11:22:57 admin-cap kernel: [309289.192705] libceph: mds0  
> [fdb7:b01e:7b8e:0:10:10:10:1]:6849 socket closed (con state OPEN)
>
> I am not sure whether the kernel client or MDS closes the  
> connection. I think the kernel client does so, because nothing is  
> logged at the MDS side at 11:22:57
>
> 4.
> Connection is reset by MDS:
>
> MDS log: 2020-08-11 11:22:58.831 7fd1f9e49700  0 --1-  
> [v2:[fdb7:b01e:7b8e:0:10:10:10:1]:6800/3619156441,v1:[fdb7:b01e:7b8e:0:10:10:10:1]:6849/3619156441]
>  >> v1:[fc00:b6d:cfc:951::7]:0/133007863 conn(0x55bfaf1c2880 0x55c16cb47000 
> :6849 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 
> l=0).handle_connect_message_2 accept we reset (peer sent cseq 1), sending  
> RESETSESSION
> Client log: Aug 11 11:22:58 admin-cap kernel: [309290.058222]  
> libceph: mds0 [fdb7:b01e:7b8e:0:10:10:10:1]:6849 connection reset
>
> 5.
> Kernel client reconnects:
>
> Aug 11 11:22:58 admin-cap kernel: [309290.058972] ceph: mds0 closed  
> our session
> Aug 11 11:22:58 admin-cap kernel: [309290.058973] ceph: mds0 reconnect start
> Aug 11 11:22:58 admin-cap kernel: [309290.069979] ceph: mds0 reconnect denied
> Aug 11 11:22:58 admin-cap kernel: [309290.069996] ceph: dropping  
> file locks for 6a23d9dd 1099625041446
> Aug 11 11:22:58 admin-cap kernel: [309290.071135] libceph: mds0  
> [fdb7:b01e:7b8e:0:10:10:10:1]:6849 socket closed (con state  
> NEGOTIATING)
>
> Question:
>
> As you can see, there's 10 minutes between losing the connection and  
> the reconnection attempt (11:12:08 - 11:22:58). I could not find any  
> settings related to the period after which reconnection is  
> attempted. I would like to change this value from 10 minutes to  
> something like 1 minute. I also tried searching the Ceph docs for  
> the string '600' (10 minutes), but did not find anything useful.
>
> Hope someone can help.
>
> Environment details:
>
> Client kernel: 4.19.0-10-amd64
> Ceph version: ceph version 14.2.9  
> (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus (stable)
>
>
> Met vriendelijke groeten,
>
> William Edwards
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Speeding up reconnection

2020-08-11 Thread William Edwards

Hello,

When connection is lost between kernel client, a few things happen:

1.
Caps become stale:

Aug 11 11:08:14 admin-cap kernel: [308405.227718] ceph: mds0 caps stale

2.
MDS evicts client for being unresponsive:

MDS log: 2020-08-11 11:12:08.923 7fd1f45ae700  0 log_channel(cluster) log [WRN] 
: evicting unresponsive client admin-cap.cf.ha.cyberfusion.cloud:DB0001-cap 
(144786749), after 300.978 seconds
Client log: Aug 11 11:12:11 admin-cap kernel: [308643.051006] ceph: mds0 hung

3.
Socket is closed:

Aug 11 11:22:57 admin-cap kernel: [309289.192705] libceph: mds0 
[fdb7:b01e:7b8e:0:10:10:10:1]:6849 socket closed (con state OPEN)

I am not sure whether the kernel client or MDS closes the connection. I think 
the kernel client does so, because nothing is logged at the MDS side at 11:22:57

4.
Connection is reset by MDS:

MDS log: 2020-08-11 11:22:58.831 7fd1f9e49700  0 --1- 
[v2:[fdb7:b01e:7b8e:0:10:10:10:1]:6800/3619156441,v1:[fdb7:b01e:7b8e:0:10:10:10:1]:6849/3619156441]
 >> v1:[fc00:b6d:cfc:951::7]:0/133007863 conn(0x55bfaf1c2880 0x55c16cb47000 
:6849 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 
l=0).handle_connect_message_2 accept we reset (peer sent cseq 1), sending 
RESETSESSION
Client log: Aug 11 11:22:58 admin-cap kernel: [309290.058222] libceph: mds0 
[fdb7:b01e:7b8e:0:10:10:10:1]:6849 connection reset

5.
Kernel client reconnects:

Aug 11 11:22:58 admin-cap kernel: [309290.058972] ceph: mds0 closed our session
Aug 11 11:22:58 admin-cap kernel: [309290.058973] ceph: mds0 reconnect start
Aug 11 11:22:58 admin-cap kernel: [309290.069979] ceph: mds0 reconnect denied
Aug 11 11:22:58 admin-cap kernel: [309290.069996] ceph: dropping file locks for 
6a23d9dd 1099625041446
Aug 11 11:22:58 admin-cap kernel: [309290.071135] libceph: mds0 
[fdb7:b01e:7b8e:0:10:10:10:1]:6849 socket closed (con state NEGOTIATING)

Question:

As you can see, there's 10 minutes between losing the connection and the 
reconnection attempt (11:12:08 - 11:22:58). I could not find any settings 
related to the period after which reconnection is attempted. I would like to 
change this value from 10 minutes to something like 1 minute. I also tried 
searching the Ceph docs for the string '600' (10 minutes), but did not find 
anything useful.

Hope someone can help.

Environment details:

Client kernel: 4.19.0-10-amd64
Ceph version: ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) 
nautilus (stable)


Met vriendelijke groeten,

William Edwards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io