[ceph-users] Re: v15.2.0 Octopus released

2020-04-02 Thread kefu chai
On Sat, Mar 28, 2020 at 1:29 AM Mazzystr wrote: > > What about the missing dependencies for octopus on el8? (looking at yu > ceph-mgr!) hi Mazzystr, regarding to the dependencies of ceph-mgr, probably you could enable the corp repo[0] for installing them, before they are included by EPEL8.

[ceph-users] Re: RGW Multi-site Issue

2020-04-02 Thread Zhenshi Zhou
I create two new cluster and successfully deploy a multisite. However it gets error "failed to commit period: (2202) Unknown error 2202" when I commit period on the secondary zone, while the master zone has data in it. I'm not sure if the multisite can just be deployed on two new zones? Zhenshi

[ceph-users] Upgrading from Mimic to Nautilus

2020-04-02 Thread Paul Choi
Hi, How is the experience of upgrading from Mimic to Nautilus? Anything to watch out for? And is it a smooth transition for the ceph-fuse clients on Mimic when interacting with a Nautilus MDS? I'm reading the docs on https://docs.ceph.com/docs/nautilus/install/upgrading-ceph/ and it looks pretty

[ceph-users] Re: Poor Windows performance on ceph RBD.

2020-04-02 Thread Olivier AUDRY
hello I did not do windows vm on kvm since years but back in time for good io performance on windows vm on kvm virtio driver has to be installed. https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win.iso oau Le jeudi 02 avril 2020 à 15:28 +, Frank

[ceph-users] Re: [Octopus] Beware the on-disk conversion

2020-04-02 Thread Jack
I do compress: root@backup2:~# ceph daemon osd.0 config show | grep bluestore_compression "bluestore_compression_algorithm": "snappy", "bluestore_compression_max_blob_size": "0", "bluestore_compression_max_blob_size_hdd": "524288", "bluestore_compression_max_blob_size_ssd":

[ceph-users] Re: v15.2.0 Octopus released

2020-04-02 Thread Dietmar Rieder
On 2020-04-02 16:40, konstantin.ilya...@mediascope.net wrote: > I have done it. > I am not sure, if i didn’t miss something, but i upgraded test cluster from > CentOs7.7.1908+Ceph14.2.8 to Debian10.3+Ceph15.2.0. > > Preparations: > - 6 nodes with OS CentOs7.7.1908, Ceph14.2.8: > -

[ceph-users] Poor Windows performance on ceph RBD.

2020-04-02 Thread Frank Schilder
Dear all, maybe someone can give me a pointer here. We are running OpenNebula with ceph RBD as a back-end store. We have a pool of spinning disks to create large low-demand data disks, mainly for backups and other cold storage. Everything is fine when using linux VMs. However, Windows VMs

[ceph-users] Re: [Octopus] Beware the on-disk conversion

2020-04-02 Thread Igor Fedotov
So this OSD has 32M of shared blobs and fsck loads them all into memory while processing. Hence the RAM consumption. I'm afraid there is no simple way to fix that, will create a ticket though. And a side question: 1) Do you use erasure coding and/or compression for rbd pool? These stats

[ceph-users] Re: [Octopus] Beware the on-disk conversion

2020-04-02 Thread Jack
Here it is On 4/2/20 3:48 PM, Igor Fedotov wrote: > And may I have the output for: > > ceph daemon osd.N calc_objectstore_db_histogram > > This will collect some stats on record types in OSD's DB. > > > On 4/2/2020 4:13 PM, Jack wrote: >> (fsck / quick-fix, same story) >> >> On 4/2/20 3:12

[ceph-users] Re: v15.2.0 Octopus released

2020-04-02 Thread konstantin . ilyasov
I have done it. I am not sure, if i didn’t miss something, but i upgraded test cluster from CentOs7.7.1908+Ceph14.2.8 to Debian10.3+Ceph15.2.0. Preparations: - 6 nodes with OS CentOs7.7.1908, Ceph14.2.8: - cephtest01,cephtest02,cephtest03: mon+mgr+mds+rgw - cephtest04,cephtest05,cephtest06:

[ceph-users] Re: [Octopus] Beware the on-disk conversion

2020-04-02 Thread Jack
Correct $ On 4/2/20 3:17 PM, Igor Fedotov wrote: > And high memory usage is present for quick-fix after conversion as well, > isn't it? > > The same tens of GBs? > > > On 4/2/2020 4:13 PM, Jack wrote: >> (fsck / quick-fix, same story) >> >> On 4/2/20 3:12 PM, Jack wrote: >>> Hi, >>> >>> A

[ceph-users] Re: librados : handle_auth_bad_method server allowed_methods [2] but i only support [2,1]

2020-04-02 Thread Yoann Moulin
> I have a Nautilus (14.2.8) cluster and I'd like to give access to a pool with > librados to a user. > > Here what I have > >> # ceph osd pool ls detail | grep user1 >> pool 5 'user1' replicated size 3 min_size 2 crush_rule 0 object_hash >> rjenkins pg_num 256 pgp_num 256 autoscale_mode

[ceph-users] Re: [Octopus] Beware the on-disk conversion

2020-04-02 Thread Igor Fedotov
And may I have the output for: ceph daemon osd.N calc_objectstore_db_histogram This will collect some stats on record types in OSD's DB. On 4/2/2020 4:13 PM, Jack wrote: (fsck / quick-fix, same story) On 4/2/20 3:12 PM, Jack wrote: Hi, A simple fsck eats the same amount of memory Cluster

[ceph-users] Re: *****SPAM***** Re: Logging remove duplicate time

2020-04-02 Thread Marc Roos
>> How to get rid of this logging?? >> >> Mar 31 13:40:03 c01 ceph-mgr: 2020-03-31 13:40:03.521 7f554edc8700 0 >> log_channel(cluster) log [DBG] : pgmap v672067: 384 pgs: 384 >> active+clean; > > >Why? Why not? > >> >> I already have the time logged, I do not need it a second

[ceph-users] Re: Multiple OSDs down, and won't come up (possibly related to other Nautilus issues)

2020-04-02 Thread aoanla
So, the recovery stalled a few more OSDs in, but looking at the disks with OSDs marked down, I noticed that, despite systemctl reporting that the OSD processes were all *up*, several of them had not written to their logs since they rotated. Suspecting that these OSDs were stalled, I've started

[ceph-users] Re: [Octopus] Beware the on-disk conversion

2020-04-02 Thread Igor Fedotov
And high memory usage is present for quick-fix after conversion as well, isn't it? The same tens of GBs? On 4/2/2020 4:13 PM, Jack wrote: (fsck / quick-fix, same story) On 4/2/20 3:12 PM, Jack wrote: Hi, A simple fsck eats the same amount of memory Cluster usage: rbd with a bit of rgw

[ceph-users] Re: [Octopus] Beware the on-disk conversion

2020-04-02 Thread Jack
(fsck / quick-fix, same story) On 4/2/20 3:12 PM, Jack wrote: > Hi, > > A simple fsck eats the same amount of memory > > Cluster usage: rbd with a bit of rgw > > Here is the ceph df detail > All OSDs are single rusty devices > > On 4/2/20 2:19 PM, Igor Fedotov wrote: >> Hi Jack, >> >> could

[ceph-users] Re: [Octopus] Beware the on-disk conversion

2020-04-02 Thread Jack
Hi, A simple fsck eats the same amount of memory Cluster usage: rbd with a bit of rgw Here is the ceph df detail All OSDs are single rusty devices On 4/2/20 2:19 PM, Igor Fedotov wrote: > Hi Jack, > > could you please try the following - stop one of already converted OSDs > and do a

[ceph-users] Re: [Octopus] Beware the on-disk conversion

2020-04-02 Thread Igor Fedotov
Hi Jack, could you please try the following - stop one of already converted OSDs and do a quick-fix/fsck/repair against it using ceph_bluestore_tool: ceph-bluestore-tool --path --command quick-fix|fsck|repair Does it cause similar memory usage? You can stop experimenting if quick-fix

[ceph-users] Re: LARGE_OMAP_OBJECTS 1 large omap objects

2020-04-02 Thread Dietmar Rieder
On 2020-04-02 12:24, Paul Emmerich wrote: > Safe to ignore/increase the warning threshold. You are seeing this > because the warning level was reduced to 200k from 2M recently. > > The file will be sharded in a newer version which will clean this up > Thanks Paul, would that "newer version" be

[ceph-users] Re: Netplan bonding configuration

2020-04-02 Thread James, GleSYS
Hi, It’s possible this isn’t supported, I haven’t been able to find a definitive answer yet on whether this configuration is supported or not. Our switches don’t support MLAG so I’m unable to create an LACP bond between the server to two switches. Regards, James. > On 1 Apr 2020, at 12:41,

[ceph-users] librados : handle_auth_bad_method server allowed_methods [2] but i only support [2,1]

2020-04-02 Thread Yoann Moulin
Hello, I have a Nautilus (14.2.8) cluster and I'd like to give access to a pool with librados to a user. Here what I have > # ceph osd pool ls detail | grep user1 > pool 5 'user1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins > pg_num 256 pgp_num 256 autoscale_mode warn

[ceph-users] Re: LARGE_OMAP_OBJECTS 1 large omap objects

2020-04-02 Thread Paul Emmerich
Safe to ignore/increase the warning threshold. You are seeing this because the warning level was reduced to 200k from 2M recently. The file will be sharded in a newer version which will clean this up Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at

[ceph-users] Re: Multiple OSDs down, and won't come up (possibly related to other Nautilus issues)

2020-04-02 Thread aoanla
Update a day later: the cluster is *very slowly* recovering, it looks like: we're now at 113 OSDs down (improved from 140 OSDs down when everything broke) - but it took a day before anything changed here, and it looks like we're recovering at a rate of about 1 -2 OSDs per hour... So I'm not

[ceph-users] LARGE_OMAP_OBJECTS 1 large omap objects

2020-04-02 Thread Dietmar Rieder
Hi, I'm trying to understand the "LARGE_OMAP_OBJECTS 1 large omap objects" warning for out cephfs metadata pool. It seems that pg 5.26 has a large omap object with > 200k keys [WRN] : Large omap object found. Object: 5:654134d2:::mds0_openfiles.0:head PG: 5.4b2c82a6 (5.26) Key count: 286083

[ceph-users] Re: luminous: osd continue down because of the hearbeattimeout

2020-04-02 Thread Martin Verges
Hello, check your network, maybe link flapping or something else. Your port shows a high dropped count as well. Use some tool like smokeping to detect loss within your network, or if you are using croit, use the network loss detection feature within the statistics view. -- Martin Verges Managing

[ceph-users] Re: Replace OSD node without remapping PGs

2020-04-02 Thread Eugen Block
Yeah, I should have mentioned the swap-bucket option. We couldn't use that because we actually didn't swap anything but moved the old hosts to a different root and we keep them for erasure coding pools. Zitat von Anthony D'Atri : The strategy that Nghia described is inefficient for moving

[ceph-users] RGW Multi-site Issue

2020-04-02 Thread Zhenshi Zhou
Hi, I am new on rgw and try deploying a mutisite cluster in order to sync data from one cluster to another. My source zone is the default zone in the default zonegroup, structure as belows: realm: big-realm |