[ceph-users] Re: Rebalancing after modifying CRUSH map

2020-06-09 Thread Brett Randall
Thanks Janne, this is fantastic to know. Brett --- original message --- On June 9, 2020, 4:21 PM GMT+10 icepic...@gmail.com wrote: Den tis 9 juni 2020 kl 07:43 skrev Brett Randall : >> Hi all >> We are looking at implementing Ceph/CephFS for a project. Over time, we may >> wish to add

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

2020-06-09 Thread Frank Schilder
Looks like an answer to your other thread takes its time. Is it a possible option for you to - copy all readable files using this PG to some other storage, - remove or clean up the broken PG and - copy the files back in? This might lead to a healthy cluster. I don't know a proper procedure

[ceph-users] Octopus OSDs dropping out of cluster: _check_auth_rotating possible clock skew, rotating keys expired way too early

2020-06-09 Thread Wido den Hollander
Hi, On a recently deployed Octopus (15.2.2) cluster (240 OSDs) we are seeing OSDs randomly drop out of the cluster. Usually it's 2 to 4 OSDs spread out over different nodes. Each node has 16 OSDs and not all the failing OSDs are on the same node. The OSDs are marked as down and all they keep

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

2020-06-09 Thread Francois Legrand
Hi, Actually I let the mds managing the damaged filesystem as it is because the files can be read (despite of the warning and errors). Thus I restarted the rsyncs to transfer everything to the new filesystem (thus on different PG because it's a different cephfs with different pools) but

[ceph-users] Re: RadosGW latency on chuked uploads

2020-06-09 Thread Robin H. Johnson
On Tue, Jun 09, 2020 at 09:07:49PM +0300, Tadas wrote: > Hello, > we face like 75-100 ms while doing 600 chunked PUT's. > while 40-45ms while doing 1k normal PUT's. > (Even amount of PUT's lowers on chunked PUT way) > > We tried civetweb and beast. Nothing changes. How close is your test running

[ceph-users] Re: RadosGW latency on chuked uploads

2020-06-09 Thread Tadas
Hello, we face like 75-100 ms while doing 600 chunked PUT's. while 40-45ms while doing 1k normal PUT's. (Even amount of PUT's lowers on chunked PUT way) We tried civetweb and beast. Nothing changes. -Original Message- From: Robin H. Johnson Sent: Tuesday, June 9, 2020 8:51 PM To:

[ceph-users] Re: RadosGW latency on chuked uploads

2020-06-09 Thread Robin H. Johnson
On Tue, Jun 09, 2020 at 12:59:10PM +0300, Tadas wrote: > Hello, > > I have strange issues with radosgw: > When trying to PUT object with “transfer-encoding: chunked”, I can see high > request latencies. > When trying to PUT the same object as non-chunked – latency is much lower, > and also

[ceph-users] Maximum size of data in crush_choose_firstn Ceph CRUSH source code

2020-06-09 Thread Bobby
Hi all, I have a question regarding a function called *crush_choose_firstn* in Ceph source code namely *mapper.c* This function has following pointer variables : - const struct crush_map *map, - struct crush_work *work, const struct crush_bucket *bucket, - int *out, - const __u32 *weight, -

[ceph-users] IO500 Revised Call For Submissions Mid-2020 List

2020-06-09 Thread committee
New Deadline: 13 July 2020 AoE NOTE: Given the short timeframe from the original announcement and complexities with some of the changes for this list, the deadline has been pushed out to give the community more time to participate. The BoF announcing the winners will be online 23 July 2020.

[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-09 Thread Zhenshi Zhou
It claimed error when I promoted the non-primary image at first. But the command executed successfully after a while, without '--force'. error: "rbd: error promoting image to primary2020-06-09 19:56:30.662 7f27e17fa700 -1 librbd::mirror::PromoteRequest: 0x558fa971fd20 handle_get_info: image is

[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-09 Thread Jason Dillaman
On Tue, Jun 9, 2020 at 7:26 AM Zhenshi Zhou wrote: > > I did promote the non-primary image, or I couldn't disable the image mirror. OK, that means that 100% of the data was properly transferred since it needs to replay previous events before it can get to the demotion event, replay that, so that

[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-09 Thread Zhenshi Zhou
I did promote the non-primary image, or I couldn't disable the image mirror. Jason Dillaman 于2020年6月9日周二 下午7:19写道: > On Mon, Jun 8, 2020 at 11:42 PM Zhenshi Zhou wrote: > > > > I have just done a test on rbd-mirror. Follow the steps: > > 1. deploy two new clusters, clusterA and clusterB > > 2.

[ceph-users] Re: rbd-mirror with snapshot, not doing any actaul data sync

2020-06-09 Thread Jason Dillaman
On Mon, Jun 8, 2020 at 6:18 PM Hans van den Bogert wrote: > > Rather unsatisfactory to not know where it really went wrong, but completely > removing all traces of peer settings and auth keys, and I redid the peer > bootstrap and this did result in a working sync. > > My initial mirror config

[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-09 Thread Jason Dillaman
On Mon, Jun 8, 2020 at 11:42 PM Zhenshi Zhou wrote: > > I have just done a test on rbd-mirror. Follow the steps: > 1. deploy two new clusters, clusterA and clusterB > 2. configure one-way replication from clusterA to clusterB with rbd-mirror > 3. write data to rbd_blk on clusterA once every 5

[ceph-users] Re: Rebalancing after modifying CRUSH map

2020-06-09 Thread Frank Schilder
This is done automatically. Every time the crush map changes, objects get moved around. Therefore, a typical procedure is - make sure ceph is HEALTH_OK - ceph osd set noout - ceph osd set norebalance - edit crush map - wait for peering to finish, all PGs must be active+clean - lots of PGs will

[ceph-users] RadosGW latency on chuked uploads

2020-06-09 Thread Tadas
Hello, I have strange issues with radosgw: When trying to PUT object with “transfer-encoding: chunked”, I can see high request latencies. When trying to PUT the same object as non-chunked – latency is much lower, and also request/s performance is better. Perhaps anyone had the same issue?

[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-09 Thread Zhenshi Zhou
What's more, I configured "rbd_journal_max_payload_bytes = 8388608" on clusterA and "rbd_mirror_journal_max_fetch_bytes = 33554432" on clusterB as well, with restarting monitors of clusterA and rbd-mirror on clusterB. Nothing changed, the target rbd is still 11 minutes less data than that of

[ceph-users] Re: poor cephFS performance on Nautilus 14.2.9 deployed by ceph_ansible

2020-06-09 Thread Marc Roos
Hi Derrick, I am not sure what this 200-300MB/s on hdd is. But it is probably not really relevant. I am testing native disk performance before I use them with ceph with this fio script. It is a bit lengthy, that is because I want to be able to have data for possible future use cases.

[ceph-users] Re: Rebalancing after modifying CRUSH map

2020-06-09 Thread Janne Johansson
Den tis 9 juni 2020 kl 07:43 skrev Brett Randall : > Hi all > We are looking at implementing Ceph/CephFS for a project. Over time, we > may wish to add additional replicas to our cluster. If we modify a CRUSH > map, is there a way of then requesting Ceph to re-evaluate the placement of > objects