Re: [ceph-users] cluster is not stable

2019-03-12 Thread huang jun
can you get the value of osd_beacon_report_interval item? the default is 300, you can set to 60, or maybe turn on debug_ms=1 debug_mon=10 can get more infos. Zhenshi Zhou 于2019年3月13日周三 下午1:20写道: > > Hi, > > The servers are cennected to the same switch. > I can ping from anyone of the servers to

Re: [ceph-users] S3 data on specific storage systems

2019-03-12 Thread Konstantin Shalygin
I have a cluster with SSD and HDD storage. I wonder how to configure S3 buckets on HDD storage backends only. Do I need to create pools on this particular storage and define radosgw placement with those or there is a better or easier way to achieve this ? Just assign your "crush hdd rule" to you

Re: [ceph-users] cluster is not stable

2019-03-12 Thread Zhenshi Zhou
Hi, The servers are cennected to the same switch. I can ping from anyone of the servers to other servers without a packet lost and the average round trip time is under 0.1 ms. Thanks Ashley Merrick 于2019年3月13日周三 下午12:06写道: > Can you ping all your OSD servers from all your mons, and ping your m

Re: [ceph-users] cluster is not stable

2019-03-12 Thread Ashley Merrick
Can you ping all your OSD servers from all your mons, and ping your mons from all your OSD servers? I’ve seen this where a route wasn’t working one direction, so it made OSDs flap when it used that mon to check availability: On Wed, 13 Mar 2019 at 11:50 AM, Zhenshi Zhou wrote: > After checking

Re: [ceph-users] cluster is not stable

2019-03-12 Thread Zhenshi Zhou
After checking the network and syslog/dmsg, I think it's not the network or hardware issue. Now there're some osds being marked down every 15 minutes. here is ceph.log: 2019-03-13 11:06:26.290701 mon.ceph-mon1 mon.0 10.39.0.34:6789/0 6756 : cluster [INF] Cluster is now healthy 2019-03-13 11:21:21.

[ceph-users] RBD Mirror Image Resync

2019-03-12 Thread Vikas Rana
Hi there, We are replicating a RBD image from Primary to DR site using RBD mirroring. On Primary, we were using 10.2.10. DR site is luminous and we promoted the DR copy to test the failure. Everything checked out good. Now we are trying to restart the replication and we did the demote

Re: [ceph-users] S3 data on specific storage systems

2019-03-12 Thread Paul Emmerich
One pool per storage class is enough, you can share the metadata pools across different placement policies. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Tue, Mar

Re: [ceph-users] mount cephfs on ceph servers

2019-03-12 Thread Paul Emmerich
On Tue, Mar 12, 2019 at 8:56 PM David C wrote: > > Out of curiosity, are you guys re-exporting the fs to clients over something > like nfs or running applications directly on the OSD nodes? Kernel NFS + kernel CephFS can fall apart and deadlock itself in exciting ways... nfs-ganesha is so much

Re: [ceph-users] mount cephfs on ceph servers

2019-03-12 Thread Hector Martin
Both, in my case (since host, both local services and NFS export use the CephFS mount). I use the in-kernel NFS server (not nfs-ganesha). On 13/03/2019 04.55, David C wrote: > Out of curiosity, are you guys re-exporting the fs to clients over > something like nfs or running applications directly o

Re: [ceph-users] mount cephfs on ceph servers

2019-03-12 Thread David C
Out of curiosity, are you guys re-exporting the fs to clients over something like nfs or running applications directly on the OSD nodes? On Tue, 12 Mar 2019, 18:28 Paul Emmerich, wrote: > Mounting kernel CephFS on an OSD node works fine with recent kernels > (4.14+) and enough RAM in the servers

[ceph-users] S3 data on specific storage systems

2019-03-12 Thread Yannick.Martin
Dear Ceph users, I have a cluster with SSD and HDD storage. I wonder how to configure S3 buckets on HDD storage backends only. Do I need to create pools on this particular storage and define radosgw placement with those or there is a better or easier way to achieve this ? Regards, _

Re: [ceph-users] mount cephfs on ceph servers

2019-03-12 Thread Paul Emmerich
Mounting kernel CephFS on an OSD node works fine with recent kernels (4.14+) and enough RAM in the servers. We did encounter problems with older kernels though Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 Mün

Re: [ceph-users] Safe to remove objects from default.rgw.meta ?

2019-03-12 Thread Dan van der Ster
Answering my own question (getting help from Pavan), I see that all the details are in this PR: https://github.com/ceph/ceph/pull/11051 So, the zone was updated to set metadata_heap: "" with $ radosgw-admin zone get --rgw-zone=default > zone.json [edit zone.json] $ radosgw-admin zone set --rgw-zo

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-12 Thread vitalif
I bet you'd see better memstore results with my vector based object implementation instead of bufferlists. Where can I find it? Nick Fisk noticed the same thing you did.  One interesting observation he made was that disabling CPU C/P states helped bluestore immensely in the iodepth=1 case. T

[ceph-users] Safe to remove objects from default.rgw.meta ?

2019-03-12 Thread Dan van der Ster
Hi all, We have an S3 cluster with >10 million objects in default.rgw.meta. # radosgw-admin zone get | jq .metadata_heap "default.rgw.meta" In these old tickets I realized that this setting is obsolete, and those objects are probably useless: http://tracker.ceph.com/issues/17256 http://tra

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-12 Thread Mark Nelson
On 3/12/19 8:40 AM, vita...@yourcmc.ru wrote: One way or another we can only have a single thread sending writes to rocksdb.  A lot of the prior optimization work on the write side was to get as much processing out of the kv_sync_thread as possible. That's still a worthwhile goal as it's typical

Re: [ceph-users] Ceph block storage - block.db useless? [solved]

2019-03-12 Thread Benjamin Zapiec
Yeah thank you xD you just answered another thread where i asked for the kv-sync thread And consider this done so i know what to do now. Thank you Am 12.03.19 um 14:43 schrieb Mark Nelson: > Our default of 4 256MB WAL buffers is arguably already too big. On one > hand we are making these buffe

Re: [ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread Mark Nelson
Our default of 4 256MB WAL buffers is arguably already too big. On one hand we are making these buffers large to hopefully avoid short lived data going into the DB (pglog writes).  IE if a pglog write comes in and later a tombstone invalidating it comes in, we really want those to land in the s

Re: [ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread Benjamin Zapiec
Sorry i mean L2 Am 12.03.19 um 14:25 schrieb Benjamin Zapiec: > May I configure the size of WAL to increase block.db usage? > For example I configure 20GB I would get an usage of about 48GB on L3. > > Or should I stay with ceph defaults? > Is there a maximal size for WAL that makes sense? > >

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-12 Thread vitalif
One way or another we can only have a single thread sending writes to rocksdb.  A lot of the prior optimization work on the write side was to get as much processing out of the kv_sync_thread as possible.  That's still a worthwhile goal as it's typically what bottlenecks with high amounts of concur

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-12 Thread Mark Nelson
On 3/12/19 7:31 AM, vita...@yourcmc.ru wrote: Decreasing the min_alloc size isn't always a win, but ican be in some cases.  Originally bluestore_min_alloc_size_ssd was set to 4096 but we increased it to 16384 because at the time our metadata path was slow and increasing it resulted in a pretty s

Re: [ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread Benjamin Zapiec
May I configure the size of WAL to increase block.db usage? For example I configure 20GB I would get an usage of about 48GB on L3. Or should I stay with ceph defaults? Is there a maximal size for WAL that makes sense? signature.asc Description: OpenPGP digital signature

Re: [ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread Mark Nelson
On 3/12/19 7:24 AM, Benjamin Zapiec wrote: Hello, i was wondering about ceph block.db to be nearly empty and I started to investigate. The recommendations from ceph are that block.db should be at least 4% the size of block. So my OSD configuration looks like this: wal.db - not explicit spec

Re: [ceph-users] Chasing slow ops in mimic

2019-03-12 Thread Alex Litvak
I looked further into historic slow ops (thanks to some other posts on the list) and I am confused a bit with the following event { "description": "osd_repop(client.85322.0:86478552 7.1b e502/466 7:d8d149b7:::rbd_data.ff7e3d1b58ba.0316:head v 502'10665506)",

Re: [ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread vitalif
Amount of the metadata depends on the amount of data. But RocksDB is only putting metadata to the fast storage when it thinks all metadata on the same level of the DB is going to fit there. So all sizes except 4, 30, 286 GB are useless. ___ ceph-users

Re: [ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread Benjamin Zapiec
Okay so i think i don't understand the mechanism of Ceph's RocksDB if it should place data on block.db or not. So the amount of data in block.db depends on the wal size? I thought it depends on the objects saved to the storage. In this case, say we have a 1GB file, it would have a size of 10GB in

Re: [ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread vitalif
block.db is very unlikely to ever grow to 250GB with a 6TB data device. However, there seems to be a funny "issue" with all block.db sizes except 4, 30, and 286 GB being useless, because RocksDB puts the data on the fast storage only if it thinks the whole LSM level will fit there. Ceph's Rock

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-12 Thread vitalif
Decreasing the min_alloc size isn't always a win, but ican be in some cases.  Originally bluestore_min_alloc_size_ssd was set to 4096 but we increased it to 16384 because at the time our metadata path was slow and increasing it resulted in a pretty significant performance win (along with increasin

[ceph-users] Ceph block storage - block.db useless?

2019-03-12 Thread Benjamin Zapiec
Hello, i was wondering about ceph block.db to be nearly empty and I started to investigate. The recommendations from ceph are that block.db should be at least 4% the size of block. So my OSD configuration looks like this: wal.db - not explicit specified block.db - 250GB of SSD storage block

Re: [ceph-users] How to attach permission policy to user?

2019-03-12 Thread Pritha Srivastava
What exact error are you seeing after adding admin caps? I tried the following steps on master and they worked fine: (TESTER1 is adding a user policy to TESTER) 1. radosgw-admin --uid TESTER --display-name "TestUser" --access_key TESTER --secret test123 user create 2. radosgw-admin --uid TESTER1 -

Re: [ceph-users] cluster is not stable

2019-03-12 Thread Zhenshi Zhou
Hi Kevin, I'm sure the firewalld are disabled on each host. Well, the network is not a problem. The servers are connected to the same switch and the connection is good when the osds are marked as down. There was no interruption or delay. I restart the leader monitor daemon and it seems return to

Re: [ceph-users] cluster is not stable

2019-03-12 Thread Kevin Olbrich
Are you sure that firewalld is stopped and disabled? Looks exactly like that when I missed one host in a test cluster. Kevin Am Di., 12. März 2019 um 09:31 Uhr schrieb Zhenshi Zhou : > Hi, > > I deployed a ceph cluster with good performance. But the logs > indicate that the cluster is not as st

Re: [ceph-users] How To Scale Ceph for Large Numbers of Clients?

2019-03-12 Thread Stefan Kooman
Quoting Zack Brenton (z...@imposium.com): > Types of devices: > We run our Ceph pods on 3 AWS i3.2xlarge nodes. We're running 3 OSDs, 3 > Mons, and 2 MDS pods (1 active, 1 standby-replay). Currently, each pod runs > with the following resources: > - osds: 2 CPU, 6Gi RAM, 1.7Ti NVMe disk > - mds: 3

[ceph-users] rbd_recovery_tool not working on Luminous 12.2.11

2019-03-12 Thread Mateusz Skała
Hi, I have problem with starting two of my OSD’s with error: osd.19 pg_epoch: 8887 pg[1.2b5(unlocked)] enter Initial 0> 2019-03-01 09:41:30.259485 7f303486be00 -1 /build/ceph-12.2.11/src/osd/PGLog.h: In function 'static void PGLog::read_log_and_missing(ObjectStore*, coll_t, coll_t, ghobject_t, c

Re: [ceph-users] mount cephfs on ceph servers

2019-03-12 Thread Hector Martin
It's worth noting that most containerized deployments can effectively limit RAM for containers (cgroups), and the kernel has limits on how many dirty pages it can keep around. In particular, /proc/sys/vm/dirty_ratio (default: 20) means at most 20% of your total RAM can be dirty FS pages. If yo

Re: [ceph-users] cluster is not stable

2019-03-12 Thread Zhenshi Zhou
Yep, I think it maybe a network issue as well. I'll check the connections. Thanks Eugen:) Eugen Block 于2019年3月12日周二 下午4:35写道: > Hi, > > my first guess would be a network issue. Double-check your connections > and make sure the network setup works as expected. Check syslogs, > dmesg, switches et

Re: [ceph-users] cluster is not stable

2019-03-12 Thread Eugen Block
Hi, my first guess would be a network issue. Double-check your connections and make sure the network setup works as expected. Check syslogs, dmesg, switches etc. for hints that a network interruption may have occured. Regards, Eugen Zitat von Zhenshi Zhou : Hi, I deployed a ceph clus

[ceph-users] cluster is not stable

2019-03-12 Thread Zhenshi Zhou
Hi, I deployed a ceph cluster with good performance. But the logs indicate that the cluster is not as stable as I think it should be. The log shows the monitors mark some osd as down periodly: [image: image.png] I didn't find any useful information in osd logs. ceph version 13.2.4 mimic (stable

[ceph-users] Intel D3-S4610 performance

2019-03-12 Thread Kai Wembacher
Hi everyone, I have an Intel D3-S4610 SSD with 1.92 TB here for testing and get some pretty bad numbers, when running the fio benchmark suggested by Sébastien Han (http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/): Intel D3-S4610 1.92 TB