[ceph-users] Stale PG data loss

2018-08-09 Thread Surya Bala
Hi folks, I was trying to test the below case Having pool with replication count as 1 and if one osd goes down, then the PGs mapped to that OSD become stale. If the hardware failure happen then the data in that OSD lost. So some parts of some files are lost . How can i find what are the files

[ceph-users] Applicability and migration path

2018-08-09 Thread Matthew Pounsett
I'm looking for some high level information about the usefulness of ceph to a particular use case and, assuming it's considered a good choice, whether the migration path I have in mind has any particular gotchas that I should be on the look out for. The current situation is that I've inherited

Re: [ceph-users] pg count question

2018-08-09 Thread Subhachandra Chandra
I have used the calculator at https://ceph.com/pgcalc/ which looks at relative sizes of pools and makes a suggestion. Subhachandra On Thu, Aug 9, 2018 at 1:11 PM, Satish Patel wrote: > Thanks Subhachandra, > > That is good point but how do i calculate that PG based on size? > > On Thu, Aug 9,

Re: [ceph-users] pg count question

2018-08-09 Thread Uwe Sauter
Given your formula, you would have 512 PGs in total. Instead of dividing that evenly you could also do 128 PGs for pool-1 and 384 PGs for pool-2, which gives you 1/4 and 3/4 of total PGs. This might not be 100% optimal for the pools but keeps the calculated total PGs and the 100PG/OSD target.

Re: [ceph-users] pg count question

2018-08-09 Thread Satish Patel
Thanks Subhachandra, That is good point but how do i calculate that PG based on size? On Thu, Aug 9, 2018 at 1:42 PM, Subhachandra Chandra wrote: > If pool1 is going to be much smaller than pool2, you may want more PGs in > pool2 for better distribution of data. > > > > > On Wed, Aug 8, 2018 at

[ceph-users] Snapshot costs (was: Re: RBD image "lightweight snapshots")

2018-08-09 Thread Jack
On 08/09/2018 03:01 PM, Piotr Dałek wrote: > This introduces one big issue: it enforces COW snapshot on image, > meaning that original image access latencies and consumed space > increases. "Lightweight" snapshots would remove these inefficiencies - > no COW performance and storage overhead. Do

Re: [ceph-users] Ceph logging into graylog

2018-08-09 Thread Rudenko Aleksandr
Hi, All our settings for this: mon cluster log to graylog = true mon cluster log to graylog host = {graylog-server-hostname} On 9 Aug 2018, at 19:33, Roman Steinhart mailto:ro...@aternos.org>> wrote: Hi all, I'm trying to set up ceph logging into graylog. For that I've set the following

Re: [ceph-users] pg count question

2018-08-09 Thread Subhachandra Chandra
If pool1 is going to be much smaller than pool2, you may want more PGs in pool2 for better distribution of data. On Wed, Aug 8, 2018 at 12:40 AM, Sébastien VIGNERON < sebastien.vigne...@criann.fr> wrote: > The formula seems correct for a 100 pg/OSD target. > > > > Le 8 août 2018 à 04:21,

[ceph-users] cephfs - restore files

2018-08-09 Thread Erik Schwalbe
Hi, Unfortunately, I deleted a few files and would like to restore them. For ext4 I would take photorec but that not seem to work for cephfs. Is it possible to restore deleted files stored in cephfs? Thanks and regards, Erik ___ ceph-users mailing

[ceph-users] cephmetrics without ansible

2018-08-09 Thread Steven Vacaroaia
Hi, I would be very grateful if any of you will share their experience/knowledge for using cephmetrics without ansible I have deployed my cluster using ceph-deploy on Centos 7 I have Grafana, Graphite and collectd installed and running / collecting data Building dashboards and queries is very

[ceph-users] Ceph logging into graylog

2018-08-09 Thread Roman Steinhart
Hi all, I'm trying to set up ceph logging into graylog. For that I've set the following options in ceph.conf: log_to_graylog = true err_to_graylog = true log_to_graylog_host = graylog.service.consul log_to_graylog_port = 12201 mon_cluster_log_to_graylog = true mon_cluster_log_to_graylog_host =

[ceph-users] osd.X down, but it is still running on Luminous

2018-08-09 Thread Rudenko Aleksandr
Hi, guys. After upgrade on Luminous i see: Monitor daemon marked osd.xx down, but it is still running this happens 3-5 times a day on different OSDs. I spent a lot of time on debug but i haven’t found problem. Network works perfectly. CPU, network and disk utilization is low. Memory is

Re: [ceph-users] ceph-mgr dashboard behind reverse proxy

2018-08-09 Thread Bastiaan Visser
This will work: backend ceph01 option httpchk GET / http-check expect status 200 server mgr01 *.*.*.*:7000 check server mgr02 *.*.*.*:7000 check server mgr03 *.*.*.*:7000 check Regards, Bastiaan - Original Message - From: "Marc Schöchlin" To:

Re: [ceph-users] cephfs kernel client hangs

2018-08-09 Thread Burkhard Linke
Hi, On 08/09/2018 03:21 PM, Yan, Zheng wrote: try 'mount -f', recent kernel should handle 'mount -f' pretty well On Wed, Aug 8, 2018 at 10:46 PM Zhenshi Zhou wrote: Hi, Is there any other way excpet rebooting the server when the client hangs? If the server is in production environment, I

Re: [ceph-users] cephfs kernel client hangs

2018-08-09 Thread Yan, Zheng
try 'mount -f', recent kernel should handle 'mount -f' pretty well On Wed, Aug 8, 2018 at 10:46 PM Zhenshi Zhou wrote: > > Hi, > Is there any other way excpet rebooting the server when the client hangs? > If the server is in production environment, I can't restart it everytime. > > Webert de

Re: [ceph-users] RBD image "lightweight snapshots"

2018-08-09 Thread Sage Weil
On Thu, 9 Aug 2018, Piotr Dałek wrote: > Hello, > > At OVH we're heavily utilizing snapshots for our backup system. We think > there's an interesting optimization opportunity regarding snapshots I'd like > to discuss here. > > The idea is to introduce a concept of a "lightweight" snapshots -

[ceph-users] RBD image "lightweight snapshots"

2018-08-09 Thread Piotr Dałek
Hello, At OVH we're heavily utilizing snapshots for our backup system. We think there's an interesting optimization opportunity regarding snapshots I'd like to discuss here. The idea is to introduce a concept of a "lightweight" snapshots - such snapshot would not contain data but only the

[ceph-users] OSD failed, rocksdb: Corruption: missing start of fragmented record

2018-08-09 Thread shrey chauhan
Hi, My OSD failed 2018-08-09 16:31:11.848457 7f49951ddd80 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/A VAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:2859] Recov ered

Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-09 Thread Thode Jocelyn
Hi Magnus, Yes this is a workaroudn for the problem. However this means that if you want to have your rbd-mirroring daemon HA, you will need to create 2+ more machines in your infrastructure instead of being able to collocate it on the same machines as your MDS,MGR and MON. Best Regards

[ceph-users] Can´t create snapshots on images, mimic, newest patches, CentOS 7

2018-08-09 Thread Kasper, Alexander
Hello, i have a small ceph cluster running with CentOS 7.5 Ceph release is: ceph --version ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable) I have one pool with 13 images. All images have same features enabled and are used by a openshift origin 3.9 as client. For

Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-09 Thread Magnus Grönlund
Hi Jocelyn, I'm in the process of setting up rdb-mirroring myself and stumbled on the same problem. But I think that the "trick" here is to _not_ colocate the RDB-mirror daemon with any other part of the cluster(s), it should be run on a separate host. That way you can change the CLUSTER_NAME

Re: [ceph-users] OSD had suicide timed out

2018-08-09 Thread Josef Zelenka
THe only reason that i could think of is some kind of a network issue, even though different clusters run on the same switch with the same settings and we don't register any issues on there. One thing i recall - one of my colleagues was testing something out on this cluster and after he

Re: [ceph-users] cephfs kernel client hangs

2018-08-09 Thread Jake Grimmett
Hi John, thanks for the advice, it's greatly appreciated. We have 45 x 8TB OSDs & 128GB RAM per node, this is 35% of the recommended quantity, so our OOM problems are predictable. I'll increase the RAM on one node to 256GB, and see if this handles OSD fault conditions without the bluestore RAM