Re: [ceph-users] Some questions concerning filestore --> bluestore migration

2018-10-04 Thread ceph
Hello Am 4. Oktober 2018 02:38:35 MESZ schrieb solarflow99 : >I use the same configuration you have, and I plan on using bluestore. >My >SSDs are only 240GB and it worked with filestore all this time, I >suspect >bluestore should be fine too. > > >On Wed, Oct 3, 2018 at 4:25 AM Massimo

Re: [ceph-users] CephFS performance.

2018-10-04 Thread Patrick Donnelly
On Thu, Oct 4, 2018 at 2:10 AM Ronny Aasen wrote: > in rbd there is a fancy striping solution, by using --stripe-unit and > --stripe-count. This would get more spindles running ; perhaps consider > using rbd instead of cephfs if it fits the workload. CephFS also supports custom striping via

[ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-04 Thread Stefan Kooman
Dear list, Today we hit our first Ceph MDS issue. Out of the blue the active MDS stopped working: mon.mon1 [WRN] daemon mds.mds1 is not responding, replacing it as rank 0 with standby daemon mds.mds2. Logging of ceph-mds1: 2018-10-04 10:50:08.524745 7fdd516bf700 1 mds.mds1 asok_command:

[ceph-users] Ceph version upgrade with Juju

2018-10-04 Thread Fabio Abreu
Hi Cephers, I have a little doubt about the migration of Jewel version in the MAAS / JUJU implementation scenario . Could someone has the same experience in production environment? I am asking this because we mapping all challenges of this scenario. Thanks and best Regards, Fabio Abreu

[ceph-users] Ceph 13.2.2 on Ubuntu 18.04 arm64

2018-10-04 Thread Rob Raymakers
Hi, I'm trying to get Ceph 13.2.2 running on Ubuntu 18.06 arm64, specifically a Rock64 to build a mini cluster. But I can't figure out how to build Ceph 13.2.2 from github, as there is no Ceph 13.2.2 package available yet. I tried to figure it out with the instructions on github, but no success.

Re: [ceph-users] Erasure coding with more chunks than servers

2018-10-04 Thread Paul Emmerich
Yes, you can use a crush rule with two steps: take default chooseleaf indep 5 emit take default chooseleaf indep 2 emit You'll have to adjust it when adding a server, so it's not a great solution. I'm not sure if there's a way to do it without hardcoding the number of servers (I don't think

[ceph-users] Erasure coding with more chunks than servers

2018-10-04 Thread Vladimir Brik
Hello I have a 5-server cluster and I am wondering if it's possible to create pool that uses k=5 m=2 erasure code. In my experiments, I ended up with pools whose pgs are stuck in creating+incomplete state even when I created the erasure code profile with --crush-failure-domain=osd. Assuming that

Re: [ceph-users] Resolving Large omap objects in RGW index pool

2018-10-04 Thread Chris Sarginson
Hi, Thanks for the response - I am still unsure as to what will happen to the "marker" reference in the bucket metadata, as this is the object that is being detected as Large. Will the bucket generate a new "marker" reference in the bucket metadata? I've been reading this page to try and get a

Re: [ceph-users] Unfound object on erasure when recovering

2018-10-04 Thread Jan Pekař - Imatic
I thought, that putting disk "in" solves the problem, but not. Problem is there, but cluster don't see object as unfound, so it reports IO error. This is what I got when using rados get command - error getting erasure_3_1/11eec49.: (5) Input/output error So maybe problem appeared

Re: [ceph-users] hardware heterogeneous in same pool

2018-10-04 Thread Brett Chancellor
You could also set *osd_crush_initial_weight = 0 . *New OSDs will automatically come up with a 0 weight and you won't have to race the clock. -Brett On Thu, Oct 4, 2018 at 3:50 AM Janne Johansson wrote: > > > Den tors 4 okt. 2018 kl 00:09 skrev Bruno Carvalho : > >> Hi Cephers, I would like to

[ceph-users] Cluster broken and ODSs crash with failed assertion in PGLog::merge_log

2018-10-04 Thread Jonas Jelten
Hello! Unfortunately, our single-node-"Cluster" with 11 ODSs is broken because some ODSs crash when they start peering. I'm on Ubuntu 18.04 with Ceph Mimic (13.2.2). The problem was induced by when RAM was filled up and ODS processes then crashed because of memory allocation failures. No

Re: [ceph-users] RBD Mirror Question

2018-10-04 Thread Jason Dillaman
On Thu, Oct 4, 2018 at 11:15 AM Vikas Rana wrote: > > Bummer. > > Our OSD is on 10G private network and MON is on 1G public network. I believe > this is reference architecture mentioned everywhere to separate MON and OSD. > > I believe the requirement for rbd-mirror for the secondary site MON to

Re: [ceph-users] Mimic upgrade 13.2.1 > 13.2.2 monmap changed

2018-10-04 Thread Paul Emmerich
You can manually extract, edit, and inject the mon map to manually fix it. In this case you probably need to: 1. check what exactly is going on, inspect the mon map of all mons 2. maybe the IP addresses changed or something? see if you can fix it somehow without editing the monmap 3. adjust the

Re: [ceph-users] Resolving Large omap objects in RGW index pool

2018-10-04 Thread Konstantin Shalygin
Hi, Ceph version: Luminous 12.2.7 Following upgrading to Luminous from Jewel we have been stuck with a cluster in HEALTH_WARN state that is complaining about large omap objects. These all seem to be located in our .rgw.buckets.index pool. We've disabled auto resharding on bucket indexes due to

Re: [ceph-users] RBD Mirror Question

2018-10-04 Thread Vikas Rana
Bummer. Our OSD is on 10G private network and MON is on 1G public network. I believe this is reference architecture mentioned everywhere to separate MON and OSD. I believe the requirement for rbd-mirror for the secondary site MON to reach the private OSD IPs on primary was never mentioned

Re: [ceph-users] RBD Mirror Question

2018-10-04 Thread Jason Dillaman
On Thu, Oct 4, 2018 at 10:27 AM Vikas Rana wrote: > > on Primary site, we have OSD's running on 192.168.4.x address. > > Similarly on Secondary site, we have OSD's running on 192.168.4.x address. > 192.168.3.x is the old MON network.on both site which was non route-able. > So we renamed mon on

Re: [ceph-users] RBD Mirror Question

2018-10-04 Thread Vikas Rana
on Primary site, we have OSD's running on 192.168.4.x address. Similarly on Secondary site, we have OSD's running on 192.168.4.x address. 192.168.3.x is the old MON network.on both site which was non route-able. So we renamed mon on primary site to 165.x.x and mon on secondary site to 165.x.y.

[ceph-users] Resolving Large omap objects in RGW index pool

2018-10-04 Thread Chris Sarginson
Hi, Ceph version: Luminous 12.2.7 Following upgrading to Luminous from Jewel we have been stuck with a cluster in HEALTH_WARN state that is complaining about large omap objects. These all seem to be located in our .rgw.buckets.index pool. We've disabled auto resharding on bucket indexes due to

Re: [ceph-users] RBD Mirror Question

2018-10-04 Thread Jason Dillaman
On Thu, Oct 4, 2018 at 10:10 AM Vikas Rana wrote: > > Thanks Jason for great suggestions. > > but somehow rbd mirror status not working from secondary to primary. Here;s > the status from both sides. cluster name is ceph on primary side and cephdr > on remote site. mirrordr is the user on DR

Re: [ceph-users] RBD Mirror Question

2018-10-04 Thread Vikas Rana
Thanks Jason for great suggestions. but somehow rbd mirror status not working from secondary to primary. Here;s the status from both sides. cluster name is ceph on primary side and cephdr on remote site. mirrordr is the user on DR side and mirrorprod is on primary prod side. # rbd mirror pool

Re: [ceph-users] RBD Mirror Question

2018-10-04 Thread ceph
Hello Vikas, Could you please provide us which Commands you have uses to Setup rbd-mirror? Would be Great if you could Provide a short howto :) Thanks in advise - Mehmet Am 2. Oktober 2018 22:47:08 MESZ schrieb Vikas Rana : >Hi, > >We have a CEPH 3 node cluster at primary site. We created a

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-10-04 Thread Webert de Souza Lima
Hi, bring this up again to ask one more question: what would be the best recommended locking strategy for dovecot against cephfs? this is a balanced setup using independent director instances but all dovecot instances on each node share the same storage system (cephfs). Regards, Webert Lima

[ceph-users] Mimic upgrade 13.2.1 > 13.2.2 monmap changed

2018-10-04 Thread Nino Bosteels
Hello list, I'm having a serious issue, since my ceph cluster has become unresponsive. I was upgrading my cluster (3 servers, 3 monitors) from 13.2.1 to 13.2.2, which shouldn't be a problem. Though on reboot my first host reported: starting mon.ceph01 rank -1 at 192.168.200.197:6789/0

[ceph-users] Mimic 13.2.2 SCST or ceph-iscsi ?

2018-10-04 Thread Steven Vacaroaia
Hi, Which implementation of iSCSI is recommended for Mimic 13.2.2 and why ? Is multipathing supported by both in a VMWare environment ? Anyone willing to share performance details ? Many thanks Steven ___ ceph-users mailing list

[ceph-users] deep scrub error caused by missing object

2018-10-04 Thread Roman Steinhart
Hi all, since some weeks we have a small problem with one of the PG's on our ceph cluster. Every time the pg 2.10d is deep scrubbing it fails because of this: 2018-08-06 19:36:28.080707 osd.14 osd.14 *.*.*.110:6809/3935 133 : cluster [ERR] 2.10d scrub stat mismatch, got 397/398 objects, 0/0

[ceph-users] bcache, dm-cache support

2018-10-04 Thread Maged Mokhtar
Hello all, Do  bcache and dm-cache work well with Ceph ? Is one recommended on the other ? Are there any issues ? There are a few posts in this list around them, but i could not determine if they are ready for mainstream use or not Appreciate any clarifications.  /Maged

Re: [ceph-users] Mimic offline problem

2018-10-04 Thread Goktug Yildirim
This is ceph-object-store tool logs for OSD.0. https://paste.ubuntu.com/p/jNwf4DC46H/ There is something wrong. But we are not sure if we cant use the tool or there is something wrong with OSD. > On 4 Oct 2018, at 06:17, Sage Weil wrote: > > On Thu, 4 Oct 2018, Goktug Yildirim wrote: >>

Re: [ceph-users] CephFS performance.

2018-10-04 Thread Ronny Aasen
On 10/4/18 7:04 AM, jes...@krogh.cc wrote: Hi All. First thanks for the good discussion and strong answer's I've gotten so far. Current cluster setup is 4 x 10 x 12TB 7.2K RPM drives with all and 10GbitE and metadata on rotating drives - 3x replication - 256GB memory in OSD hosts and 32+

Re: [ceph-users] Best handling network maintenance

2018-10-04 Thread Paul Emmerich
Mons are also on a 30s timeout. Even a short loss of quorum isn‘t noticeable for ongoing IO. Paul > Am 04.10.2018 um 11:03 schrieb Martin Palma : > > Also monitor election? That is the most fear we have since the monitor > nodes will no see each other for that timespan... >> On Thu, Oct 4, 2018

Re: [ceph-users] Best handling network maintenance

2018-10-04 Thread Martin Palma
Also monitor election? That is the most fear we have since the monitor nodes will no see each other for that timespan... On Thu, Oct 4, 2018 at 10:21 AM Paul Emmerich wrote: > > 10 seconds is far below any relevant timeout values (generally 20-30 > seconds); so you will be fine without any

Re: [ceph-users] Best handling network maintenance

2018-10-04 Thread Paul Emmerich
10 seconds is far below any relevant timeout values (generally 20-30 seconds); so you will be fine without any special configuration. Paul Am 04.10.2018 um 09:38 schrieb Konstantin Shalygin : >> What can we do of best handling this scenario to have minimal or no >> impact on Ceph? >> >> We

Re: [ceph-users] hardware heterogeneous in same pool

2018-10-04 Thread Janne Johansson
Den tors 4 okt. 2018 kl 00:09 skrev Bruno Carvalho : > Hi Cephers, I would like to know how you are growing the cluster. > Using dissimilar hardware in the same pool or creating a pool for each > different hardware group. > What problem would I have many problems using different hardware (CPU, >

Re: [ceph-users] Best handling network maintenance

2018-10-04 Thread Konstantin Shalygin
What can we do of best handling this scenario to have minimal or no impact on Ceph? We plan to set "noout", "nobackfill", "norebalance", "noscrub", "nodeep", "scrub" are there any other suggestions? ceph osd set noout ceph osd pause k ___

[ceph-users] Best handling network maintenance

2018-10-04 Thread Martin Palma
Hi all, our Ceph cluster is distributed across two datacenter. Due do network maintenance the link between the two datacenter will be down for ca. 8 - 10 seconds. In this time the public network of Ceph between the two DCs will also be down. What can we do of best handling this scenario to have