Re: [ceph-users] bluestore OSD did not start at system-boot

2018-04-05 Thread Nico Schottelius
Hey Ansgar, we have a similar "problem": in our case all servers are wiped on reboot, as they boot their operating system from the network into initramfs. While the OS configuration is done with cdist [0], we consider ceph osds more dynamic data and just re-initialise all osds on boot using the

Re: [ceph-users] Stuck in creating+activating

2018-03-17 Thread Nico Schottelius
idodh.nl/2018/01/placement-groups-with-ceph-luminous-stay-in-activating-state/ > > 2018-03-17 12:15 GMT+03:00 Nico Schottelius <nico.schottel...@ungleich.ch>: > >> >> Good morning, >> >> some days ago we created a new pool with 512 pgs, and originally 5 os

[ceph-users] Stuck in creating+activating

2018-03-17 Thread Nico Schottelius
Good morning, some days ago we created a new pool with 512 pgs, and originally 5 osds. We use the device class "ssd" and a crush rule that maps all data for the pool "ssd" to the ssd device class osds. While creating, one of the ssds failed and we are left with 4 osds: [10:00:22]

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Nico Schottelius
Max, I understand your frustration. However, last time I checked, ceph was open source. Some of you might not remember, but one major reason why open source is great is that YOU CAN DO your own modifications. If you need a change like iSCSI support and it isn't there, it is probably best, if

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-23 Thread Nico Schottelius
A very interesting question and I would add the follow up question: Is there an easy way to add an external DB/WAL devices to an existing OSD? I suspect that it might be something on the lines of: - stop osd - create a link in ...ceph/osd/ceph-XX/block.db to the target device - (maybe run some

Re: [ceph-users] Restoring keyring capabilities

2018-02-16 Thread Nico Schottelius
It seems your monitor capabilities are different to mine: root@server3:/opt/ungleich-tools# ceph -k /var/lib/ceph/mon/ceph-server3/keyring -n mon. auth list 2018-02-16 20:34:59.257529 7fe0d5c6b700 0 librados: mon. authentication error (13) Permission denied [errno 13] error connecting to the

Re: [ceph-users] Restoring keyring capabilities

2018-02-16 Thread Nico Schottelius
] error connecting to the cluster ... which kind of makes sense, as the mon. key does not have capabilities for it. Then again, I wonder how monitors actually talk to each other... Michel Raabe <rmic...@devnu11.net> writes: > On 02/16/18 @ 18:21, Nico Schottelius wrote: >> on a

[ceph-users] Restoring keyring capabilities

2018-02-16 Thread Nico Schottelius
Hello, on a test cluster I issued a few seconds ago: ceph auth caps client.admin mgr 'allow *' instead of what I really wanted to do ceph auth caps client.admin mgr 'allow *' mon 'allow *' osd 'allow *' \ mds allow Now any access to the cluster using client.admin correctly results in

[ceph-users] Is there a "set pool readonly" command?

2018-02-11 Thread Nico Schottelius
Hello, we have one pool, in which about 10 disks failed last week (fortunately mostly sequentially), which now has now some pgs that are only left on one disk. Is there a command to set one pool into "read-only" mode or even "recovery io-only" mode so that the only thing same is doing is

[ceph-users] ceph-disk vs. ceph-volume: both error prone

2018-02-09 Thread Nico Schottelius
Dear list, for a few days we are disecting ceph-disk and ceph-volume to find out, what is the appropriate way of creating partitions for ceph. For years already I found ceph-disk (and especially ceph-deploy) very error prone and we at ungleich are considering to rewrite both into a

[ceph-users] Inactive PGs rebuild is not priorized

2018-02-03 Thread Nico Schottelius
Good morning, after another disk failure, we currently have 7 inactive pgs [1], which are stalling IO from the affected VMs. It seems that ceph, when rebuilding does not focus on repairing the inactive PGs first, which surprised us quite a lot: It does not repair the inactive first, but mixes

Re: [ceph-users] [Best practise] Adding new data center

2018-01-29 Thread Nico Schottelius
Hey Wido, > [...] > Like I said, latency, latency, latency. That's what matters. Bandwidth > usually isn't a real problem. I imagined that. > What latency do you have with a 8k ping between hosts? As the link will be setup this week, I cannot tell yet. However, currently we have on a 65km

[ceph-users] [Best practise] Adding new data center

2018-01-29 Thread Nico Schottelius
Good evening list, we are soon expanding our data center [0] to a new location [1]. We are mainly offering VPS / VM Hosting, so rbd is our main interest. We have a low latency 10 Gbit/s link between our other location [2] and we are wondering, what is the best practise for expanding. Naturally

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-23 Thread Nico Schottelius
Hey Burkhard, we did actually restart osd.61, which led to the current status. Best, Nico Burkhard Linke <burkhard.li...@computational.bio.uni-giessen.de> writes:> > On 01/23/2018 08:54 AM, Nico Schottelius wrote: >> Good morning, >> >> the osd.61 actually just

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-23 Thread Nico Schottelius
00", > "last_deep_scrub": "0'0", > "last_deep_scrub_stamp": "0.00", > "last_clean_scrub_stamp": "0.00", > "log_size": 0, >

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius
ngth": "8" }, { "start": "10", "length": "2" } ], "history": { "epoch_created": 913

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius
in the long run we ended up with full data integrity again. > > On Mon, Jan 22, 2018 at 1:03 PM Nico Schottelius < > nico.schottel...@ungleich.ch> wrote: > >> >> Hey David, >> >> thanks for the fast answer. All our pools are running with size=3, >> min_s

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius
the cluster will > likely find all of the objects by the time it's done backfilling. With > only losing 2 disks, I wouldn't worry about the missing objects not > becoming found unless you're pool size=2. > > On Mon, Jan 22, 2018 at 11:47 AM Nico Schottelius < > nico.schottel...@unglei

[ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius
Hello, we added about 7 new disks yesterday/today and our cluster became very slow. While the rebalancing took place, 2 of the 7 new added disks died. Our cluster is still recovering, however we spotted that there are a lot of unfound objects. We lost osd.63 and osd.64, which seem not to be

[ceph-users] Adding Monitor ceph freeze, monitor 100% cpu usage

2018-01-06 Thread Nico Schottelius
Hello, our problems with ceph monitors continue in version 12.2.2: Adding a specific monitor causes all monitors to hang and not respond to ceph -s or similar anymore. Interestingly when this monitor is on (mon.server2), the other two monitors (mon.server3, mon.server5) randomly begin to

Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-27 Thread Nico Schottelius
Hello, we are running everything IPv6 only. You just need to setup the MTU on your devices (nics, switches) correctly, nothing ceph or IPv6 specific required. If you are using SLAAC (like we do), you can also announce the MTU via RA. Best, Nico Jack writes: > Or

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

2017-10-18 Thread Nico Schottelius
Hey Joao, thanks for the pointer! Do you have a timeline for the release of v12.2.2? Best, Nico -- Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

2017-10-18 Thread Nico Schottelius
uck however - synchronizing is > progressing, albeit slowly. > > Can you please share the logs of the other monitors, especially of > those crashing? > > -Joao > > On 10/18/2017 06:58 AM, Nico Schottelius wrote: >> >> Hello everyone, >> >> is there any sol

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

2017-10-17 Thread Nico Schottelius
Hello everyone, is there any solution in sight for this problem? Currently our cluster is stuck with a 2 monitor configuration, as everytime we restart the one server2, it crashes after some minutes (and in between the cluster is stuck). Should we consider downgrading to kraken to fix that

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

2017-10-09 Thread Nico Schottelius
Good morning Joao, thanks for your feedback! We do actually have three managers running: cluster: id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab health: HEALTH_WARN 1/3 mons down, quorum server5,server3 services: mon: 3 daemons, quorum server5,server3, out of quorum:

Re: [ceph-users] [CLUSTER STUCK] Luminous cluster stuck when adding monitor

2017-10-08 Thread Nico Schottelius
and now comes the not so funny part: restarting the monitor makes the cluster hang again. I will post another debug log in the next hours, now from the monitor on server2. Nico Schottelius <nico.schottel...@ungleich.ch> writes: > Not sure if I mentioned before: adding a new monitor

Re: [ceph-users] [CLUSTER STUCK] Luminous cluster stuck when adding monitor

2017-10-08 Thread Nico Schottelius
uot;: "server2", "addr": "[2a0a:e5c0::92e2:baff:fe4e:6614]:6789/0", "public_addr": "[2a0a:e5c0::92e2:baff:fe4e:6614]:6789/0" }, { "rank": 3, "name&q

Re: [ceph-users] Luminous cluster stuck when adding monitor

2017-10-07 Thread Nico Schottelius
monitors that was solely related to a > switch's MTU being too small. > > Maybe that could be the case? If not, I'll take a look at the logs as > soon as possible. > > -Joao > >> >> On Wed, Oct 4, 2017 at 1:04 PM Nico Schottelius >> <nico.schottel...@unglei

Re: [ceph-users] Luminous cluster stuck when adding monitor

2017-10-04 Thread Nico Schottelius
have ntpd running). We are running everything on IPv6, but this should not be a problem, should it? Best, Nico Nico Schottelius <nico.schottel...@ungleich.ch> writes: > Hello Gregory, > > the logfile I produced has already debug mon = 20 set: > > [21:03:51] server1:~#

Re: [ceph-users] Luminous cluster stuck when adding monitor

2017-10-04 Thread Nico Schottelius
igurable. > > On Wed, Oct 4, 2017 at 4:09 AM Nico Schottelius < > nico.schottel...@ungleich.ch> wrote: > >> >> Good morning, >> >> we have recently upgraded our kraken cluster to luminous and since then >> noticed an odd behaviour: we cannot add a m

[ceph-users] Luminous cluster stuck when adding monitor

2017-10-04 Thread Nico Schottelius
Good morning, we have recently upgraded our kraken cluster to luminous and since then noticed an odd behaviour: we cannot add a monitor anymore. As soon as we start a new monitor (server2), ceph -s and ceph -w start to hang. The situation became worse, since one of our staff stopped an

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-12 Thread Nico Schottelius
.ceph.com/issues/21353 >> [2] >> http://docs.ceph.com/docs/master/rbd/rbd-openstack/#setup-ceph-client-authentication >> >> On Mon, Sep 11, 2017 at 5:16 PM, Nico Schottelius >> <nico.schottel...@ungleich.ch> wrote: >>> >>> That indeed

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
s/master/rbd/rbd-openstack/#setup-ceph-client-authentication > > On Mon, Sep 11, 2017 at 5:16 PM, Nico Schottelius > <nico.schottel...@ungleich.ch> wrote: >> >> That indeed worked! Thanks a lot! >> >> The remaining question from my side: did we do anything wrong

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
That indeed worked! Thanks a lot! The remaining question from my side: did we do anything wrong in the upgrade process and if not, should it be documented somewhere how to setup the permissions correctly on upgrade? Or should the documentation on the side of the cloud infrastructure software be

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
ng else stands out. That >> "Exec format error" isn't actually an issue -- but now that I know >> about it, we can prevent it from happening in the future [1] >> >> [1] http://tracker.ceph.com/issues/21360 >> >> On Mon, Sep 11, 2017 at 4:32 PM, Nic

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
://www.nico.schottelius.org/ceph.client.libvirt.41670.log.bz2 I wonder if anyone sees the real reason for the I/O errors in the log? Best, Nico > Mykola Golub <mgo...@mirantis.com> writes: > >> On Sun, Sep 10, 2017 at 03:56:21PM +0200, Nico Schottelius wrote: >>> >>> Just

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
017-09-10 08:23, Nico Schottelius wrote: >> >> Good morning, >> >> yesterday we had an unpleasant surprise that I would like to discuss: >> >> Many (not all!) of our VMs were suddenly >> dying (qemu process exiting) and when trying to restart them, inside

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius
gt; Regards, > Lionel > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Nico Schottelius >> Sent: dimanche 10 septembre 2017 14:23 >> To: ceph-users <ceph-us...@ceph.com> >> Cc: kamila.souck...@u

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius
configuration settings. > > On Sun, Sep 10, 2017 at 9:22 AM, Nico Schottelius > <nico.schottel...@ungleich.ch> wrote: >> >> Hello Jason, >> >> I think there is a slight misunderstanding: >> There is only one *VM*, not one OSD left that we did not start

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius
is using librbd instead of a mapped krbd block device, > correct? If that is the case, can you add "debug-rbd=20" and "debug > objecter=20" to your ceph.conf and boot up your last remaining broken > OSD? > > On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius > &

[ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius
Good morning, yesterday we had an unpleasant surprise that I would like to discuss: Many (not all!) of our VMs were suddenly dying (qemu process exiting) and when trying to restart them, inside the qemu process we saw i/o errors on the disks and the OS was not able to start (i.e. stopped in

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Nico Schottelius
Lionel, Christian, we do have the exactly same trouble as Christian, namely Christian Eichelmann [Fri, Jan 09, 2015 at 10:43:20AM +0100]: We still don't know what caused this specific error... and ...there is currently no way to make ceph forget about the data of this pg and create it as

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Nico Schottelius
about your deployment: ceph version, kernel versions, OS, filesystem btrfs/xfs. Thx Jiri - Reply message - From: Nico Schottelius nico-eph-us...@schottelius.org To: ceph-users@lists.ceph.com Subject: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster

Re: [ceph-users] Hanging VMs with Qemu + RBD

2015-01-07 Thread Nico Schottelius
Hello Achim, good to hear someone else running this setup. We have changed the number of backfills using ceph tell osd.\* injectargs '--osd-max-backfills 1' and it seems to work mostly in regards of issues when rebalancing. One unsolved problem we have is machines kernel panic'ing, when

[ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-07 Thread Nico Schottelius
. To tell the truth, I guess that will result in the end of our ceph project (running for already 9 Monthes). Regards, Christian Am 29.12.2014 15:59, schrieb Nico Schottelius: Hey Christian, Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]: [incomplete PG / RBD hanging, osd

[ceph-users] Weights: Hosts vs. OSDs

2014-12-30 Thread Nico Schottelius
Good evening, for some time we have the problem that ceph stores too much data on a host with small disks. Originally we used weight 1 = 1 TB, but we reduced the weight for this particular host further to keep it somehow alive. Our setup currently consists of 3 hosts: wein: 6x 136G (fest

Re: [ceph-users] Weights: Hosts vs. OSDs

2014-12-30 Thread Nico Schottelius
Hey Lindsay, Lindsay Mathieson [Wed, Dec 31, 2014 at 06:23:10AM +1000]: On Tue, 30 Dec 2014 05:07:31 PM Nico Schottelius wrote: While writing this I noted that the relation / factor is exactly 5.5 times wrong, so I *guess* that ceph treats all hosts with the same weight (even though

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-29 Thread Nico Schottelius
Hey Christian, Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]: [incomplete PG / RBD hanging, osd lost also not helping] that is very interesting to hear, because we had a similar situation with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg directories to allow OSDs

Re: [ceph-users] HEALTH_WARN 29 pgs degraded; 29 pgs stuck degraded; 133 pgs stuck unclean; 29 pgs stuck undersized;

2014-12-27 Thread Nico Schottelius
Hey Jiri, also rais the pgp_num (pg != pgp - it's easy to overread). Cheers, Nico Jiri Kanicky [Sun, Dec 28, 2014 at 01:52:39AM +1100]: Hi, I just build my CEPH cluster but having problems with the health of the cluster. Here are few details: - I followed the ceph documentation. - I

Re: [ceph-users] Running instances on ceph with openstack

2014-12-23 Thread Nico Schottelius
Hello Ali Shah, we are running VMs using Opennebula with ceph as the backend. So far with varying results: From time to time VMs are freezing, probably panic'ing when the load is too high on the ceph storage due to rebalance work. We are experimenting with --osd-max-backfills 1, but it hasn't

Re: [ceph-users] Behaviour of a cluster with full OSD(s)

2014-12-23 Thread Nico Schottelius
Max, List, Max Power [Tue, Dec 23, 2014 at 12:34:54PM +0100]: [...Recovering from full osd ...] Normally the osd process quits then and I cannot restart it (even after setting the replicas back). The only possibility is to manually delete complete PG folders after exploring them with 'pg

Re: [ceph-users] ceph-deploy state of documentation [was: OSD JOURNAL not associated - ceph-disk list ?]

2014-12-22 Thread Nico Schottelius
ceph-deploy, but it does use ceph-disk. Whenever I have problems with the ceph-disk command, I first go look at the cookbook to see how it's doing things. On Sun, Dec 21, 2014 at 10:37 AM, Nico Schottelius nico-ceph-us...@schottelius.org wrote: Hello list, I am a bit wondering about

[ceph-users] ceph-deploy state of documentation [was: OSD JOURNAL not associated - ceph-disk list ?]

2014-12-21 Thread Nico Schottelius
Hello list, I am a bit wondering about ceph-deploy and the development of ceph: I see that many people in the community are pushing towards the use of ceph-deploy, likely to ease use of ceph. However, I have run multiple times into issues using ceph-deploy, when it failed or incorrectly setup

[ceph-users] Hanging VMs with Qemu + RBD

2014-12-19 Thread Nico Schottelius
Hello, another issue we have experienced with qemu VMs (qemu 2.0.0) with ceph-0.80 on Ubuntu 14.04 managed by opennebula 4.10.1: The VMs are completly frozen when rebalancing takes place, they do not even respond to ping anymore. Looking at the qemu processes they are in state Sl. Is this a