from:"Nico Schottelius"

[ceph-users] Stuck + Incomplete after deleting to allow osd to start

2014-12-19 Thread Nico Schottelius

Hello, we have had some trouble of osds running full, even after rebalancing. So at 100% usage and ceph-osds not starting anymore, we decided to delete some pg directories, after which rebalancing finished. However after this, we have the situation that one pg is not becoming clean anymore. We t

[ceph-users] Hanging VMs with Qemu + RBD

2014-12-19 Thread Nico Schottelius

Hello, another issue we have experienced with qemu VMs (qemu 2.0.0) with ceph-0.80 on Ubuntu 14.04 managed by opennebula 4.10.1: The VMs are completly frozen when rebalancing takes place, they do not even respond to ping anymore. Looking at the qemu processes they are in state "Sl". Is this a

[ceph-users] ceph-deploy & state of documentation [was: OSD & JOURNAL not associated - ceph-disk list ?]

2014-12-21 Thread Nico Schottelius

Hello list, I am a bit wondering about "ceph-deploy" and the development of ceph: I see that many people in the community are pushing towards the use of ceph-deploy, likely to ease use of ceph. However, I have run multiple times into issues using ceph-deploy, when it failed or incorrectly setup p

Re: [ceph-users] ceph-deploy & state of documentation [was: OSD & JOURNAL not associated - ceph-disk list ?]

2014-12-22 Thread Nico Schottelius

fork of the Ceph cookbook. The > Ceph cookbook doesn't use ceph-deploy, but it does use ceph-disk. Whenever > I have problems with the ceph-disk command, I first go look at the cookbook > to see how it's doing things. > > > > On Sun, Dec 21, 2014 at 10:37 AM,

Re: [ceph-users] Running instances on ceph with openstack

2014-12-23 Thread Nico Schottelius

Hello Ali Shah, we are running VMs using Opennebula with ceph as the backend. So far with varying results: From time to time VMs are freezing, probably panic'ing when the load is too high on the ceph storage due to rebalance work. We are experimenting with --osd-max-backfills 1, but it hasn't sol

Re: [ceph-users] Behaviour of a cluster with full OSD(s)

2014-12-23 Thread Nico Schottelius

Max, List, Max Power [Tue, Dec 23, 2014 at 12:34:54PM +0100]: > [...Recovering from full osd ...] > > Normally > the osd process quits then and I cannot restart it (even after setting the > replicas back). The only possibility is to manually delete complete PG folders > after exploring them with

Re: [ceph-users] HEALTH_WARN 29 pgs degraded; 29 pgs stuck degraded; 133 pgs stuck unclean; 29 pgs stuck undersized;

2014-12-27 Thread Nico Schottelius

Hey Jiri, also rais the pgp_num (pg != pgp - it's easy to overread). Cheers, Nico Jiri Kanicky [Sun, Dec 28, 2014 at 01:52:39AM +1100]: > Hi, > > I just build my CEPH cluster but having problems with the health of > the cluster. > > Here are few details: > - I followed the ceph documentation.

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-29 Thread Nico Schottelius

Hey Christian, Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]: > [incomplete PG / RBD hanging, osd lost also not helping] that is very interesting to hear, because we had a similar situation with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg directories to allow OSDs

[ceph-users] Weights: Hosts vs. OSDs

2014-12-30 Thread Nico Schottelius

Good evening, for some time we have the problem that ceph stores too much data on a host with small disks. Originally we used weight 1 = 1 TB, but we reduced the weight for this particular host further to keep it somehow alive. Our setup currently consists of 3 hosts: wein: 6x 136G (fest dis

Re: [ceph-users] Weights: Hosts vs. OSDs

2014-12-30 Thread Nico Schottelius

Hey Lindsay, Lindsay Mathieson [Wed, Dec 31, 2014 at 06:23:10AM +1000]: > On Tue, 30 Dec 2014 05:07:31 PM Nico Schottelius wrote: > > While writing this I noted that the relation / factor is exactly 5.5 times > > wrong, so I *guess* that ceph treats all hosts with the same weight (

[ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-07 Thread Nico Schottelius

r to create a new pool along the > old one to at least enable our clients to send data to ceph again. > > To tell the truth, I guess that will result in the end of our ceph > project (running for already 9 Monthes). > > Regards, > Christian > > Am 29.12.2014 15:59,

Re: [ceph-users] Hanging VMs with Qemu + RBD

2015-01-07 Thread Nico Schottelius

Hello Achim, good to hear someone else running this setup. We have changed the number of backfills using ceph tell osd.\* injectargs '--osd-max-backfills 1' and it seems to work mostly in regards of issues when rebalancing. One unsolved problem we have is machines kernel panic'ing, when i/o

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-07 Thread Nico Schottelius

t know why you were getting kernel panics. It's probably advisable to > stick to the most recent mainline kernel when using kRBD. > > Cheers, Dan > > On 7 Jan 2015 20:45, Nico Schottelius wrote: > Good evening, > > we also tried to rescue data *from* our old

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Nico Schottelius

Lionel, Christian, we do have the exactly same trouble as Christian, namely Christian Eichelmann [Fri, Jan 09, 2015 at 10:43:20AM +0100]: > We still don't know what caused this specific error... and > ...there is currently no way to make ceph forget about the data of this pg > and create it as

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Nico Schottelius

more info > about your deployment: ceph version, kernel versions, OS, filesystem > btrfs/xfs. > > Thx Jiri > > - Reply message - > From: "Nico Schottelius" > To: > Subject: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = >

[ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius

Good morning, yesterday we had an unpleasant surprise that I would like to discuss: Many (not all!) of our VMs were suddenly dying (qemu process exiting) and when trying to restart them, inside the qemu process we saw i/o errors on the disks and the OS was not able to start (i.e. stopped in init

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius

f a mapped krbd block device, > correct? If that is the case, can you add "debug-rbd=20" and "debug > objecter=20" to your ceph.conf and boot up your last remaining broken > OSD? > > On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius > wrote: >> >&g

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-10 Thread Nico Schottelius

ttings. > > On Sun, Sep 10, 2017 at 9:22 AM, Nico Schottelius > wrote: >> >> Hello Jason, >> >> I think there is a slight misunderstanding: >> There is only one *VM*, not one OSD left that we did not start. >> >> Or does librbd also read ceph.conf

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius

egards, > Lionel > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Nico Schottelius >> Sent: dimanche 10 septembre 2017 14:23 >> To: ceph-users >> Cc: kamila.souck...@ungleich.ch >>

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius

Sarunas, may I ask when this happened? And did you move OSDs or mons after that export/import procecdure? I really wonder, what is the reason for this behaviour and also if it is likely to experience it again. Best, Nico Sarunas Burdulis writes: > On 2017-09-10 08:23, Nico Schottel

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius

Hey Mykola, thanks for the hint, I will test this in a few hours when I'm back on a regular Internet connection! Best, Nico Mykola Golub writes: > On Sun, Sep 10, 2017 at 03:56:21PM +0200, Nico Schottelius wrote: >> >> Just tried and there is not much more log in ceph -w

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius

og at http://www.nico.schottelius.org/ceph.client.libvirt.41670.log.bz2 I wonder if anyone sees the real reason for the I/O errors in the log? Best, Nico > Mykola Golub writes: > >> On Sun, Sep 10, 2017 at 03:56:21PM +0200, Nico Schottelius wrote: >>> >>> Just tried and there is

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius

format error" isn't actually an issue -- but now that I know >> about it, we can prevent it from happening in the future [1] >> >> [1] http://tracker.ceph.com/issues/21360 >> >> On Mon, Sep 11, 2017 at 4:32 PM, Nico Schottelius >> wrote: >>

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius

looks like your "client.libvirt" user lacks the permission to > blacklist a dead client that had previously acquired the exclusive > lock and failed to release it. > > Can you provide the results from "ceph auth get client.libvirt"? I > suspect it only has 'cap

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius

That indeed worked! Thanks a lot! The remaining question from my side: did we do anything wrong in the upgrade process and if not, should it be documented somewhere how to setup the permissions correctly on upgrade? Or should the documentation on the side of the cloud infrastructure software be

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-11 Thread Nico Schottelius

penstack/#setup-ceph-client-authentication > > On Mon, Sep 11, 2017 at 5:16 PM, Nico Schottelius > wrote: >> >> That indeed worked! Thanks a lot! >> >> The remaining question from my side: did we do anything wrong in the >> upgrade process and if not, should it be documen

Re: [ceph-users] RBD I/O errors with QEMU [luminous upgrade/osd change]

2017-09-12 Thread Nico Schottelius

ocs.ceph.com/docs/master/rbd/rbd-openstack/#setup-ceph-client-authentication >> >> On Mon, Sep 11, 2017 at 5:16 PM, Nico Schottelius >> wrote: >>> >>> That indeed worked! Thanks a lot! >>> >>> The remaining question from my side: did we d

[ceph-users] Luminous cluster stuck when adding monitor

2017-10-04 Thread Nico Schottelius

Good morning, we have recently upgraded our kraken cluster to luminous and since then noticed an odd behaviour: we cannot add a monitor anymore. As soon as we start a new monitor (server2), ceph -s and ceph -w start to hang. The situation became worse, since one of our staff stopped an existing

Re: [ceph-users] Luminous cluster stuck when adding monitor

2017-10-04 Thread Nico Schottelius

lient > connections because it's been out of quorum for too long, which is the > correct behavior in general. I'd imagine that you've got clients trying to > connect to the new monitor instead of the ones already in the quorum and > not passing around correctly; this is a

Re: [ceph-users] Luminous cluster stuck when adding monitor

2017-10-04 Thread Nico Schottelius

have ntpd running). We are running everything on IPv6, but this should not be a problem, should it? Best, Nico Nico Schottelius writes: > Hello Gregory, > > the logfile I produced has already debug mon = 20 set: > > [21:03:51] server1:~# grep "debug mon" /etc/ceph/c

Re: [ceph-users] Luminous cluster stuck when adding monitor

2017-10-07 Thread Nico Schottelius

ith the monitors that was solely related to a > switch's MTU being too small. > > Maybe that could be the case? If not, I'll take a look at the logs as > soon as possible. > > -Joao > >> >> On Wed, Oct 4, 2017 at 1:04 PM Nico Schottelius >> mailto:

Re: [ceph-users] [CLUSTER STUCK] Luminous cluster stuck when adding monitor

2017-10-08 Thread Nico Schottelius

uot;: "server2", "addr": "[2a0a:e5c0::92e2:baff:fe4e:6614]:6789/0", "public_addr": "[2a0a:e5c0::92e2:baff:fe4e:6614]:6789/0" }, { "rank": 3, "name&q

Re: [ceph-users] [CLUSTER STUCK] Luminous cluster stuck when adding monitor

2017-10-08 Thread Nico Schottelius

and now comes the not so funny part: restarting the monitor makes the cluster hang again. I will post another debug log in the next hours, now from the monitor on server2. Nico Schottelius writes: > Not sure if I mentioned before: adding a new monitor also puts the whole > cluster into

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

2017-10-08 Thread Nico Schottelius

Good morning Joao, thanks for your feedback! We do actually have three managers running: cluster: id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab health: HEALTH_WARN 1/3 mons down, quorum server5,server3 services: mon: 3 daemons, quorum server5,server3, out of quorum: s

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

2017-10-17 Thread Nico Schottelius

Hello everyone, is there any solution in sight for this problem? Currently our cluster is stuck with a 2 monitor configuration, as everytime we restart the one server2, it crashes after some minutes (and in between the cluster is stuck). Should we consider downgrading to kraken to fix that probl

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

2017-10-18 Thread Nico Schottelius

nizing is > progressing, albeit slowly. > > Can you please share the logs of the other monitors, especially of > those crashing? > > -Joao > > On 10/18/2017 06:58 AM, Nico Schottelius wrote: >> >> Hello everyone, >> >> is there any solutio

Re: [ceph-users] [MONITOR SEGFAULT] Luminous cluster stuck when adding monitor

2017-10-18 Thread Nico Schottelius

Hey Joao, thanks for the pointer! Do you have a timeline for the release of v12.2.2? Best, Nico -- Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/li

Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-27 Thread Nico Schottelius

Hello, we are running everything IPv6 only. You just need to setup the MTU on your devices (nics, switches) correctly, nothing ceph or IPv6 specific required. If you are using SLAAC (like we do), you can also announce the MTU via RA. Best, Nico Jack writes: > Or maybe you reach that ipv4

[ceph-users] Adding Monitor ceph freeze, monitor 100% cpu usage

2018-01-06 Thread Nico Schottelius

Hello, our problems with ceph monitors continue in version 12.2.2: Adding a specific monitor causes all monitors to hang and not respond to ceph -s or similar anymore. Interestingly when this monitor is on (mon.server2), the other two monitors (mon.server3, mon.server5) randomly begin to consum

[ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius

Hello, we added about 7 new disks yesterday/today and our cluster became very slow. While the rebalancing took place, 2 of the 7 new added disks died. Our cluster is still recovering, however we spotted that there are a lot of unfound objects. We lost osd.63 and osd.64, which seem not to be inv

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius

find all of the objects by the time it's done backfilling. With > only losing 2 disks, I wouldn't worry about the missing objects not > becoming found unless you're pool size=2. > > On Mon, Jan 22, 2018 at 11:47 AM Nico Schottelius < > nico.schottel...@ungleich.ch&g

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius

th full data integrity again. > > On Mon, Jan 22, 2018 at 1:03 PM Nico Schottelius < > nico.schottel...@ungleich.ch> wrote: > >> >> Hey David, >> >> thanks for the fast answer. All our pools are running with size=3, >> min_size=2 and the two disks were

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius

rrors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 0, "num_bytes_recovered": 0, "num_keys_recovered": 0, "num_objects_omap&q

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-23 Thread Nico Schottelius

"last_epoch_clean": 0, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "0'0", > "last_scrub_stamp": "0.00"

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-23 Thread Nico Schottelius

Hey Burkhard, we did actually restart osd.61, which led to the current status. Best, Nico Burkhard Linke writes:> > On 01/23/2018 08:54 AM, Nico Schottelius wrote: >> Good morning, >> >> the osd.61 actually just crashed and the disk is still intact. However, >&g

[ceph-users] [Best practise] Adding new data center

2018-01-29 Thread Nico Schottelius

Good evening list, we are soon expanding our data center [0] to a new location [1]. We are mainly offering VPS / VM Hosting, so rbd is our main interest. We have a low latency 10 Gbit/s link between our other location [2] and we are wondering, what is the best practise for expanding. Naturally

Re: [ceph-users] [Best practise] Adding new data center

2018-01-29 Thread Nico Schottelius

Hey Wido, > [...] > Like I said, latency, latency, latency. That's what matters. Bandwidth > usually isn't a real problem. I imagined that. > What latency do you have with a 8k ping between hosts? As the link will be setup this week, I cannot tell yet. However, currently we have on a 65km lin

[ceph-users] Inactive PGs rebuild is not priorized

2018-02-03 Thread Nico Schottelius

Good morning, after another disk failure, we currently have 7 inactive pgs [1], which are stalling IO from the affected VMs. It seems that ceph, when rebuilding does not focus on repairing the inactive PGs first, which surprised us quite a lot: It does not repair the inactive first, but mixes i

[ceph-users] ceph-disk vs. ceph-volume: both error prone

2018-02-09 Thread Nico Schottelius

Dear list, for a few days we are disecting ceph-disk and ceph-volume to find out, what is the appropriate way of creating partitions for ceph. For years already I found ceph-disk (and especially ceph-deploy) very error prone and we at ungleich are considering to rewrite both into a ceph-block-do

[ceph-users] Is there a "set pool readonly" command?

2018-02-11 Thread Nico Schottelius

Hello, we have one pool, in which about 10 disks failed last week (fortunately mostly sequentially), which now has now some pgs that are only left on one disk. Is there a command to set one pool into "read-only" mode or even "recovery io-only" mode so that the only thing same is doing is recover

[ceph-users] Restoring keyring capabilities

2018-02-16 Thread Nico Schottelius

Hello, on a test cluster I issued a few seconds ago: ceph auth caps client.admin mgr 'allow *' instead of what I really wanted to do ceph auth caps client.admin mgr 'allow *' mon 'allow *' osd 'allow *' \ mds allow Now any access to the cluster using client.admin correctly results in cl

Re: [ceph-users] Restoring keyring capabilities

2018-02-16 Thread Nico Schottelius

3) Permission denied [errno 13] error connecting to the cluster ... which kind of makes sense, as the mon. key does not have capabilities for it. Then again, I wonder how monitors actually talk to each other... Michel Raabe writes: > On 02/16/18 @ 18:21, Nico Schottelius wrote: >> on a

Re: [ceph-users] Restoring keyring capabilities

2018-02-16 Thread Nico Schottelius

It seems your monitor capabilities are different to mine: root@server3:/opt/ungleich-tools# ceph -k /var/lib/ceph/mon/ceph-server3/keyring -n mon. auth list 2018-02-16 20:34:59.257529 7fe0d5c6b700 0 librados: mon. authentication error (13) Permission denied [errno 13] error connecting to the c

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-23 Thread Nico Schottelius

A very interesting question and I would add the follow up question: Is there an easy way to add an external DB/WAL devices to an existing OSD? I suspect that it might be something on the lines of: - stop osd - create a link in ...ceph/osd/ceph-XX/block.db to the target device - (maybe run some

Re: [ceph-users] Ceph iSCSI is a prank?

2018-02-28 Thread Nico Schottelius

Max, I understand your frustration. However, last time I checked, ceph was open source. Some of you might not remember, but one major reason why open source is great is that YOU CAN DO your own modifications. If you need a change like iSCSI support and it isn't there, it is probably best, if yo

[ceph-users] Stuck in creating+activating

2018-03-17 Thread Nico Schottelius

Good morning, some days ago we created a new pool with 512 pgs, and originally 5 osds. We use the device class "ssd" and a crush rule that maps all data for the pool "ssd" to the ssd device class osds. While creating, one of the ssds failed and we are left with 4 osds: [10:00:22] server2.place6

Re: [ceph-users] Stuck in creating+activating

2018-03-17 Thread Nico Schottelius

018/01/placement-groups-with-ceph-luminous-stay-in-activating-state/ > > 2018-03-17 12:15 GMT+03:00 Nico Schottelius : > >> >> Good morning, >> >> some days ago we created a new pool with 512 pgs, and originally 5 osds. >> We use the device class "

Re: [ceph-users] bluestore OSD did not start at system-boot

2018-04-05 Thread Nico Schottelius

Hey Ansgar, we have a similar "problem": in our case all servers are wiped on reboot, as they boot their operating system from the network into initramfs. While the OS configuration is done with cdist [0], we consider ceph osds more dynamic data and just re-initialise all osds on boot using the

58 matches

Mail list logo