Hey Ansgar,
we have a similar "problem": in our case all servers are wiped on
reboot, as they boot their operating system from the network into
initramfs.
While the OS configuration is done with cdist [0], we consider ceph osds
more dynamic data and just re-initialise all osds on boot using the
idodh.nl/2018/01/placement-groups-with-ceph-luminous-stay-in-activating-state/
>
> 2018-03-17 12:15 GMT+03:00 Nico Schottelius <nico.schottel...@ungleich.ch>:
>
>>
>> Good morning,
>>
>> some days ago we created a new pool with 512 pgs, and originally 5 os
Good morning,
some days ago we created a new pool with 512 pgs, and originally 5 osds.
We use the device class "ssd" and a crush rule that maps all data for
the pool "ssd" to the ssd device class osds.
While creating, one of the ssds failed and we are left with 4 osds:
[10:00:22]
Max,
I understand your frustration.
However, last time I checked, ceph was open source.
Some of you might not remember, but one major reason why open source is
great is that YOU CAN DO your own modifications.
If you need a change like iSCSI support and it isn't there,
it is probably best, if
A very interesting question and I would add the follow up question:
Is there an easy way to add an external DB/WAL devices to an existing
OSD?
I suspect that it might be something on the lines of:
- stop osd
- create a link in ...ceph/osd/ceph-XX/block.db to the target device
- (maybe run some
It seems your monitor capabilities are different to mine:
root@server3:/opt/ungleich-tools# ceph -k
/var/lib/ceph/mon/ceph-server3/keyring -n mon. auth list
2018-02-16 20:34:59.257529 7fe0d5c6b700 0 librados: mon. authentication error
(13) Permission denied
[errno 13] error connecting to the
] error connecting to the cluster
... which kind of makes sense, as the mon. key does not have
capabilities for it. Then again, I wonder how monitors actually talk to
each other...
Michel Raabe <rmic...@devnu11.net> writes:
> On 02/16/18 @ 18:21, Nico Schottelius wrote:
>> on a
Hello,
on a test cluster I issued a few seconds ago:
ceph auth caps client.admin mgr 'allow *'
instead of what I really wanted to do
ceph auth caps client.admin mgr 'allow *' mon 'allow *' osd 'allow *' \
mds allow
Now any access to the cluster using client.admin correctly results in
Hello,
we have one pool, in which about 10 disks failed last week (fortunately
mostly sequentially), which now has now some pgs that are only left on
one disk.
Is there a command to set one pool into "read-only" mode or even
"recovery io-only" mode so that the only thing same is doing is
Dear list,
for a few days we are disecting ceph-disk and ceph-volume to find out,
what is the appropriate way of creating partitions for ceph.
For years already I found ceph-disk (and especially ceph-deploy) very
error prone and we at ungleich are considering to rewrite both into a
Good morning,
after another disk failure, we currently have 7 inactive pgs [1], which
are stalling IO from the affected VMs.
It seems that ceph, when rebuilding does not focus on repairing
the inactive PGs first, which surprised us quite a lot:
It does not repair the inactive first, but mixes
Hey Wido,
> [...]
> Like I said, latency, latency, latency. That's what matters. Bandwidth
> usually isn't a real problem.
I imagined that.
> What latency do you have with a 8k ping between hosts?
As the link will be setup this week, I cannot tell yet.
However, currently we have on a 65km
Good evening list,
we are soon expanding our data center [0] to a new location [1].
We are mainly offering VPS / VM Hosting, so rbd is our main interest.
We have a low latency 10 Gbit/s link between our other location [2] and
we are wondering, what is the best practise for expanding.
Naturally
Hey Burkhard,
we did actually restart osd.61, which led to the current status.
Best,
Nico
Burkhard Linke <burkhard.li...@computational.bio.uni-giessen.de> writes:>
> On 01/23/2018 08:54 AM, Nico Schottelius wrote:
>> Good morning,
>>
>> the osd.61 actually just
00",
> "last_deep_scrub": "0'0",
> "last_deep_scrub_stamp": "0.00",
> "last_clean_scrub_stamp": "0.00",
> "log_size": 0,
>
ngth": "8"
},
{
"start": "10",
"length": "2"
}
],
"history": {
"epoch_created": 913
in the long run we ended up with full data integrity again.
>
> On Mon, Jan 22, 2018 at 1:03 PM Nico Schottelius <
> nico.schottel...@ungleich.ch> wrote:
>
>>
>> Hey David,
>>
>> thanks for the fast answer. All our pools are running with size=3,
>> min_s
the cluster will
> likely find all of the objects by the time it's done backfilling. With
> only losing 2 disks, I wouldn't worry about the missing objects not
> becoming found unless you're pool size=2.
>
> On Mon, Jan 22, 2018 at 11:47 AM Nico Schottelius <
> nico.schottel...@unglei
Hello,
we added about 7 new disks yesterday/today and our cluster became very
slow. While the rebalancing took place, 2 of the 7 new added disks
died.
Our cluster is still recovering, however we spotted that there are a lot
of unfound objects.
We lost osd.63 and osd.64, which seem not to be
Hello,
our problems with ceph monitors continue in version 12.2.2:
Adding a specific monitor causes all monitors to hang and not respond to
ceph -s or similar anymore.
Interestingly when this monitor is on (mon.server2), the other two
monitors (mon.server3, mon.server5) randomly begin to
Hello,
we are running everything IPv6 only. You just need to setup the MTU on
your devices (nics, switches) correctly, nothing ceph or IPv6 specific
required.
If you are using SLAAC (like we do), you can also announce the MTU via
RA.
Best,
Nico
Jack writes:
> Or
Hey Joao,
thanks for the pointer! Do you have a timeline for the release of
v12.2.2?
Best,
Nico
--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
___
ceph-users mailing list
ceph-users@lists.ceph.com
uck however - synchronizing is
> progressing, albeit slowly.
>
> Can you please share the logs of the other monitors, especially of
> those crashing?
>
> -Joao
>
> On 10/18/2017 06:58 AM, Nico Schottelius wrote:
>>
>> Hello everyone,
>>
>> is there any sol
Hello everyone,
is there any solution in sight for this problem? Currently our cluster
is stuck with a 2 monitor configuration, as everytime we restart the one
server2, it crashes after some minutes (and in between the cluster is stuck).
Should we consider downgrading to kraken to fix that
Good morning Joao,
thanks for your feedback! We do actually have three managers running:
cluster:
id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
health: HEALTH_WARN
1/3 mons down, quorum server5,server3
services:
mon: 3 daemons, quorum server5,server3, out of quorum:
and now comes
the not so funny part: restarting the monitor makes the cluster hang again.
I will post another debug log in the next hours, now from the monitor on
server2.
Nico Schottelius <nico.schottel...@ungleich.ch> writes:
> Not sure if I mentioned before: adding a new monitor
uot;: "server2",
"addr": "[2a0a:e5c0::92e2:baff:fe4e:6614]:6789/0",
"public_addr": "[2a0a:e5c0::92e2:baff:fe4e:6614]:6789/0"
},
{
"rank": 3,
"name&q
monitors that was solely related to a
> switch's MTU being too small.
>
> Maybe that could be the case? If not, I'll take a look at the logs as
> soon as possible.
>
> -Joao
>
>>
>> On Wed, Oct 4, 2017 at 1:04 PM Nico Schottelius
>> <nico.schottel...@unglei
have
ntpd running).
We are running everything on IPv6, but this should not be a problem,
should it?
Best,
Nico
Nico Schottelius <nico.schottel...@ungleich.ch> writes:
> Hello Gregory,
>
> the logfile I produced has already debug mon = 20 set:
>
> [21:03:51] server1:~#
igurable.
>
> On Wed, Oct 4, 2017 at 4:09 AM Nico Schottelius <
> nico.schottel...@ungleich.ch> wrote:
>
>>
>> Good morning,
>>
>> we have recently upgraded our kraken cluster to luminous and since then
>> noticed an odd behaviour: we cannot add a m
Good morning,
we have recently upgraded our kraken cluster to luminous and since then
noticed an odd behaviour: we cannot add a monitor anymore.
As soon as we start a new monitor (server2), ceph -s and ceph -w start to hang.
The situation became worse, since one of our staff stopped an
.ceph.com/issues/21353
>> [2]
>> http://docs.ceph.com/docs/master/rbd/rbd-openstack/#setup-ceph-client-authentication
>>
>> On Mon, Sep 11, 2017 at 5:16 PM, Nico Schottelius
>> <nico.schottel...@ungleich.ch> wrote:
>>>
>>> That indeed
s/master/rbd/rbd-openstack/#setup-ceph-client-authentication
>
> On Mon, Sep 11, 2017 at 5:16 PM, Nico Schottelius
> <nico.schottel...@ungleich.ch> wrote:
>>
>> That indeed worked! Thanks a lot!
>>
>> The remaining question from my side: did we do anything wrong
That indeed worked! Thanks a lot!
The remaining question from my side: did we do anything wrong in the
upgrade process and if not, should it be documented somewhere how to
setup the permissions correctly on upgrade?
Or should the documentation on the side of the cloud infrastructure
software be
ng else stands out. That
>> "Exec format error" isn't actually an issue -- but now that I know
>> about it, we can prevent it from happening in the future [1]
>>
>> [1] http://tracker.ceph.com/issues/21360
>>
>> On Mon, Sep 11, 2017 at 4:32 PM, Nic
://www.nico.schottelius.org/ceph.client.libvirt.41670.log.bz2
I wonder if anyone sees the real reason for the I/O errors in the log?
Best,
Nico
> Mykola Golub <mgo...@mirantis.com> writes:
>
>> On Sun, Sep 10, 2017 at 03:56:21PM +0200, Nico Schottelius wrote:
>>>
>>> Just
017-09-10 08:23, Nico Schottelius wrote:
>>
>> Good morning,
>>
>> yesterday we had an unpleasant surprise that I would like to discuss:
>>
>> Many (not all!) of our VMs were suddenly
>> dying (qemu process exiting) and when trying to restart them, inside
gt; Regards,
> Lionel
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Nico Schottelius
>> Sent: dimanche 10 septembre 2017 14:23
>> To: ceph-users <ceph-us...@ceph.com>
>> Cc: kamila.souck...@u
configuration settings.
>
> On Sun, Sep 10, 2017 at 9:22 AM, Nico Schottelius
> <nico.schottel...@ungleich.ch> wrote:
>>
>> Hello Jason,
>>
>> I think there is a slight misunderstanding:
>> There is only one *VM*, not one OSD left that we did not start
is using librbd instead of a mapped krbd block device,
> correct? If that is the case, can you add "debug-rbd=20" and "debug
> objecter=20" to your ceph.conf and boot up your last remaining broken
> OSD?
>
> On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius
> &
Good morning,
yesterday we had an unpleasant surprise that I would like to discuss:
Many (not all!) of our VMs were suddenly
dying (qemu process exiting) and when trying to restart them, inside the
qemu process we saw i/o errors on the disks and the OS was not able to
start (i.e. stopped in
Lionel, Christian,
we do have the exactly same trouble as Christian,
namely
Christian Eichelmann [Fri, Jan 09, 2015 at 10:43:20AM +0100]:
We still don't know what caused this specific error...
and
...there is currently no way to make ceph forget about the data of this pg
and create it as
about your deployment: ceph version, kernel versions, OS, filesystem
btrfs/xfs.
Thx Jiri
- Reply message -
From: Nico Schottelius nico-eph-us...@schottelius.org
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete =
Cluster
Hello Achim,
good to hear someone else running this setup. We have changed the number
of backfills using
ceph tell osd.\* injectargs '--osd-max-backfills 1'
and it seems to work mostly in regards of issues when rebalancing.
One unsolved problem we have is machines kernel panic'ing, when
.
To tell the truth, I guess that will result in the end of our ceph
project (running for already 9 Monthes).
Regards,
Christian
Am 29.12.2014 15:59, schrieb Nico Schottelius:
Hey Christian,
Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]:
[incomplete PG / RBD hanging, osd
Good evening,
for some time we have the problem that ceph stores too much data on
a host with small disks. Originally we used weight 1 = 1 TB, but
we reduced the weight for this particular host further to keep it
somehow alive.
Our setup currently consists of 3 hosts:
wein: 6x 136G (fest
Hey Lindsay,
Lindsay Mathieson [Wed, Dec 31, 2014 at 06:23:10AM +1000]:
On Tue, 30 Dec 2014 05:07:31 PM Nico Schottelius wrote:
While writing this I noted that the relation / factor is exactly 5.5 times
wrong, so I *guess* that ceph treats all hosts with the same weight (even
though
Hey Christian,
Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]:
[incomplete PG / RBD hanging, osd lost also not helping]
that is very interesting to hear, because we had a similar situation
with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg
directories to allow OSDs
Hey Jiri,
also rais the pgp_num (pg != pgp - it's easy to overread).
Cheers,
Nico
Jiri Kanicky [Sun, Dec 28, 2014 at 01:52:39AM +1100]:
Hi,
I just build my CEPH cluster but having problems with the health of
the cluster.
Here are few details:
- I followed the ceph documentation.
- I
Hello Ali Shah,
we are running VMs using Opennebula with ceph as the backend. So far
with varying results: From time to time VMs are freezing, probably
panic'ing when the load is too high on the ceph storage due to rebalance
work.
We are experimenting with --osd-max-backfills 1, but it hasn't
Max, List,
Max Power [Tue, Dec 23, 2014 at 12:34:54PM +0100]:
[...Recovering from full osd ...]
Normally
the osd process quits then and I cannot restart it (even after setting the
replicas back). The only possibility is to manually delete complete PG folders
after exploring them with 'pg
ceph-deploy, but it does use ceph-disk. Whenever
I have problems with the ceph-disk command, I first go look at the cookbook
to see how it's doing things.
On Sun, Dec 21, 2014 at 10:37 AM, Nico Schottelius
nico-ceph-us...@schottelius.org wrote:
Hello list,
I am a bit wondering about
Hello list,
I am a bit wondering about ceph-deploy and the development of ceph: I
see that many people in the community are pushing towards the use of
ceph-deploy, likely to ease use of ceph.
However, I have run multiple times into issues using ceph-deploy, when
it failed or incorrectly setup
Hello,
another issue we have experienced with qemu VMs
(qemu 2.0.0) with ceph-0.80 on Ubuntu 14.04
managed by opennebula 4.10.1:
The VMs are completly frozen when rebalancing takes place,
they do not even respond to ping anymore.
Looking at the qemu processes they are in state Sl.
Is this a
54 matches
Mail list logo