Hi Caspar,
I've been trying the mgr balancer for a couple weeks now and can share
some experience.
Currently there are two modes implemented: upmap and crush-compat.
Upmap requires all clients to be running luminous -- it uses this new
pg-upmap mechanism to precisely move PGs one by one to a
Hi Folks,
i just try to get the prometheus plugin up and runing but as soon as i
browse /metrics i got:
500 Internal Server Error
The server encountered an unexpected condition which prevented it from
fulfilling the request.
Traceback (most recent call last):
File
On vrijdag 16 februari 2018 00:21:34 CET Gregory Farnum wrote:
> If mon.0 is not connected to the cluster, the monitor version report won’t
> update — how could it?
>
> So you need to figure out why that’s not working. A monitor that’s running
> but isn’t part of the active set is not good.
Hi,
After watching Sage's talk at LinuxConfAU about making distributed storage
easy he mentioned the Balancer Manager module. After enabling this module,
pg's should get balanced automagically around the cluster.
The module was added in Ceph Luminous v12.2.2
Since i couldn't find much
Hi,
Radosgw decided to reshard a bucket with 25 million objects from 256 to 512
shards.
Resharding took about 1 hour, during this time all buckets on the cluster had a
huge performance drop.
"GET" requests for small objects (on other buckets) took multiple seconds.
Are there any
i just try to get the prometheus plugin up and runing
Use module from master.
From this commit should work with 12.2.2, just wget it and replace
stock module.
https://github.com/ceph/ceph/blob/d431de74def1b8889ad568ab99436362833d063e/src/pybind/mgr/prometheus/module.py
k
On 02/16/2018 07:16 AM, Kai Wagner wrote:
> yes there are plans to add management functionality to the dashboard as
> well. As soon as we're covered all the existing functionality to create
> the initial PR we'll start with the management stuff. The big benefit
> here is, that we can profit what
On Fri, Feb 16, 2018 at 09:27:08AM +0100, Ansgar Jazdzewski wrote:
Hi Folks,
i just try to get the prometheus plugin up and runing but as soon as i
browse /metrics i got:
500 Internal Server Error
The server encountered an unexpected condition which prevented it from
fulfilling the request.
On Fri, Feb 16, 2018 at 07:06:21PM -0600, Graham Allan wrote:
[snip great debugging]
This seems similar to two open issues, could be either of them depending
on how old that bucket is.
http://tracker.ceph.com/issues/22756
http://tracker.ceph.com/issues/22928
- I have a mitigation posted to
I thought I'd follow up on this just in case anyone else experiences
similar issues. We ended up increasing the tcmalloc thread cache size and
saw a huge improvement in latency. This got us out of the woods because we
were finally in a state where performance was good enough that it was no
longer
On Wed, Feb 14, 2018 at 3:20 AM Maged Mokhtar wrote:
> Hi,
>
> You need to set the min_size to 2 in crush rule.
>
> The exact location and replication flow when a client writes data depends
> on the object name and num of pgs. the crush rule determines which osds
> will
Sorry to be posting a second mystery at the same time - though this
feels unconnected to my other one.
We had a user complain that they can't list the contents of one of their
buckets (they can access certain objects within the bucket).
I started by running a simple command to get data on
tanks i will have a look into it
--
Ansgar
2018-02-16 10:10 GMT+01:00 Konstantin Shalygin :
>> i just try to get the prometheus plugin up and runing
>
>
>
> Use module from master.
>
> From this commit should work with 12.2.2, just wget it and replace stock
> module.
>
>
Hello all,
I have a VM system with libvirt/KVM on top of a ceph storage system
I can't take an external snapshot (disk + RAM) of a running vm. The
option view->snapshots is disabled on the virt-manager application . On
other VM, on the same hypervisor, that runs on local storage, I can take
Dear list, hello Jason,
you may have seen my message on the Ceph mailing list about RDB pool
migration - it's a common subject that pools were created in a
sub-optimum fashion and i. e. pgnum is (not yet) reducible, so we're
looking into means to "clone" an RBD pool into a new pool within
hi,
wihle i added the "class" to all my OSD's the ceph-mgr crashed :-( but
the prometheus plugin works now
for i in {1..9}; do ceph osd crush set-device-class hdd osd.$i; done
Thanks,
Ansgar
2018-02-16 10:12 GMT+01:00 Jan Fajerski :
> On Fri, Feb 16, 2018 at 09:27:08AM
Thanks David.
> purging the objects and bypassing the GC is definitely the way to go
Cool.
> What rebalancing do you expect to see during this operation that you're
> trying to avoid
I think I just have a poor understanding or wasn't thinking very hard :) I
suppose the question really was
we upgraded the cluster from jewel to luminous
but the restart of the ceph osd service did not add the 'CLASS
(hdd/ssd)' on its own it it not exist so i had to add it on my own to
make it work.
should somhow mentiond in the upgrade process?
like:
for all osd make shure that the "class" it set
Colleagues, tell me please, who uses puppet for the deployment of ceph in
production?
And also, where can I get the puppet modules for ceph?
Александр Пивушков
___
ceph-users mailing list
ceph-users@lists.ceph.com
On Fri, Feb 16, 2018 at 5:36 AM, Jens-U. Mozdzen wrote:
> Dear list, hello Jason,
>
> you may have seen my message on the Ceph mailing list about RDB pool
> migration - it's a common subject that pools were created in a sub-optimum
> fashion and i. e. pgnum is (not yet)
I'm trying out cephfs and I'm in the process of copying over some
real-world data to see what happens.
I have created a number of cephfs file systems, the only one I've
started working on is the one called jenkins specifically the one named
jenkins which lives in fs_jenkins_data and
On Fri, Feb 16, 2018 at 8:08 AM, Jason Dillaman wrote:
> On Fri, Feb 16, 2018 at 5:36 AM, Jens-U. Mozdzen wrote:
>> Dear list, hello Jason,
>>
>> you may have seen my message on the Ceph mailing list about RDB pool
>> migration - it's a common subject that
2018-02-16 10:16 GMT+01:00 Dan van der Ster :
> Hi Caspar,
>
> I've been trying the mgr balancer for a couple weeks now and can share
> some experience.
>
> Currently there are two modes implemented: upmap and crush-compat.
>
> Upmap requires all clients to be running
On Fri, Feb 16, 2018 at 8:27 AM, Ansgar Jazdzewski
wrote:
> Hi Folks,
>
> i just try to get the prometheus plugin up and runing but as soon as i
> browse /metrics i got:
>
> 500 Internal Server Error
> The server encountered an unexpected condition which prevented it
Hi,
For every CONSECUTIVE ran of the "ceph tell osd.x bench" command
I get different and MUCH worse results
Is this expected ?
OSD was created with the following command ( /dev/sda is an Entreprise
class SDD)
ceph-deploy osd create --zap-disk --bluestore osd01:sdc --block-db
/dev/sda
Hi Jason,
... also forgot to mention "rbd export --export-format 2" / "rbd
import --export-format 2" that will also deeply export/import all
snapshots associated with an image and that feature is available in
the Luminous release.
thanks for that information, this could be very valuable for
On Fri, Feb 16, 2018 at 11:20 AM, Eugen Block wrote:
> Hi Jason,
>
>> ... also forgot to mention "rbd export --export-format 2" / "rbd
>> import --export-format 2" that will also deeply export/import all
>> snapshots associated with an image and that feature is available in
>> the
On vrijdag 16 februari 2018 13:33:32 CET David Turner wrote:
> Can you send us a `ceph status` and `ceph health detail`? Something is
> still weird. Also can you query the running daemon for it's version instead
> of asking the cluster? You should also be able to find it in the logs when
> it
On 02/16/2018 12:39 AM, F21 wrote:
I am uploading parts to RGW using the S3 multipart upload functionality.
I tried uploading a part sized at 500 KB and received a EntityTooSmall
error from the server. I am assuming that it expects each part to have
a minimum size of 5MB like S3.
I found
Can you send us a `ceph status` and `ceph health detail`? Something is
still weird. Also can you query the running daemon for it's version instead
of asking the cluster? You should also be able to find it in the logs when
it starts.
On Fri, Feb 16, 2018, 4:24 AM Mark Schouten
Hi there,
I have a Ceph cluster version 12.2.2 on CentOS 7.
It is a testing cluster and I have set it up 2 weeks ago.
after some days, I see that one of the three mons has stopped(out of
quorum) and I can't start it anymore.
I checked the mon service log and the output shows this error:
"""
Well I decided to try the increase in PGs to 4096 and that seems to have caused
some issues:
2018-02-16 12:38:35.798911 mon.carf-ceph-osd01 [ERR] overall HEALTH_ERR
61802168/241154376 objects misplaced (25.628%); Reduced data availability: 2081
pgs inactive, 322 pgs peering; Degraded data
Hi.
I want to run my Ceph cluster in a 2 datacenter/room setup with pool
size/replica 3.
But I don't get it done to define the ruleset correctly - or at least I
am unsure if it is correct.
I have the following setup of my Ceph cluster:
> ID CLASS WEIGHT TYPE NAME
Hi.
during the reorgainzation of my cluster I removed some OSDs. Obviously
something went wrong for 2 of them, osd.19 and osd.20.
If I get my current Crush map, decompile and edit them, I see 2
orphaned/stale entries for the former OSDs:
> device 16 osd.16 class hdd
> device 17 osd.17 class hdd
Thanks David,
I have set the nobackfill, norecover, noscrub, and nodeep-scrub options at this
point and the backfills have stopped. I’ll also stop the backups from pushing
into ceph for now.
I don’t want to make things worse, so ask for some more guidance now.
1) In looking at a PG
What does the cluster deployment look like? Usually this happens when
you’re sharing disks with the OS, or have co-located file journals or
something.
On Fri, Feb 16, 2018 at 4:02 AM Flemming Frandsen <
flemming.frand...@stibosystems.com> wrote:
> I'm trying out cephfs and I'm in the process of
On 02/16/18 @ 18:59, Nico Schottelius wrote:
> Saw that, too, however it does not work:
>
> root@server3:/var/lib/ceph/mon/ceph-server3# ceph -n mon. --keyring keyring
> auth caps client.admin mds 'allow *' osd 'allow *' mon 'allow *'
> 2018-02-16 17:23:38.154282 7f7e257e3700 0 librados: mon.
On 02/16/2018 12:31 PM, Graham Allan wrote:
If I set debug rgw=1 and demug ms=1 before running the "object stat"
command, it seems to stall in a loop of trying communicate with osds for
pool 96, which is .rgw.control
10.32.16.93:0/2689814946 --> 10.31.0.68:6818/8969 --
Your problem might have been creating too many PGs at once. I generally
increase pg_num and pgp_num by no more than 256 at a time. Making sure
that all PGs are creating, peered, and healthy (other than backfilling).
To help you get back to a healthy state, let's start off by getting all of
your
On Fri, Feb 16, 2018 at 7:37 AM Behnam Loghmani
wrote:
> Hi there,
>
> I have a Ceph cluster version 12.2.2 on CentOS 7.
>
> It is a testing cluster and I have set it up 2 weeks ago.
> after some days, I see that one of the three mons has stopped(out of
> quorum) and I
Hi Peter,
On 02/15/18 @ 19:44, Jan Peters wrote:
> I want to evaluate ceph with bluestore, so I need some hardware/configure
> advices from you.
>
> My Setup should be:
>
> 3 Nodes Cluster, on each with:
>
> - Intel Gold Processor SP 5118, 12 core / 2.30Ghz
> - 64GB RAM
> - 6 x 7,2k, 4 TB
I have a question about block.db and block.wal
How big should they be?
Relative to drive size or ssd size ?
Thanks Joe
>>> Michel Raabe 2/16/2018 9:12 AM >>>
Hi Peter,
On 02/15/18 @ 19:44, Jan Peters wrote:
> I want to evaluate ceph with bluestore, so I need some
Hello,
on a test cluster I issued a few seconds ago:
ceph auth caps client.admin mgr 'allow *'
instead of what I really wanted to do
ceph auth caps client.admin mgr 'allow *' mon 'allow *' osd 'allow *' \
mds allow
Now any access to the cluster using client.admin correctly results in
Saw that, too, however it does not work:
root@server3:/var/lib/ceph/mon/ceph-server3# ceph -n mon. --keyring keyring
auth caps client.admin mds 'allow *' osd 'allow *' mon 'allow *'
2018-02-16 17:23:38.154282 7f7e257e3700 0 librados: mon. authentication error
(13) Permission denied
[errno
Hi,
thank you.
Networksetup is like that:
2 x 10 GBit LACP for public
2 x 10 GBit LACP for clusternetwork
1 x 1 GBit for management
Yes Joe, the sizing for block.db and blockwal would be interesting!
Is there another advice for SSDs like blog from Sebastian Han?:
Hi,
thank you.
Networksetup is like that:
2 x 10 GBit LACP for public
2 x 10 GBit LACP for clusternetwork
1 x 1 GBit for management
Yes Joe, the sizing for block.db and blockwal would be interesting!
Is there another advice for SSDs like blog from Sebastian Han?:
On 02/16/18 @ 18:21, Nico Schottelius wrote:
> on a test cluster I issued a few seconds ago:
>
> ceph auth caps client.admin mgr 'allow *'
>
> instead of what I really wanted to do
>
> ceph auth caps client.admin mgr 'allow *' mon 'allow *' osd 'allow *' \
> mds allow
>
> Now any access
On 16.02.2018 06:20, Laszlo Budai wrote:
Hi,
I've just started up the dasboard component of the ceph mgr. It looks
OK, but from what can be seen, and what I was able to find in the
docs, the dashboard is just for monitoring. Is there any plugin that
allows management of the ceph resources
On 02/15/2018 05:33 PM, Gregory Farnum wrote:
On Thu, Feb 15, 2018 at 3:10 PM Graham Allan > wrote:
A lot more in xattrs which I won't paste, though the keys are:
> root@cephmon1:~# ssh ceph03 find
What is the output of `ceph osd stat`? My guess is that they are still
considered to be part of the cluster and going through the process of
removing OSDs from your cluster is what you need to do. In particular
`ceph osd rm 19`.
On Fri, Feb 16, 2018 at 2:31 PM Karsten Becker
Thanks for your reply
Do you mean, that's the problem with the disk I use for WAL and DB?
On Fri, Feb 16, 2018 at 11:33 PM, Gregory Farnum wrote:
>
> On Fri, Feb 16, 2018 at 7:37 AM Behnam Loghmani
> wrote:
>
>> Hi there,
>>
>> I have a Ceph
The questions I definitely know the answer to first, and then we'll
continue from there. If an OSD is blocking peering but is online, when you
mark it as down in the cluster it receives a message in it's log saying it
was wrongly marked down and tells the mons it is online. That gets it to
stop
That sounds like a good next step. Start with OSDs involved in the longest
blocked requests. Wait a couple minutes after the osd marks itself back up
and continue through them. Hopefully things will start clearing up so that
you don't need to mark all of them down. There is usually a only a
The disk that the monitor is on...there isn't anything for you to configure
about a monitor WAL though so I'm not sure how that enters into it?
On Fri, Feb 16, 2018 at 12:46 PM Behnam Loghmani
wrote:
> Thanks for your reply
>
> Do you mean, that's the problem with the
It seems your monitor capabilities are different to mine:
root@server3:/opt/ungleich-tools# ceph -k
/var/lib/ceph/mon/ceph-server3/keyring -n mon. auth list
2018-02-16 20:34:59.257529 7fe0d5c6b700 0 librados: mon. authentication error
(13) Permission denied
[errno 13] error connecting to the
Hi David.
So far everything else is fine.
> 46 osds: 46 up, 46 in; 1344 remapped pgs
And the rm gives:
> root@kong[/0]:~ # ceph osd rm 19
> osd.19 does not exist.
> root@kong[/0]:~ # ceph osd rm 20
> osd.20 does not exist.
The "devices" do NOT show up in "ceph osd tree" or "ceph osd df"...
Thanks David,
Taking the list of all OSDs that are stuck reports that a little over 50% of
all OSDs are in this condition. There isn’t any discernable pattern that I can
find and they are spread across the three servers. All of the OSDs are online
as far as the service is concern.
I have
Here is what I did - bash history:
> 1897 for n in 6 7 14 15 16 17 18 19 3 9 10 11 12 20; do ceph osd down
> osd.$n; done> 1920 for n in 6 7 14 15 16 17 18 19 3 9 10 11 12 20; do ceph
> osd out
osd.$n; done
> 1921 for n in 6 7 14 15 16 17 18 19 3 9 10 11 12 20; do ceph osd down
> osd.$n;
First you stop the service, then make sure they're down, out, crush remove,
auth del, and finally osd rm. You had it almost in the right order, but
you were down and outing them before you stopped them. That would allow
them to mark themselves back up and in. The down and out commands don't
On Fri, Feb 16, 2018 at 12:17 PM Graham Allan wrote:
> On 02/16/2018 12:31 PM, Graham Allan wrote:
> >
> > If I set debug rgw=1 and demug ms=1 before running the "object stat"
> > command, it seems to stall in a loop of trying communicate with osds for
> > pool 96, which is
I checked the disk that monitor is on it with smartctl and it didn't return
any error and it doesn't have any Current_Pending_Sector.
Do you recommend any disk checks to make sure that this disk has problem
and then I can send the report to the provider for replacing the disk
On Sat, Feb 17, 2018
On Thu, Feb 15, 2018 at 10:28 AM Cary wrote:
> Hello,
>
> I have enabled debugging on my MONs and OSDs to help troubleshoot
> these signature check failures. I was watching ods.4's log and saw
> these errors when the signature check failure happened.
>
> 2018-02-15
62 matches
Mail list logo