Hi all,
I have setup a ceph cluster in my lab recently, the configuration per my
understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of
PG stuck with state "active+undersized+degraded", I think this should be very
generic issue, could anyone help me out?
Here is the
Hey Igor, patch that you pointed worked for me.
Thanks Again.
From: ceph-users On Behalf Of Igor Fedotov
Sent: 20 June 2018 21:55
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] issues with ceph nautilus version
Hi Raju,
This is a bug in new BlueStore's bitmap allocator.
This PR
As a part of the repair operation it runs a deep-scrub on the PG. If it
showed active+clean after the repair and deep-scrub finished, then the next
run of a scrub on the PG shouldn't change the PG status at all.
On Wed, Jun 6, 2018 at 8:57 PM Adrian wrote:
> Update to this.
>
> The affected pg
We originally used pacemaker to move a VIP between our RGWs, but ultimately
decided to go with an LB in front of them. With an LB you can utilize both
RGWs while they're up, but the LB will shy away from either if they're down
until the check starts succeeding for that host again. We do have 2
Hi Igor,
Great! Thanks for the quick response.
Will try the fix and let you know how it goes.
-Raj
From: ceph-users On Behalf Of Igor Fedotov
Sent: 20 June 2018 21:55
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] issues with ceph nautilus version
Hi Raju,
This is a bug in new
Hi Raju,
This is a bug in new BlueStore's bitmap allocator.
This PR will most probably fix that:
https://github.com/ceph/ceph/pull/22610
Also you may try to switch bluestore and bluefs allocators
(bluestore_allocator and bluefs_allocator parameters respectively) to
stupid and restart
I've also seen something similar with Luminous once on broken OSDs reporting
nonsense stats that overflowed some variables and reporting 1000% full.
In my case it was Bluestore OSDs running on too tiny VMs.
Paul
2018-06-20 17:41 GMT+02:00 Raju Rangoju :
> Hi,
>
>
>
> Recently I have
Yeah, your tunables are ancient. Probably wouldn't have happened with
modern ones.
If this was my cluster I would probably update the clients and update that
(caution: lots of data movement!),
but I know how annoying it can be to chase down everyone who runs ancient
clients.
For comparison, this
Hi Paul,
ah, right, "ceph pg dump | grep remapped", that's what I was looking
for. I added the output and the result of the pg query at the end of
https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb
> But my guess here is that you are running a CRUSH rule to distribute across
Hay All
Has any one, done or working a way to do S3(radosgw) failover.
I am trying to work out away to have 2 radosgw servers, with an VIP
when one server goes down it will go over to the other.
I am trying this with CTDB, but while testing the upload can fail and then
carry on or just hand and
Hi,
Recently I have upgraded my ceph cluster to version 14.0.0 - nautilus(dev) from
ceph version 13.0.1, after this, I noticed some weird data usage numbers on the
cluster.
Here are the issues I'm seeing...
1. The data usage reported is much more than what is available
usage: 16 EiB
And BTW, if you can't make it to this event we're in the early days of
planning a dedicated Ceph + OpenStack Days at CERN around May/June
2019.
More news on that later...
-- Dan @ CERN
On Tue, Jun 19, 2018 at 10:23 PM Leonardo Vaz wrote:
>
> Hey Cephers,
>
> We will join our friends from
Hi,
have a look at "ceph pg dump" to see which ones are stuck in remapped.
But my guess here is that you are running a CRUSH rule to distribute across
3 racks
and you only have 3 racks in total.
CRUSH will sometimes fail to find a mapping in this scenario. There are a
few parameters
that you can
Hi Leo,
On 06/20/2018 01:47 AM, Leonardo Vaz wrote:
> We created the following etherpad to organize the calendar for the
> future Ceph Tech Talks.
>
> For the Ceph Tech Talk of June 28th our fellow George Mihaiescu will
> tell us how Ceph is being used on cancer research at OICR (Ontario
>
Hi Cephers,
Due the July 4th holiday in US we are postponing the Ceph Developer
Monthly meeting to July 11th.
Kindest regards,
Leo
--
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
Hi Brad,
Yes, but it doesn't show much:
ceph pg 18.2 query
Error EPERM: problem getting command descriptions from pg.18.2
Cheers
- Original Message -
> From: "Brad Hubbard"
> To: "andrei"
> Cc: "ceph-users"
> Sent: Wednesday, 20 June, 2018 00:02:07
> Subject: Re: [ceph-users]
Can you post the full output of "ceph -s", "ceph health detail, and ceph
osd df tree
Also please run "ceph pg X.YZ query" on one of the PGs not backfilling.
Paul
2018-06-20 15:25 GMT+02:00 Oliver Schulz :
> Dear all,
>
> we (somewhat) recently extended our Ceph cluster,
> and updated it to
On Wed, Jun 20, 2018 at 7:27 AM, Bernhard Dick wrote:
> Hi,
>
> I'm experimenting with CEPH and have seen that ceph-deploy and ceph-ansible
> have the EPEL repositories as requirement, when installing CEPH on CENTOS
> hosts. Due to the nature of the EPEL repos this might cause trouble (i.e.
>
Dear all,
we (somewhat) recently extended our Ceph cluster,
and updated it to Luminous. By now, the fill level
on some ODSs is quite high again, so I'd like to
re-balance via "OSD reweight".
I'm running into the following problem, however:
Not matter what I do (reweigt a little, or a lot,
or
Thanks for the response. I was also hoping to be able to debug better
once we got onto Mimic. We just finished that upgrade yesterday and
cephfs-journal-tool does find a corruption in the purge queue though
our MDS continues to startup and the filesystem appears to be
functional as usual.
How
adding back in the list :)
-- Forwarded message -
From: Luis Periquito
Date: Wed, Jun 20, 2018 at 1:54 PM
Subject: Re: [ceph-users] Planning all flash cluster
To:
On Wed, Jun 20, 2018 at 1:35 PM Nick A wrote:
>
> Thank you, I was under the impression that 4GB RAM per 1TB was
Hi,
It sounds like the .rgw.bucket.index pool has grown maybe due to some
problem with dynamic bucket resharding.
I wonder if the (stale/old/not used) bucket index's needs to be purged
using something like the below
radosgw-admin bi purge --bucket= --bucket-id=
Not sure how you would find the
Adding more nodes from the beginning would probably be a good idea.
On Wed, Jun 20, 2018 at 12:58 PM Nick A wrote:
>
> Hello Everyone,
>
> We're planning a small cluster on a budget, and I'd like to request any
> feedback or tips.
>
> 3x Dell R720XD with:
> 2x Xeon E5-2680v2 or very similar
The
Another great thing about lots of small servers vs. few big servers is that
you can use erasure coding.
You can save a lot of money by using erasure coding, but performance will
have to be evaluated
for your use case.
I'm working with several clusters that are 8-12 servers with 6-10 SSDs each
This is true, but misses the point that the OP is talking about old
hardware already - you're not going to save much money on removing a 2nd
hand CPU from a system.
On Wed, 20 Jun 2018 at 22:10, Wido den Hollander wrote:
>
>
> On 06/20/2018 02:00 PM, Robert Sander wrote:
> > On 20.06.2018
On 06/20/2018 02:00 PM, Robert Sander wrote:
> On 20.06.2018 13:58, Nick A wrote:
>
>> We'll probably add another 2 OSD drives per month per node until full
>> (24 SSD's per node), at which point, more nodes.
>
> I would add more nodes earlier to achieve better overall performance.
Exactly.
* More small servers give better performance then few big servers, maybe
twice the number of servers with half the disks, cpus and RAM
* 2x 10 gbit is usually enough, especially with more servers. that will
rarely be the bottleneck (unless you have extreme bandwidth requirements)
* maybe save
On 20.06.2018 13:58, Nick A wrote:
> We'll probably add another 2 OSD drives per month per node until full
> (24 SSD's per node), at which point, more nodes.
I would add more nodes earlier to achieve better overall performance.
Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str.
Hello Everyone,
We're planning a small cluster on a budget, and I'd like to request any
feedback or tips.
3x Dell R720XD with:
2x Xeon E5-2680v2 or very similar
96GB RAM
2x Samsung SM863 240GB boot/OS drives
4x Samsung SM863 960GB OSD drives
Dual 40/56Gbit Infiniband using IPoIB.
3 replica, MON
Hi all,
We have recently upgraded from Jewel (10.2.10) to Luminous (12.2.5) and after
this we decided to update our tunables configuration to the optimals, which
were previously at Firefly. During this process, we have noticed the OSDs
(bluestore) rapidly filling on the RGW index and GC pool.
Hi,
I'm experimenting with CEPH and have seen that ceph-deploy and
ceph-ansible have the EPEL repositories as requirement, when installing
CEPH on CENTOS hosts. Due to the nature of the EPEL repos this might
cause trouble (i.e. when combining CEPH with oVirt on the same host).
When using the
Hi Wladimir,
A combination of slow enough clock speed , erasure code, single node
and SATA spinners is probably going to lead to not a really great
evaluation. Some of the experts will chime in here with answers to
your specific questions I"m sure but this test really isn't ever going
to give
Dear all,
I set up a minimal 1-node Ceph cluster to evaluate its performance.
We tried to save as much as possible on the hardware, so now the box has
Asus P10S-M WS motherboard, Xeon E3-1235L v5 CPU, 64 GB DDR4 ECC RAM and
8x3TB HDDs (WD30EFRX) connected to on-board SATA ports. Also
Hi,
at the moment, we use Icinga2, check_ceph* and Telegraf with the Ceph
plugin. I'm asking what I need to have a separate host, which knows all
about the Ceph cluster health. The reason is, that each OSD node has
mostly the exact same data, which is transmitted into our database (like
InfluxDB
34 matches
Mail list logo