Hello list,
Just curious if anyone has ever seen this behavior and might have some
ideas on how to troubleshoot it.
We're seeing very high iowait in iostat across all OSD's in on a single OSD
host. It's very spiky - dropping to zero and then shooting up to as high as
400 in some cases. Despite
So I'm not sure if this was the best or right way to do this but --
using rados I confirmed the unfound object was in the cephfs_data pool
# rados -p cephfs_data ls|grep 001c0ed4
using the osdmap tool I found the pg/osd the unfound object was in --
# osdmaptool --test-map-object
I'm using bcache (starting around the middle of December...before that
see way higher await) for all the 12 hdds on the 2 SSDs, and NVMe for
journals. (and some months ago I changed all the 2TB disks to 6TB and
added ceph4,5)
Here's my iostat in ganglia:
just raw per disk await
On 07/03/2017 02:36 PM, Tim Serong wrote:
> It's that time of year again, folks! Please everyone go submit talks,
> or at least plan to attend this most excellent of F/OSS conferences.
CFP closes in a bit over a week (August 6). Get into it if you didn't
already :-)
> (I thought I might put in
Hey Cephers,
This is just a friendly reminder that the next Ceph Developer Montly
meeting is coming up:
https://wiki.ceph.com/Planning
If you have work that you're doing that it a feature work, significant
backports, or anything you would like to discuss with the core team,
please add it to
The clients receive up to date versions of the osd map which includes which
osds are down. So yes, when an osd is marked down in the cluster the
clients know about it. If an osd is unreachable but isn't marked down in
the cluster, the result is blocked requests.
On Thu, Jul 27, 2017, 1:21 PM
Does the client track which OSDs are reachable? How does it behave if some
are not reachable?
For example:
Cluster network with all OSD hosts on a switch.
Public network with OSD hosts split between two switches, failure domain is
switch.
copies=3 so with a failure of the public switch, 1 copy
I had same issue on Lumninous and worked around it by disabling ceph-disk.
The osds can start without it.
On Thu, Jul 27, 2017 at 3:36 PM Oscar Segarra
wrote:
> Hi,
>
> First of all, my version:
>
> [root@vdicnode01 ~]# ceph -v
> ceph version 12.1.1
Hi,
In my environment I have 3 hosts, every host has 2 network interfaces:
public: 192.168.2.0/24
cluster: 192.168.100.0/24
The hostname "vdicnode01", "vdicnode02" and "vdicnode03" are resolved by
public DNS through the public interface, that means the "ping vdicnode01"
will resolve
Hi Roger,
Thanks a lot, I will try your workarround.
I have opened a bug in order devs to review it as soon as they have
availability.
http://tracker.ceph.com/issues/20807
2017-07-27 23:39 GMT+02:00 Roger Brown :
> I had same issue on Lumninous and worked around it by
The only thing that is supposed to use the cluster network are the OSDs.
Not even the MONs access the cluster network. I am sure that if you have a
need to make this work that you can find a way, but I don't know that one
exists in the standard tool set.
You might try temporarily setting the
I’ve got a cluster where a bunch of OSDs are down/out (only 6/21 are up/in).
ceph status and ceph osd tree output can be found at:
https://gist.github.com/jbw976/24895f5c35ef0557421124f4b26f6a12
In osd.4 log, I see many of these:
2017-07-27 19:38:53.468852 7f3855c1c700 -1 osd.4 120
My first suspicion would be the HBA. Are you using a RAID HBA? If so I
suggest checking the status of your BBU/FBWC and cache policy.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hi,
First of all, my version:
[root@vdicnode01 ~]# ceph -v
ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
When I boot my ceph node (I have an all in one) I get the following message
in boot.log:
*[FAILED] Failed to start Ceph disk activation: /dev/sdb2.*
*See
On Fri, Jul 28, 2017 at 6:06 AM, Jared Watts wrote:
> I’ve got a cluster where a bunch of OSDs are down/out (only 6/21 are up/in).
> ceph status and ceph osd tree output can be found at:
>
> https://gist.github.com/jbw976/24895f5c35ef0557421124f4b26f6a12
>
>
>
> In osd.4
An update on this.
The "attempt to access beyond end of device" messages are created due to a
kernel bug which is rectified by the following patches.
- 59d43914ed7b9625(vfs: make guard_bh_eod() more generic)
- 4db96b71e3caea(vfs: guard end of device for mpage interface)
An upgraded Red Hat
yaoning, haomai, Json
what about the "recovery what is really modified" feature? I didn't see any
update on github recently, will it be further developed?
https://github.com/ceph/ceph/pull/3837 (PG:: recovery optimazation: recovery
what is really modified)
Thanks a lot.
I could be wrong, but I think you cannot achieve this objective. If you
declare a cluster network, OSDs will route heartbeat, object replication
and recovery traffic over the cluster network. We prefer that the cluster
network is NOT reachable from the public network or the Internet for added
Sorry! I'd like to add that I want to use the cluster network for both
purposes:
ceph-deploy --username vdicceph new vdicnode01 --cluster-network
192.168.100.0/24 --public-network 192.168.100.0/24
Thanks a lot
2017-07-28 0:29 GMT+02:00 Oscar Segarra :
> Hi,
>
> ¿Do you
19 matches
Mail list logo