18bf40).accept peer addr is really 192.168.112.192:0/1457324982 (socket is 192.168.112.192:47128/0)
Regards
Prabu
On Tue, 03 Nov 2015 12:50:40 +0530 CHRIS TAYLOR
<ctay...@eyonic.com> wrote
On 2015-11-02 10:19 pm, gjprabu wrote:
Hi Taylor,
I have checked DNS name and all
.
Regards
Prabu
On Tue, 03 Nov 2015 11:20:07 +0530 CHRIS TAYLOR
<ctay...@eyonic.com> wrote
I would double check the network configuration on the new node.
Including hosts files and DNS names. Do all the host names resolve to
the correct IP addresses from all
I would double check the network configuration on the new node.
Including hosts files and DNS names. Do all the host names resolve to
the correct IP addresses from all hosts?
"... 192.168.112.231:6800/49908 >> 192.168.113.42:0/599324131 ..."
Looks like the communication between subnets is a
Is there some way to manually correct this error while this bug is still
needing review? I have one PG that is stuck inconsistent with the same
error. I already created a new RBD image and migrated the data to it.
The original RBD image was "rb.0.ac3386.238e1f29". The new image is
I have one placement group that is stuck inconsistent.
$ ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 8.e82 is active+clean+inconsistent, acting [15,43]
1 scrub errors
I tried to run "ceph pg repair 8.e82" but it will not repair it. In the
OSD log with debugging
The cluster version is 0.94.3.
On 2015-10-17 2:25 am, Chris Taylor wrote:
> I have one placement group that is stuck inconsistent.
>
> $ ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> pg 8.e82 is active+clean+inconsistent, acting [15,43]
> 1 sc
I have an OSD that went down while the cluster was recovering from
another OSD being reweighted. The cluster appears to be stuck in
recovery since the number of degraded and misplaced objects are not
decreasing.
It is a three node cluster in production and the pool size is 2. Ceph
version
54158'13420",
"created": 250794,
"last_epoch_clean": 256847,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "254220'13929",
"last_scrub_stamp": "2015-10-09 20:18:15.856071",
"
osd/osd_types.cc: 4076: FAILED assert(clone_size.count(clone))
Thanks,
Chris
On 9/3/15 2:22 AM, Gregory Farnum wrote:
On Thu, Sep 3, 2015 at 7:48 AM, Chris Taylor <ctay...@eyonic.com> wrote:
I removed the latest OSD that was respawing (osd.23) and now I
having the
same problem with osd
On 09/03/2015 02:22 AM, Gregory Farnum wrote:
On Thu, Sep 3, 2015 at 7:48 AM, Chris Taylor <ctay...@eyonic.com> wrote:
I removed the latest OSD that was respawing (osd.23) and now I having the
same problem with osd.30. It looks like they both have pg 3.f9 in common. I
tried "ceph pg
]
["3.f9",{"oid":"rb.0.8c2990.238e1f29.8cc0","key":"","snapid":9198,"hash":###,"max":0,"pool":3,"namespace":"","max":0}]
["3.f9",{"oid":"rb.0.8c
please send me your ceph.conf? I tried to set "mon addr" but
it looks like that it was ignored all the time.
Regards - Willi
Am 07.09.15 um 20:47 schrieb Chris Taylor:
My monitors are only connected to the public network, not the cluster
network. Only the OSDs are connected to t
e.
David
On 9/5/15 3:24 PM, Chris Taylor wrote:
# ceph-dencoder type SnapSet import /tmp/snap.out decode dump_json
{
"snap_context": {
"seq": 9197,
"snaps": [
9197
]
},
"head_exists": 1,
"clones":
ool::WorkThread::entry()+0x10) [0xbb5300]
10: (()+0x8182) [0x7fbd4113c182]
11: (clone()+0x6d) [0x7fbd3f6a747d]
NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.
On 08/28/2015 11:55 AM, Chris Taylor wrote:
Fellow Ceph Users,
I have 3 OSD node
f9/rb.0.8c2990.238e1f29.00008cc0/23ed//3
-1> 2015-09-03 20:11:52.472032 7fdc0d42c700 -1 log_channel(cluster)
log [ERR] : be_compare_scrubmaps: 3.f9 shard 30 missing
c55800f9/rb.0.8c2990.238e1f29.8cc0/23ed//3
0> 2015-09-03 20:11:52.475693 7fdc0d42c700 -1 osd/osd_types.
My monitors are only connected to the public network, not the cluster
network. Only the OSDs are connected to the cluster network.
Take a look at the diagram here:
http://ceph.com/docs/master/rados/configuration/network-config-ref/
-Chris
On 09/07/2015 03:15 AM, Willi Fehler wrote:
Hi,
any
Fellow Ceph Users,
I have 3 OSD nodes and 3 MONs on separate servers. Our storage was near
full on some OSD so we added additional drives, almost doubling our
space. Since then we are getting OSDs that are respawning. We added
additional RAM to the OSD nodes, from 12G to 24G. It started with
Hi Oliver,
Have you tried tuning some of the cluster settings to fix the IO errors
in the VMs?
We found some of the same issues when reweighting, backfilling and
removing large snapshots. By minimizing the number of concurrent
backfills and prioritizing client IO we can now add/remove OSDs
We are using a single pool for all our RBD images.
You could create different pools based on performance and replication needs.
Say one with all SSDs and one with SATA. Then put your RBD images in the
appropriate pool.
Each host is also using the same user. You could use a different user for
Maybe a long shot, but have you checked OSD memory usage? Are the OSD
hosts low on RAM and swapping to disk?
I am not familiar with your issue, but though that might cause it.
Chris
On 2016-11-14 3:29 pm, Brad Hubbard wrote:
> Have you looked for clues in the output of dump_historic_ops
Kevin,
After changing the pool size to 3, make sure the min_size is set to 1 to
allow 2 of the 3 hosts to be offline.
http://docs.ceph.com/docs/master/rados/operations/pools/#set-pool-values
[2]
How many MONs do you have and are they on the same OSD hosts? If you
have 3 MONs running on
I see on this list often that peering issues are related to networking
and MTU sizes. Perhaps the HP 5400's or the managed switches did not
have jumbo frames enabled?
Hope that helps you determine the issue in case you want to move the
nodes back to the other location.
Chris
On 2016-10-11
On 2017-11-20 3:39 am, Matteo Dacrema wrote:
Yes I mean the existing Cluster.
SSDs are on a fully separate pool.
Cluster is not busy during recovery and deep scrubs but I think it’s
better to limit replication in some way when switching to replica 3.
My question is to understand if I need to
On 2018-06-19 12:17 pm, Frank de Bot (lists) wrote:
Frank (lists) wrote:
Hi,
On a small cluster (3 nodes) I frequently have slow requests. When
dumping the inflight ops from the hanging OSD, it seems it doesn't get
a
'response' for one of the subops. The events always look like:
I've
> On Nov 9, 2018, at 1:38 PM, Gregory Farnum wrote:
>
>> On Fri, Nov 9, 2018 at 2:24 AM Kenneth Waegeman
>> wrote:
>> Hi all,
>>
>> On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some
>> snapshots:
>>
>> [root@osd001 ~]# ceph -s
>>cluster:
>> id:
It seems like since the change to the 9 months cadence it has been bumpy
for the Debian based installs. Changing to a 12 month cadence sounds
like a good idea. Perhaps some Debian maintainers can suggest a good
month for them to get the packages in time for their release cycle.
On
Full disclosure - I have not created an erasure code pool yet!
I have been wanting to do the same thing that you are attempting and
have these links saved. I believe this is what you are looking for.
This link is for decompiling the CRUSH rules and recompiling:
27 matches
Mail list logo