Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-03 Thread Chris Taylor
18bf40).accept peer addr is really 192.168.112.192:0/1457324982 (socket is 192.168.112.192:47128/0) Regards Prabu On Tue, 03 Nov 2015 12:50:40 +0530 CHRIS TAYLOR <ctay...@eyonic.com> wrote On 2015-11-02 10:19 pm, gjprabu wrote: Hi Taylor, I have checked DNS name and all

Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-02 Thread Chris Taylor
. Regards Prabu On Tue, 03 Nov 2015 11:20:07 +0530 CHRIS TAYLOR <ctay...@eyonic.com> wrote I would double check the network configuration on the new node. Including hosts files and DNS names. Do all the host names resolve to the correct IP addresses from all

Re: [ceph-users] ceph new osd addition and client disconnected

2015-11-02 Thread Chris Taylor
I would double check the network configuration on the new node. Including hosts files and DNS names. Do all the host names resolve to the correct IP addresses from all hosts? "... 192.168.112.231:6800/49908 >> 192.168.113.42:0/599324131 ..." Looks like the communication between subnets is a

[ceph-users] Help with Bug #12738: scrub bogus results when missing a clone

2015-10-21 Thread Chris Taylor
Is there some way to manually correct this error while this bug is still needing review? I have one PG that is stuck inconsistent with the same error. I already created a new RBD image and migrated the data to it. The original RBD image was "rb.0.ac3386.238e1f29". The new image is

[ceph-users] deep-scrub error: missing clones

2015-10-17 Thread Chris Taylor
I have one placement group that is stuck inconsistent. $ ceph health detail HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 8.e82 is active+clean+inconsistent, acting [15,43] 1 scrub errors I tried to run "ceph pg repair 8.e82" but it will not repair it. In the OSD log with debugging

Re: [ceph-users] deep-scrub error: missing clones

2015-10-17 Thread Chris Taylor
The cluster version is 0.94.3. On 2015-10-17 2:25 am, Chris Taylor wrote: > I have one placement group that is stuck inconsistent. > > $ ceph health detail > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors > pg 8.e82 is active+clean+inconsistent, acting [15,43] > 1 sc

[ceph-users] OSD will not start

2015-10-11 Thread Chris Taylor
I have an OSD that went down while the cluster was recovering from another OSD being reweighted. The cluster appears to be stuck in recovery since the number of degraded and misplaced objects are not decreasing. It is a three node cluster in production and the pool size is 2. Ceph version

Re: [ceph-users] OSD will not start

2015-10-11 Thread Chris Taylor
54158'13420", "created": 250794, "last_epoch_clean": 256847, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "254220'13929", "last_scrub_stamp": "2015-10-09 20:18:15.856071", "

Re: [ceph-users] OSD respawning -- FAILED assert(clone_size.count(clone))

2015-09-03 Thread Chris Taylor
osd/osd_types.cc: 4076: FAILED assert(clone_size.count(clone)) Thanks, Chris On 9/3/15 2:22 AM, Gregory Farnum wrote: On Thu, Sep 3, 2015 at 7:48 AM, Chris Taylor <ctay...@eyonic.com> wrote: I removed the latest OSD that was respawing (osd.23) and now I having the same problem with osd

Re: [ceph-users] OSD respawning -- FAILED assert(clone_size.count(clone))

2015-09-03 Thread Chris Taylor
On 09/03/2015 02:22 AM, Gregory Farnum wrote: On Thu, Sep 3, 2015 at 7:48 AM, Chris Taylor <ctay...@eyonic.com> wrote: I removed the latest OSD that was respawing (osd.23) and now I having the same problem with osd.30. It looks like they both have pg 3.f9 in common. I tried "ceph pg

Re: [ceph-users] OSD respawning -- FAILED assert(clone_size.count(clone))

2015-09-05 Thread Chris Taylor
] ["3.f9",{"oid":"rb.0.8c2990.238e1f29.8cc0","key":"","snapid":9198,"hash":###,"max":0,"pool":3,"namespace":"","max":0}] ["3.f9",{"oid":"rb.0.8c

Re: [ceph-users] Ceph monitor ip address issue

2015-09-08 Thread Chris Taylor
please send me your ceph.conf? I tried to set "mon addr" but it looks like that it was ignored all the time. Regards - Willi Am 07.09.15 um 20:47 schrieb Chris Taylor: My monitors are only connected to the public network, not the cluster network. Only the OSDs are connected to t

Re: [ceph-users] OSD respawning -- FAILED assert(clone_size.count(clone))

2015-09-08 Thread Chris Taylor
e. David On 9/5/15 3:24 PM, Chris Taylor wrote: # ceph-dencoder type SnapSet import /tmp/snap.out decode dump_json { "snap_context": { "seq": 9197, "snaps": [ 9197 ] }, "head_exists": 1, "clones":

Re: [ceph-users] OSD respawning -- FAILED assert(clone_size.count(clone))

2015-09-03 Thread Chris Taylor
ool::WorkThread::entry()+0x10) [0xbb5300] 10: (()+0x8182) [0x7fbd4113c182] 11: (clone()+0x6d) [0x7fbd3f6a747d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. On 08/28/2015 11:55 AM, Chris Taylor wrote: Fellow Ceph Users, I have 3 OSD node

Re: [ceph-users] OSD respawning -- FAILED assert(clone_size.count(clone))

2015-09-03 Thread Chris Taylor
f9/rb.0.8c2990.238e1f29.00008cc0/23ed//3 -1> 2015-09-03 20:11:52.472032 7fdc0d42c700 -1 log_channel(cluster) log [ERR] : be_compare_scrubmaps: 3.f9 shard 30 missing c55800f9/rb.0.8c2990.238e1f29.8cc0/23ed//3 0> 2015-09-03 20:11:52.475693 7fdc0d42c700 -1 osd/osd_types.

Re: [ceph-users] Ceph monitor ip address issue

2015-09-07 Thread Chris Taylor
My monitors are only connected to the public network, not the cluster network. Only the OSDs are connected to the cluster network. Take a look at the diagram here: http://ceph.com/docs/master/rados/configuration/network-config-ref/ -Chris On 09/07/2015 03:15 AM, Willi Fehler wrote: Hi, any

[ceph-users] OSD respawning -- FAILED assert(clone_size.count(clone))

2015-08-28 Thread Chris Taylor
Fellow Ceph Users, I have 3 OSD nodes and 3 MONs on separate servers. Our storage was near full on some OSD so we added additional drives, almost doubling our space. Since then we are getting OSDs that are respawning. We added additional RAM to the OSD nodes, from 12G to 24G. It started with

Re: [ceph-users] adding cache tier in productive hammer environment

2016-04-07 Thread Chris Taylor
Hi Oliver, Have you tried tuning some of the cluster settings to fix the IO errors in the VMs? We found some of the same issues when reweighting, backfilling and removing large snapshots. By minimizing the number of concurrent backfills and prioritizing client IO we can now add/remove OSDs

Re: [ceph-users] pools per hypervisor?

2016-09-12 Thread Chris Taylor
We are using a single pool for all our RBD images. You could create different pools based on performance and replication needs. Say one with all SSDs and one with SATA. Then put your RBD images in the appropriate pool. Each host is also using the same user. You could use a different user for

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-14 Thread Chris Taylor
Maybe a long shot, but have you checked OSD memory usage? Are the OSD hosts low on RAM and swapping to disk? I am not familiar with your issue, but though that might cause it. Chris On 2016-11-14 3:29 pm, Brad Hubbard wrote: > Have you looked for clues in the output of dump_historic_ops

Re: [ceph-users] How are replicas spread in default crush configuration?

2016-11-23 Thread Chris Taylor
Kevin, After changing the pool size to 3, make sure the min_size is set to 1 to allow 2 of the 3 hosts to be offline. http://docs.ceph.com/docs/master/rados/operations/pools/#set-pool-values [2] How many MONs do you have and are they on the same OSD hosts? If you have 3 MONs running on

Re: [ceph-users] New OSD Nodes, pgs haven't changed state

2016-10-11 Thread Chris Taylor
I see on this list often that peering issues are related to networking and MTU sizes. Perhaps the HP 5400's or the managed switches did not have jumbo frames enabled? Hope that helps you determine the issue in case you want to move the nodes back to the other location. Chris On 2016-10-11

Re: [ceph-users] Switch to replica 3

2017-11-20 Thread Chris Taylor
On 2017-11-20 3:39 am, Matteo Dacrema wrote: Yes I mean the existing Cluster. SSDs are on a fully separate pool. Cluster is not busy during recovery and deep scrubs but I think it’s better to limit replication in some way when switching to replica 3. My question is to understand if I need to

Re: [ceph-users] Frequent slow requests

2018-06-19 Thread Chris Taylor
On 2018-06-19 12:17 pm, Frank de Bot (lists) wrote: Frank (lists) wrote: Hi, On a small cluster (3 nodes) I frequently have slow requests. When dumping the inflight ops from the hanging OSD, it seems it doesn't get a 'response' for one of the subops. The events always look like: I've

Re: [ceph-users] slow ops after cephfs snapshot removal

2018-11-09 Thread Chris Taylor
> On Nov 9, 2018, at 1:38 PM, Gregory Farnum wrote: > >> On Fri, Nov 9, 2018 at 2:24 AM Kenneth Waegeman >> wrote: >> Hi all, >> >> On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some >> snapshots: >> >> [root@osd001 ~]# ceph -s >>cluster: >> id:

Re: [ceph-users] Changing the release cadence

2019-06-05 Thread Chris Taylor
It seems like since the change to the 9 months cadence it has been bumpy for the Debian based installs. Changing to a 12 month cadence sounds like a good idea. Perhaps some Debian maintainers can suggest a good month for them to get the packages in time for their release cycle. On

Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?

2019-10-18 Thread Chris Taylor
Full disclosure - I have not created an erasure code pool yet! I have been wanting to do the same thing that you are attempting and have these links saved. I believe this is what you are looking for. This link is for decompiling the CRUSH rules and recompiling: