Re: [ceph-users] 1 mon unable to join the quorum

2018-03-30 Thread Brad Hubbard
I'm not sure I completely understand your "test". What exactly are you
trying to achieve and what documentation are you following?

On Fri, Mar 30, 2018 at 10:49 PM, Julien Lavesque
 wrote:
> Brad,
>
> Thanks for your answer
>
> On 30/03/2018 02:09, Brad Hubbard wrote:
>>
>> 2018-03-19 11:03:50.819493 7f842ed47640  0 mon.controller02 does not
>> exist in monmap, will attempt to join an existing cluster
>> 2018-03-19 11:03:50.820323 7f842ed47640  0 starting mon.controller02
>> rank -1 at 172.18.8.6:6789/0 mon_data
>> /var/lib/ceph/mon/ceph-controller02 fsid
>> f37f31b1-92c5-47c8-9834-1757a677d020
>>
>> We are called 'mon.controller02' and we can not find our name in the
>> local copy of the monmap.
>>
>> 2018-03-19 11:03:52.346318 7f842735d700 10
>> mon.controller02@-1(probing) e68  ready to join, but i'm not in the
>> monmap or my addr is blank, trying to join
>>
>> Our name is not in the copy of the monmap we got from peer controller01
>> either.
>
>
> During our test we have deleted completely the controller02 monitor and add
> it again.
>
> The log you have is when the controller02 is added (so it wasn't in the
> monmap before)
>
>
>>
>> $ cat ../controller02-mon_status.log
>> [root@controller02 ~]# ceph --admin-daemon
>> /var/run/ceph/ceph-mon.controller02.asok mon_status
>> {
>> "name": "controller02",
>> "rank": 1,
>> "state": "electing",
>> "election_epoch": 32749,
>> "quorum": [],
>> "outside_quorum": [],
>> "extra_probe_peers": [],
>> "sync_provider": [],
>> "monmap": {
>> "epoch": 71,
>> "fsid": "f37f31b1-92c5-47c8-9834-1757a677d020",
>> "modified": "2018-03-29 10:48:06.371157",
>> "created": "0.00",
>> "mons": [
>> {
>> "rank": 0,
>> "name": "controller01",
>> "addr": "172.18.8.5:6789\/0"
>> },
>> {
>> "rank": 1,
>> "name": "controller02",
>> "addr": "172.18.8.6:6789\/0"
>> },
>> {
>> "rank": 2,
>> "name": "controller03",
>> "addr": "172.18.8.7:6789\/0"
>> }
>> ]
>> }
>> }
>>
>> In the monmaps we are called 'controller02', not 'mon.controller02'.
>> These names need to be identical.
>>
>
> The cluster has been deployed using ceph-ansible with the servers hostname.
> All monitors are called mon.controller0x in the monmap and all the 3
> monitors have the same configuration
>
> We have the same behavior creating a monmap from scratch :
>
> [root@controller03 ~]# monmaptool --create --add controller01
> 172.18.8.5:6789 --add controller02 172.18.8.6:6789 --add controller03
> 172.18.8.7:6789 --fsid f37f31b1-92c5-47c8-9834-1757a677d020 --clobber
> test-monmap
> monmaptool: monmap file test-monmap
> monmaptool: set fsid to f37f31b1-92c5-47c8-9834-1757a677d020
> monmaptool: writing epoch 0 to test-monmap (3 monitors)
>
> [root@controller03 ~]# monmaptool --print test-monmap
> monmaptool: monmap file test-monmap
> epoch 0
> fsid f37f31b1-92c5-47c8-9834-1757a677d020
> last_changed 2018-03-30 14:42:18.809719
> created 2018-03-30 14:42:18.809719
> 0: 172.18.8.5:6789/0 mon.controller01
> 1: 172.18.8.6:6789/0 mon.controller02
> 2: 172.18.8.7:6789/0 mon.controller03
>
>
>>
>> On Thu, Mar 29, 2018 at 7:23 PM, Julien Lavesque
>>  wrote:
>>>
>>> Hi Brad,
>>>
>>> The results have been uploaded on the tracker
>>> (https://tracker.ceph.com/issues/23403)
>>>
>>> Julien
>>>
>>>
>>> On 29/03/2018 07:54, Brad Hubbard wrote:


 Can you update with the result of the following commands from all of the
 MONs?

 # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok mon_status
 # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok
 quorum_status

 On Thu, Mar 29, 2018 at 3:11 PM, Gauvain Pocentek
  wrote:
>
>
> Hello Ceph users,
>
> We are having a problem on a ceph cluster running Jewel: one of the
> mons
> left the quorum, and we  have not been able to make it join again. The
> two
> other monitors are running just fine, but obviously we need this third
> one.
>
> The problem happened before Jewel, when the cluster was running
> Infernalis.
> We upgraded hoping that it would solve the problem, but no luck.
>
> We've validated several things: no network problem, no clock skew, same
> OS
> and ceph version everywhere. We've also removed the mon completely, and
> recreated it. We also tried to run an additional mon on one of the OSD
> machines, this mon didn't join the quorum either.
>
> We've opened https://tracker.ceph.com/issues/23403 with logs from the 3
> mons
> during a fresh startup of the problematic logs.
>
> Is there anything we could try to do to 

Re: [ceph-users] Backfilling on Luminous

2018-03-30 Thread Pavan Rallabhandi
Somehow, I missed replying to this, the random split would be enabled for all 
new PGs or the PGs that get mapped to new OSDs. For existing OSDs, one has to 
use ceph-objectstore-tool’s apply-layout commad to run on each OSD while the 
OSD is offline.

If you want to pre-split PGs using ‘expected_num_objects’ at the time of pool 
creation, be aware of this fix http://tracker.ceph.com/issues/22530.

Thanks,
-Pavan.

From: David Turner 
Date: Tuesday, March 20, 2018 at 1:50 PM
To: Pavan Rallabhandi 
Cc: ceph-users 
Subject: EXT: Re: [ceph-users] Backfilling on Luminous

@Pavan, I did not know about 'filestore split rand factor'.  That looks like it 
was added in Jewel and I must have missed it.  To disable it, would I just set 
it to 0 and restart all of the OSDs?  That isn't an option at the moment, but 
restarting the OSDs after this backfilling is done is definitely doable.

On Mon, Mar 19, 2018 at 5:28 PM Pavan Rallabhandi 
> wrote:
David,

Pretty sure you must be aware of the filestore random split on existing OSD 
PGs, `filestore split rand factor`, may be you could try that too.

Thanks,
-Pavan.

From: ceph-users 
> 
on behalf of David Turner >
Date: Monday, March 19, 2018 at 1:36 PM
To: Caspar Smit >
Cc: ceph-users >
Subject: EXT: Re: [ceph-users] Backfilling on Luminous

Sorry for being away. I set all of my backfilling to VERY slow settings over 
the weekend and things have been stable, but incredibly slow (1% recovery from 
3% misplaced to 2% all weekend).  I'm back on it now and well rested.

@Caspar, SWAP isn't being used on these nodes and all of the affected OSDs have 
been filestore.

@Dan, I think you hit the nail on the head.  I didn't know that logging was 
added for subfolder splitting in Luminous!!! That's AMAZING  We are seeing 
consistent subfolder splitting all across the cluster.  The majority of the 
crashed OSDs have a split started before the crash and then commenting about it 
in the crash dump.  Looks like I just need to write a daemon to watch for 
splitting to start and throttle recovery until it's done.

I had injected the following timeout settings, but it didn't seem to affect 
anything.  I may need to have placed them in ceph.conf and let them pick up the 
new settings as the OSDs crashed, but I didn't really want different settings 
on some OSDs in the cluster.

osd_op_thread_suicide_timeout=1200 (from 180)
osd-recovery-thread-timeout=300  (from 30)

My game plan for now is to watch for splitting in the log, increase recovery 
sleep, decrease osd_recovery_max_active, and watch for splitting to finish 
before setting them back to more aggressive settings.  After this cluster is 
done backfilling I'm going to do my best to reproduce this scenario in a test 
environment and open a ticket to hopefully fix why this is happening so 
detrimentally.


On Fri, Mar 16, 2018 at 4:00 AM Caspar Smit 
> wrote:
Hi David,

What about memory usage?

1] 23 OSD nodes: 15x 10TB Seagate Ironwolf filestore with journals on Intel DC 
P3700, 70% full cluster, Dual Socket E5-2620 v4 @ 2.10GHz, 128GB RAM.

If you upgrade to bluestore, memory usage will likely increase. 15x10TB ~~ 
150GB RAM needed especially in recovery/backfilling scenario's like these.

Kind regards,
Caspar


2018-03-15 21:53 GMT+01:00 Dan van der Ster 
>:
Did you use perf top or iotop to try to identify where the osd is stuck?
Did you try increasing the op thread suicide timeout from 180s?

Splitting should log at the beginning and end of an op, so it should be clear 
if it's taking longer than the timeout.

.. Dan



On Mar 15, 2018 9:23 PM, "David Turner" 
> wrote:
I am aware of the filestore splitting happening.  I manually split all of the 
subfolders a couple weeks ago on this cluster, but every time we have 
backfilling the newly moved PGs have a chance to split before the backfilling 
is done.  When that has happened in the past it causes some blocked requests 
and will flap OSDs if we don't increase the osd_heartbeat_grace, but it has 
never consistently killed the OSDs during the task.  Maybe that's new in 
Luminous due to some of the priority and timeout settings.

This problem in general seems unrelated to the subfolder splitting, though, 
since it started to happen very quickly into the backfilling process.  
Definitely before many of the recently moved PGs would have reached that point. 
 I've also confirmed that the OSDs that are dying are not just stuck on a 
process (like it 

Re: [ceph-users] rgw make container private again

2018-03-30 Thread Vladimir Prokofev
As usual, I found solution after a while.
Metadata field is not deleting as it should by API docs, but it can be
changed. So I just changed it with
curl -X POST -i -H "X-Auth-Token:  -H "X-Container-Read:
:*" https://endpoint.url/swift/v1/containername
and now metadata field looks like this
X-Container-Read: :*

Essentialy this behaves the same as when there's no  X-Container-Read at
all.

But overall this is still an issue - what should've taken 5 seconds to just
uncheck a box in Horizon interface turned into couple hours of debugging.
Can anyone who uses same version check if this issue is reproducible? If so
- this seems to be a ticket-worthy.


2018-03-30 17:40 GMT+03:00 Vladimir Prokofev :

> CEPH 12.2.2, RGW.
> I'm using it as an object storage endpoint for Openstack.
>
> Recently while browsing an object storage from Horizon, I accidently
> marked container as public. The issue is - I can't make it private again!
> Docs state that to do it I should simply delete X-Container-Read metadata,
> but I just can't!
>
> Examples:
> private container headers(only relevant output, some other empty
> container):
> X-Container-Bytes-Used-Actual: 0
> X-Storage-Policy: default-placement
>
> public container headers(only relevant output):
> X-Container-Bytes-Used-Actual: 114688
> X-Container-Read: .r:*,.rlistings
> X-Storage-Policy: default-placement
>
> As you can see, there's now an X-Container-Read header.
>
>
> I've tried to make it back private with swift client and curl, but to no
> success. Here're some curl examples.
>
> Updating works!
> If I do
> curl -X POST -i -H "X-Auth-Token: " -H "X-Container-Read:
> .r:test" https://endpoint.url/swift/v1/containername
> metadata will become
> X-Container-Read: .r:test
>
> But if I do
> curl -X POST -i -H "X-Auth-Token: " -H
> "X-Remove-Container-Read: x" https://endpoint.url/swift/v1/containername
>
> nothing happens, metadata field will remain there.
>
> So is this a broken API in RGW, or am I missing something? Maybe there's
> some explicit warning that after becoming public you can't make container
> private again?
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw make container private again

2018-03-30 Thread Vladimir Prokofev
CEPH 12.2.2, RGW.
I'm using it as an object storage endpoint for Openstack.

Recently while browsing an object storage from Horizon, I accidently marked
container as public. The issue is - I can't make it private again!
Docs state that to do it I should simply delete X-Container-Read metadata,
but I just can't!

Examples:
private container headers(only relevant output, some other empty container):
X-Container-Bytes-Used-Actual: 0
X-Storage-Policy: default-placement

public container headers(only relevant output):
X-Container-Bytes-Used-Actual: 114688
X-Container-Read: .r:*,.rlistings
X-Storage-Policy: default-placement

As you can see, there's now an X-Container-Read header.


I've tried to make it back private with swift client and curl, but to no
success. Here're some curl examples.

Updating works!
If I do
curl -X POST -i -H "X-Auth-Token: " -H "X-Container-Read:
.r:test" https://endpoint.url/swift/v1/containername
metadata will become
X-Container-Read: .r:test

But if I do
curl -X POST -i -H "X-Auth-Token: " -H "X-Remove-Container-Read:
x" https://endpoint.url/swift/v1/containername

nothing happens, metadata field will remain there.

So is this a broken API in RGW, or am I missing something? Maybe there's
some explicit warning that after becoming public you can't make container
private again?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 1 mon unable to join the quorum

2018-03-30 Thread Julien Lavesque

Brad,

Thanks for your answer

On 30/03/2018 02:09, Brad Hubbard wrote:

2018-03-19 11:03:50.819493 7f842ed47640  0 mon.controller02 does not
exist in monmap, will attempt to join an existing cluster
2018-03-19 11:03:50.820323 7f842ed47640  0 starting mon.controller02
rank -1 at 172.18.8.6:6789/0 mon_data
/var/lib/ceph/mon/ceph-controller02 fsid
f37f31b1-92c5-47c8-9834-1757a677d020

We are called 'mon.controller02' and we can not find our name in the
local copy of the monmap.

2018-03-19 11:03:52.346318 7f842735d700 10
mon.controller02@-1(probing) e68  ready to join, but i'm not in the
monmap or my addr is blank, trying to join

Our name is not in the copy of the monmap we got from peer 
controller01 either.


During our test we have deleted completely the controller02 monitor and 
add it again.


The log you have is when the controller02 is added (so it wasn't in the 
monmap before)




$ cat ../controller02-mon_status.log
[root@controller02 ~]# ceph --admin-daemon
/var/run/ceph/ceph-mon.controller02.asok mon_status
{
"name": "controller02",
"rank": 1,
"state": "electing",
"election_epoch": 32749,
"quorum": [],
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": {
"epoch": 71,
"fsid": "f37f31b1-92c5-47c8-9834-1757a677d020",
"modified": "2018-03-29 10:48:06.371157",
"created": "0.00",
"mons": [
{
"rank": 0,
"name": "controller01",
"addr": "172.18.8.5:6789\/0"
},
{
"rank": 1,
"name": "controller02",
"addr": "172.18.8.6:6789\/0"
},
{
"rank": 2,
"name": "controller03",
"addr": "172.18.8.7:6789\/0"
}
]
}
}

In the monmaps we are called 'controller02', not 'mon.controller02'.
These names need to be identical.



The cluster has been deployed using ceph-ansible with the servers 
hostname. All monitors are called mon.controller0x in the monmap and all 
the 3 monitors have the same configuration


We have the same behavior creating a monmap from scratch :

[root@controller03 ~]# monmaptool --create --add controller01 
172.18.8.5:6789 --add controller02 172.18.8.6:6789 --add controller03 
172.18.8.7:6789 --fsid f37f31b1-92c5-47c8-9834-1757a677d020 --clobber 
test-monmap

monmaptool: monmap file test-monmap
monmaptool: set fsid to f37f31b1-92c5-47c8-9834-1757a677d020
monmaptool: writing epoch 0 to test-monmap (3 monitors)

[root@controller03 ~]# monmaptool --print test-monmap
monmaptool: monmap file test-monmap
epoch 0
fsid f37f31b1-92c5-47c8-9834-1757a677d020
last_changed 2018-03-30 14:42:18.809719
created 2018-03-30 14:42:18.809719
0: 172.18.8.5:6789/0 mon.controller01
1: 172.18.8.6:6789/0 mon.controller02
2: 172.18.8.7:6789/0 mon.controller03



On Thu, Mar 29, 2018 at 7:23 PM, Julien Lavesque
 wrote:

Hi Brad,

The results have been uploaded on the tracker
(https://tracker.ceph.com/issues/23403)

Julien


On 29/03/2018 07:54, Brad Hubbard wrote:


Can you update with the result of the following commands from all of 
the

MONs?

# ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok 
mon_status
# ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok 
quorum_status


On Thu, Mar 29, 2018 at 3:11 PM, Gauvain Pocentek
 wrote:


Hello Ceph users,

We are having a problem on a ceph cluster running Jewel: one of the 
mons
left the quorum, and we  have not been able to make it join again. 
The

two
other monitors are running just fine, but obviously we need this 
third

one.

The problem happened before Jewel, when the cluster was running
Infernalis.
We upgraded hoping that it would solve the problem, but no luck.

We've validated several things: no network problem, no clock skew, 
same

OS
and ceph version everywhere. We've also removed the mon completely, 
and
recreated it. We also tried to run an additional mon on one of the 
OSD

machines, this mon didn't join the quorum either.

We've opened https://tracker.ceph.com/issues/23403 with logs from 
the 3

mons
during a fresh startup of the problematic logs.

Is there anything we could try to do to resolve this issue? We are
getting
out of ideas.

We'd appreciate any suggestion!

Gauvain Pocentek

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com