Re: [ceph-users] clock skew

2019-04-25 Thread huang jun
mj  于2019年4月25日周四 下午6:34写道:
>
> Hi all,
>
> On our three-node cluster, we have setup chrony for time sync, and even
> though chrony reports that it is synced to ntp time, at the same time
> ceph occasionally reports time skews that can last several hours.
>
> See for example:
>
> > root@ceph2:~# ceph -v
> > ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous 
> > (stable)
> > root@ceph2:~# ceph health detail
> > HEALTH_WARN clock skew detected on mon.1
> > MON_CLOCK_SKEW clock skew detected on mon.1
> > mon.1 addr 10.10.89.2:6789/0 clock skew 0.506374s > max 0.5s (latency 
> > 0.000591877s)
> > root@ceph2:~# chronyc tracking
> > Reference ID: 7F7F0101 ()
> > Stratum : 10
> > Ref time (UTC)  : Wed Apr 24 19:05:28 2019
> > System time : 0.00133 seconds slow of NTP time
> > Last offset : -0.00524 seconds
> > RMS offset  : 0.00524 seconds
> > Frequency   : 12.641 ppm slow
> > Residual freq   : +0.000 ppm
> > Skew: 0.000 ppm
> > Root delay  : 0.00 seconds
> > Root dispersion : 0.00 seconds
> > Update interval : 1.4 seconds
> > Leap status : Normal
> > root@ceph2:~#
>
> For the record: mon.1 = ceph2 = 10.10.89.2, and time is synced similarly
> with NTP on the two other nodes.
>
> We don't understand this...
>
> I have now injected mon_clock_drift_allowed 0.7, so at least we have
> HEALTH_OK again. (to stop upsetting my monitoring system)
>
> But two questions:
>
> - can anyone explain why this is happening, is it looks as if ceph and
> NTP/chrony disagree on just how time-synced the servers are..?

Not familiar with chrony, but for our practice is using NTP, and it works fine.

> - how to determine the current clock skew from cephs perspective?
> Because "ceph health detail" in case of HEALTH_OK does not show it.
> (I want to start monitoring it continuously, to see if I can find some
> sort of pattern)

You can use 'ceph time-sync-status' to get current time sync status.
>
> Thanks!
>
> MJ
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] clock skew

2019-04-25 Thread Janne Johansson
Den tors 25 apr. 2019 kl 13:00 skrev huang jun :

> mj  于2019年4月25日周四 下午6:34写道:
> >
> > Hi all,
> >
> > On our three-node cluster, we have setup chrony for time sync, and even
> > though chrony reports that it is synced to ntp time, at the same time
> > ceph occasionally reports time skews that can last several hours.
> >
> > But two questions:
> >
> > - can anyone explain why this is happening, is it looks as if ceph and
> > NTP/chrony disagree on just how time-synced the servers are..?
>
> Not familiar with chrony, but for our practice is using NTP, and it works
> fine.
>

What we do with ntpd (and that is probably possible with chrony also) is to
have all mons grab the date from some generic NTP servers, but also add
eachother as peers, which means they sync with eachother about what time it
is, and since the mons are super close to eachother network wise, this is
very stable, compared to what you might get from a random time server on
the internet. It's not super important that they are right about what time
it actually is, only that they all agree with eachother.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] clock skew

2019-04-25 Thread John Petrini
+1 to Janne's suggestion. Also, how many time sources are you using? More
tend to be better and by default chrony has a pretty low limit on the
number of sources if you're using a pool (3 or 4 i think?). You can adjust
it by adding maxsources to the pool line.

pool pool.ntp.org iburst maxsources 8
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] clock skew

2019-04-25 Thread mj

Hi all,

On our three-node cluster, we have setup chrony for time sync, and even 
though chrony reports that it is synced to ntp time, at the same time 
ceph occasionally reports time skews that can last several hours.


See for example:


root@ceph2:~# ceph -v
ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous 
(stable)
root@ceph2:~# ceph health detail
HEALTH_WARN clock skew detected on mon.1
MON_CLOCK_SKEW clock skew detected on mon.1
mon.1 addr 10.10.89.2:6789/0 clock skew 0.506374s > max 0.5s (latency 
0.000591877s)
root@ceph2:~# chronyc tracking
Reference ID: 7F7F0101 ()
Stratum : 10
Ref time (UTC)  : Wed Apr 24 19:05:28 2019
System time : 0.00133 seconds slow of NTP time
Last offset : -0.00524 seconds
RMS offset  : 0.00524 seconds
Frequency   : 12.641 ppm slow
Residual freq   : +0.000 ppm
Skew: 0.000 ppm
Root delay  : 0.00 seconds
Root dispersion : 0.00 seconds
Update interval : 1.4 seconds
Leap status : Normal
root@ceph2:~# 


For the record: mon.1 = ceph2 = 10.10.89.2, and time is synced similarly 
with NTP on the two other nodes.


We don't understand this...

I have now injected mon_clock_drift_allowed 0.7, so at least we have 
HEALTH_OK again. (to stop upsetting my monitoring system)


But two questions:

- can anyone explain why this is happening, is it looks as if ceph and 
NTP/chrony disagree on just how time-synced the servers are..?


- how to determine the current clock skew from cephs perspective? 
Because "ceph health detail" in case of HEALTH_OK does not show it.
(I want to start monitoring it continuously, to see if I can find some 
sort of pattern)


Thanks!

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Object Gateway - Server Side Encryption

2019-04-25 Thread Francois Scheurer

Hello Amardeep

We are trying the same as you on luminous.

s3cmd --access_key xxx  --secret_key xxx  --host-bucket '%(bucket)s.s3.xxx.ch' 
--host s3.xxx.ch --signature-v2 --no-preserve --server-side-encryption \
--server-side-encryption-kms-id 
https://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5 
put hello.txt3 s3://test/hello.txt3

upload: 'hello.txt3' -> 's3://test/hello.txt3'  [1 of 1]
 13 of 13   100% in    0s    14.25 B/s  done
ERROR: S3 error: 400 (InvalidArgument): Failed to retrieve the actual key, 
kms-keyid: 
https://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5

openstack --os-cloud fsc-ac secret get 
https://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5
+---+--+
| Field | Value 
   |
+---+--+
| Secret href   | 
https://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5 
|
| Name  | fsc-key3  
   |
| Created   | 2019-04-25T14:31:52+00:00 
   |
| Status    | ACTIVE
   |
| Content types | {u'default': u'application/octet-stream'} 
   |
| Algorithm | aes   
   |
| Bit length    | 256   
   |
| Secret type   | opaque
   |
| Mode  | cbc   
   |
| Expiration    | 2020-01-01T00:00:00+00:00 
   |
+---+--+

We also tried using --server-side-encryption-kms-id 
ffa60094-f88b-41a4-b63f-c07a017ad2b5
or --server-side-encryption-kms-id fsc-key3 with the same error.


vim /etc/ceph/ceph.conf
rgw barbican url = https://barbican.service.xxx.ch
rgw keystone barbican user = rgwcrypt
rgw keystone barbican password = xxx
rgw keystone barbican project = service
rgw keystone barbican domain = default
rgw crypt require ssl = false

Thank you in advance for your help.



Best Regards

Francois Scheurer



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] clock skew

2019-04-25 Thread Bill Sharer
If you are just synching to the outside pool, the three hosts may end up 
latching on to different outside servers as their definitive sources.  
You might want to make one of the three a higher priority source to the 
other two and possibly just have it use the outside sources as sync.  
Also for hardware newer than about five years old, you might want to 
look at enabling the NIC clocks using LinuxPTP to keep clock jitter down 
inside your LAN.  I wrote this article on the Gentoo wiki on enabling 
PTP in chrony.


https://wiki.gentoo.org/wiki/Chrony_with_hardware_timestamping

Bill Sharer


On 4/25/19 6:33 AM, mj wrote:

Hi all,

On our three-node cluster, we have setup chrony for time sync, and 
even though chrony reports that it is synced to ntp time, at the same 
time ceph occasionally reports time skews that can last several hours.


See for example:


root@ceph2:~# ceph -v
ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) 
luminous (stable)

root@ceph2:~# ceph health detail
HEALTH_WARN clock skew detected on mon.1
MON_CLOCK_SKEW clock skew detected on mon.1
    mon.1 addr 10.10.89.2:6789/0 clock skew 0.506374s > max 0.5s 
(latency 0.000591877s)

root@ceph2:~# chronyc tracking
Reference ID    : 7F7F0101 ()
Stratum : 10
Ref time (UTC)  : Wed Apr 24 19:05:28 2019
System time : 0.00133 seconds slow of NTP time
Last offset : -0.00524 seconds
RMS offset  : 0.00524 seconds
Frequency   : 12.641 ppm slow
Residual freq   : +0.000 ppm
Skew    : 0.000 ppm
Root delay  : 0.00 seconds
Root dispersion : 0.00 seconds
Update interval : 1.4 seconds
Leap status : Normal
root@ceph2:~# 


For the record: mon.1 = ceph2 = 10.10.89.2, and time is synced 
similarly with NTP on the two other nodes.


We don't understand this...

I have now injected mon_clock_drift_allowed 0.7, so at least we have 
HEALTH_OK again. (to stop upsetting my monitoring system)


But two questions:

- can anyone explain why this is happening, is it looks as if ceph and 
NTP/chrony disagree on just how time-synced the servers are..?


- how to determine the current clock skew from cephs perspective? 
Because "ceph health detail" in case of HEALTH_OK does not show it.
(I want to start monitoring it continuously, to see if I can find some 
sort of pattern)


Thanks!

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Object Gateway - Server Side Encryption

2019-04-25 Thread Casey Bodley


On 4/25/19 11:33 AM, Francois Scheurer wrote:

Hello Amardeep
We are trying the same as you on luminous.
s3cmd --access_key xxx  --secret_key xxx  --host-bucket '%(bucket)s.s3.xxx.ch' 
--host s3.xxx.ch --signature-v2 --no-preserve --server-side-encryption \
--server-side-encryption-kms-idhttps://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5
  put hello.txt3 s3://test/hello.txt3

upload: 'hello.txt3' -> 's3://test/hello.txt3'  [1 of 1]
  13 of 13   100% in    0s    14.25 B/s  done
ERROR: S3 error: 400 (InvalidArgument): Failed to retrieve the actual key, 
kms-keyid:https://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5
openstack --os-cloud fsc-ac secret 
gethttps://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5
+---+--+
| Field | Value 
   |
+---+--+
| Secret href   
|https://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5
  |
| Name  | fsc-key3  
   |
| Created   | 2019-04-25T14:31:52+00:00 
   |
| Status    | ACTIVE
   |
| Content types | {u'default': u'application/octet-stream'} 
   |
| Algorithm | aes   
   |
| Bit length    | 256   
   |
| Secret type   | opaque
   |
| Mode  | cbc   
   |
| Expiration    | 2020-01-01T00:00:00+00:00 
   |
+---+--+
We also tried using --server-side-encryption-kms-id 
ffa60094-f88b-41a4-b63f-c07a017ad2b5
or --server-side-encryption-kms-id fsc-key3 with the same error.


vim /etc/ceph/ceph.conf
 rgw barbican url =https://barbican.service.xxx.ch
 rgw keystone barbican user = rgwcrypt
 rgw keystone barbican password = xxx
 rgw keystone barbican project = service
 rgw keystone barbican domain = default
 rgw crypt require ssl = false
Thank you in advance for your help.



Best Regards
Francois Scheurer

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


I think rgw is expecting these keyids to look like 
"ffa60094-f88b-41a4-b63f-c07a017ad2b5", so it doesn't url-encode them 
when sending the request to barbican. In this case, the keyid is itself 
a url, so rgw is sending a request to 
"https://barbican.service.xxx.ch/v1/secrets/https://barbican.service.xxx.ch/v1/secrets/ffa60094-f88b-41a4-b63f-c07a017ad2b5;. 
It's hard to tell without logs from barbican, but I suspect that it's 
trying to interpret the slashes as part of the request path, rather than 
part of the keyid.


So I would recommend using keyids of the form 
"ffa60094-f88b-41a4-b63f-c07a017ad2b5", but would also consider the lack 
of url-encoding to be a bug. I opened a ticket for this at 
http://tracker.ceph.com/issues/39488 - feel free to add more information 
there. Barbican log output showing the request/response would be helpful!


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] showing active config settings

2019-04-25 Thread solarflow99
It sucks that its so hard to set/view active settings, this should be a lot
simpler in my opinion

On Tue, Apr 23, 2019 at 1:58 PM solarflow99  wrote:

> Thanks, but does this not work on Luminous maybe?  I am on the mon hosts
> trying this:
>
>
> # ceph config set osd osd_recovery_max_active 4
> Invalid command: unused arguments: [u'4']
> config set   :  Set a configuration option at runtime (not
> persistent)
> Error EINVAL: invalid command
>
> # ceph daemon osd.0 config diff|grep -A5 osd_recovery_max_active
> admin_socket: exception getting command descriptions: [Errno 2] No such
> file or directory
>
>
> On Tue, Apr 16, 2019 at 4:04 PM Brad Hubbard  wrote:
>
>> $ ceph config set osd osd_recovery_max_active 4
>> $ ceph daemon osd.0 config diff|grep -A5 osd_recovery_max_active
>> "osd_recovery_max_active": {
>> "default": 3,
>> "mon": 4,
>> "override": 4,
>> "final": 4
>> },
>>
>> On Wed, Apr 17, 2019 at 5:29 AM solarflow99 
>> wrote:
>> >
>> > I wish there was a way to query the running settings from one of the
>> MGR hosts, and it doesn't help that ansible doesn't even copy the keyring
>> to the OSD nodes so commands there wouldn't work anyway.
>> > I'm still puzzled why it doesn't show any change when I run this no
>> matter what I set it to:
>> >
>> > # ceph -n osd.1 --show-config | grep osd_recovery_max_active
>> > osd_recovery_max_active = 3
>> >
>> > in fact it doesn't matter if I use an OSD number that doesn't exist,
>> same thing if I use ceph get
>> >
>> >
>> >
>> > On Tue, Apr 16, 2019 at 1:18 AM Brad Hubbard 
>> wrote:
>> >>
>> >> On Tue, Apr 16, 2019 at 6:03 PM Paul Emmerich 
>> wrote:
>> >> >
>> >> > This works, it just says that it *might* require a restart, but this
>> >> > particular option takes effect without a restart.
>> >>
>> >> We've already looked at changing the wording once to make it more
>> palatable.
>> >>
>> >> http://tracker.ceph.com/issues/18424
>> >>
>> >> >
>> >> > Implementation detail: this message shows up if there's no internal
>> >> > function to be called when this option changes, so it can't be sure
>> if
>> >> > the change is actually doing anything because the option might be
>> >> > cached or only read on startup. But in this case this option is read
>> >> > in the relevant path every time and no notification is required. But
>> >> > the injectargs command can't know that.
>> >>
>> >> Right on all counts. The functions are referred to as observers and
>> >> register to be notified if the value changes, hence "not observed."
>> >>
>> >> >
>> >> > Paul
>> >> >
>> >> > On Mon, Apr 15, 2019 at 11:38 PM solarflow99 
>> wrote:
>> >> > >
>> >> > > Then why doesn't this work?
>> >> > >
>> >> > > # ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
>> >> > > osd.0: osd_recovery_max_active = '4' (not observed, change may
>> require restart)
>> >> > > osd.1: osd_recovery_max_active = '4' (not observed, change may
>> require restart)
>> >> > > osd.2: osd_recovery_max_active = '4' (not observed, change may
>> require restart)
>> >> > > osd.3: osd_recovery_max_active = '4' (not observed, change may
>> require restart)
>> >> > > osd.4: osd_recovery_max_active = '4' (not observed, change may
>> require restart)
>> >> > >
>> >> > > # ceph -n osd.1 --show-config | grep osd_recovery_max_active
>> >> > > osd_recovery_max_active = 3
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Wed, Apr 10, 2019 at 7:21 AM Eugen Block  wrote:
>> >> > >>
>> >> > >> > I always end up using "ceph --admin-daemon
>> >> > >> > /var/run/ceph/name-of-socket-here.asok config show | grep ..."
>> to get what
>> >> > >> > is in effect now for a certain daemon.
>> >> > >> > Needs you to be on the host of the daemon of course.
>> >> > >>
>> >> > >> Me too, I just wanted to try what OP reported. And after trying
>> that,
>> >> > >> I'll keep it that way. ;-)
>> >> > >>
>> >> > >>
>> >> > >> Zitat von Janne Johansson :
>> >> > >>
>> >> > >> > Den ons 10 apr. 2019 kl 13:37 skrev Eugen Block > >:
>> >> > >> >
>> >> > >> >> > If you don't specify which daemon to talk to, it tells you
>> what the
>> >> > >> >> > defaults would be for a random daemon started just now using
>> the same
>> >> > >> >> > config as you have in /etc/ceph/ceph.conf.
>> >> > >> >>
>> >> > >> >> I tried that, too, but the result is not correct:
>> >> > >> >>
>> >> > >> >> host1:~ # ceph -n osd.1 --show-config | grep
>> osd_recovery_max_active
>> >> > >> >> osd_recovery_max_active = 3
>> >> > >> >>
>> >> > >> >
>> >> > >> > I always end up using "ceph --admin-daemon
>> >> > >> > /var/run/ceph/name-of-socket-here.asok config show | grep ..."
>> to get what
>> >> > >> > is in effect now for a certain daemon.
>> >> > >> > Needs you to be on the host of the daemon of course.
>> >> > >> >
>> >> > >> > --
>> >> > >> > May the most significant bit of your life be positive.
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >> ___
>> >> > >>