Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-30 Thread Jake Grimmett
Hi All,

there might be a a problem on Scientific Linux 7.5 too:

after upgrading directly from 12.2.5 to 13.2.1

[root@cephr01 ~]# ceph-detect-init
Traceback (most recent call last):
  File "/usr/bin/ceph-detect-init", line 9, in 
load_entry_point('ceph-detect-init==1.0.1', 'console_scripts',
'ceph-detect-init')()
  File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line
56, in run
print(ceph_detect_init.get(args.use_rhceph).init)
  File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py",
line 42, in get
release=release)
ceph_detect_init.exc.UnsupportedPlatform: Platform is not supported.:
rhel  7.5

# cat /etc/redhat-release
Scientific Linux release 7.5 (Nitrogen)

# cat /etc/os-release
NAME="Scientific Linux"
VERSION="7.5 (Nitrogen)"
ID="rhel"
ID_LIKE="scientific centos fedora"
VERSION_ID="7.5"
PRETTY_NAME="Scientific Linux 7.5 (Nitrogen)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:scientificlinux:scientificlinux:7.5:GA"
HOME_URL="http://www.scientificlinux.org//";
BUG_REPORT_URL="mailto:scientific-linux-de...@listserv.fnal.gov";

REDHAT_BUGZILLA_PRODUCT="Scientific Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.5
REDHAT_SUPPORT_PRODUCT="Scientific Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.5"

Applying Nathan's one line fix to ceph_detect_init/__init__.py
works :)

# ceph-detect-init
systemd

all the best,

Jake

On 30/07/18 09:59, Kenneth Waegeman wrote:
> kzal t maar eens testen :)
> 
> 
> On 30/07/18 10:54, Nathan Cutler wrote:
>>> for all others on this list, it might also be helpful to know which
>>> setups are likely affected.
>>> Does this only occur for Filestore disks, i.e. if ceph-volume has
>>> taken over taking care of these?
>>> Does it happen on every RHEL 7.5 system?
>>
>> It affects all OSDs managed by ceph-disk on all RHEL systems (but not
>> on CentOS), regardless of whether they are filestore or bluestore.
>>
>>> We're still on 13.2.0 here and ceph-detect-init works fine on our
>>> CentOS 7.5 systems (it just echoes "systemd").
>>> We're on Bluestore.
>>> Should we hold off on an upgrade, or are we unaffected?
>>
>> The regression does not affect CentOS - only RHEL.
>>
>> Nathan
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-30 Thread Kenneth Waegeman

kzal t maar eens testen :)


On 30/07/18 10:54, Nathan Cutler wrote:
for all others on this list, it might also be helpful to know which 
setups are likely affected.
Does this only occur for Filestore disks, i.e. if ceph-volume has 
taken over taking care of these?

Does it happen on every RHEL 7.5 system?


It affects all OSDs managed by ceph-disk on all RHEL systems (but not 
on CentOS), regardless of whether they are filestore or bluestore.


We're still on 13.2.0 here and ceph-detect-init works fine on our 
CentOS 7.5 systems (it just echoes "systemd").

We're on Bluestore.
Should we hold off on an upgrade, or are we unaffected?


The regression does not affect CentOS - only RHEL.

Nathan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-30 Thread Nathan Cutler

for all others on this list, it might also be helpful to know which setups are 
likely affected.
Does this only occur for Filestore disks, i.e. if ceph-volume has taken over 
taking care of these?
Does it happen on every RHEL 7.5 system?


It affects all OSDs managed by ceph-disk on all RHEL systems (but not on 
CentOS), regardless of whether they are filestore or bluestore.



We're still on 13.2.0 here and ceph-detect-init works fine on our CentOS 7.5 systems (it 
just echoes "systemd").
We're on Bluestore.
Should we hold off on an upgrade, or are we unaffected?


The regression does not affect CentOS - only RHEL.

Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-30 Thread Oliver Freyermuth
Hi together,

for all others on this list, it might also be helpful to know which setups are 
likely affected. 
Does this only occur for Filestore disks, i.e. if ceph-volume has taken over 
taking care of these? 
Does it happen on every RHEL 7.5 system? 

We're still on 13.2.0 here and ceph-detect-init works fine on our CentOS 7.5 
systems (it just echoes "systemd"). 
We're on Bluestore. 
Should we hold off on an upgrade, or are we unaffected? 

Cheers,
Oliver

Am 30.07.2018 um 09:50 schrieb ceph.nov...@habmalnefrage.de:
> Hey Nathan.
> 
> No blaming here. I'm very thankful for this great peace (ok, sometime more of 
> a beast ;) ) of open-source SDS and all the great work around it incl. 
> community and users... and happy the problem is identified and can be fixed 
> for others/the future as well :)
>  
> Well, yes, can confirm your found "error" also here:
> 
> [root@sds20 ~]# ceph-detect-init
> Traceback (most recent call last):
>   File "/usr/bin/ceph-detect-init", line 9, in 
> load_entry_point('ceph-detect-init==1.0.1', 'console_scripts', 
> 'ceph-detect-init')()
>   File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line 56, 
> in run
> print(ceph_detect_init.get(args.use_rhceph).init)
>   File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py", line 
> 42, in get
> release=release)
> ceph_detect_init.exc.UnsupportedPlatform: Platform is not supported.: rhel  
> 7.5
> 
> 
> Gesendet: Sonntag, 29. Juli 2018 um 20:33 Uhr
> Von: "Nathan Cutler" 
> An: ceph.nov...@habmalnefrage.de, "Vasu Kulkarni" 
> Cc: ceph-users , "Ceph Development" 
> 
> Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
>> Strange...
>> - wouldn't swear, but pretty sure v13.2.0 was working ok before
>> - so what do others say/see?
>> - no one on v13.2.1 so far (hard to believe) OR
>> - just don't have this "systemctl ceph-osd.target" problem and all just 
>> works?
>>
>> If you also __MIGRATED__ from Luminous (say ~ v12.2.5 or older) to Mimic 
>> (say v13.2.0 -> v13.2.1) and __DO NOT__ see the same systemctl problems, 
>> whats your Linix OS and version (I'm on RHEL 7.5 here) ? :O
> 
> Best regards
>  Anton
> 
> 
> 
> Hi ceph.novice:
> 
> I'm the one to blame for this regretful incident. Today I have
> reproduced the issue in teuthology:
> 
> 2018-07-29T18:20:07.288 INFO:teuthology.orchestra.run.ovh093:Running:
> 'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph-detect-init'
> 2018-07-29T18:20:07.796
> INFO:teuthology.orchestra.run.ovh093.stderr:Traceback (most recent call
> last):
> 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
> File "/bin/ceph-detect-init", line 9, in 
> 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
> load_entry_point('ceph-detect-init==1.0.1', 'console_scripts',
> 'ceph-detect-init')()
> 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
> File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line
> 56, in run
> 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
> print(ceph_detect_init.get(args.use_rhceph).init)
> 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
> File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py",
> line 42, in get
> 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
> release=release)
> 2018-07-29T18:20:07.797
> INFO:teuthology.orchestra.run.ovh093.stderr:ceph_detect_init.exc.UnsupportedPlatform:
> Platform is not supported.: rhel 7.5
> 
> Just to be sure, can you confirm? (I.e. issue the command
> "ceph-detect-init" on your RHEL 7.5 system. Instead of saying "systemd"
> it gives an error like above?)
> 
> I'm working on a fix now at https://github.com/ceph/ceph/pull/23303
> 
> Nathan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-30 Thread ceph . novice
Hey Nathan.

No blaming here. I'm very thankful for this great peace (ok, sometime more of a 
beast ;) ) of open-source SDS and all the great work around it incl. community 
and users... and happy the problem is identified and can be fixed for 
others/the future as well :)
 
Well, yes, can confirm your found "error" also here:

[root@sds20 ~]# ceph-detect-init
Traceback (most recent call last):
  File "/usr/bin/ceph-detect-init", line 9, in 
load_entry_point('ceph-detect-init==1.0.1', 'console_scripts', 
'ceph-detect-init')()
  File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line 56, in 
run
print(ceph_detect_init.get(args.use_rhceph).init)
  File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py", line 
42, in get
release=release)
ceph_detect_init.exc.UnsupportedPlatform: Platform is not supported.: rhel  7.5


Gesendet: Sonntag, 29. Juli 2018 um 20:33 Uhr
Von: "Nathan Cutler" 
An: ceph.nov...@habmalnefrage.de, "Vasu Kulkarni" 
Cc: ceph-users , "Ceph Development" 

Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
> Strange...
> - wouldn't swear, but pretty sure v13.2.0 was working ok before
> - so what do others say/see?
> - no one on v13.2.1 so far (hard to believe) OR
> - just don't have this "systemctl ceph-osd.target" problem and all just works?
>
> If you also __MIGRATED__ from Luminous (say ~ v12.2.5 or older) to Mimic (say 
> v13.2.0 -> v13.2.1) and __DO NOT__ see the same systemctl problems, whats 
> your Linix OS and version (I'm on RHEL 7.5 here) ? :O

Best regards
 Anton



Hi ceph.novice:

I'm the one to blame for this regretful incident. Today I have
reproduced the issue in teuthology:

2018-07-29T18:20:07.288 INFO:teuthology.orchestra.run.ovh093:Running:
'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph-detect-init'
2018-07-29T18:20:07.796
INFO:teuthology.orchestra.run.ovh093.stderr:Traceback (most recent call
last):
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
File "/bin/ceph-detect-init", line 9, in 
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
load_entry_point('ceph-detect-init==1.0.1', 'console_scripts',
'ceph-detect-init')()
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line
56, in run
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
print(ceph_detect_init.get(args.use_rhceph).init)
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py",
line 42, in get
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:
release=release)
2018-07-29T18:20:07.797
INFO:teuthology.orchestra.run.ovh093.stderr:ceph_detect_init.exc.UnsupportedPlatform:
Platform is not supported.: rhel 7.5

Just to be sure, can you confirm? (I.e. issue the command
"ceph-detect-init" on your RHEL 7.5 system. Instead of saying "systemd"
it gives an error like above?)

I'm working on a fix now at https://github.com/ceph/ceph/pull/23303

Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-29 Thread Nathan Cutler

Strange...
- wouldn't swear, but pretty sure v13.2.0 was working ok before
- so what do others say/see?
  - no one on v13.2.1 so far (hard to believe) OR
  - just don't have this "systemctl ceph-osd.target" problem and all just works?

If you also __MIGRATED__ from Luminous (say ~ v12.2.5 or older) to Mimic (say 
v13.2.0 -> v13.2.1) and __DO NOT__ see the same systemctl problems, whats your 
Linix OS and version (I'm on RHEL 7.5 here) ? :O


Hi ceph.novice:

I'm the one to blame for this regretful incident. Today I have 
reproduced the issue in teuthology:


2018-07-29T18:20:07.288 INFO:teuthology.orchestra.run.ovh093:Running: 
'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph-detect-init'
2018-07-29T18:20:07.796 
INFO:teuthology.orchestra.run.ovh093.stderr:Traceback (most recent call 
last):
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
File "/bin/ceph-detect-init", line 9, in 
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
load_entry_point('ceph-detect-init==1.0.1', 'console_scripts', 
'ceph-detect-init')()
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line 
56, in run
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
print(ceph_detect_init.get(args.use_rhceph).init)
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py", 
line 42, in get
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
release=release)
2018-07-29T18:20:07.797 
INFO:teuthology.orchestra.run.ovh093.stderr:ceph_detect_init.exc.UnsupportedPlatform: 
Platform is not supported.: rhel  7.5


Just to be sure, can you confirm? (I.e. issue the command 
"ceph-detect-init" on your RHEL 7.5 system. Instead of saying "systemd" 
it gives an error like above?)


I'm working on a fix now at https://github.com/ceph/ceph/pull/23303

Nathan

On 07/29/2018 11:16 AM, ceph.nov...@habmalnefrage.de wrote:
>



  


Gesendet: Sonntag, 29. Juli 2018 um 03:15 Uhr
Von: "Vasu Kulkarni" 
An: ceph.nov...@habmalnefrage.de
Cc: "Sage Weil" , ceph-users , "Ceph 
Development" 
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
On Sat, Jul 28, 2018 at 6:02 PM,  wrote:

Have you guys changed something with the systemctl startup of the OSDs?


I think there is some kind of systemd issue hidden in mimic,
https://tracker.ceph.com/issues/25004
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-29 Thread ceph . novice

Strange...
- wouldn't swear, but pretty sure v13.2.0 was working ok before
- so what do others say/see?
 - no one on v13.2.1 so far (hard to believe) OR
 - just don't have this "systemctl ceph-osd.target" problem and all just works?

If you also __MIGRATED__ from Luminous (say ~ v12.2.5 or older) to Mimic (say 
v13.2.0 -> v13.2.1) and __DO NOT__ see the same systemctl problems, whats your 
Linix OS and version (I'm on RHEL 7.5 here) ? :O

 

Gesendet: Sonntag, 29. Juli 2018 um 03:15 Uhr
Von: "Vasu Kulkarni" 
An: ceph.nov...@habmalnefrage.de
Cc: "Sage Weil" , ceph-users , 
"Ceph Development" 
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
On Sat, Jul 28, 2018 at 6:02 PM,  wrote:
> Have you guys changed something with the systemctl startup of the OSDs?

I think there is some kind of systemd issue hidden in mimic,
https://tracker.ceph.com/issues/25004
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-28 Thread Vasu Kulkarni
On Sat, Jul 28, 2018 at 6:02 PM,   wrote:
> Have you guys changed something with the systemctl startup of the OSDs?
I think there is some kind of systemd issue hidden in mimic,
https://tracker.ceph.com/issues/25004

>
> I've stopped and disabled all the OSDs on all my hosts via "systemctl 
> stop|disable ceph-osd.target" and rebooted all the nodes. Everything look 
> just the same.
> The I started all the OSD daemons one after the other via the CLI with 
> "/usr/bin/ceph-osd -f --cluster ceph --id $NR --setuser ceph --setgroup ceph 
> > /tmp/osd.${NR}.log 2>&1 & " and now everything (ok, beside the ZABBIX mgr 
> module?!?) seems to work :|
>
>
>   cluster:
> id: 2a919338-4e44-454f-bf45-e94a01c2a5e6
> health: HEALTH_WARN
> Failed to send data to Zabbix
>
>   services:
> mon: 3 daemons, quorum sds20,sds21,sds22
> mgr: sds22(active), standbys: sds20, sds21
> osd: 18 osds: 18 up, 18 in
> rgw: 4 daemons active
>
>   data:
> pools:   25 pools, 1390 pgs
> objects: 2.55 k objects, 3.4 GiB
> usage:   26 GiB used, 8.8 TiB / 8.8 TiB avail
> pgs: 1390 active+clean
>
>   io:
> client:   11 KiB/s rd, 10 op/s rd, 0 op/s wr
>
> Any hints?
>
> --
>
>
> Gesendet: Samstag, 28. Juli 2018 um 23:35 Uhr
> Von: ceph.nov...@habmalnefrage.de
> An: "Sage Weil" 
> Cc: ceph-users@lists.ceph.com, ceph-de...@vger.kernel.org
> Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
> Hi Sage.
>
> Sure. Any specific OSD(s) log(s)? Or just any?
>
> Gesendet: Samstag, 28. Juli 2018 um 16:49 Uhr
> Von: "Sage Weil" 
> An: ceph.nov...@habmalnefrage.de, ceph-users@lists.ceph.com, 
> ceph-de...@vger.kernel.org
> Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
>
> Can you include more or your osd log file?
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-28 Thread ceph . novice
Have you guys changed something with the systemctl startup of the OSDs?

I've stopped and disabled all the OSDs on all my hosts via "systemctl 
stop|disable ceph-osd.target" and rebooted all the nodes. Everything look just 
the same.
The I started all the OSD daemons one after the other via the CLI with 
"/usr/bin/ceph-osd -f --cluster ceph --id $NR --setuser ceph --setgroup ceph > 
/tmp/osd.${NR}.log 2>&1 & " and now everything (ok, beside the ZABBIX mgr 
module?!?) seems to work :|


  cluster:
id: 2a919338-4e44-454f-bf45-e94a01c2a5e6
health: HEALTH_WARN
Failed to send data to Zabbix

  services:
mon: 3 daemons, quorum sds20,sds21,sds22
mgr: sds22(active), standbys: sds20, sds21
osd: 18 osds: 18 up, 18 in
rgw: 4 daemons active

  data:
pools:   25 pools, 1390 pgs
objects: 2.55 k objects, 3.4 GiB
usage:   26 GiB used, 8.8 TiB / 8.8 TiB avail
pgs: 1390 active+clean

  io:
client:   11 KiB/s rd, 10 op/s rd, 0 op/s wr

Any hints?

--
 

Gesendet: Samstag, 28. Juli 2018 um 23:35 Uhr
Von: ceph.nov...@habmalnefrage.de
An: "Sage Weil" 
Cc: ceph-users@lists.ceph.com, ceph-de...@vger.kernel.org
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
Hi Sage.

Sure. Any specific OSD(s) log(s)? Or just any?

Gesendet: Samstag, 28. Juli 2018 um 16:49 Uhr
Von: "Sage Weil" 
An: ceph.nov...@habmalnefrage.de, ceph-users@lists.ceph.com, 
ceph-de...@vger.kernel.org
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

Can you include more or your osd log file?
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-28 Thread ceph . novice
Hi Sage.

Sure. Any specific OSD(s) log(s)? Or just any?

Gesendet: Samstag, 28. Juli 2018 um 16:49 Uhr
Von: "Sage Weil" 
An: ceph.nov...@habmalnefrage.de, ceph-users@lists.ceph.com, 
ceph-de...@vger.kernel.org
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

Can you include more or your osd log file?
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-28 Thread Sage Weil
Can you include more or your osd log file?

On July 28, 2018 9:46:16 AM CDT, ceph.nov...@habmalnefrage.de wrote:
>Dear users and developers.
> 
>I've updated our dev-cluster from v13.2.0 to v13.2.1 yesterday and
>since then everything is badly broken.
>I've restarted all Ceph components via "systemctl" and also rebootet
>the server SDS21 and SDS24, nothing changes.
>
>This cluster started as Kraken, was updated to Luminous (up to v12.2.5)
>and then to Mimic.
>
>Here are some system related infos, see
>https://semestriel.framapad.org/p/DTkBspmnfU
>
>Somehow I guess this may have to do with the various "ceph-disk",
>"ceph-volume", ceph-lvm" changes in the last months?!?
>
>Thanks & regards
> Anton
>
>--
>
> 
>
>Gesendet: Samstag, 28. Juli 2018 um 00:22 Uhr
>Von: "Bryan Stillwell" 
>An: "ceph-users@lists.ceph.com" 
>Betreff: Re: [ceph-users] v13.2.1 Mimic released
>
>I decided to upgrade my home cluster from Luminous (v12.2.7) to Mimic
>(v13.2.1) today and ran into a couple issues:
> 
>1. When restarting the OSDs during the upgrade it seems to forget my
>upmap settings.  I had to manually return them to the way they were
>with commands like:
> 
>ceph osd pg-upmap-items 5.1 11 18 8 6 9 0
>ceph osd pg-upmap-items 5.1f 11 17
> 
>I also saw this when upgrading from v12.2.5 to v12.2.7.
> 
>2. Also after restarting the first OSD during the upgrade I saw 21
>messages like these in ceph.log:
> 
>2018-07-27 15:53:49.868552 osd.1 osd.1 10.0.0.207:6806/4029643 97 :
>cluster [WRN] failed to encode map e100467 with expected crc
>2018-07-27 15:53:49.922365 osd.6 osd.6 10.0.0.16:6804/90400 25 :
>cluster [WRN] failed to encode map e100467 with expected crc
>2018-07-27 15:53:49.925585 osd.6 osd.6 10.0.0.16:6804/90400 26 :
>cluster [WRN] failed to encode map e100467 with expected crc
>2018-07-27 15:53:49.944414 osd.18 osd.18 10.0.0.15:6808/120845 8 :
>cluster [WRN] failed to encode map e100467 with expected crc
>2018-07-27 15:53:49.944756 osd.17 osd.17 10.0.0.15:6800/120749 13 :
>cluster [WRN] failed to encode map e100467 with expected crc
> 
>Is this a sign that full OSD maps were sent out by the mons to every
>OSD like back in the hammer days?  I seem to remember that OSD maps
>should be a lot smaller now, so maybe this isn't as big of a problem as
>it was back then?
> 
>Thanks,
>Bryan
> 
>
>From: ceph-users  on behalf of Sage
>Weil 
>Date: Friday, July 27, 2018 at 1:25 PM
>To: "ceph-annou...@lists.ceph.com" ,
>"ceph-users@lists.ceph.com" ,
>"ceph-maintain...@lists.ceph.com" ,
>"ceph-de...@vger.kernel.org" 
>Subject: [ceph-users] v13.2.1 Mimic released
>
> 
>
>This is the first bugfix release of the Mimic v13.2.x long term stable
>release
>
>series. This release contains many fixes across all components of Ceph,
>
>including a few security fixes. We recommend that all users upgrade.
>
> 
>
>Notable Changes
>
>--
>
> 
>
>* CVE 2018-1128: auth: cephx authorizer subject to replay attack
>(issue#24836 http://tracker.ceph.com/issues/24836, Sage Weil)
>
>* CVE 2018-1129: auth: cephx signature check is weak (issue#24837
>http://tracker.ceph.com/issues/24837[http://tracker.ceph.com/issues/24837],
>Sage Weil)
>
>* CVE 2018-10861: mon: auth checks not correct for pool ops
>(issue#24838
>
>*
>Jason Dillaman)
>
> 
>
>For more details and links to various issues and pull requests, please
>
>refer to the ceph release blog at
>https://ceph.com/releases/13-2-1-mimic-released[https://ceph.com/releases/13-2-1-mimic-released]
>
> 
>
>Changelog
>
>-
>
>* bluestore:  common/hobject: improved hash calculation for hobject_t
>etc (pr#22777, Adam Kupczyk, Sage Weil)
>
>* bluestore,core: mimic: os/bluestore: don't store/use
>path_block.{db,wal} from meta (pr#22477, Sage Weil, Alfredo Deza)
>
>* bluestore: os/bluestore: backport 24319 and 24550 (issue#24550,
>issue#24502, issue#24319, issue#24581, pr#22649, Sage Weil)
>
>* bluestore: os/bluestore: fix incomplete faulty range marking when
>doing compression (pr#22910, Igor Fedotov)
>
>* bluestore: spdk: fix ceph-osd crash when activate SPDK (issue#24472,
>issue#24371, pr#22684, tone-zhang)
>
>* build/ops: build/ops: ceph.git has two different versions of dpdk in
>the source tree (issue#24942, issue#24032, pr#23070, Kefu Chai)
>
>* build/ops: build/ops: install-deps.sh fails on newest openSUSE Leap
>(issue#25065, pr#23178, Kyr Shatskyy)
>
>* build/ops: build/ops: Mimic build fails with -DWITH_RADOSGW=0
>(issue#24766, pr#22851, Dan Mick)
>
>* build/ops: cmake: enable RTTI for both debug and release RocksDB
>builds (pr#22299, Igor Fedotov)
>
>* build/ops: deb/rpm: add python-six as build-time and run-time
>dependency (issue#24885, pr#22948, Nathan Cutler, Kefu Chai)
>
>* build/ops: deb,rpm: fix block.db symlink ownership (pr#23246, Sage
>Weil)
>
>* build/ops: include: fix build with older clang (OSX target)
>(pr#23049, Christopher Blum)
>
>* build/ops: inclu

[ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-28 Thread ceph . novice
Dear users and developers.
 
I've updated our dev-cluster from v13.2.0 to v13.2.1 yesterday and since then 
everything is badly broken.
I've restarted all Ceph components via "systemctl" and also rebootet the server 
SDS21 and SDS24, nothing changes.

This cluster started as Kraken, was updated to Luminous (up to v12.2.5) and 
then to Mimic.

Here are some system related infos, see 
https://semestriel.framapad.org/p/DTkBspmnfU

Somehow I guess this may have to do with the various "ceph-disk", 
"ceph-volume", ceph-lvm" changes in the last months?!?

Thanks & regards
 Anton

--

 

Gesendet: Samstag, 28. Juli 2018 um 00:22 Uhr
Von: "Bryan Stillwell" 
An: "ceph-users@lists.ceph.com" 
Betreff: Re: [ceph-users] v13.2.1 Mimic released

I decided to upgrade my home cluster from Luminous (v12.2.7) to Mimic (v13.2.1) 
today and ran into a couple issues:
 
1. When restarting the OSDs during the upgrade it seems to forget my upmap 
settings.  I had to manually return them to the way they were with commands 
like:
 
ceph osd pg-upmap-items 5.1 11 18 8 6 9 0
ceph osd pg-upmap-items 5.1f 11 17
 
I also saw this when upgrading from v12.2.5 to v12.2.7.
 
2. Also after restarting the first OSD during the upgrade I saw 21 messages 
like these in ceph.log:
 
2018-07-27 15:53:49.868552 osd.1 osd.1 10.0.0.207:6806/4029643 97 : cluster 
[WRN] failed to encode map e100467 with expected crc
2018-07-27 15:53:49.922365 osd.6 osd.6 10.0.0.16:6804/90400 25 : cluster [WRN] 
failed to encode map e100467 with expected crc
2018-07-27 15:53:49.925585 osd.6 osd.6 10.0.0.16:6804/90400 26 : cluster [WRN] 
failed to encode map e100467 with expected crc
2018-07-27 15:53:49.944414 osd.18 osd.18 10.0.0.15:6808/120845 8 : cluster 
[WRN] failed to encode map e100467 with expected crc
2018-07-27 15:53:49.944756 osd.17 osd.17 10.0.0.15:6800/120749 13 : cluster 
[WRN] failed to encode map e100467 with expected crc
 
Is this a sign that full OSD maps were sent out by the mons to every OSD like 
back in the hammer days?  I seem to remember that OSD maps should be a lot 
smaller now, so maybe this isn't as big of a problem as it was back then?
 
Thanks,
Bryan
 

From: ceph-users  on behalf of Sage Weil 

Date: Friday, July 27, 2018 at 1:25 PM
To: "ceph-annou...@lists.ceph.com" , 
"ceph-users@lists.ceph.com" , 
"ceph-maintain...@lists.ceph.com" , 
"ceph-de...@vger.kernel.org" 
Subject: [ceph-users] v13.2.1 Mimic released

 

This is the first bugfix release of the Mimic v13.2.x long term stable release

series. This release contains many fixes across all components of Ceph,

including a few security fixes. We recommend that all users upgrade.

 

Notable Changes

--

 

* CVE 2018-1128: auth: cephx authorizer subject to replay attack (issue#24836 
http://tracker.ceph.com/issues/24836, Sage Weil)

* CVE 2018-1129: auth: cephx signature check is weak (issue#24837 
http://tracker.ceph.com/issues/24837[http://tracker.ceph.com/issues/24837], 
Sage Weil)

* CVE 2018-10861: mon: auth checks not correct for pool ops (issue#24838

*