[Bug 1724529] Re: ceph health output flips between OK and WARN all the time

2018-01-29 Thread Launchpad Bug Tracker
[Expired for ceph (Ubuntu) because there has been no activity for 60
days.]

** Changed in: ceph (Ubuntu)
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1724529

Title:
  ceph health output flips between OK and WARN all the time

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1724529/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1724529] Re: ceph health output flips between OK and WARN all the time

2017-11-30 Thread Billy Olsen
In responses specifically to comment #4 and changing the
mon_pg_warn_max_per_osd setting...

I don't think that the mon_pg_warn_max_per_osd setting should be
something that is generally available as a config option on the ceph-mon
charm. The warning was added to Ceph as a direct response of real world
experience from the Ceph devs based on observed behavior during recovery
scenarios. The warning is there to indicate that you are exceeding
recommended thresholds for acceptable recovery scenarios (for example,
read http://lists.ceph.com/pipermail/ceph-users-
ceph.com/2015-January/045780.html as a real-world, albeit extreme,
example).

Granted, its not at all trivial to fix the warning when it appears due
to the inability to reduce the pg_num for a pool. The resolution
inevitably involves creating a new pool and migrating data to it.
Unfortunately, the Ceph community provides no recommended way to do this
and various options that exist all have their drawbacks. Rados cppool
doesn't copy user versions (e.g. user issued snapshots) and doesn't work
for EC pools. Cache tiering migration may work for most use cases, but
there would need to be windows where the clients would need to reconnect
to talk to the right pool (and possibly to free up an in-use object) -
some suggestions are available at http://ceph.com/geen-categorie/ceph-
pool-migration/.

Overall, when an admin is going to change this configuration setting I
think it'd be best if they were to understand what the implications of
the configuration is and to accept the possible downstream
ramifications. It may not be the OSD that gets killed when it starts
gobbling up memory on the box; it could be an innocent bystander.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1724529

Title:
  ceph health output flips between OK and WARN all the time

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1724529/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1724529] Re: ceph health output flips between OK and WARN all the time

2017-11-30 Thread Billy Olsen
To be clear, comment #6 is aiming to explain why I do not consider this
"an operational monitoring setting".

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1724529

Title:
  ceph health output flips between OK and WARN all the time

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1724529/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1724529] Re: ceph health output flips between OK and WARN all the time

2017-11-30 Thread Drew Freiberger
Also of interest to anyone running into this, you can check active
daemon's injected args with:

ceph --admin-daemon /var/run/ceph/ceph-mon* config show | grep


ex:
ceph --admin-daemon /var/run/ceph/ceph-mon* config show|grep 
mon_pg_warn_max_per_osd
"mon_pg_warn_max_per_osd": "300",

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1724529

Title:
  ceph health output flips between OK and WARN all the time

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1724529/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1724529] Re: ceph health output flips between OK and WARN all the time

2017-11-30 Thread Drew Freiberger
We are seeing this same flapping on another cloud.  One node had rebooted 
yesterday when the HEALTH_WARN flapping began.  Running on trusty/mitaka cloud 
archive 10.2.7 ceph package.
That server is giving the health_warn on the too many pgs.
Ohter server rebooted 7 days ago was not giving warning, and third server was 
still running ceph-mon from March at 10.2.3.  Had to kill ceph-mon (as 
/etc/init.d/ceph restart mon did not work) and that's now running 10.2.7 mon.
Some OSDs running 10.2.6, some running 10.2.7 when I run "ceph tell osd.* 
version"
restart of third mon (up for 10 days) (also required kill command) and now 
error is not flapping.

Seems there's something that either /etc/init.d/ceph command is not
properly allowing for mon restarts (on ceph-charm, not ceph-mon-charm)
when OSDs are present (though haven't tested w/out OSDs present).
Having to kill the process with standard SIG is odd to get the process
to recycle.  Perhaps it's being blocked by init daemon configsside
issue.

I'm guessing what actually has happened is someone did a "ceph tell
mon.*" to ignore the pg counts, and then the restarts caused the setting
to be dropped.  This may be something to re-open against the ceph and
ceph-mon charms to allow for config opts for ceph health_warn configs,
or we can close this bug and open another.

The flapping makes so much more sense in this context of a ceph tell
mon.* having been run in the past.

We've got notes in a related case on another cloud to work-around this
with config-flags setting in the charm, but would love to see more of
these operational monitoring settings exposed by the charm directly
rather than relying on config-flags.

Here's the command to change on the live ceph-mons:
- ceph tell mon.* injectargs '--mon_pg_warn_max_per_osd=900'

Here's the command to configure the juju ceph charm to persist the setting:
- juju set ceph config-flags='{osd: {"mon pg warn max per osd": 900}}'

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1724529

Title:
  ceph health output flips between OK and WARN all the time

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1724529/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1724529] Re: ceph health output flips between OK and WARN all the time

2017-10-26 Thread Marian Gasparovic
While we were collecting sosreport it filled the disk space on ceph-mon
and service had to be restarted. Since then there is no flapping, it is
just correctly reporting HEALTH_WARN all the time.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1724529

Title:
  ceph health output flips between OK and WARN all the time

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1724529/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1724529] Re: ceph health output flips between OK and WARN all the time

2017-10-26 Thread James Page
Hi Marian

Could you run the following command from a monitor unit:

   sudo ceph -w

and hopefully we might see what event is causing the status to toggle as
you see.

** Changed in: ceph (Ubuntu)
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1724529

Title:
  ceph health output flips between OK and WARN all the time

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1724529/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1724529] Re: ceph health output flips between OK and WARN all the time

2017-10-26 Thread James Page
Not a charm specific issue so moving to the Ubuntu ceph package.

** Also affects: ceph (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: charm-ceph-mon
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1724529

Title:
  ceph health output flips between OK and WARN all the time

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1724529/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs