Re: [ceph-users] inconsistent number of pools

2019-05-29 Thread Jan Fajerski

On Tue, May 28, 2019 at 11:50:01AM -0700, Gregory Farnum wrote:

  You’re the second report I’ve seen if this, and while it’s confusing,
  you should be Abel to resolve it by restarting your active manager
  daemon.

Maybe this is related? http://tracker.ceph.com/issues/40011


  On Sun, May 26, 2019 at 11:52 PM Lars Täuber <[1]taeu...@bbaw.de>
  wrote:

Fri, 24 May 2019 21:41:33 +0200
Michel Raabe <[2]rmic...@devnu11.net> ==> Lars Täuber
<[3]taeu...@bbaw.de>, [4]ceph-users@lists.ceph.com :
>
> You can also try
>
> $ rados lspools
> $ ceph osd pool ls
>
> and verify that with the pgs
>
> $ ceph pg ls --format=json-pretty | jq -r '.pg_stats[].pgid' | cut
-d.
> -f1 | uniq
>
Yes, now I know but I still get this:
$ sudo ceph -s
[…]
  data:
pools:   5 pools, 1153 pgs
[…]
and with all other means I get:
$ sudo ceph osd lspools | wc -l
3
Which is what I expect, because all other pools are removed.
But since this has no bad side effects I can live with it.
Cheers,
Lars
___
ceph-users mailing list
[5]ceph-users@lists.ceph.com
[6]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

References

  1. mailto:taeu...@bbaw.de
  2. mailto:rmic...@devnu11.net
  3. mailto:taeu...@bbaw.de
  4. mailto:ceph-users@lists.ceph.com
  5. mailto:ceph-users@lists.ceph.com
  6. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure code profiles and crush rules. Missing link...?

2019-05-22 Thread Jan Fajerski

On Wed, May 22, 2019 at 03:38:27PM +0200, Rainer Krienke wrote:

Am 22.05.19 um 15:16 schrieb Dan van der Ster:

Yes this is basically what I was looking for however I had expected that
its a little better visible in the output...
Mind opening a tracker ticket on http://tracker.ceph.com/ so we can have this 
added to the non-json output of ceph osd pool ls detail?


Rainer


Is this what you're looking for?

# ceph osd pool ls detail  -f json | jq .[0].erasure_code_profile
"jera_4plus2"

-- Dan


--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
Web: http://userpages.uni-koblenz.de/~krienke
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume lvm batch OSD replacement

2019-03-19 Thread Jan Fajerski
;
> > > > Dan
> > > >
> > > > P.S:
> > > >
> > > > = osd.240 ==
> > > >
> > > >   [  db]
/dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
> > > >
> > > >   type  db
> > > >   osd id240
> > > >   cluster fsid  b4f463a0-c671-43a8-bd36-e40ab8d233d2
> > > >       cluster name  ceph
> > > >   osd fsid  d4d1fb15-a30a-4325-8628-706772ee4294
> > > >   db device
> > > > 
/dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
> > > >   encrypted 0
> > > >   db uuid   iWWdyU-UhNu-b58z-ThSp-Bi3B-19iA-06iJIc
> > > >   cephx lockbox secret
> > > >   block uuidu4326A-Q8bH-afPb-y7Y6-ftNf-TE1X-vjunBd
> > > >   block device
> > > > 
/dev/ceph-f78ff8a3-803d-4b6d-823b-260b301109ac/osd-data-9e4bf34d-1aa3-4c0a-9655-5dba52dcfcd7
> > > >   vdo   0
> > > >   crush device classNone
> > > >   devices   /dev/sdac

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] migrate ceph-disk to ceph-volume fails with dmcrypt

2019-01-23 Thread Jan Fajerski

On Wed, Jan 23, 2019 at 10:01:05AM +0100, Manuel Lausch wrote:

Hi,

thats a bad news.

round about 5000 OSDs are affected from this issue. It's not realy a
solution to redeploy this OSDs.

Is it possible to migrate the local keys to the monitors?
I see that the OSDs with the "lockbox feature" has only one key for
data and journal partition and the older OSDs have individual keys for
journal and data. Might this be a problem?

And a other question.
Is it a good idea to mix ceph-disk and ceph-volume managed OSDSs on one
host?
So I could only migrate newer OSDs to ceph-volume and deploy new
ones (after disk replacements) with ceph-volume until hopefuly there is
a solution.
I might be wrong on this, since its been a while since I played with that. But 
iirc you can't migrate a subset of ceph-disk OSDs to ceph-volume on one host.  
Once you run ceph-volume simple activate, the ceph-disk systemd units and udev 
profiles will be disabled. While the remaining ceph-disk OSDs will continue to 
run, they won't come up after a reboot.
I'm sure there's a way to get them running again, but I imagine you'd rather not 
manually deal with that.


Regards
Manuel


On Tue, 22 Jan 2019 07:44:02 -0500
Alfredo Deza  wrote:



This is one case we didn't anticipate :/ We supported the wonky
lockbox setup and thought we wouldn't need to go further back,
although we did add support for both
plain and luks keys.

Looking through the code, it is very tightly couple to
storing/retrieving keys from the monitors, and I don't know what
workarounds might be possible here other than throwing away the OSD
and deploying a new one (I take it this is not an option for you at
all)



Manuel Lausch

Systemadministrator
Storage Services

1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 |
76135 Karlsruhe | Germany Phone: +49 721 91374-1847
E-Mail: manuel.lau...@1und1.de | Web: www.1und1.de

Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 5452

Geschäftsführer: Thomas Ludwig, Jan Oetjen, Sascha Vollmer


Member of United Internet

Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte
Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat
sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie
bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem
bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern,
weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu
verwenden.

This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient of this e-mail, you are hereby
notified that saving, distribution or use of the content of this e-mail
in any way is prohibited. If you have received this e-mail in error,
please notify the sender and delete the e-mail.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd IO monitoring

2018-12-02 Thread Jan Fajerski

On Thu, Nov 29, 2018 at 11:48:35PM -0500, Michael Green wrote:

  Hello collective wisdom,

  Ceph neophyte here, running v13.2.2 (mimic).

  Question: what tools are available to monitor IO stats on RBD level?
  That is, IOPS, Throughput, IOs inflight and so on?
There is some brand new code for rbd io monitoring. This PR 
(https://github.com/ceph/ceph/pull/25114) added rbd client side perf counters 
and this PR (https://github.com/ceph/ceph/pull/25358) will add those counters as 
prometheus metrics. There is also room for an "rbd top" tool, though I haven't 
seen any code for this.
I'm sure Mykola (the author of both PRs) could go into more detail if needed. I 
expect this functionality to land in nautilus.


  I'm testing with FIO and want to verify independently the IO load on
  each RBD image.

  --
  Michael Green
  Customer Support & Integration
  [1]gr...@e8storage.com

References

  1. mailto:gr...@e8storage.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs quota limit

2018-11-07 Thread Jan Fajerski

On Tue, Nov 06, 2018 at 08:57:48PM +0800, Zhenshi Zhou wrote:

  Hi,
  I'm wondering whether cephfs have quota limit options.
  I use kernel client and ceph version is 12.2.8.
  Thanks

CephFS has quota support, see http://docs.ceph.com/docs/luminous/cephfs/quota/.
The kernel has recently gained CephFS quota support too (before only the fuse 
client supported it) so it depends on your distro and kernel version.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CfP FOSDEM'19 Software Defined Storage devroom

2018-10-11 Thread Jan Fajerski

CfP for the Software Defined Storage devroom at FOSDEM 2019
(Brussels, Belgium, February 3rd).

FOSDEM is a free software event that offers open source communities a place to
meet, share ideas and collaborate.  It is renown for being highly developer-
oriented and brings together 8000+ participants from all over the world.  It
is held in the city of Brussels (Belgium).

FOSDEM 2019 will take place during the weekend of February 2nd-3rd 2019. More
details about the event can be found at http://fosdem.org/

** Call For Participation

The Software Defined Storage devroom will go into it's third round for
talks around Open Source Software Defined Storage projects, management tools
and real world deployments.

Presentation topics could include but are not limited too:

- Your work on a SDS project like Ceph, Gluster, OpenEBS or LizardFS

- Your work on or with SDS related projects like SWIFT or Container Storage
 Interface

- Management tools for SDS deployments

- Monitoring tools for SDS clusters

** Important dates:

- Nov 25th 2018:  submission deadline for talk proposals
- Dec 17th 2018:  announcement of the final schedule
- Feb  3rd 2019:  Software Defined Storage dev room

Talk proposals will be reviewed by a steering committee:
- Niels de Vos (Gluster Developer - RedHat)
- Jan Fajerski (Ceph Developer - SUSE)
- other volunteers TBA

Use the FOSDEM 'pentabarf' tool to submit your proposal:
https://penta.fosdem.org/submission/FOSDEM19

- If necessary, create a Pentabarf account and activate it.
 Please reuse your account from previous years if you have
 already created it.

- In the "Person" section, provide First name, Last name
 (in the "General" tab), Email (in the "Contact" tab)
 and Bio ("Abstract" field in the "Description" tab).

- Submit a proposal by clicking on "Create event".

- Important! Select the "Software Defined Storage devroom" track
 (on the "General" tab).

- Provide the title of your talk ("Event title" in the "General" tab).

- Provide a description of the subject of the talk and the
 intended audience (in the "Abstract" field of the "Description" tab)

- Provide a rough outline of the talk or goals of the session (a short
 list of bullet points covering topics that will be discussed) in the
 "Full description" field in the "Description" tab

- Provide an expected length of your talk in the "Duration" field. Please
 count at least 10 minutes of discussion into your proposal plus allow
 5 minutes for the handover to the next presenter.
 Suggested talk length would be 20+10 and 45+15 minutes.

** Recording of talks

The FOSDEM organizers plan to have live streaming and recording fully working,
both for remote/later viewing of talks, and so that people can watch streams
in the hallways when rooms are full. This requires speakers to consent to
being recorded and streamed. If you plan to be a speaker, please understand
that by doing so you implicitly give consent for your talk to be recorded and
streamed. The recordings will be published under the same license as all
FOSDEM content (CC-BY).

Hope to hear from you soon! And please forward this announcement.

If you have any further questions, please write to the mailinglist at
storage-devr...@lists.fosdem.org and we will try to answer as soon as
possible.

Thanks!

--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster Security

2018-09-20 Thread Jan Fajerski

Hi,
if you want to isolate your HV from ceph's public network a gateway would do 
that (like iscsi gateway). Note however that this will also add an extra network 
hop and a potential bottleneck since all client traffic has to pass through the 
gateway node(s).


HTH,
Jan

On Wed, Sep 19, 2018 at 01:05:06PM +0200, Florian Florensa wrote:

Hello everyone,

I am currently working on the design of a ceph cluster, and i was
asking myself some question regarding the security of the cluster.
(Cluster should be deployed using Luminous on Ubuntu 16.04)

Technically, we would have HVs exploiting the block storage, but we
are in a position where we can't trust the VM that is running, thus,
the HV can eventually get compromised, so how can we do to avoid a
compromised hypervisor from compromising the safety of the data on the
ceph cluster ?
Using iscsi ? Using one key-ring per hypervisor ? Anything else ?

Regards,

Florian.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic + cephmetrics + prometheus - working ?

2018-09-05 Thread Jan Fajerski
I'm not the expert when it comes to cephmetrics but I think (at least until very 
recently) cephmetrics relies on other exporters besides the mgr module and the 
node_exporter.


On Mon, Aug 27, 2018 at 01:11:29PM -0400, Steven Vacaroaia wrote:

  Hi
  has anyone been able to use Mimic + cephmetric + prometheus ?
  I am struggling to make it fully functional as it appears data provided
  by node_exporter has a different name than the one grafana expectes
  As a result of the above, only certain dashboards are being populated (
  the ones ceph specific)
  while others have "no data points" ( the ones server specific)
  Any advice/suggestion/troubleshooting tips will be greatly appreciated
  Example:
  Grafana latency by server uses
  node_disk_read_time_ms
  but node_exporter does not provide it
   curl [1]http://osd01:9100/metrics | grep node_disk_read_time
% Total% Received % Xferd  Average Speed   TimeTime Time
  Current
   Dload  Upload   Total   SpentLeft
  Speed
0 00 00 0  0  0 --:--:-- --:--:--
  --:--:-- 0# HELP node_disk_read_time_seconds_total The total number
  of milliseconds spent by all reads.
  # TYPE node_disk_read_time_seconds_total counter
  node_disk_read_time_seconds_total{device="dm-0"} 8910.801
  node_disk_read_time_seconds_total{device="sda"} 0.525
  node_disk_read_time_seconds_total{device="sdb"} 14221.732
  node_disk_read_time_seconds_total{device="sdc"} 0.465
  node_disk_read_time_seconds_total{device="sdd"} 0.46
  node_disk_read_time_seconds_total{device="sde"} 0.017
  node_disk_read_time_seconds_total{device="sdf"} 455.064
  node_disk_read_time_seconds_total{device="sr0"} 0
  100 64683  100 646830 0  4452k  0 --:--:-- --:--:--
  --:--:-- 5263k

References

  1. http://osd01:9100/metrics



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic - troubleshooting prometheus

2018-09-05 Thread Jan Fajerski
The prometheus plugin currently skips histogram perf counters. The 
representation in ceph is not compatible with prometheus' approach (iirc).  
However I believe most, if not all of the perf counters should be exported as 
long running averages. Look for metric pair that are named some_metric_name_sum 
and some_metric_name_count.


HTH,
Jan

On Fri, Aug 24, 2018 at 01:47:40PM -0400, Steven Vacaroaia wrote:

  Hi,
  Any idea/suggestions for troubleshooting prometheus ?
  what logs /commands are available to find out why OSD servers specific
  data ( IOPS, disk and network data) is not scrapped but cluster
  specific data ( pools, capacity ..etc) is ?
  Increasing log level for MGR showed only the following
  2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_r_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_in_bytes_histogram, type
  2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_w_latency_in_bytes_histogram, type
  2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_r_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_in_bytes_histogram, type
  2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_w_latency_in_bytes_histogram, type
  2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_r_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_in_bytes_histogram, type
  2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_w_latency_in_bytes_histogram, type



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to secure Prometheus endpoints (mgr plugin and node_exporter)

2018-09-05 Thread Jan Fajerski

Hi Martin,
hope this is still useful, despite the lag.

On Fri, Jun 29, 2018 at 01:04:09PM +0200, Martin Palma wrote:

Since Prometheus uses a pull model over HTTP for collecting metrics.
What are the best practices to secure these HTTP endpoints?

- With a reverse proxy with authentication?
This is currently the recommended way to secure prometheus traffic with TLS or 
authentication. See also 
https://prometheus.io/docs/introduction/faq/#why-don-t-the-prometheus-server-components-support-tls-or-authentication-can-i-add-those 
for more info.
However native support for TLS and authentication has just been put on the 
roadmap in August.

- Export the node_exporter only on the cluster network? (not usable
for the mgr plugin and for nodes like mons, mdss,...)
- No security at all?

Best,
Martin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] alert conditions

2018-08-21 Thread Jan Fajerski
Fwiw I added a few things to https://pad.ceph.com/p/alert-conditions and will 
circulate this mail a bit wider.

Or maybe there is not all that much interest in alerting...

On Mon, Jul 23, 2018 at 06:10:04PM +0200, Jan Fajerski wrote:

Hi community,
the topic of alerting conditions for a ceph cluster comes up in 
various contexts. Some folks use prometheus or grafana, (I believe) 
sopme people would like snmp traps from ceph, the mgr dashboard could 
provide basic alerting capabilities and there is of course ceph -s.

Also see "Improving alerting/health checks" on ceph-devel.

Working on some prometheus stuff I think it would be nice to have some 
basic alerting rules in the ceph repo. This could serve as a 
out-of-the-box default as well as a example or best practice which 
conditions should be watched.


So I'm wondering what does the community think? What do operators use 
as alert conditions or find alert-worthy?
I'm aware that this is very open-ended, highly dependent on the 
cluster and its workload and can range from obvious (health_err 
anyone?) to intricate conditions that are designed for a certain 
cluster. I'm wondering if we can distill some non-trivial alert 
conditions that ceph itself does not (yet) provide.


If you have any conditions fitting that description, feel free to add 
them to https://pad.ceph.com/p/alert-conditions. Otherwise looking 
forward to feedback.


jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] alert conditions

2018-07-23 Thread Jan Fajerski

Hi community,
the topic of alerting conditions for a ceph cluster comes up in various 
contexts. Some folks use prometheus or grafana, (I believe) sopme people would 
like snmp traps from ceph, the mgr dashboard could provide basic alerting 
capabilities and there is of course ceph -s.

Also see "Improving alerting/health checks" on ceph-devel.

Working on some prometheus stuff I think it would be nice to have some basic 
alerting rules in the ceph repo. This could serve as a out-of-the-box default as 
well as a example or best practice which conditions should be watched.


So I'm wondering what does the community think? What do operators use as alert 
conditions or find alert-worthy?
I'm aware that this is very open-ended, highly dependent on the cluster and its 
workload and can range from obvious (health_err anyone?) to intricate conditions 
that are designed for a certain cluster. I'm wondering if we can distill some 
non-trivial alert conditions that ceph itself does not (yet) provide.


If you have any conditions fitting that description, feel free to add them to 
https://pad.ceph.com/p/alert-conditions. Otherwise looking forward to feedback.


jan

--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Jan Fajerski

On Mon, May 07, 2018 at 02:45:14PM +0200, Kurt Bauer wrote:




Jan Fajerski <mailto:jfajer...@suse.com>
7. May 2018 at 14:21
On Mon, May 07, 2018 at 02:05:59PM +0200, Kurt Bauer wrote:

 Hi Jan,
 first of all thanks for this dashboard.
 A few comments:
 -) 'vonage-status-panel' is needed, which isn't mentioned in the 
ReadMe

Yes, my bad. Will update the README

 -) Using ceph 12.2.4 the mon metric for me is apparently called
 'ceph_mon_quorum_count' not 'ceph_mon_quorum_status'

I'll also add to the readme: The dashboard is based on Ceph Mimic.

 And a question:
 Is there a way to get the Cluster IOPS with prometheus metrics? I did
 this with collectd, but can't find a suitable metric from ceph-mgr.

Yes...at least in Mimic the metrics are called ceph_osd_op[_r,_w,_rw]
Thanks, these metrics are in Luminous too. I seem unable to find some 
sort of register, to see which metrics mean what. Some are quite 
obvious, but others are a mystery. Does smth. like that exist 
somewhere?

Not yet.
Most daemon specific metric names (like ceph_osd_op[_r,_w,_rw) are derived 
directly from the respective perf counter names. The plugin exports all perf 
counters with PRIO_INTERESTING or higher (iirc).

An automatically created index would certainly be feasible.


Thanks.


 Best regards,
 Kurt

 [1]Jan Fajerski
 7. May 2018 at 12:32

 Hi all,
 I'd like to request comments and feedback about a Grafana 
dashboard for

 Ceph cluster monitoring.
 [2]https://youtu.be/HJquM127wMY
 [3]https://github.com/ceph/ceph/pull/21850
 The goal is to eventually have a set of default dashboards in the Ceph
 repository that offer decent monitoring for clusters of various (maybe
 even all) sizes and applications, or at least serve as a 
starting point

 for customizations.
 --
 To unsubscribe from this list: send the line "unsubscribe ceph-devel"
 in
 the body of a message to [4]majord...@vger.kernel.org
 More majordomo info at [5]http://vger.kernel.org/majordomo-info.html

References

 1. mailto:jfajer...@suse.com
 2. https://youtu.be/HJquM127wMY
 3. https://github.com/ceph/ceph/pull/21850
 4. mailto:majord...@vger.kernel.org
 5. http://vger.kernel.org/majordomo-info.html



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Kurt Bauer <mailto:kurt.ba...@univie.ac.at>
7. May 2018 at 14:05
Hi Jan,

first of all thanks for this dashboard.
A few comments:
-) 'vonage-status-panel' is needed, which isn't mentioned in the ReadMe
-) Using ceph 12.2.4 the mon metric for me is apparently called 
'ceph_mon_quorum_count' not 'ceph_mon_quorum_status'


And a question:
Is there a way to get the Cluster IOPS with prometheus metrics? I 
did this with collectd, but can't find a suitable metric from 
ceph-mgr.


Best regards,
Kurt




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--

Kurt Bauer<kurt.ba...@univie.ac.at>
Vienna University Computer Center - ACOnet - VIX
Universitaetsstrasse 7, A-1010 Vienna, Austria, Europe
Tel: ++431 4277  - 14070 (Fax: - 814070)  KB1970-RIPE

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Jan Fajerski

On Mon, May 07, 2018 at 02:05:59PM +0200, Kurt Bauer wrote:

  Hi Jan,
  first of all thanks for this dashboard.
  A few comments:
  -) 'vonage-status-panel' is needed, which isn't mentioned in the ReadMe

Yes, my bad. Will update the README

  -) Using ceph 12.2.4 the mon metric for me is apparently called
  'ceph_mon_quorum_count' not 'ceph_mon_quorum_status'

I'll also add to the readme: The dashboard is based on Ceph Mimic.

  And a question:
  Is there a way to get the Cluster IOPS with prometheus metrics? I did
  this with collectd, but can't find a suitable metric from ceph-mgr.

Yes...at least in Mimic the metrics are called ceph_osd_op[_r,_w,_rw]

  Best regards,
  Kurt

  [1]Jan Fajerski
  7. May 2018 at 12:32

  Hi all,
  I'd like to request comments and feedback about a Grafana dashboard for
  Ceph cluster monitoring.
  [2]https://youtu.be/HJquM127wMY
  [3]https://github.com/ceph/ceph/pull/21850
  The goal is to eventually have a set of default dashboards in the Ceph
  repository that offer decent monitoring for clusters of various (maybe
  even all) sizes and applications, or at least serve as a starting point
  for customizations.
  --
  To unsubscribe from this list: send the line "unsubscribe ceph-devel"
  in
  the body of a message to [4]majord...@vger.kernel.org
  More majordomo info at  [5]http://vger.kernel.org/majordomo-info.html

References

  1. mailto:jfajer...@suse.com
  2. https://youtu.be/HJquM127wMY
  3. https://github.com/ceph/ceph/pull/21850
  4. mailto:majord...@vger.kernel.org
  5. http://vger.kernel.org/majordomo-info.html



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Jan Fajerski

Hi all,
I'd like to request comments and feedback about a Grafana dashboard for Ceph 
cluster monitoring.


https://youtu.be/HJquM127wMY

https://github.com/ceph/ceph/pull/21850

The goal is to eventually have a set of default dashboards in the Ceph 
repository that offer decent monitoring for clusters of various (maybe even all) 
sizes and applications, or at least serve as a starting point for 
customizations.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-mgr Python error with prometheus plugin

2018-02-16 Thread Jan Fajerski

On Fri, Feb 16, 2018 at 09:27:08AM +0100, Ansgar Jazdzewski wrote:

Hi Folks,

i just try to get the prometheus plugin up and runing but as soon as i
browse /metrics i got:

500 Internal Server Error
The server encountered an unexpected condition which prevented it from
fulfilling the request.

Traceback (most recent call last):
 File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line
670, in respond
   response.body = self.handler()
 File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py",
line 217, in __call__
   self.body = self.oldhandler(*args, **kwargs)
 File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py",
line 61, in __call__
   return self.callable(*self.args, **self.kwargs)
 File "/usr/lib/ceph/mgr/prometheus/module.py", line 386, in metrics
   metrics = global_instance().collect()
 File "/usr/lib/ceph/mgr/prometheus/module.py", line 323, in collect
   self.get_metadata_and_osd_status()
 File "/usr/lib/ceph/mgr/prometheus/module.py", line 283, in
get_metadata_and_osd_status
   dev_class['class'],
KeyError: 'class'
This error is part of the osd metadata metric. Which version of Ceph are you 
running this with? Specifically the Crush Map of this cluster seems to not have 
the device class for each OSD yet.


I assume that i have to change the mkgr cephx kex? but iam not 100% sure

mgr.mgr01
  key: AQAqLIRasocnChAAbOIEMKVEWWHCbgVeEctwng==
  caps: [mds] allow *
  caps: [mon] allow profile mgr
  caps: [osd] allow *

thanks for your help,
Ansgar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] formatting bytes and object counts in ceph status ouput

2018-01-09 Thread Jan Fajerski

On Tue, Jan 02, 2018 at 04:54:55PM +, John Spray wrote:

On Tue, Jan 2, 2018 at 10:43 AM, Jan Fajerski <jfajer...@suse.com> wrote:

Hi lists,
Currently the ceph status output formats all numbers with binary unit
prefixes, i.e. 1MB equals 1048576 bytes and an object count of 1M equals
1048576 objects.  I received a bug report from a user that printing object
counts with a base 2 multiplier is confusing (I agree) so I opened a bug and
https://github.com/ceph/ceph/pull/19117.
In the PR discussion a couple of questions arose that I'd like to get some
opinions on:



- Should we print binary unit prefixes (MiB, GiB, ...) since that would be
technically correct?


I'm not a fan of the technically correct base 2 units -- they're still
relatively rarely used, and I've spent most of my life using kB to
mean 1024, not 1000.
We could start changing the "rarely used" part ;) But I can certainly live with 
keeping the old units.



- Should counters (like object counts) be formatted with a base 10
multiplier or  a multiplier woth base 2?


I prefer base 2 for any dimensionless quantities (or rates thereof) in
computing.  Metres and kilograms go in base 10, bytes go in base 2.

It's all very subjective and a matter of opinion of course, and my
feelings aren't particularly strong :-)
As far as I understand the standards regarding this (IEC 60027, ISO/IEC 8, 
probably more) are talking about base 2 units for digital data related units 
only. I might of course misunderstand.
What is problematic I find is that other tools will (mostly?) use base 10 units 
for everything not data related. Say I plot the object count of ceph in Grafana.  
It'll use base 10 multipliers for a dimensionless number. Since Grafana (and I 
imagine other toolsllike this) consume raw numbers we'll end up with Grafana 
displaying a different object count then "ceph -s". Say 1.04M vs 1M. Now this is 
not terrible but it'll get worse with higher counts quickly.
In the original tracker issue it's noted that this was reported with cluster 
containing 7150896726 objects. The difference from grafana to "ceph -s" was 
7150M vs 6835M.


John


My proposal would be to both use binary unit prefixes and use base 10
multipliers for counters. I think this aligns with user expectations as well
as the relevant standard(s?).

Best,
Jan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] formatting bytes and object counts in ceph status ouput

2018-01-02 Thread Jan Fajerski

Hi lists,
Currently the ceph status output formats all numbers with binary unit prefixes, 
i.e. 1MB equals 1048576 bytes and an object count of 1M equals 1048576 objects.  
I received a bug report from a user that printing object counts with a base 2 
multiplier is confusing (I agree) so I opened a bug and 
https://github.com/ceph/ceph/pull/19117.
In the PR discussion a couple of questions arose that I'd like to get some 
opinions on:
- Should we print binary unit prefixes (MiB, GiB, ...) since that would be 
 technically correct?
- Should counters (like object counts) be formatted with a base 10 multiplier or 
 a multiplier woth base 2?


My proposal would be to both use binary unit prefixes and use base 10 
multipliers for counters. I think this aligns with user expectations as well as 
the relevant standard(s?).


Best,
Jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] FOSDEM Call for Participation: Software Defined Storage devroom

2017-10-12 Thread Jan Fajerski


CfP for the Software Defined Storage devroom at FOSDEM 2018
(Brussels, Belgium, February 4th).

FOSDEM is a free software event that offers open source communities a place to
meet, share ideas and collaborate.  It is renown for being highly developer-
oriented and brings together 8000+ participants from all over the world.  It
is held in the city of Brussels (Belgium).

FOSDEM 2018 will take place during the weekend of February 3rd-4th 2018. More
details about the event can be found at http://fosdem.org/

** Call For Participation

The Software Defined Storage devroom will go into it's second round for
talks around Open Source Software Defined Storage projects, management tools
and real world deployments.

Presentation topics could include but are not limited too:

- Your work on a SDS project like Ceph, GlusterFS or LizardFS

- Your work on or with SDS related projects like SWIFT or Container Storage
Interface

- Management tools for SDS deployments

- Monitoring tools for SDS clusters

** Important dates:

- 26 Nov 2017:  submission deadline for talk proposals
- 15 Dec 2017:  announcement of the final schedule
-  4 Feb 2018:  Software Defined Storage dev room

Talk proposals will be reviewed by a steering committee:
- Leonardo Vaz (Ceph Community Manager - Red Hat Inc.)
- Joao Luis (Core Ceph contributor - SUSE)
- Jan Fajerski (Ceph Developer - SUSE)

Use the FOSDEM 'pentabarf' tool to submit your proposal:
https://penta.fosdem.org/submission/FOSDEM18

- If necessary, create a Pentabarf account and activate it.
Please reuse your account from previous years if you have
already created it.

- In the "Person" section, provide First name, Last name
(in the "General" tab), Email (in the "Contact" tab)
and Bio ("Abstract" field in the "Description" tab).

- Submit a proposal by clicking on "Create event".

- Important! Select the "Software Defined Storage devroom" track
(on the "General" tab).

- Provide the title of your talk ("Event title" in the "General" tab).

- Provide a description of the subject of the talk and the
intended audience (in the "Abstract" field of the "Description" tab)

- Provide a rough outline of the talk or goals of the session (a short
list of bullet points covering topics that will be discussed) in the
"Full description" field in the "Description" tab

- Provide an expected length of your talk in the "Duration" field. Please
count at least 10 minutes of discussion into your proposal.

Suggested talk length would be 15, 20+10, 30+15, and 45+15 minutes.

** Recording of talks

The FOSDEM organizers plan to have live streaming and recording fully working,
both for remote/later viewing of talks, and so that people can watch streams
in the hallways when rooms are full. This requires speakers to consent to
being recorded and streamed. If you plan to be a speaker, please understand
that by doing so you implicitly give consent for your talk to be recorded and
streamed. The recordings will be published under the same license as all
FOSDEM content (CC-BY).

Hope to hear from you soon! And please forward this announcement.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com