Re: [ceph-users] mimic + cephmetrics + prometheus - working ?

2018-09-04 Thread Jan Fajerski
I'm not the expert when it comes to cephmetrics but I think (at least until very 
recently) cephmetrics relies on other exporters besides the mgr module and the 
node_exporter.


On Mon, Aug 27, 2018 at 01:11:29PM -0400, Steven Vacaroaia wrote:

  Hi
  has anyone been able to use Mimic + cephmetric + prometheus ?
  I am struggling to make it fully functional as it appears data provided
  by node_exporter has a different name than the one grafana expectes
  As a result of the above, only certain dashboards are being populated (
  the ones ceph specific)
  while others have "no data points" ( the ones server specific)
  Any advice/suggestion/troubleshooting tips will be greatly appreciated
  Example:
  Grafana latency by server uses
  node_disk_read_time_ms
  but node_exporter does not provide it
   curl [1]http://osd01:9100/metrics | grep node_disk_read_time
% Total% Received % Xferd  Average Speed   TimeTime Time
  Current
   Dload  Upload   Total   SpentLeft
  Speed
0 00 00 0  0  0 --:--:-- --:--:--
  --:--:-- 0# HELP node_disk_read_time_seconds_total The total number
  of milliseconds spent by all reads.
  # TYPE node_disk_read_time_seconds_total counter
  node_disk_read_time_seconds_total{device="dm-0"} 8910.801
  node_disk_read_time_seconds_total{device="sda"} 0.525
  node_disk_read_time_seconds_total{device="sdb"} 14221.732
  node_disk_read_time_seconds_total{device="sdc"} 0.465
  node_disk_read_time_seconds_total{device="sdd"} 0.46
  node_disk_read_time_seconds_total{device="sde"} 0.017
  node_disk_read_time_seconds_total{device="sdf"} 455.064
  node_disk_read_time_seconds_total{device="sr0"} 0
  100 64683  100 646830 0  4452k  0 --:--:-- --:--:--
  --:--:-- 5263k

References

  1. http://osd01:9100/metrics



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mimic - troubleshooting prometheus

2018-09-04 Thread Jan Fajerski
The prometheus plugin currently skips histogram perf counters. The 
representation in ceph is not compatible with prometheus' approach (iirc).  
However I believe most, if not all of the perf counters should be exported as 
long running averages. Look for metric pair that are named some_metric_name_sum 
and some_metric_name_count.


HTH,
Jan

On Fri, Aug 24, 2018 at 01:47:40PM -0400, Steven Vacaroaia wrote:

  Hi,
  Any idea/suggestions for troubleshooting prometheus ?
  what logs /commands are available to find out why OSD servers specific
  data ( IOPS, disk and network data) is not scrapped but cluster
  specific data ( pools, capacity ..etc) is ?
  Increasing log level for MGR showed only the following
  2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_r_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_in_bytes_histogram, type
  2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_w_latency_in_bytes_histogram, type
  2018-08-24 13:46:23.395 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_r_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_in_bytes_histogram, type
  2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_w_latency_in_bytes_histogram, type
  2018-08-24 13:46:23.396 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_r_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_out_bytes_histogram, type
  2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_rw_latency_in_bytes_histogram, type
  2018-08-24 13:46:23.397 7f73d54ce700 20 mgr[prometheus] ignoring
  osd.op_w_latency_in_bytes_histogram, type



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to secure Prometheus endpoints (mgr plugin and node_exporter)

2018-09-04 Thread Jan Fajerski

Hi Martin,
hope this is still useful, despite the lag.

On Fri, Jun 29, 2018 at 01:04:09PM +0200, Martin Palma wrote:

Since Prometheus uses a pull model over HTTP for collecting metrics.
What are the best practices to secure these HTTP endpoints?

- With a reverse proxy with authentication?
This is currently the recommended way to secure prometheus traffic with TLS or 
authentication. See also 
https://prometheus.io/docs/introduction/faq/#why-don-t-the-prometheus-server-components-support-tls-or-authentication-can-i-add-those 
for more info.
However native support for TLS and authentication has just been put on the 
roadmap in August.

- Export the node_exporter only on the cluster network? (not usable
for the mgr plugin and for nodes like mons, mdss,...)
- No security at all?

Best,
Martin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-09-04 Thread Jones de Andrade
Hi Eugen.

Just tried everything again here by removing the /sda4 partitions and
letting it so that either salt-run proposal-populate or salt-run state.orch
ceph.stage.configure could try to find the free space on the partitions to
work with: unsuccessfully again. :(

Just to make things clear: are you so telling me that it is completely
impossible to have a ceph "volume" in non-dedicated devices, sharing space
with, for instance, the nodes swap, boot or main partition?

And so the only possible way to have a functioning ceph distributed
filesystem working would be by having in each node at least one disk
dedicated for the operational system and another, independent disk
dedicated to the ceph filesystem?

That would be a awful drawback in our plans if real, but if there is no
other way, we will have to just give up. Just, please, answer this two
questions clearly, before we capitulate?  :(

Anyway, thanks a lot, once again,

Jones

On Mon, Sep 3, 2018 at 5:39 AM Eugen Block  wrote:

> Hi Jones,
>
> I still don't think creating an OSD on a partition will work. The
> reason is that SES creates an additional partition per OSD resulting
> in something like this:
>
> vdb   253:16   05G  0 disk
> ├─vdb1253:17   0  100M  0 part /var/lib/ceph/osd/ceph-1
> └─vdb2253:18   0  4,9G  0 part
>
> Even with external block.db and wal.db on additional devices you would
> still need two partitions for the OSD. I'm afraid with your setup this
> can't work.
>
> Regards,
> Eugen
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to setup Ceph OSD auto boot up on node reboot

2018-09-04 Thread Pardhiv Karri
Hi,

I created a ceph cluster  manually (not using ceph-deploy). When I reboot
the node the osd's doesn't come backup because the OS doesn't know that it
need to bring up the OSD. I am running this on Ubuntu 1604. Is there a
standardized way to initiate ceph osd start on node reboot?

"sudo start ceph-osd-all" isn't working well and doesn't like the idea
of "sudo start ceph-osd id=1" for each OSD in rc file.

Need to do it for both Hammer (Ubuntu 1404) and Luminous (Ubuntu 1604).

--
Thanks,
Pardhiv Karri
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS small files overhead

2018-09-04 Thread Sergey Malinin
You need to re-deploy OSDs for bluestore_min_alloc_size to take effect.

> On 4.09.2018, at 18:31, andrew w goussakovski  wrote:
> 
> Hello
> 
> We are trying to use cephfs as storage for web graphics, such as
> thumbnails and so on.
> Is there any way to reduse overhead on storage? On test cluster we have
> 1 fs, 2 pools (meta and data) with replica size = 2
> 
> objects: 1.02 M objects, 1.1 GiB
> usage:   144 GiB used, 27 GiB / 172 GiB avail
> 
> So we have (144/2)/1.1*100%=6500% overhead.
> 
> ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
> (stable)
> osd storage - bluestore (changing bluestore_min_alloc_size makes no
> visible effect)
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS small files overhead

2018-09-04 Thread Paul Emmerich
You could probably cut the overhead in half with the inline data
feature: 
http://docs.ceph.com/docs/master/cephfs/experimental-features/#inline-data
However, that is an experimental feature.

CephFS is unfortunately not very good at storing lots of small files
in a storage-efficient manner :(

Paul

2018-09-04 17:31 GMT+02:00 andrew w goussakovski :
> Hello
>
> We are trying to use cephfs as storage for web graphics, such as
> thumbnails and so on.
> Is there any way to reduse overhead on storage? On test cluster we have
> 1 fs, 2 pools (meta and data) with replica size = 2
>
> objects: 1.02 M objects, 1.1 GiB
> usage:   144 GiB used, 27 GiB / 172 GiB avail
>
> So we have (144/2)/1.1*100%=6500% overhead.
>
> ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
> (stable)
> osd storage - bluestore (changing bluestore_min_alloc_size makes no
> visible effect)
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v12.2.8 Luminous released

2018-09-04 Thread Abhishek Lekshmanan

We're glad to announce the next point release in the Luminous v12.2.X
stable release series. This release contains a range of bugfixes and
stability improvements across all the components of ceph. For detailed
release notes with links to tracker issues and pull requests, refer to
the blog post at http://ceph.com/releases/v12-2-8-released/

Upgrade Notes from previous luminous releases
-

When upgrading from v12.2.5 or v12.2.6 please note that upgrade caveats from
12.2.5 will apply to any _newer_ luminous version including 12.2.8. Please read
the notes at 
https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6

For the cluster that installed the broken 12.2.6 release, 12.2.7 fixed the
regression and introduced a workaround option `osd distrust data digest = true`,
but 12.2.7 clusters still generated health warnings like ::

  [ERR] 11.288 shard 207: soid
  11:1155c332:::rbd_data.207dce238e1f29.0527:head data_digest
  0xc8997a5b != data_digest 0x2ca15853


12.2.8 improves the deep scrub code to automatically repair these
inconsistencies. Once the entire cluster has been upgraded and then fully deep
scrubbed, and all such inconsistencies are resolved; it will be safe to disable
the `osd distrust data digest = true` workaround option.

Changelog
-
* bluestore: set correctly shard for existed Collection (issue#24761, pr#22860, 
Jianpeng Ma)
* build/ops: Boost system library is no longer required to compile and link 
example librados program (issue#25054, pr#23202, Nathan Cutler)
* build/ops: Bring back diff -y for non-FreeBSD (issue#24396, issue#21664, 
pr#22848, Sage Weil, David Zafman)
* build/ops: install-deps.sh fails on newest openSUSE Leap (issue#25064, 
pr#23179, Kyr Shatskyy)
* build/ops: Mimic build fails with -DWITH_RADOSGW=0 (issue#24437, pr#22864, 
Dan Mick)
* build/ops: order rbdmap.service before remote-fs-pre.target (issue#24713, 
pr#22844, Ilya Dryomov)
* build/ops: rpm: silence osd block chown (issue#25152, pr#23313, Dan van der 
Ster)
* cephfs-journal-tool: Fix purging when importing an zero-length journal 
(issue#24239, pr#22980, yupeng chen, zhongyan gu)
* cephfs: MDSMonitor: uncommitted state exposed to clients/mdss (issue#23768, 
pr#23013, Patrick Donnelly)
* ceph-fuse mount failed because no mds (issue#22205, pr#22895, liyan)
* ceph-volume add a __release__ string, to help version-conditional calls 
(issue#25170, pr#23331, Alfredo Deza)
* ceph-volume: adds test for `ceph-volume lvm list /dev/sda` (issue#24784, 
issue#24957, pr#23350, Andrew Schoen)
* ceph-volume: do not use stdin in luminous (issue#25173, issue#23260, 
pr#23367, Alfredo Deza)
* ceph-volume enable the ceph-osd during lvm activation (issue#24152, pr#23394, 
Dan van der Ster, Alfredo Deza)
* ceph-volume expand on the LVM API to create multiple LVs at different sizes 
(issue#24020, pr#23395, Alfredo Deza)
* ceph-volume lvm.activate conditional mon-config on prime-osd-dir 
(issue#25216, pr#23397, Alfredo Deza)
* ceph-volume lvm.batch remove non-existent sys_api property (issue#34310, 
pr#23811, Alfredo Deza)
* ceph-volume lvm.listing only include devices if they exist (issue#24952, 
pr#23150, Alfredo Deza)
* ceph-volume: process.call with stdin in Python 3 fix (issue#24993, pr#23238, 
Alfredo Deza)
* ceph-volume: PVolumes.get() should return one PV when using name or uuid 
(issue#24784, pr#23329, Andrew Schoen)
* ceph-volume: refuse to zap mapper devices (issue#24504, pr#23374, Andrew 
Schoen)
* ceph-volume: tests.functional inherit SSH_ARGS from ansible (issue#34311, 
pr#23813, Alfredo Deza)
* ceph-volume tests/functional run lvm list after OSD provisioning 
(issue#24961, pr#23147, Alfredo Deza)
* ceph-volume: unmount lvs correctly before zapping (issue#24796, pr#23128, 
Andrew Schoen)
* ceph-volume: update batch documentation to explain filestore strategies 
(issue#34309, pr#23825, Alfredo Deza)
* change default filestore_merge_threshold to -10 (issue#24686, pr#22814, 
Douglas Fuller)
* client: add inst to asok status output (issue#24724, pr#23107, Patrick 
Donnelly)
* client: fixup parallel calls to ceph_ll_lookup_inode() in NFS FASL 
(issue#22683, pr#23012, huanwen ren)
* client: increase verbosity level for log messages in helper methods 
(issue#21014, pr#23014, Rishabh Dave)
* client:  update inode fields according to issued caps (issue#24269, pr#22783, 
"Yan, Zheng")
* common: Abort in OSDMap::decode() during 
qa/standalone/erasure-code/test-erasure-eio.sh (issue#23492, pr#23025, Sage 
Weil)
* common/DecayCounter: set last_decay to current time when decoding decay 
counter (issue#24440, pr#22779, Zhi Zhang)
* doc: ceph-bluestore-tool manpage not getting rendered correctly (issue#24800, 
pr#23177, Nathan Cutler)
* filestore: add pgid in filestore pg dir split log message (issue#24878, 
pr#23454, Vikhyat Umrao)
* let "ceph status" use base 10 when printing numbers not sizes (issue#22095, 
pr#22680, Jan Fajerski, Kefu Chai)
* librados: fix

Re: [ceph-users] Luminous RGW errors at start

2018-09-04 Thread Robert Stanford
 This was the issue (could not create the pool, because it would have
exceeded the new (luminous) limitation on pgs /osd.

On Tue, Sep 4, 2018 at 10:35 AM David Turner  wrote:

> I was confused what could be causing this until Janne's email.  I think
> they're correct that the cluster is preventing pool creation due to too
> many PGs per OSD.  Double check how many PGs you have in each pool and what
> your defaults are for that.
>
> On Mon, Sep 3, 2018 at 7:19 AM Janne Johansson 
> wrote:
>
>> Did you change the default pg_num or pgp_num so the pools that did show
>> up made it go past the mon_max_pg_per_osd ?
>>
>>
>> Den fre 31 aug. 2018 kl 17:20 skrev Robert Stanford <
>> rstanford8...@gmail.com>:
>>
>>>
>>>  I installed a new Luminous cluster.  Everything is fine so far.  Then I
>>> tried to start RGW and got this error:
>>>
>>> 2018-08-31 15:15:41.998048 7fc350271e80  0 rgw_init_ioctx ERROR:
>>> librados::Rados::pool_create returned (34) Numerical result out of range
>>> (this can be due to a pool or placement group misconfiguration, e.g. pg_num
>>> < pgp_num or mon_max_pg_per_osd exceeded)
>>> 2018-08-31 15:15:42.005732 7fc350271e80 -1 Couldn't init storage
>>> provider (RADOS)
>>>
>>>  I notice that the only pools that exist are the data and index RGW
>>> pools (no user or log pools like on Jewel).  What is causing this?
>>>
>>>  Thank you
>>>  R
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> --
>> May the most significant bit of your life be positive.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous new OSD being over filled

2018-09-04 Thread David Turner
Instead of manually weighting the OSDs, you can use the mgr module to
slowly add the OSDs and balance your cluster at the same time.  I believe
you can control the module by telling it a maximum percent of misplaced
objects, or other similar metrics, to control adding in the OSD, while also
preventing your cluster from being poorly balanced.

On Mon, Sep 3, 2018 at 12:08 PM David C  wrote:

> Hi Marc
>
> I like that approach although I think I'd go in smaller weight increments.
>
> Still a bit confused by the behaviour I'm seeing, it looks like I've got
> things weighted correctly. Redhat's docs recommend doing an OSD at a time
> and I'm sure that's how I've done it on other clusters in the past although
> they would have been running older versions.
>
> Thanks,
>
> On Mon, Sep 3, 2018 at 1:45 PM Marc Roos  wrote:
>
>>
>>
>> I am adding a node like this, I think it is more efficient, because in
>> your case you will have data being moved within the added node (between
>> the newly added osd's there). So far no problems with this.
>>
>> Maybe limit your
>> ceph tell osd.* injectargs --osd_max_backfills=X
>> Because pg's being moved are taking space until the move is completed.
>>
>> sudo -u ceph ceph osd crush reweight osd.23 1 (all osd's in the node)
>> sudo -u ceph ceph osd crush reweight osd.24 1
>> sudo -u ceph ceph osd crush reweight osd.25 1
>> sudo -u ceph ceph osd crush reweight osd.26 1
>> sudo -u ceph ceph osd crush reweight osd.27 1
>> sudo -u ceph ceph osd crush reweight osd.28 1
>> sudo -u ceph ceph osd crush reweight osd.29 1
>>
>> And then after recovery
>>
>> sudo -u ceph ceph osd crush reweight osd.23 2
>> sudo -u ceph ceph osd crush reweight osd.24 2
>> sudo -u ceph ceph osd crush reweight osd.25 2
>> sudo -u ceph ceph osd crush reweight osd.26 2
>> sudo -u ceph ceph osd crush reweight osd.27 2
>> sudo -u ceph ceph osd crush reweight osd.28 2
>> sudo -u ceph ceph osd crush reweight osd.29 2
>>
>> Etc etc
>>
>>
>> -Original Message-
>> From: David C [mailto:dcsysengin...@gmail.com]
>> Sent: maandag 3 september 2018 14:34
>> To: ceph-users
>> Subject: [ceph-users] Luminous new OSD being over filled
>>
>> Hi all
>>
>>
>> Trying to add a new host to a Luminous cluster, I'm doing one OSD at a
>> time. I've only added one so far but it's getting too full.
>>
>> The drive is the same size (4TB) as all others in the cluster, all OSDs
>> have crush weight of 3.63689. Average usage on the drives is 81.70%
>>
>>
>> With the new OSD I start with a crush weight 0 and steadily increase.
>> It's currently crush weight 3.0 and is 94.78% full. If I increase to
>> 3.63689 it's going to hit too full.
>>
>>
>> It's been a while since I've added a host to an existing cluster. Any
>> idea why the drive is getting too full? Do I just have to leave this one
>> with a lower crush weight and then continue adding the drives and then
>> eventually even out the crush weights?
>>
>> Thanks
>> David
>>
>>
>>
>>
>>
>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous RGW errors at start

2018-09-04 Thread David Turner
I was confused what could be causing this until Janne's email.  I think
they're correct that the cluster is preventing pool creation due to too
many PGs per OSD.  Double check how many PGs you have in each pool and what
your defaults are for that.

On Mon, Sep 3, 2018 at 7:19 AM Janne Johansson  wrote:

> Did you change the default pg_num or pgp_num so the pools that did show up
> made it go past the mon_max_pg_per_osd ?
>
>
> Den fre 31 aug. 2018 kl 17:20 skrev Robert Stanford <
> rstanford8...@gmail.com>:
>
>>
>>  I installed a new Luminous cluster.  Everything is fine so far.  Then I
>> tried to start RGW and got this error:
>>
>> 2018-08-31 15:15:41.998048 7fc350271e80  0 rgw_init_ioctx ERROR:
>> librados::Rados::pool_create returned (34) Numerical result out of range
>> (this can be due to a pool or placement group misconfiguration, e.g. pg_num
>> < pgp_num or mon_max_pg_per_osd exceeded)
>> 2018-08-31 15:15:42.005732 7fc350271e80 -1 Couldn't init storage provider
>> (RADOS)
>>
>>  I notice that the only pools that exist are the data and index RGW pools
>> (no user or log pools like on Jewel).  What is causing this?
>>
>>  Thank you
>>  R
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Luminous - journal setting

2018-09-04 Thread David Turner
Are you planning on using bluestore or filestore?  The settings for
filestore haven't changed.  If you're planning to use bluestore there is a
lot of documentation in the ceph docs as well as a wide history of
questions like this on the ML.

On Mon, Sep 3, 2018 at 5:24 AM M Ranga Swami Reddy 
wrote:

> Hi  - I am using the Ceph Luminous release. here what are the OSD
> journal settings needed for OSD?
> NOTE: I used SSDs for journal till Jewel release.
>
> Thanks
> Swami
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS small files overhead

2018-09-04 Thread andrew w goussakovski
Hello

We are trying to use cephfs as storage for web graphics, such as
thumbnails and so on.
Is there any way to reduse overhead on storage? On test cluster we have
1 fs, 2 pools (meta and data) with replica size = 2

    objects: 1.02 M objects, 1.1 GiB
    usage:   144 GiB used, 27 GiB / 172 GiB avail

So we have (144/2)/1.1*100%=6500% overhead.

ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
(stable)
osd storage - bluestore (changing bluestore_min_alloc_size makes no
visible effect)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] data_extra_pool for RGW Luminous still needed?

2018-09-04 Thread Casey Bodley




On 09/03/2018 10:07 PM, Nhat Ngo wrote:


Hi all,


I am new to Ceph and we are setting up a new RadosGW and Ceph storage 
cluster on Luminous. We are using only EC for our `buckets.data` pool 
at the moment.



However, I just read the Red Hat Ceph object Gateway for Production 
article and it mentions an extra  duplicated `buckets.non-ec` pool is 
needed for multi-part uploads because each multi-upload parts must be 
stored without EC. EC will only apply to the whole objects, not 
partial uploads. Is this still hold true for Luminous?



The data layout document on Ceph does not make any mention of non-ec pool:

http://docs.ceph.com/docs/luminous/radosgw/layout/


Thanks,

*Nhat Ngo* | DevOps Engineer

Cloud Research Team, University of Melbourne, 3010, VIC
*Email: *nhat.n...@unimelb.edu.au



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Nhat,

The data extra pool is still necessary for multipart uploads, yes. This 
extra non-ec pool is only used for the 'multipart metadata' object that 
tracks which parts have been written, though - the object data for each 
part is still written to the normal data pool, so it can take advantage 
of erasure coding.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Degraded data redundancy: NUM pgs undersized

2018-09-04 Thread Jörg Kastning

Hello Lothar,

Thanks for your reply.

Am 04.09.2018 um 11:20 schrieb Lothar Gesslein:

By pure chance 15 pgs are now actually replicated to all 3 osds, so they
have enough copies (clean). But the placement is "wrong", it would like
to move the data to different osds (remapped) if possible.


That seems to be correct. I've added a third bucket of type datacenter 
and moved on host bucket so that each datacenter has one host with one 
osd. The PGs were rebalanced (if that is the correct term) and status 
changed to HEALTH_OK with all PGs active+clean.


Now I moved the host in dc2 to another datacenter and removed dc2 from 
the CRUSH map. Now I have all PGs active+clean+remapped. So now your 
next statement applies:



It replicated to 2 osds in the initial placement but wasn't able to find
a suitable third osd. Then by increasing pgp_num it recalculated the
placement, again selected two osds and moved the data there. It won't
remove the data from the "wrong" osd until it has a new place for it, so
you end up with three copies, but remapped pgs.


Ok, I think I got this.



  3. What's wrong here and what do I have to do to get the cluster back
to active+clean, again?


I guess you want to have "two copies in dc1, one copy in dc2"?

If you stay with only 3 osds that is the only way to distribute 3
objects anyway, so you don't need any crush rule.

What your crush rule is currently expressing is

"in the default root, select n buckets (where n is the pool size, 3 in
this case) of type datacenter, select one leaf (meaning osd) in each
datacenter". You only have 2 datacenter buckets, so that will only ever
select 2 osds.


If your cluster is going to grow to at least 2 osds in each dc, you can
go with

http://cephnotes.ksperis.com/blog/2017/01/23/crushmap-for-2-dc/

I would translate this crush rule as

"in the default root, select 2 buckets of type datacenter, select n-1
(where n is the pool size, so here 3-1 = 2) leafs in each datacenter"

You will need at least two osds in each dc for this, because it is
random (with respect to the weights) in which dc the 2 copies will be
placed and which gets the remaining copy.


I don't get it why I need to have at least two osds in each dc. Because 
I thought when I only have three osds it is implicit clear where to 
write the two copies.


In case I have two osds in each dc I would never know on which side the 
two copies of my three replicas are.


Let's try an example to check if my understanding of the matter is 
correct or not:


I have two dc dcA and dcB with two osds in each dc. Due to the random 
placement two copies of object A are written in dcA and one in dcB. From 
the next object B two copies are written in dcB and one in dcA.


In case I have two osds in dcA and only one in dcB the two copies of an 
object are written to dcA every time and only one copy in dcB.


Did I get it right?

Best regards,
Joerg




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "no valid command found" when running "ceph-deploy osd create"

2018-09-04 Thread Alfredo Deza
On Sun, Sep 2, 2018 at 3:01 PM, David Wahler  wrote:
> On Sun, Sep 2, 2018 at 1:31 PM Alfredo Deza  wrote:
>>
>> On Sun, Sep 2, 2018 at 12:00 PM, David Wahler  wrote:
>> > Ah, ceph-volume.log pointed out the actual problem:
>> >
>> > RuntimeError: Cannot use device (/dev/storage/bluestore). A vg/lv path
>> > or an existing device is needed
>>
>> That is odd, is it possible that the error log wasn't the one that
>> matched what you saw on ceph-deploy's end?
>>
>> Usually ceph-deploy will just receive whatever ceph-volume produced.
>
> I tried again, running ceph-volume directly this time, just to see if
> I had mixed anything up. It looks like ceph-deploy is correctly
> reporting the output of ceph-volume. The problem is that ceph-volume
> only writes the relevant error message to the log file, and not to its
> stdout/stderr.
>
> Console output:
>
> rock64@rockpro64-1:~/my-cluster$ sudo ceph-volume --cluster ceph lvm
> create --bluestore --data /dev/storage/foobar
> Running command: /usr/bin/ceph-authtool --gen-print-key
> Running command: /usr/bin/ceph --cluster ceph --name
> client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
> e7dd6d45-b556-461c-bad1-83d98a5a1afa
> --> Was unable to complete a new OSD, will rollback changes
> Running command: /usr/bin/ceph --cluster ceph --name
> client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.1
> --yes-i-really-mean-it
>  stderr: no valid command found; 10 closest matches:
> [...etc...]
>
> ceph-volume.log:
>
> [2018-09-02 18:49:21,415][ceph_volume.main][INFO  ] Running command:
> ceph-volume --cluster ceph lvm create --bluestore --data
> /dev/storage/foobar
> [2018-09-02 18:49:21,423][ceph_volume.process][INFO  ] Running
> command: /usr/bin/ceph-authtool --gen-print-key
> [2018-09-02 18:49:26,664][ceph_volume.process][INFO  ] stdout
> AQCxMIxb+SezJRAAGAP/HHtHLVbciSQnZ/c/qw==
> [2018-09-02 18:49:26,668][ceph_volume.process][INFO  ] Running
> command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
> e7dd6d45-b556-461c-bad1-83d98a5a1afa
> [2018-09-02 18:49:27,685][ceph_volume.process][INFO  ] stdout 1
> [2018-09-02 18:49:27,686][ceph_volume.process][INFO  ] Running
> command: /bin/lsblk --nodeps -P -o
> NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL
> /dev/storage/foobar
> [2018-09-02 18:49:27,707][ceph_volume.process][INFO  ] stdout
> NAME="storage-foobar" KNAME="dm-1" MAJ:MIN="253:1" FSTYPE=""
> MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="0" MODEL="" SIZE="100G"
> STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw"
> ALIGNMENT="0" PHY-SEC="4096" LOG-SEC="512" ROTA="1" SCHED=""
> TYPE="lvm" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0"
> PKNAME="" PARTLABEL=""
> [2018-09-02 18:49:27,708][ceph_volume.process][INFO  ] Running
> command: /bin/lsblk --nodeps -P -o
> NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL
> /dev/storage/foobar
> [2018-09-02 18:49:27,720][ceph_volume.process][INFO  ] stdout
> NAME="storage-foobar" KNAME="dm-1" MAJ:MIN="253:1" FSTYPE=""
> MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="0" MODEL="" SIZE="100G"
> STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw"
> ALIGNMENT="0" PHY-SEC="4096" LOG-SEC="512" ROTA="1" SCHED=""
> TYPE="lvm" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0"
> PKNAME="" PARTLABEL=""
> [2018-09-02 18:49:27,720][ceph_volume.devices.lvm.prepare][ERROR ] lvm
> prepare was unable to complete
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py",
> line 216, in safe_prepare
> self.prepare(args)
>   File "/usr/lib/python2.7/dist-packages/ceph_volume/decorators.py",
> line 16, in is_root
> return func(*a, **kw)
>   File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py",
> line 283, in prepare
> block_lv = self.prepare_device(args.data, 'block', cluster_fsid, osd_fsid)
>   File "/usr/lib/python2.7/dist-packages/ceph_volume/devices/lvm/prepare.py",
> line 206, in prepare_device
> raise RuntimeError(' '.join(error))
> RuntimeError: Cannot use device (/dev/storage/foobar). A vg/lv path or
> an existing device is needed
> [2018-09-02 18:49:27,722][ceph_volume.devices.lvm.prepare][INFO  ]
> will rollback OSD ID creation
> [2018-09-02 18:49:27,723][ceph_volume.process][INFO  ] Running
> command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.1
> --yes-i-really-mean-it
> [2018-09-02 18:49:28,425][ceph_volume.process][INFO  ] stderr no valid
> command found; 10 closest matches:
> [...etc...]

This is a bug. Thanks for di

Re: [ceph-users] SSD OSDs crashing after upgrade to 12.2.7

2018-09-04 Thread Alfredo Deza
On Tue, Sep 4, 2018 at 3:59 AM, Wolfgang Lendl
 wrote:
> is downgrading from 12.2.7 to 12.2.5 an option? - I'm still suffering
> from high frequent osd crashes.
> my hopes are with 12.2.9 - but hope wasn't always my best strategy

12.2.8 just went out. I think that Adam or Radoslaw might have some
time to check those logs now

>
> br
> wolfgang
>
> On 2018-08-30 19:18, Alfredo Deza wrote:
>> On Thu, Aug 30, 2018 at 5:24 AM, Wolfgang Lendl
>>  wrote:
>>> Hi Alfredo,
>>>
>>>
>>> caught some logs:
>>> https://pastebin.com/b3URiA7p
>> That looks like there is an issue with bluestore. Maybe Radoslaw or
>> Adam might know a bit more.
>>
>>
>>> br
>>> wolfgang
>>>
>>> On 2018-08-29 15:51, Alfredo Deza wrote:
 On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl
  wrote:
> Hi,
>
> after upgrading my ceph clusters from 12.2.5 to 12.2.7  I'm experiencing 
> random crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not 
> affected.
> I destroyed and recreated some of the SSD OSDs which seemed to help.
>
> this happens on centos 7.5 (different kernels tested)
>
> /var/log/messages:
> Aug 29 10:24:08  ceph-osd: *** Caught signal (Segmentation fault) **
> Aug 29 10:24:08  ceph-osd: in thread 7f8a8e69e700 
> thread_name:bstore_kv_final
> Aug 29 10:24:08  kernel: traps: bstore_kv_final[187470] general 
> protection ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in 
> libtcmalloc.so.4.4.5[7f8a997a8000+46000]
> Aug 29 10:24:08  systemd: ceph-osd@2.service: main process exited, 
> code=killed, status=11/SEGV
> Aug 29 10:24:08  systemd: Unit ceph-osd@2.service entered failed state.
> Aug 29 10:24:08  systemd: ceph-osd@2.service failed.
> Aug 29 10:24:28  systemd: ceph-osd@2.service holdoff time over, 
> scheduling restart.
> Aug 29 10:24:28  systemd: Starting Ceph object storage daemon osd.2...
> Aug 29 10:24:28  systemd: Started Ceph object storage daemon osd.2.
> Aug 29 10:24:28  ceph-osd: starting osd.2 at - osd_data 
> /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
> Aug 29 10:24:35  ceph-osd: *** Caught signal (Segmentation fault) **
> Aug 29 10:24:35  ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp
> Aug 29 10:24:35  kernel: traps: tp_osd_tp[186933] general protection 
> ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in 
> libtcmalloc.so.4.4.5[7f5f430cd000+46000]
> Aug 29 10:24:35  systemd: ceph-osd@0.service: main process exited, 
> code=killed, status=11/SEGV
> Aug 29 10:24:35  systemd: Unit ceph-osd@0.service entered failed state.
> Aug 29 10:24:35  systemd: ceph-osd@0.service failed
 These systemd messages aren't usually helpful, try poking around
 /var/log/ceph/ for the output on that one OSD.

 If those logs aren't useful either, try bumping up the verbosity (see
 http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time
 )
> did I hit a known issue?
> any suggestions are highly appreciated
>
>
> br
> wolfgang
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>>> --
>>> Wolfgang Lendl
>>> IT Systems & Communications
>>> Medizinische Universität Wien
>>> Spitalgasse 23 / BT 88 /Ebene 00
>>> A-1090 Wien
>>> Tel: +43 1 40160-21231
>>> Fax: +43 1 40160-921200
>>>
>>>
>
> --
> Wolfgang Lendl
> IT Systems & Communications
> Medizinische Universität Wien
> Spitalgasse 23 / BT 88 /Ebene 00
> A-1090 Wien
> Tel: +43 1 40160-21231
> Fax: +43 1 40160-921200
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-09-04 Thread John Spray
It's mds_beacon_grace.  Set that on the monitor to control the
replacement of laggy MDS daemons, and usually also set it to the same
value on the MDS daemon as it's used there for the daemon to hold off
on certain tasks if it hasn't seen a mon beacon recently.

John
On Mon, Sep 3, 2018 at 9:26 AM William Lawton  wrote:
>
> Which configuration option determines the MDS timeout period?
>
>
>
> William Lawton
>
>
>
> From: Gregory Farnum 
> Sent: Thursday, August 30, 2018 5:46 PM
> To: William Lawton 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] MDS does not always failover to hot standby on 
> reboot
>
>
>
> Yes, this is a consequence of co-locating the MDS and monitors — if the MDS 
> reports to its co-located monitor and both fail, the monitor cluster has to 
> go through its own failure detection and then wait for a full MDS timeout 
> period after that before it marks the MDS down. :(
>
>
>
> We might conceivably be able to optimize for this, but there's not a general 
> solution. If you need to co-locate, one thing that would make it better 
> without being a lot of work is trying to have the MDS connect to one of the 
> monitors on a different host. You can do that by just restricting the list of 
> monitors you feed it in the ceph.conf, although it's not a guarantee that 
> will *prevent* it from connecting to its own monitor if there are failures or 
> reconnects after first startup.
>
> -Greg
>
> On Thu, Aug 30, 2018 at 8:38 AM William Lawton  
> wrote:
>
> Hi.
>
>
>
> We have a 5 node Ceph cluster (refer to ceph -s output at bottom of email). 
> During resiliency tests we have an occasional problem when we reboot the 
> active MDS instance and a MON instance together i.e.  dub-sitv-ceph-02 and 
> dub-sitv-ceph-04. We expect the MDS to failover to the standby instance 
> dub-sitv-ceph-01 which is in standby-replay mode, and 80% of the time it does 
> with no problems. However, 20% of the time it doesn’t and the MDS_ALL_DOWN 
> health check is not cleared until 30 seconds later when the rebooted 
> dub-sitv-ceph-02 and dub-sitv-ceph-04 instances come back up.
>
>
>
> When the MDS successfully fails over to the standby we see in the ceph.log 
> the following:
>
>
>
> 2018-08-25 00:30:02.231811 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 50 : 
> cluster [ERR] Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
>
> 2018-08-25 00:30:02.237389 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 52 : 
> cluster [INF] Standby daemon mds.dub-sitv-ceph-01 assigned to filesystem 
> cephfs as rank 0
>
> 2018-08-25 00:30:02.237528 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 54 : 
> cluster [INF] Health check cleared: MDS_ALL_DOWN (was: 1 filesystem is 
> offline)
>
>
>
> When the active MDS role does not failover to the standby the MDS_ALL_DOWN 
> check is not cleared until after the rebooted instances have come back up 
> e.g.:
>
>
>
> 2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 55 : 
> cluster [ERR] Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
>
> 2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2 10.18.186.208:6789/0 
> 226 : cluster [INF] mon.dub-sitv-ceph-05 calling monitor election
>
> 2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 56 : 
> cluster [INF] mon.dub-sitv-ceph-03 calling monitor election
>
> 2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 57 : 
> cluster [INF] mon.dub-sitv-ceph-03 is new leader, mons 
> dub-sitv-ceph-03,dub-sitv-ceph-05 in quorum (ranks 0,2)
>
> 2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 62 : 
> cluster [WRN] Health check failed: 1/3 mons down, quorum 
> dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN)
>
> 2018-08-25 03:30:09.258693 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 63 : 
> cluster [WRN] overall HEALTH_WARN 2 osds down; 2 hosts (2 osds) down; 1/3 
> mons down, quorum dub-sitv-ceph-03,dub-sitv-ceph-05
>
> 2018-08-25 03:30:10.254162 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 64 : 
> cluster [WRN] Health check failed: Reduced data availability: 2 pgs inactive, 
> 115 pgs peering (PG_AVAILABILITY)
>
> 2018-08-25 03:30:12.429145 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 66 : 
> cluster [WRN] Health check failed: Degraded data redundancy: 712/2504 objects 
> degraded (28.435%), 86 pgs degraded (PG_DEGRADED)
>
> 2018-08-25 03:30:16.137408 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 67 : 
> cluster [WRN] Health check update: Reduced data availability: 1 pg inactive, 
> 69 pgs peering (PG_AVAILABILITY)
>
> 2018-08-25 03:30:17.193322 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 68 : 
> cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data 
> availability: 1 pg inactive, 69 pgs peering)
>
> 2018-08-25 03:30:18.432043 mon.dub-sitv-ceph-03 mon.0 10.18.53.32:6789/0 69 : 
> cluster [WRN] Health check update: Degraded data redundancy: 1286/2572 
> objects degraded (50.000%), 166 pgs degraded (PG_DEGRADED)
>
> 2018

Re: [ceph-users] Degraded data redundancy: NUM pgs undersized

2018-09-04 Thread Lothar Gesslein
On 09/04/2018 09:47 AM, Jörg Kastning wrote:
> My questions are:
> 
>  1. What does active+undersized actually mean? I did not find anything
> about it in the documentation on docs.ceph.com.

http://docs.ceph.com/docs/master/rados/operations/pg-states/

active
Ceph will process requests to the placement group.

undersized
The placement group has fewer copies than the configured pool
replication level.


Your crush map/rules and osds do not allow to have all pgs on three
"independent" osds, so pgs have fewer copies than configured.

>  2. Why are only 15 PGs were getting remapped after I've corrected the
> mistake with the wrong pgp_num value?

By pure chance 15 pgs are now actually replicated to all 3 osds, so they
have enough copies (clean). But the placement is "wrong", it would like
to move the data to different osds (remapped) if possible.

It replicated to 2 osds in the initial placement but wasn't able to find
a suitable third osd. Then by increasing pgp_num it recalculated the
placement, again selected two osds and moved the data there. It won't
remove the data from the "wrong" osd until it has a new place for it, so
you end up with three copies, but remapped pgs.

>  3. What's wrong here and what do I have to do to get the cluster back
> to active+clean, again?

I guess you want to have "two copies in dc1, one copy in dc2"?

If you stay with only 3 osds that is the only way to distribute 3
objects anyway, so you don't need any crush rule.

What your crush rule is currently expressing is

"in the default root, select n buckets (where n is the pool size, 3 in
this case) of type datacenter, select one leaf (meaning osd) in each
datacenter". You only have 2 datacenter buckets, so that will only ever
select 2 osds.


If your cluster is going to grow to at least 2 osds in each dc, you can
go with

http://cephnotes.ksperis.com/blog/2017/01/23/crushmap-for-2-dc/

I would translate this crush rule as

"in the default root, select 2 buckets of type datacenter, select n-1
(where n is the pool size, so here 3-1 = 2) leafs in each datacenter"

You will need at least two osds in each dc for this, because it is
random (with respect to the weights) in which dc the 2 copies will be
placed and which gets the remaining copy.


Best regards,
Lothar


-- 
Lothar Gesslein
Linux Consultant
Mail: gessl...@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd_journal_aio=false and performance

2018-09-04 Thread Rudenko Aleksandr
Hi, guys.

I made a few tests and i see that performance is better if 
osd_journal_aio=false for LV-journals.

Setup:
2 servers x 4 OSD (SATA HDD + journal on SSD LV)
12.2.5, filestore

  cluster:
id: ce305aae-4c56-41ec-be54-529b05eb45ed
health: HEALTH_OK

  services:
mon: 2 daemons, quorum a,b
mgr: a(active), standbys: b
osd: 8 osds: 8 up, 8 in

  data:
pools:   1 pools, 512 pgs
objects: 0 objects, 0 bytes
usage:   904 MB used, 11440 GB / 11441 GB avail
pgs: 512 active+clean


0 objects before each test.

I used rados bench from two servers in parralel:

2x of:
rados bench -p test 30 write -b 1M -t 32

I ran each test four times.

file-journal on XFS FS, dio=1, aio=0:

Average IOPS:   102.5
Average Latency(s):   0.30

LV-journal, dio=1, aio=1:

Average IOPS:   96.5
Average Latency(s):   32.5

LV-journal, dio=1, aio=0:

Average IOPS:   104
Average Latency(s):   0.30


Is It safe to disable aio on LV journals? Is it make sence?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD OSDs crashing after upgrade to 12.2.7

2018-09-04 Thread Wolfgang Lendl
is downgrading from 12.2.7 to 12.2.5 an option? - I'm still suffering
from high frequent osd crashes.
my hopes are with 12.2.9 - but hope wasn't always my best strategy

br
wolfgang

On 2018-08-30 19:18, Alfredo Deza wrote:
> On Thu, Aug 30, 2018 at 5:24 AM, Wolfgang Lendl
>  wrote:
>> Hi Alfredo,
>>
>>
>> caught some logs:
>> https://pastebin.com/b3URiA7p
> That looks like there is an issue with bluestore. Maybe Radoslaw or
> Adam might know a bit more.
>
>
>> br
>> wolfgang
>>
>> On 2018-08-29 15:51, Alfredo Deza wrote:
>>> On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl
>>>  wrote:
 Hi,

 after upgrading my ceph clusters from 12.2.5 to 12.2.7  I'm experiencing 
 random crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not 
 affected.
 I destroyed and recreated some of the SSD OSDs which seemed to help.

 this happens on centos 7.5 (different kernels tested)

 /var/log/messages:
 Aug 29 10:24:08  ceph-osd: *** Caught signal (Segmentation fault) **
 Aug 29 10:24:08  ceph-osd: in thread 7f8a8e69e700 
 thread_name:bstore_kv_final
 Aug 29 10:24:08  kernel: traps: bstore_kv_final[187470] general protection 
 ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in 
 libtcmalloc.so.4.4.5[7f8a997a8000+46000]
 Aug 29 10:24:08  systemd: ceph-osd@2.service: main process exited, 
 code=killed, status=11/SEGV
 Aug 29 10:24:08  systemd: Unit ceph-osd@2.service entered failed state.
 Aug 29 10:24:08  systemd: ceph-osd@2.service failed.
 Aug 29 10:24:28  systemd: ceph-osd@2.service holdoff time over, scheduling 
 restart.
 Aug 29 10:24:28  systemd: Starting Ceph object storage daemon osd.2...
 Aug 29 10:24:28  systemd: Started Ceph object storage daemon osd.2.
 Aug 29 10:24:28  ceph-osd: starting osd.2 at - osd_data 
 /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
 Aug 29 10:24:35  ceph-osd: *** Caught signal (Segmentation fault) **
 Aug 29 10:24:35  ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp
 Aug 29 10:24:35  kernel: traps: tp_osd_tp[186933] general protection 
 ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in 
 libtcmalloc.so.4.4.5[7f5f430cd000+46000]
 Aug 29 10:24:35  systemd: ceph-osd@0.service: main process exited, 
 code=killed, status=11/SEGV
 Aug 29 10:24:35  systemd: Unit ceph-osd@0.service entered failed state.
 Aug 29 10:24:35  systemd: ceph-osd@0.service failed
>>> These systemd messages aren't usually helpful, try poking around
>>> /var/log/ceph/ for the output on that one OSD.
>>>
>>> If those logs aren't useful either, try bumping up the verbosity (see
>>> http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time
>>> )
 did I hit a known issue?
 any suggestions are highly appreciated


 br
 wolfgang



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> --
>> Wolfgang Lendl
>> IT Systems & Communications
>> Medizinische Universität Wien
>> Spitalgasse 23 / BT 88 /Ebene 00
>> A-1090 Wien
>> Tel: +43 1 40160-21231
>> Fax: +43 1 40160-921200
>>
>>

-- 
Wolfgang Lendl
IT Systems & Communications
Medizinische Universität Wien
Spitalgasse 23 / BT 88 /Ebene 00
A-1090 Wien
Tel: +43 1 40160-21231
Fax: +43 1 40160-921200




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Degraded data redundancy: NUM pgs undersized

2018-09-04 Thread Jörg Kastning

Good morning folks,

As a newbie to Ceph yesterday was the first time I've configured my 
CRUSH map, added a CRUSH rule and created my first pool using this rule.


Since then I get the status HEALTH_WARN with the following output:

~~~
$ sudo ceph status
  cluster:
id: 47c108bd-db66-4197-96df-cadde9e9eb45
health: HEALTH_WARN
Degraded data redundancy: 128 pgs undersized
1 pools have pg_num > pgp_num

  services:
mon: 3 daemons, quorum ccp-tcnm01,ccp-tcnm02,ccp-tcnm03
mgr: ccp-tcnm01(active), standbys: ccp-tcnm03, ccp-tcnm02
osd: 3 osds: 3 up, 3 in

  data:
pools:   1 pools, 128 pgs
objects: 0 objects, 0 bytes
usage:   3088 MB used, 3068 GB / 3071 GB avail
pgs: 128 active+undersized
~~~

The pool was created running `sudo ceph osd pool create joergsfirstpool 
128 replicated replicate_datacenter`.


I've figured out that I forgot to set the value for the key pgp_num 
accordingly. So I've done that by running `sudo ceph osd pool set 
joergsfirstpool pgp_num 128`. As you could see in the following output 
15 PGs were remapped but 113 still remain in active+undersized.


~~~
$ sudo ceph status
  cluster:
id: 47c108bd-db66-4197-96df-cadde9e9eb45
health: HEALTH_WARN
Degraded data redundancy: 113 pgs undersized

  services:
mon: 3 daemons, quorum ccp-tcnm01,ccp-tcnm02,ccp-tcnm03
mgr: ccp-tcnm01(active), standbys: ccp-tcnm03, ccp-tcnm02
osd: 3 osds: 3 up, 3 in; 15 remapped pgs

  data:
pools:   1 pools, 128 pgs
objects: 0 objects, 0 bytes
usage:   3089 MB used, 3068 GB / 3071 GB avail
pgs: 113 active+undersized
 15  active+clean+remapped
~~~

My questions are:

 1. What does active+undersized actually mean? I did not find anything 
about it in the documentation on docs.ceph.com.


 2. Why are only 15 PGs were getting remapped after I've corrected the 
mistake with the wrong pgp_num value?


 3. What's wrong here and what do I have to do to get the cluster back 
to active+clean, again?


For further information you could find my current CRUSH map below:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host ccp-tcnm01 {
id -5   # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 1.000
alg straw2
hash 0  # rjenkins1
item osd.1 weight 1.000
}
host ccp-tcnm03 {
id -7   # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 1.000
alg straw2
hash 0  # rjenkins1
item osd.2 weight 1.000
}
datacenter dc1 {
id -9   # do not change unnecessarily
id -12 class hdd# do not change unnecessarily
# weight 2.000
alg straw2
hash 0  # rjenkins1
item ccp-tcnm01 weight 1.000
item ccp-tcnm03 weight 1.000
}
host ccp-tcnm02 {
id -3   # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 1.000
alg straw2
hash 0  # rjenkins1
item osd.0 weight 1.000
}
datacenter dc3 {
id -10  # do not change unnecessarily
id -11 class hdd# do not change unnecessarily
# weight 1.000
alg straw2
hash 0  # rjenkins1
item ccp-tcnm02 weight 1.000
}
root default {
id -1   # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 3.000
alg straw2
hash 0  # rjenkins1
item dc1 weight 2.000
item dc3 weight 1.000
}

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule replicate_datacenter {
id 1
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type datacenter
step emit
}

# end crush map

Best regards,
Joerg



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com