[ceph-users] Expected IO in luminous Ceph Cluster

2019-06-06 Thread Stolte, Felix
Hello folks,

we are running a ceph cluster on Luminous consisting of 21 OSD Nodes with 9 8TB 
SATA drives and 3 Intel 3700 SSDs for Bluestore WAL and DB (1:3 Ratio). OSDs 
have 10Gb for Public and Cluster Network. The cluster is running stable for 
over a year. We didn’t had a closer look on IO until one of our customers 
started to complain about a VM we migrated from VMware with Netapp Storage to 
our Openstack Cloud with ceph storage. He sent us a sysbench report from the 
machine, which I could reproduce on other VMs as well as on a mounted RBD on 
physical hardware:

sysbench --file-fsync-freq=1 --threads=16 fileio --file-total-size=1G 
--file-test-mode=rndrw --file-rw-ratio=2 run
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 16
Initializing random number generator from current time

Extra file open flags: 0
128 files, 8MiB each
1GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 2.00
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test

File operations:
reads/s:  36.36
writes/s: 18.18
fsyncs/s: 2318.59

Throughput:
read, MiB/s:  0.57
written, MiB/s:   0.28

General statistics:
total time:  10.0071s
total number of events:  23755

Latency (ms):
 min:  0.01
 avg:  6.74
 max:   1112.58
 95th percentile: 26.68
 sum: 160022.67

Threads fairness:
events (avg/stddev):   1484.6875/52.59
execution time (avg/stddev):   10.0014/0.00

Are these numbers reasonable for a cluster of our size?

Best regards
Felix
IT-Services
Telefon 02461 61-9243
E-Mail: f.sto...@fz-juelich.de
-
-
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-
-
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Remove rbd image after interrupt of deletion command

2019-06-06 Thread Sakirnth Nagarasa
Hello,

Our ceph version is ceph nautilus (14.2.1).
We create periodically snapshots from an rbd image (50 TB). In order to
restore some data, we have cloned a snapshot.
To delete the snapshot we ran: rbd rm ${POOLNAME}/${IMAGE}

But it took very long to delete the image after half an hour it had only
1% progress. We thought it couldn't be because the creation of the clone
was pretty fast.
So we interrupted (SIGINT) the delete command. After doing some research
we found out its the normal deletion behavior.

The problem is that ceph does not recognize the image anymore. Even
though it is listed in rbd list we can't remove it.

rbd rm ${POOLNAME}/${IMAGE}
rbd: error opening image ${IMAGE}: (2) No such file or directory

Now how we can get rid of the image correctly.

Thanks
Sakirnth Nagarasa
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-06-06 Thread Xiaoxi Chen
We go with upstream release and mostly Nautilus now, probably the most
aggressive ones among serious production user (i.e tens of PB+ ),

I will vote for November for several reasons:

 1.   Q4 is holiday season and usually production rollout was blocked
, especially storage related change, which usually give team more time
to prepare/ testing/ LnP the new releases, as well as catch up with
new features.

 2.  Q4/Q1 is usually the planning season,  having the upstream
released and testing to know the readiness of new feature, will
greatly helps when planning the feature/offering of next year.

 3.  Users have whole year to migrate their
provision/monitoring/deployment/remediation system to new version, and
have enough time to fix and stable the surrounding system before next
holiday season.

Release in Feb or March will make the Q4 just in the middle of the
cycle, and lot of changes will land at last minutes(month),   in which
case, few things can be test and forecasted based on the state-of-art
in Q4.

-Xiaoxi

Linh Vu  于2019年6月6日周四 上午8:32写道:
>
> I think 12 months cycle is much better from the cluster operations 
> perspective. I also like March as a release month as well.
> 
> From: ceph-users  on behalf of Sage Weil 
> 
> Sent: Thursday, 6 June 2019 1:57 AM
> To: ceph-us...@ceph.com; ceph-de...@vger.kernel.org; d...@ceph.io
> Subject: [ceph-users] Changing the release cadence
>
> Hi everyone,
>
> Since luminous, we have had the follow release cadence and policy:
>  - release every 9 months
>  - maintain backports for the last two releases
>  - enable upgrades to move either 1 or 2 releases heads
>(e.g., luminous -> mimic or nautilus; mimic -> nautilus or octopus; ...)
>
> This has mostly worked out well, except that the mimic release received
> less attention that we wanted due to the fact that multiple downstream
> Ceph products (from Red Has and SUSE) decided to based their next release
> on nautilus.  Even though upstream every release is an "LTS" release, as a
> practical matter mimic got less attention than luminous or nautilus.
>
> We've had several requests/proposals to shift to a 12 month cadence. This
> has several advantages:
>
>  - Stable/conservative clusters only have to be upgraded every 2 years
>(instead of every 18 months)
>  - Yearly releases are more likely to intersect with downstream
>distribution release (e.g., Debian).  In the past there have been
>problems where the Ceph releases included in consecutive releases of a
>distro weren't easily upgradeable.
>  - Vendors that make downstream Ceph distributions/products tend to
>release yearly.  Aligning with those vendors means they are more likely
>to productize *every* Ceph release.  This will help make every Ceph
>release an "LTS" release (not just in name but also in terms of
>maintenance attention).
>
> So far the balance of opinion seems to favor a shift to a 12 month
> cycle[1], especially among developers, so it seems pretty likely we'll
> make that shift.  (If you do have strong concerns about such a move, now
> is the time to raise them.)
>
> That brings us to an important decision: what time of year should we
> release?  Once we pick the timing, we'll be releasing at that time *every
> year* for each release (barring another schedule shift, which we want to
> avoid), so let's choose carefully!
>
> A few options:
>
>  - November: If we release Octopus 9 months from the Nautilus release
>(planned for Feb, released in Mar) then we'd target this November.  We
>could shift to a 12 months candence after that.
>  - February: That's 12 months from the Nautilus target.
>  - March: That's 12 months from when Nautilus was *actually* released.
>
> November is nice in the sense that we'd wrap things up before the
> holidays.  It's less good in that users may not be inclined to install the
> new release when many developers will be less available in December.
>
> February kind of sucked in that the scramble to get the last few things
> done happened during the holidays.  OTOH, we should be doing what we can
> to avoid such scrambles, so that might not be something we should factor
> in.  March may be a bit more balanced, with a solid 3 months before when
> people are productive, and 3 months after before they disappear on holiday
> to address any post-release issues.
>
> People tend to be somewhat less available over the summer months due to
> holidays etc, so an early or late summer release might also be less than
> ideal.
>
> Thoughts?  If we can narrow it down to a few options maybe we could do a
> poll to gauge user preferences.
>
> Thanks!
> sage
>
>
> [1] 
> https://protect-au.mimecast.com/s/N1l6CROAEns1RN1Zu9Jwts?domain=twitter.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users 

[ceph-users] OSD hanging on 12.2.12 by message worker

2019-06-06 Thread Max Vernimmen
HI,


We are running VM images on ceph using RBD. We are seeing a problem where
one of our VMs gets into problems due to IO not completing. iostat on the
VM shows IO remaining in the queue, and disk utilisation for ceph based
vdisks is 100%.


Upon investigation the problem seems to be with the message worker
(msgr-worker-0) thread for one OSD. Restarting the OSD process fixes the
problem: the IO gets completed and the VM is unfrozen, life continues
without a problem. Until it happens again. Recurrence is between 30 minutes
and over 24 hours. The VMs affected are always in the same pool, but it's
not always the same VM that is affected. The problem occurs with different
VMs on different hypervisors. The problem occurs on different ceph nodes
and with different OSDs.


When the problem occurs, we see on the network a sudden jump from
<100mbit/sec to >4gbit/sec of continuous traffic. This traffic is between
the hypervisor and one OSD, in these cases always one of our HDD OSDs. The
traffic is not visible within the VM, only on the hypervisor.


If the client is rebooted,  the problem is gone. If the OSD  is restarted,
the problem is gone.

This is happening several times per day after we made several changes at
the same time:

   - add physical ram to the ceph nodes
   - move from fixed 'bluestore cache size hdd|sdd' and 'bluestore cache kv
   max' to 'bluestore cache autotune = 1' and 'osd memory target =
   20401094656'.
   - update ceph from 12.2.8 to 12.2.11
   - update clients from 12.2.8 to 12.2.11

We have since upgraded the ceph nodes to 12.2.12 but it did not help to fix
this problem.


My request is that someone takes a look at our findings below and can give
some insight into whether this is a bug, a misconfiguration or perhaps some
idea of where to take a closer look.


our setup is:

8 identical nodes, each with 4 HDDs (8TB, 7k rpm) and  6 SSDs (4TB). There
are a number of pools that using crush rules  map to either the HDDs  or
the SSDs. The pool that always has this problem is called 'prod_slow' and
goes to the HDDs.


I tracked down the osd by looking at the client port the client receives
most traffic from (all 4gbps is read traffic, outgoing from ceph, incoming
to client).


root@ceph-03:~# netstat -tlpn|grep 6804

tcp0  0 10.60.8.11:6804 0.0.0.0:*   LISTEN
3741/ceph-osd

tcp0  0 10.60.6.11:6804 0.0.0.0:*   LISTEN
3730/ceph-osd


root@ceph-03:~# ps uafx|grep 3730

ceph3730 44.3  6.3 20214604 16848524 ?   Ssl  Jun05 524:14
/usr/bin/ceph-osd -f --cluster ceph --id 23 --setuser ceph --setgroup ceph


root@ceph-03:~# ps -L -p3730

PID LWP TTY  TIME CMD

   37303730 ?00:00:05 ceph-osd

   37303778 ?00:00:00 log

   37303791 ?05:19:49 msgr-worker-0

   37303802 ?00:01:18 msgr-worker-1

   37303810 ?00:01:25 msgr-worker-2

   37303842 ?00:00:00 service

   37303845 ?00:00:00 admin_socket

   37304015 ?00:00:00 ceph-osd

   37304017 ?00:00:00 safe_timer

   37304018 ?00:00:03 safe_timer

   37304019 ?00:00:00 safe_timer

   37304020 ?00:00:00 safe_timer

   37304021 ?00:00:14 bstore_aio

   37304023 ?00:00:05 bstore_aio

   37304280 ?00:00:32 rocksdb:bg0

   37304634 ?00:00:00 dfin

   37304635 ?00:00:12 finisher

   37304636 ?00:00:51 bstore_kv_sync

   37304637 ?00:00:12 bstore_kv_final

   37304638 ?00:00:27 bstore_mempool

   37305803 ?00:03:08 ms_dispatch

   37305804 ?00:00:00 ms_local

   37305805 ?00:00:00 ms_dispatch

   37305806 ?00:00:00 ms_local

   37305807 ?00:00:00 ms_dispatch

   37305808 ?00:00:00 ms_local

   37305809 ?00:00:00 ms_dispatch

   37305810 ?00:00:00 ms_local

   37305811 ?00:00:00 ms_dispatch

   37305812 ?00:00:00 ms_local

   37305813 ?00:00:00 ms_dispatch

   37305814 ?00:00:00 ms_local

   37305815 ?00:00:00 ms_dispatch

   37305816 ?00:00:00 ms_local

   37305817 ?00:00:00 safe_timer

   37305818 ?00:00:00 fn_anonymous

   37305819 ?00:00:02 safe_timer

   37305820 ?00:00:00 tp_peering

   37305821 ?00:00:00 tp_peering

   37305822 ?00:00:00 fn_anonymous

   37305823 ?00:00:00 fn_anonymous

   37305824 ?00:00:00 safe_timer

   37305825 ?00:00:00 safe_timer

   37305826 ?00:00:00 safe_timer

   37305827 ?00:00:00 safe_timer

   37305828 ?00:00:00 osd_srv_agent

   37305829 ?00:01:15 tp_osd_tp

   37305830 ?00:01:27 tp_osd_tp

   37305831 ?00:01:40 tp_osd_tp

   37305832 ?00:00:49 

[ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-06 Thread BASSAGET Cédric
Hello,

I see messages related to REQUEST_SLOW a few times per day.

here's my ceph -s  :

root@ceph-pa2-1:/etc/ceph# ceph -s
  cluster:
id: 72d94815-f057-4127-8914-448dfd25f5bc
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph-pa2-1,ceph-pa2-2,ceph-pa2-3
mgr: ceph-pa2-3(active), standbys: ceph-pa2-1, ceph-pa2-2
osd: 6 osds: 6 up, 6 in

  data:
pools:   1 pools, 256 pgs
objects: 408.79k objects, 1.49TiB
usage:   4.44TiB used, 37.5TiB / 41.9TiB avail
pgs: 256 active+clean

  io:
client:   8.00KiB/s rd, 17.2MiB/s wr, 1op/s rd, 546op/s wr


Running ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
luminous (stable)

I've check :
- all my network stack : OK ( 2*10G LAG )
- memory usage : ok (256G on each host, about 2% used per osd)
- cpu usage : OK (Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz)
- disk status : OK (SAMSUNG   AREA7680S5xnNTRI  3P04 => samsung DC series)

I heard on IRC that it can be related to samsung PM / SM series.

Do anybody here is facing the same problem ? What can I do to solve that ?
Regards,
Cédric
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to remove ceph-mgr from a node

2019-06-06 Thread Vandeir Eduardo
Just for the record in case someone gets into this thread. The problem
related to ceph-mgr beeing started on another host than mgr active one
was because python-routes package was missing. In log, this was the
error messages displayed:

2019-06-05 11:04:48.800 7fed60097700 -1 log_channel(cluster) log [ERR]
: Unhandled exception from module 'dashboard' while running on
mgr.cephback2: No module named routes
2019-06-05 11:04:48.800 7fed60097700 -1 dashboard.serve:
2019-06-05 11:04:48.800 7fed60097700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/module.py", line 323, in serve
mapper, parent_urls = generate_routes(self.url_prefix)
  File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line
336, in generate_routes
mapper = cherrypy.dispatch.RoutesDispatcher()
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py",
line 515, in __init__
import routes
ImportError: No module named routes

An "apt install python-routes" on ceph-mgr nodes resolves the problem.

Im using Ubuntu 18.04 on ceph hosts and installed ceph-mgr nodes using
"ceph-deploy mgr create hostname". Seems like a package dependency is
missing.

On Wed, Jun 5, 2019 at 2:10 PM Vandeir Eduardo
 wrote:
>
> I am trying to resolve some kind of inconsistency.
>
> My ceph -s:
>   services:
> mon: 1 daemons, quorum cephback2 (age 22h)
> mgr: cephback2(active, since 28m), standbys: cephback1
> osd: 6 osds: 6 up (since 22h), 6 in (since 24h); 125 remapped pgs
>
> But when I do
>
> ceph mgr module enable dashboard
>
> It starts ceph-mgr listening on port 8443 in cephback1, instead of cephback2
>
> See:
> root@cephback1:/etc/ceph# lsof -i -P -n|grep ceph-mgr|grep LISTEN
> ceph-mgr  6832ceph   27u  IPv6  54536  0t0  TCP *:8443 
> (LISTEN)
>
> root@cephback2:/etc/ceph# lsof -i -P -n|grep ceph-mgr|grep LISTEN
> ceph-mgr  78871ceph   25u  IPv4 939321  0t0  TCP *:6812 
> (LISTEN)
> ceph-mgr  78871ceph   26u  IPv4 939335  0t0  TCP *:6813 
> (LISTEN)
>
> Shouldnt ceph-mgr, listening on port 8443, be started at cephback2,
> the active one?
>
> Output of 'ceph mgr services'
> root@cephback1:/etc/ceph# ceph mgr services
> {
> "dashboard": "https://cephback2.xxx.xx:8443/;
> }
>
> If I try to access https://cephback1.xxx.xx:8443, it redirect the browser to
> https://cephback2.xxx.xx:8443, what, obviously doesnt work.
>
> Seems like there is some kind of inconsistency between the active
> ceph-mgr node and where the dashboard is to be started...
>
> On Wed, Jun 5, 2019 at 11:47 AM Marc Roos  wrote:
> >
> >
> > What is wrong with?
> >
> > service ceph-mgr@c stop
> > systemctl disable ceph-mgr@c
> >
> >
> > -Original Message-
> > From: Vandeir Eduardo [mailto:vandeir.edua...@gmail.com]
> > Sent: woensdag 5 juni 2019 16:44
> > To: ceph-users
> > Subject: [ceph-users] How to remove ceph-mgr from a node
> >
> > Hi guys,
> >
> > sorry, but I'm not finding in documentation how to remove ceph-mgr from
> > a node. Is it possible?
> >
> > Thanks.
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD hanging on 12.2.12 by message worker

2019-06-06 Thread Stefan Kooman
Quoting Max Vernimmen (vernim...@textkernel.nl):
> 
> This is happening several times per day after we made several changes at
> the same time:
> 
>- add physical ram to the ceph nodes
>- move from fixed 'bluestore cache size hdd|sdd' and 'bluestore cache kv
>max' to 'bluestore cache autotune = 1' and 'osd memory target =
>20401094656'.
>- update ceph from 12.2.8 to 12.2.11
>- update clients from 12.2.8 to 12.2.11
> 
> We have since upgraded the ceph nodes to 12.2.12 but it did not help to fix
> this problem.

Have you tried the new bitmap allocator for the OSDs already (available
since 12.2.12):

[osd]

# MEMORY ALLOCATOR
bluestore_allocator = bitmap
bluefs_allocator = bitmap

The issues you are reporting sound like an issue many of us have seen on
luminous and mimic clusters and has been identified to be caused by the
"stupid allocator" memory allocator.

Gr. Stefan


-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] is rgw crypt default encryption key long term supported ?

2019-06-06 Thread Florian Engelmann

Am 5/28/19 um 5:37 PM schrieb Casey Bodley:


On 5/28/19 11:17 AM, Scheurer François wrote:

Hi Casey


I greatly appreciate your quick and helpful answer :-)


It's unlikely that we'll do that, but if we do it would be announced 
with a long deprecation period and migration strategy.

Fine, just the answer we wanted to hear ;-)



However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.

sse-kms is working great, no issue or gaps with it.
We already use it in our openstack (rocky) with barbican and 
ceph/radosgw (luminous).


But we have customers that want encryption by default, something like 
SSE-S3 (cf. below).

Do you know if there are plans to implement something similar?
I would love to see support for sse-s3. We've talked about building 
something around vault (which I think is what minio does?), but so far 
nobody has taken it up as a project.


What about accepting empty HTTP header "x-amz-server-side-encryption" or 
"x-amz-server-side-encryption: AES256" if


rgw crypt default encryption key =

is enabled. Even if this RadosGW "default encryption key" feature is not 
implemented the same way SSE-S3 is - still the data is encrypted by 
AES256. This would improve compatibility with the S3 API and client 
tools like s3cmd and awscli.





Using dm-crypt would cost too much time for the conversion (72x 8TB 
SATA disks...) .
And dm-crypt is also storing its key on the monitors (cf. 
https://www.spinics.net/lists/ceph-users/msg52402.html).



Best Regards
Francois Scheurer

Amazon SSE-3 description:

https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html 

Protecting Data Using Server-Side Encryption with Amazon S3-Managed 
Encryption Keys (SSE-S3)
Server-side encryption protects data at rest. Amazon S3 encrypts each 
object with a unique key. As an additional safeguard, it encrypts the 
key itself with a master key that it rotates regularly. Amazon S3 
server-side encryption uses one of the strongest block ciphers 
available, 256-bit Advanced Encryption Standard (AES-256), to encrypt 
your data.


https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTencryption.html 


The following is an example of the request body for setting SSE-S3.
xmlns="http://s3.amazonaws.com/doc/2006-03-01/;>

   
 
 AES256
 











From: Casey Bodley 
Sent: Tuesday, May 28, 2019 3:55 PM
To: Scheurer François; ceph-users@lists.ceph.com
Subject: Re: is rgw crypt default encryption key long term supported ?

Hi François,


Removing support for either of rgw_crypt_default_encryption_key or
rgw_crypt_s3_kms_encryption_keys would mean that objects encrypted with
those keys would no longer be accessible. It's unlikely that we'll do
that, but if we do it would be announced with a long deprecation period
and migration strategy.


However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.


Casey


[1]
https://ceph.com/community/new-mimic-centralized-configuration-management/ 



[2]
http://docs.ceph.com/docs/mimic/rados/configuration/ceph-conf/#monitor-configuration-database 




On 5/28/19 6:39 AM, Scheurer François wrote:

Dear Casey, Dear Ceph Users The following is written in the radosgw
documentation
(http://docs.ceph.com/docs/luminous/radosgw/encryption/): rgw crypt
default encryption key = 4YSmvJtBv0aZ7geVgAsdpRnLBEwWSWlMIGnRS8a9TSA=

   Important: This mode is for diagnostic purposes only! The ceph
configuration file is not a secure method for storing encryption keys.

 Keys that are accidentally exposed in this way should be
considered compromised.




Is the warning only about the key exposure risk or does it mean also
that the feature could be removed in future?


The is also another similar parameter "rgw crypt s3 kms encryption
keys" (cf. usage example in
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030679.html). 

 




Both parameters are still interesting (provided the ceph.conf is
encrypted) but we want to be sure that they will not be dropped in 
future.





Best Regards

Francois


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Expected IO in luminous Ceph Cluster

2019-06-06 Thread Marc Roos

I am also thinking of moving the wal/db to ssd of the sata hdd's. Did 
you do tests before and after this change, and know what the difference 
is iops? And is the advantage more or less when your sata hdd's are 
slower? 


-Original Message-
From: Stolte, Felix [mailto:f.sto...@fz-juelich.de] 
Sent: donderdag 6 juni 2019 10:47
To: ceph-users
Subject: [ceph-users] Expected IO in luminous Ceph Cluster

Hello folks,

we are running a ceph cluster on Luminous consisting of 21 OSD Nodes 
with 9 8TB SATA drives and 3 Intel 3700 SSDs for Bluestore WAL and DB 
(1:3 Ratio). OSDs have 10Gb for Public and Cluster Network. The cluster 
is running stable for over a year. We didn’t had a closer look on IO 
until one of our customers started to complain about a VM we migrated 
from VMware with Netapp Storage to our Openstack Cloud with ceph 
storage. He sent us a sysbench report from the machine, which I could 
reproduce on other VMs as well as on a mounted RBD on physical hardware:

sysbench --file-fsync-freq=1 --threads=16 fileio --file-total-size=1G 
--file-test-mode=rndrw --file-rw-ratio=2 run sysbench 1.0.11 (using 
system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 16
Initializing random number generator from current time

Extra file open flags: 0
128 files, 8MiB each
1GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 2.00 Periodic FSYNC 
enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test

File operations:
reads/s:  36.36
writes/s: 18.18
fsyncs/s: 2318.59

Throughput:
read, MiB/s:  0.57
written, MiB/s:   0.28

General statistics:
total time:  10.0071s
total number of events:  23755

Latency (ms):
 min:  0.01
 avg:  6.74
 max:   1112.58
 95th percentile: 26.68
 sum: 160022.67

Threads fairness:
events (avg/stddev):   1484.6875/52.59
execution time (avg/stddev):   10.0014/0.00

Are these numbers reasonable for a cluster of our size?

Best regards
Felix
IT-Services
Telefon 02461 61-9243
E-Mail: f.sto...@fz-juelich.de

-

-
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), 
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. 
Dr. Sebastian M. Schmidt

-

-
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Upgrading from luminous to nautilus using CentOS storage repos

2019-06-06 Thread Drew Weaver
Hello,

I built a tiny test cluster with Luminous using the CentOS storage repos.

I saw that they now have a nautilus repo as well but I can't find much 
information on upgrading from one to the other.

Does it make sense to continue using the CentOS storage repos or should I just 
switch to the official ceph repos and is there a way to swap between them 
without rebuilding a cluster from scratch?

Thank you,
-Drew
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-06 Thread Jason Dillaman
On Thu, Jun 6, 2019 at 5:07 AM Sakirnth Nagarasa
 wrote:
>
> Hello,
>
> Our ceph version is ceph nautilus (14.2.1).
> We create periodically snapshots from an rbd image (50 TB). In order to
> restore some data, we have cloned a snapshot.
> To delete the snapshot we ran: rbd rm ${POOLNAME}/${IMAGE}
>
> But it took very long to delete the image after half an hour it had only
> 1% progress. We thought it couldn't be because the creation of the clone
> was pretty fast.
> So we interrupted (SIGINT) the delete command. After doing some research
> we found out its the normal deletion behavior.
>
> The problem is that ceph does not recognize the image anymore. Even
> though it is listed in rbd list we can't remove it.
>
> rbd rm ${POOLNAME}/${IMAGE}
> rbd: error opening image ${IMAGE}: (2) No such file or directory
>
> Now how we can get rid of the image correctly.

Starting in Nautilus, we now first temporarily move an image to the
RBD trash when it's requested to be deleted. Interrupting that
operation should leave it in the trash, but "rbd rm" should have still
worked. Can you run "rbd trash ls --all --long" and see if your image
is listed?

> Thanks
> Sakirnth Nagarasa
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-06-06 Thread Dietmar Rieder
+1
Operators view: 12 months cycle is definitely better than 9. March seem
to be a reasonable compromise.

Best
  Dietmar

On 6/6/19 2:31 AM, Linh Vu wrote:
> I think 12 months cycle is much better from the cluster operations
> perspective. I also like March as a release month as well. 
> 
> *From:* ceph-users  on behalf of Sage
> Weil 
> *Sent:* Thursday, 6 June 2019 1:57 AM
> *To:* ceph-us...@ceph.com; ceph-de...@vger.kernel.org; d...@ceph.io
> *Subject:* [ceph-users] Changing the release cadence
>  
> Hi everyone,
> 
> Since luminous, we have had the follow release cadence and policy:  
>  - release every 9 months
>  - maintain backports for the last two releases
>  - enable upgrades to move either 1 or 2 releases heads
>    (e.g., luminous -> mimic or nautilus; mimic -> nautilus or octopus; ...)
> 
> This has mostly worked out well, except that the mimic release received
> less attention that we wanted due to the fact that multiple downstream
> Ceph products (from Red Has and SUSE) decided to based their next release
> on nautilus.  Even though upstream every release is an "LTS" release, as a
> practical matter mimic got less attention than luminous or nautilus.
> 
> We've had several requests/proposals to shift to a 12 month cadence. This
> has several advantages:
> 
>  - Stable/conservative clusters only have to be upgraded every 2 years
>    (instead of every 18 months)
>  - Yearly releases are more likely to intersect with downstream
>    distribution release (e.g., Debian).  In the past there have been
>    problems where the Ceph releases included in consecutive releases of a
>    distro weren't easily upgradeable.
>  - Vendors that make downstream Ceph distributions/products tend to
>    release yearly.  Aligning with those vendors means they are more likely
>    to productize *every* Ceph release.  This will help make every Ceph
>    release an "LTS" release (not just in name but also in terms of
>    maintenance attention).
> 
> So far the balance of opinion seems to favor a shift to a 12 month
> cycle[1], especially among developers, so it seems pretty likely we'll
> make that shift.  (If you do have strong concerns about such a move, now
> is the time to raise them.)
> 
> That brings us to an important decision: what time of year should we
> release?  Once we pick the timing, we'll be releasing at that time *every
> year* for each release (barring another schedule shift, which we want to
> avoid), so let's choose carefully!
> 
> A few options:
> 
>  - November: If we release Octopus 9 months from the Nautilus release
>    (planned for Feb, released in Mar) then we'd target this November.  We
>    could shift to a 12 months candence after that.
>  - February: That's 12 months from the Nautilus target.
>  - March: That's 12 months from when Nautilus was *actually* released.
> 
> November is nice in the sense that we'd wrap things up before the
> holidays.  It's less good in that users may not be inclined to install the
> new release when many developers will be less available in December.
> 
> February kind of sucked in that the scramble to get the last few things
> done happened during the holidays.  OTOH, we should be doing what we can
> to avoid such scrambles, so that might not be something we should factor
> in.  March may be a bit more balanced, with a solid 3 months before when
> people are productive, and 3 months after before they disappear on holiday
> to address any post-release issues.
> 
> People tend to be somewhat less available over the summer months due to
> holidays etc, so an early or late summer release might also be less than
> ideal.
> 
> Thoughts?  If we can narrow it down to a few options maybe we could do a
> poll to gauge user preferences.
> 
> Thanks!
> sage
> 
> 
> [1]
> https://protect-au.mimecast.com/s/N1l6CROAEns1RN1Zu9Jwts?domain=twitter.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-06-06 Thread Daniel Baumann
On 6/6/19 9:26 AM, Xiaoxi Chen wrote:
> I will vote for November for several reasons:

[...]

as an academic institution we're aligned by August to July (school year)
instead of the January to December (calendar year), so all your reasons
(thanks!) are valid for us.. just shifted by 6 months, hence Q1 is ideal
for us.

however, given that academic institutions are the minority, I'm
convinced now that November is the better choice for everyone.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-06 Thread Sakirnth Nagarasa
On 6/6/19 3:46 PM, Jason Dillaman wrote:
> Can you run "rbd trash ls --all --long" and see if your image
> is listed?

No, it is not listed.

I did run:
rbd trash ls --all --long ${POOLNAME_FROM_IMAGE}

Cheers,
Sakirnth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fix scrub error in bluestore.

2019-06-06 Thread Alfredo Rezinovsky
https://ceph.com/geen-categorie/ceph-manually-repair-object/

is a little outdated.

After stopping the OSD, flushing the journal I don't have any clue on how
to move the object (easy in filestore).

I have thins in my osd log.

2019-06-05 10:46:41.418 7f47d0502700 -1 log_channel(cluster) log [ERR] :
10.c5 shard 2 soid 10:a39e2c78:::183f81f.0001:head : candidate had
a read error

How can I fix it?

-- 
Alfrenovsky
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-06 Thread Jason Dillaman
On Thu, Jun 6, 2019 at 10:13 AM Sakirnth Nagarasa
 wrote:
>
> On 6/6/19 3:46 PM, Jason Dillaman wrote:
> > Can you run "rbd trash ls --all --long" and see if your image
> > is listed?
>
> No, it is not listed.
>
> I did run:
> rbd trash ls --all --long ${POOLNAME_FROM_IMAGE}
>
> Cheers,
> Sakirnth

Is it listed under "rbd ls ${POOLNAME_FROM_IMAGE}"?

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Single threaded IOPS on SSD pool.

2019-06-06 Thread jesper
> Hi,
>
> El 5/6/19 a las 16:53, vita...@yourcmc.ru escribió:
>>> Ok, average network latency from VM to OSD's ~0.4ms.
>>
>> It's rather bad, you can improve the latency by 0.3ms just by
>> upgrading the network.
>>
>>> Single threaded performance ~500-600 IOPS - or average latency of 1.6ms
>>> Is that comparable to what other are seeing?
>>
>> Good "reference" numbers are 0.5ms for reads (~2000 iops) and 1ms for
>> writes (~1000 iops).
>>
>> I confirm that the most powerful thing to do is disabling CPU
>> powersave (governor=performance + cpupower -D 0). You usually get 2x
>> single thread iops at once.
>
> We have a small cluster with 4 OSD host, each with 1 SSD INTEL
> SSDSC2KB019T8 (D3-S4510 1.8T), connected with a 10G network (shared with
> VMs, not a busy cluster). Volumes are replica 3:
>
> Network latency from one node to the other 3:
> 10 packets transmitted, 10 received, 0% packet loss, time 9166ms
> rtt min/avg/max/mdev = 0.042/0.064/0.088/0.013 ms
>
> 10 packets transmitted, 10 received, 0% packet loss, time 9190ms
> rtt min/avg/max/mdev = 0.047/0.072/0.110/0.017 ms
>
> 10 packets transmitted, 10 received, 0% packet loss, time 9219ms
> rtt min/avg/max/mdev = 0.061/0.078/0.099/0.011 ms

What NIC / Switching components are in play here .. I simply cannot get
latencies
this far down.

Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to fix ceph MDS HEALTH_WARN

2019-06-06 Thread Jorge Garcia
The mds has load of 0.00, and the IO stats basically say "nothing is 
going on".


On 6/5/19 5:33 PM, Yan, Zheng wrote:

On Thu, Jun 6, 2019 at 6:36 AM Jorge Garcia  wrote:

We have been testing a new installation of ceph (mimic 13.2.2) mostly
using cephfs (for now). The current test is just setting up a filesystem
for backups of our other filesystems. After rsyncing data for a few
days, we started getting this from ceph -s:

health: HEALTH_WARN
  1 MDSs report slow metadata IOs
  1 MDSs behind on trimming

I have been googling for solutions and reading the docs and the
ceph-users list, but I haven't found a way to get rid of these messages
and get back to HEALTH_OK. Some of the things I have tried (from
suggestions around the internet):

- Increasing the amount of RAM on the MDS server (Currently 192 GB)
- Increasing mds_log_max_segments (Currently 256)
- Increasing mds_cache_memory_limit

The message still reports a HEALTH_WARN. Currently, the filesystem is
idle, no I/O happening. Not sure what to try next. Any suggestions?


maybe mds is trimming its log. please check if mds' cpu usage and
whole cluster's IO stats.


Thanks in advance!

Jorge

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fix scrub error in bluestore.

2019-06-06 Thread Tarek Zegar

Look here
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent


Read error typically is a disk issue. The doc is not clear on how to
resolve that






From:   Alfredo Rezinovsky 
To: Ceph Users 
Date:   06/06/2019 10:58 AM
Subject:[EXTERNAL] [ceph-users] Fix scrub error in bluestore.
Sent by:"ceph-users" 



https://ceph.com/geen-categorie/ceph-manually-repair-object/

is a little outdated.

After stopping the OSD, flushing the journal I don't have any clue on how
to move the object (easy in filestore).

I have thins in my osd log.

2019-06-05 10:46:41.418 7f47d0502700 -1 log_channel(cluster) log [ERR] :
10.c5 shard 2 soid 10:a39e2c78:::183f81f.0001:head : candidate had
a read error

How can I fix it?

--
Alfrenovsky___
ceph-users mailing list
ceph-users@lists.ceph.com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwICAg=jf_iaSHvJObTbx-siA1ZOg=3V1n-r1W__Mu-wEAwzq7jDpopOSMrfRfomn1f5bgT28=352TJwgu0vnFCTdMhAtPjFy3LjdYBfTkgOCdE2HTktQ=M9UCn5VB0zy165xxF7Ip1o4HxjQZMz6QvEXcDYwZIaI=



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] dashboard returns 401 on successful auth

2019-06-06 Thread Drew Weaver
Hello,

I was able to get Nautilus running on my cluster.

When I try to login to dashboard with the user I created if I enter the correct 
credentials in the log I see:

2019-06-06 12:51:43.738 7f373ec9b700  1 mgr[dashboard] 
[:::192.168.105.1:56110] [GET] [401] [0.002s] [271B] 
/api/settings/alertmanager-api-host
2019-06-06 12:51:43.741 7f373ec9b700  1 mgr[dashboard] 
[:::192.168.105.1:56110] [GET] [401] [0.002s] [271B] /api/health/minimal
2019-06-06 12:51:43.745 7f373ec9b700  1 mgr[dashboard] 
[:::192.168.105.1:56110] [GET] [401] [0.002s] [271B] /api/health/minimal
2019-06-06 12:51:43.755 7f373dc99700  1 mgr[dashboard] 
[:::192.168.105.1:56111] [GET] [401] [0.001s] [271B] /api/feature_toggles

And in the browser nothing happens.

Any ideas?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fix scrub error in bluestore.

2019-06-06 Thread Oliver Freyermuth

Hi Alfredo,

you may want to check the SMART data for the disk.
I also had such a case recently (see 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/035117.html for 
the thread),
and the disk had one unreadable sector which was pending reallocation.

Triggering "ceph pg repair" for the problematic placement group made the OSD 
rewrite the problematic sector and allowed the disk to reallocate this unreadable sector.

Cheers,
Oliver

Am 06.06.19 um 18:45 schrieb Tarek Zegar:

Look here
_http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent_

Read error typically is a disk issue. The doc is not clear on how to resolve 
that




Inactive hide details for Alfredo Rezinovsky ---06/06/2019 10:58:50 
AM---https://urldefense.proofpoint.com/v2/url?u=https-3A__cAlfredo Rezinovsky 
---06/06/2019 10:58:50 
AM---https://urldefense.proofpoint.com/v2/url?u=https-3A__ceph.com_geen-2Dcategorie_ceph-2Dmanually-2Drep

From: Alfredo Rezinovsky 
To: Ceph Users 
Date: 06/06/2019 10:58 AM
Subject: [EXTERNAL] [ceph-users] Fix scrub error in bluestore.
Sent by: "ceph-users" 

--



_https://ceph.com/geen-categorie/ceph-manually-repair-object/_

is a little outdated.

After stopping the OSD, flushing the journal I don't have any clue on how to 
move the object (easy in filestore).

I have thins in my osd log.

2019-06-05 10:46:41.418 7f47d0502700 -1 log_channel(cluster) log [ERR] : 10.c5 
shard 2 soid 10:a39e2c78:::183f81f.0001:head : candidate had a read 
error

How can I fix it?

--
Alfrenovsky___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] dashboard returns 401 on successful auth

2019-06-06 Thread Nathan Fish
I have filed this bug:
https://tracker.ceph.com/issues/40051

On Thu, Jun 6, 2019 at 12:52 PM Drew Weaver  wrote:
>
> Hello,
>
>
>
> I was able to get Nautilus running on my cluster.
>
>
>
> When I try to login to dashboard with the user I created if I enter the 
> correct credentials in the log I see:
>
>
>
> 2019-06-06 12:51:43.738 7f373ec9b700  1 mgr[dashboard] 
> [:::192.168.105.1:56110] [GET] [401] [0.002s] [271B] 
> /api/settings/alertmanager-api-host
>
> 2019-06-06 12:51:43.741 7f373ec9b700  1 mgr[dashboard] 
> [:::192.168.105.1:56110] [GET] [401] [0.002s] [271B] /api/health/minimal
>
> 2019-06-06 12:51:43.745 7f373ec9b700  1 mgr[dashboard] 
> [:::192.168.105.1:56110] [GET] [401] [0.002s] [271B] /api/health/minimal
>
> 2019-06-06 12:51:43.755 7f373dc99700  1 mgr[dashboard] 
> [:::192.168.105.1:56111] [GET] [401] [0.001s] [271B] /api/feature_toggles
>
>
>
> And in the browser nothing happens.
>
>
>
> Any ideas?
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] typical snapmapper size

2019-06-06 Thread Sage Weil
Hello RBD users,

Would you mind running this command on a random OSD on your RBD-oriented 
cluster?

ceph-objectstore-tool \
 --data-path /var/lib/ceph/osd/ceph-NNN \
 
'["meta",{"oid":"snapmapper","key":"","snapid":0,"hash":2758339587,"max":0,"pool":-1,"namespace":"","max":0}]'
 \
 list-omap | wc -l

...and share the number of lines along with the overall size and 
utilization % of the OSD?  The OSD needs to be stopped, then run that 
command, then start it up again.

I'm trying to guage how much snapmapper metadata there is in a "typical" 
RBD environment.  If you have some sense of whether your users make 
relatively heavy or light use of snapshots, that would be helpful too!

Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] obj_size_info_mismatch error handling

2019-06-06 Thread Reed Dier
Sadly I never discovered anything more.

It ended up clearing up on its own, which was disconcerting, but I resigned to 
not making things worse in an attempt to make them better.

I assume someone touched the file in CephFS, which triggered the metadata to be 
updated, and everyone was able to reach consensus.

Wish I had more for you.

Reed

> On Jun 3, 2019, at 7:43 AM, Dan van der Ster  wrote:
> 
> Hi Reed and Brad,
> 
> Did you ever learn more about this problem?
> We currently have a few inconsistencies arriving with the same env
> (cephfs, v13.2.5) and symptoms.
> 
> PG Repair doesn't fix the inconsistency, nor does Brad's omap
> workaround earlier in the thread.
> In our case, we can fix by cp'ing the file to a new inode, deleting
> the inconsistent file, then scrubbing the PG.
> 
> -- Dan
> 
> 
> On Fri, May 3, 2019 at 3:18 PM Reed Dier  wrote:
>> 
>> Just to follow up for the sake of the mailing list,
>> 
>> I had not had a chance to attempt your steps yet, but things appear to have 
>> worked themselves out on their own.
>> 
>> Both scrub errors cleared without intervention, and I'm not sure if it is 
>> the results of that object getting touched in CephFS that triggered the 
>> update of the size info, or if something else was able to clear it.
>> 
>> Didn't see anything relating to the clearing in mon, mgr, or osd logs.
>> 
>> So, not entirely sure what fixed it, but it is resolved on its own.
>> 
>> Thanks,
>> 
>> Reed
>> 
>> On Apr 30, 2019, at 8:01 PM, Brad Hubbard  wrote:
>> 
>> On Wed, May 1, 2019 at 10:54 AM Brad Hubbard  wrote:
>> 
>> 
>> Which size is correct?
>> 
>> 
>> Sorry, accidental discharge =D
>> 
>> If the object info size is *incorrect* try forcing a write to the OI
>> with something like the following.
>> 
>> 1. rados -p [name_of_pool_17] setomapval 10008536718.
>> temporary-key anything
>> 2. ceph pg deep-scrub 17.2b9
>> 3. Wait for the scrub to finish
>> 4. rados -p [name_of_pool_2] rmomapkey 10008536718. temporary-key
>> 
>> If the object info size is *correct* you could try just doing a rados
>> get followed by a rados put of the object to see if the size is
>> updated correctly.
>> 
>> It's more likely the object info size is wrong IMHO.
>> 
>> 
>> On Tue, Apr 30, 2019 at 1:06 AM Reed Dier  wrote:
>> 
>> 
>> Hi list,
>> 
>> Woke up this morning to two PG's reporting scrub errors, in a way that I 
>> haven't seen before.
>> 
>> $ ceph versions
>> {
>>   "mon": {
>>   "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic 
>> (stable)": 3
>>   },
>>   "mgr": {
>>   "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic 
>> (stable)": 3
>>   },
>>   "osd": {
>>   "ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic 
>> (stable)": 156
>>   },
>>   "mds": {
>>   "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic 
>> (stable)": 2
>>   },
>>   "overall": {
>>   "ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic 
>> (stable)": 156,
>>   "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic 
>> (stable)": 8
>>   }
>> }
>> 
>> 
>> OSD_SCRUB_ERRORS 8 scrub errors
>> PG_DAMAGED Possible data damage: 2 pgs inconsistent
>>   pg 17.72 is active+clean+inconsistent, acting [3,7,153]
>>   pg 17.2b9 is active+clean+inconsistent, acting [19,7,16]
>> 
>> 
>> Here is what $rados list-inconsistent-obj 17.2b9 --format=json-pretty yields:
>> 
>> {
>>   "epoch": 134582,
>>   "inconsistents": [
>>   {
>>   "object": {
>>   "name": "10008536718.",
>>   "nspace": "",
>>   "locator": "",
>>   "snap": "head",
>>   "version": 0
>>   },
>>   "errors": [],
>>   "union_shard_errors": [
>>   "obj_size_info_mismatch"
>>   ],
>>   "shards": [
>>   {
>>   "osd": 7,
>>   "primary": false,
>>   "errors": [
>>   "obj_size_info_mismatch"
>>   ],
>>   "size": 5883,
>>   "object_info": {
>>   "oid": {
>>   "oid": "10008536718.",
>>   "key": "",
>>   "snapid": -2,
>>   "hash": 1752643257,
>>   "max": 0,
>>   "pool": 17,
>>   "namespace": ""
>>   },
>>   "version": "134599'448331",
>>   "prior_version": "134599'448330",
>>   "last_reqid": "client.1580931080.0:671854",
>>   "user_version": 448331,
>>   "size": 3505,
>>   "mtime": "2019-04-28 15:32:20.003519",
>>   "local_mtime": "2019-04-28 15:32:25.991015",
>>   "lost": 0,
>>   

[ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size

2019-06-06 Thread Tarek Zegar

For testing purposes I set a bunch of OSD to 0 weight, this correctly
forces Ceph to not use said OSD. I took enough out such that the UP set
only had Pool min size # of OSD (i.e 2 OSD).

Two Questions:
1. Why doesn't the acting set eventually match the UP set and simply point
to [6,5] only
2. Why are none of the PGs marked as undersized and degraded? The data is
only hosted on 2 OSD rather then Pool size (3), I would expect a undersized
warning and degraded for PG with data?

Example PG:
PG 1.4d active+clean+remapped  UP= [6,5] Acting = [6,5,4]

OSD Tree:
ID CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF
-1   0.08817 root default
-3   0.02939 host hostosd1
 0   hdd 0.00980 osd.0 up  1.0 1.0
 3   hdd 0.00980 osd.3 up  1.0 1.0
 6   hdd 0.00980 osd.6 up  1.0 1.0
-5   0.02939 host hostosd2
 1   hdd 0.00980 osd.1 up0 1.0
 4   hdd 0.00980 osd.4 up0 1.0
 7   hdd 0.00980 osd.7 up0 1.0
-7   0.02939 host hostosd3
 2   hdd 0.00980 osd.2 up  1.0 1.0
 5   hdd 0.00980 osd.5 up  1.0 1.0
 8   hdd 0.00980 osd.8 up0 1.0



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to fix ceph MDS HEALTH_WARN

2019-06-06 Thread Jorge Garcia
Ok, I finally got the cluster back to HEALTH_OK. After rebooting the 
whole cluster didn't fix the problem, I did a:


  ceph osd set noscrub
  ceph osd set nodeep-scrub

That made the "slow metadata IOs" and "behind on trimming" warnings go 
away, replaced by "noscrub, nodeep-scrub flag(s) set". When all the pgs 
were active+clean, I did:


  ceph osd unset noscub
  ceph osd unset nodeep-scrub

An now the cluster is back to HEALTH_OK.

Now to figure out what is causing the problem in the first place...

Jorge

On 6/5/19 5:33 PM, Yan, Zheng wrote:

On Thu, Jun 6, 2019 at 6:36 AM Jorge Garcia  wrote:

We have been testing a new installation of ceph (mimic 13.2.2) mostly
using cephfs (for now). The current test is just setting up a filesystem
for backups of our other filesystems. After rsyncing data for a few
days, we started getting this from ceph -s:

health: HEALTH_WARN
  1 MDSs report slow metadata IOs
  1 MDSs behind on trimming

I have been googling for solutions and reading the docs and the
ceph-users list, but I haven't found a way to get rid of these messages
and get back to HEALTH_OK. Some of the things I have tried (from
suggestions around the internet):

- Increasing the amount of RAM on the MDS server (Currently 192 GB)
- Increasing mds_log_max_segments (Currently 256)
- Increasing mds_cache_memory_limit

The message still reports a HEALTH_WARN. Currently, the filesystem is
idle, no I/O happening. Not sure what to try next. Any suggestions?


maybe mds is trimming its log. please check if mds' cpu usage and
whole cluster's IO stats.


Thanks in advance!

Jorge

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] typical snapmapper size

2019-06-06 Thread Shawn Iverson
17838

ID CLASS WEIGHT   REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
24   hdd  1.0  1.0  419GiB  185GiB  234GiB 44.06 1.46  85

light snapshot use


On Thu, Jun 6, 2019 at 2:00 PM Sage Weil  wrote:

> Hello RBD users,
>
> Would you mind running this command on a random OSD on your RBD-oriented
> cluster?
>
> ceph-objectstore-tool \
>  --data-path /var/lib/ceph/osd/ceph-NNN \
>  
> '["meta",{"oid":"snapmapper","key":"","snapid":0,"hash":2758339587,"max":0,"pool":-1,"namespace":"","max":0}]'
> \
>  list-omap | wc -l
>
> ...and share the number of lines along with the overall size and
> utilization % of the OSD?  The OSD needs to be stopped, then run that
> command, then start it up again.
>
> I'm trying to guage how much snapmapper metadata there is in a "typical"
> RBD environment.  If you have some sense of whether your users make
> relatively heavy or light use of snapshots, that would be helpful too!
>
> Thanks!
> sage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Shawn Iverson, CETL
Director of Technology
Rush County Schools
765-932-3901 option 7
ivers...@rushville.k12.in.us

[image: Cybersecurity]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com