Re: [ceph-users] problem w libvirt version 4.5 and 12.2.7

2019-01-03 Thread Tomasz Płaza

Konstantin,

Thanks for reply. I've managed to unravel it partially. Somehow (did not 
look into srpm) starting from this version libvirt started to calculate 
real allocation if fastdiff feature is present on image. Doing "rbd 
object-map rebuild" on every image helped (do not know why it was needed 
- it is a new cluster with ceph version 12.2.7).


Now the only problem is 25T image on which "virsh vol-info" takes 13s 
(rbd du takes 1s) compared to few minutes before, so the questions remains:


- why it happened,

- how to monitor/foresee this,

- how to improve virsh vol-info if rbd du take less time to execute?


On 03.01.2019 at 13:51, Konstantin Shalygin wrote:



After update to CentOS 7.6, libvirt was updated from 3.9 to 4.5.
Executing: "virsh vol-list ceph --details" makes libvirtd using 300% CPU
for 2 minutes to show volumes on rbd. Quick pick at tcpdump shows
accessing rbd_data.* which previous version of libvirtd did not need.
Ceph version is 12.2.7.

Any help will be appreciated
There is nothing special in libvirt 4.5, I was upgraded hypervisors to 
this version, still works flawless.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade from jewel 10.2.10 to 10.2.11 broke anonymous swift

2019-01-03 Thread Johan Guldmyr


> 
> Does anybody have a suggestion of what I could try to troubleshoot this?

Upgrading to Luminous also "solves the issue". I'll look into that :)

// Johan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help Ceph Cluster Down

2019-01-03 Thread Arun POONIA
Hi Chris,

Indeed that's what happened. I didn't set noout flag either and I did
zapped disk on new server every time. In my cluster status fre201 is only
new server.

Current Status after enabling 3 OSDs on fre201 host.

[root@fre201 ~]# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME   STATUS REWEIGHT PRI-AFF
 -1   70.92137 root default
 -25.45549 host fre101
  0   hdd  1.81850 osd.0   up  1.0 1.0
  1   hdd  1.81850 osd.1   up  1.0 1.0
  2   hdd  1.81850 osd.2   up  1.0 1.0
 -95.45549 host fre103
  3   hdd  1.81850 osd.3   up  1.0 1.0
  4   hdd  1.81850 osd.4   up  1.0 1.0
  5   hdd  1.81850 osd.5   up  1.0 1.0
 -35.45549 host fre105
  6   hdd  1.81850 osd.6   up  1.0 1.0
  7   hdd  1.81850 osd.7   up  1.0 1.0
  8   hdd  1.81850 osd.8   up  1.0 1.0
 -45.45549 host fre107
  9   hdd  1.81850 osd.9   up  1.0 1.0
 10   hdd  1.81850 osd.10  up  1.0 1.0
 11   hdd  1.81850 osd.11  up  1.0 1.0
 -55.45549 host fre109
 12   hdd  1.81850 osd.12  up  1.0 1.0
 13   hdd  1.81850 osd.13  up  1.0 1.0
 14   hdd  1.81850 osd.14  up  1.0 1.0
 -65.45549 host fre111
 15   hdd  1.81850 osd.15  up  1.0 1.0
 16   hdd  1.81850 osd.16  up  1.0 1.0
 17   hdd  1.81850 osd.17  up  0.7 1.0
 -75.45549 host fre113
 18   hdd  1.81850 osd.18  up  1.0 1.0
 19   hdd  1.81850 osd.19  up  1.0 1.0
 20   hdd  1.81850 osd.20  up  1.0 1.0
 -85.45549 host fre115
 21   hdd  1.81850 osd.21  up  1.0 1.0
 22   hdd  1.81850 osd.22  up  1.0 1.0
 23   hdd  1.81850 osd.23  up  1.0 1.0
-105.45549 host fre117
 24   hdd  1.81850 osd.24  up  1.0 1.0
 25   hdd  1.81850 osd.25  up  1.0 1.0
 26   hdd  1.81850 osd.26  up  1.0 1.0
-115.45549 host fre119
 27   hdd  1.81850 osd.27  up  1.0 1.0
 28   hdd  1.81850 osd.28  up  1.0 1.0
 29   hdd  1.81850 osd.29  up  1.0 1.0
-125.45549 host fre121
 30   hdd  1.81850 osd.30  up  1.0 1.0
 31   hdd  1.81850 osd.31  up  1.0 1.0
 32   hdd  1.81850 osd.32  up  1.0 1.0
-135.45549 host fre123
 33   hdd  1.81850 osd.33  up  1.0 1.0
 34   hdd  1.81850 osd.34  up  1.0 1.0
 35   hdd  1.81850 osd.35  up  1.0 1.0
-275.45549 host fre201
 36   hdd  1.81850 osd.36  up  1.0 1.0
 37   hdd  1.81850 osd.37  up  1.0 1.0
 38   hdd  1.81850 osd.38  up  1.0 1.0
[root@fre201 ~]#
[root@fre201 ~]#
[root@fre201 ~]#
[root@fre201 ~]#
[root@fre201 ~]#
[root@fre201 ~]# ceph -s
  cluster:
id: adb9ad8e-f458-4124-bf58-7963a8d1391f
health: HEALTH_ERR
3 pools have many more objects per pg than average
585791/12391450 objects misplaced (4.727%)
2 scrub errors
2374 PGs pending on creation
Reduced data availability: 6578 pgs inactive, 2025 pgs down, 74
pgs peering, 1234 pgs stale
Possible data damage: 2 pgs inconsistent
Degraded data redundancy: 64969/12391450 objects degraded
(0.524%), 616 pgs degraded, 20 pgs undersized
96242 slow requests are blocked > 32 sec
228 stuck requests are blocked > 4096 sec
too many PGs per OSD (2768 > max 200)

  services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
osd: 39 osds: 39 up, 39 in; 96 remapped pgs
rgw: 1 daemon active

  data:
pools:   18 pools, 54656 pgs
objects: 6050k objects, 10942 GB
usage:   21900 GB used, 50721 GB / 72622 GB avail
pgs: 0.002% pgs unknown
 12.050% pgs not active
 64969/12391450 objects degraded (0.524%)
 585791/12391450 objects misplaced (4.727%)
 47489 active+clean
 3670  activating
 1098  stale+down
 923   down
 575   activating+degraded
 563   stale+active+clean
 105   stale+activating
 78activating+remapped
 72peering
 25stale+activating+degraded
 23stale+activating+remapped
 9 stale+active+undersized
 6 stale+activating+undersized+degraded+remapped
 5 stale+active+undersized+degraded
 4 

[ceph-users] cephfs : rsync backup create cache pressure on clients, filling caps

2019-01-03 Thread Alexandre DERUMIER
Hi,

I'm currently doing cephfs backup, through a dedicated clients mounting the 
whole filesystem at root.
others clients are mounting part of the filesystem. (kernel cephfs clients)


I have around 22millions inodes, 

before backup, I have around 5M caps loaded by clients

#ceph daemonperf mds.x.x

---mds --mds_cache--- ---mds_log -mds_mem- 
--mds_server-- mds_ -objecter-- purg 
req  rlat fwd  inos caps exi  imi |stry recy recd|subm evts segs|ino  dn  |hcr  
hcs  hsr |sess|actv rd   wr   rdwr|purg|
11800   22M 5.3M   00 |  600 |  2  120k 130 | 22M  22M|118  
  00 |167 |  0200 |  0 



when backup is running, reading all the files, the caps are increasing to max 
(and even a little bit more)

# ceph daemonperf mds.x.x
---mds --mds_cache--- ---mds_log -mds_mem- 
--mds_server-- mds_ -objecter-- purg 
req  rlat fwd  inos caps exi  imi |stry recy recd|subm evts segs|ino  dn  |hcr  
hcs  hsr |sess|actv rd   wr   rdwr|purg|
15500   20M  22M   00 |  600 |  2  120k 129 | 20M  20M|155  
  00 |167 |  0000 |  0 

then mds try recall caps to others clients, and I'm gettin some
2019-01-04 01:13:11.173768 cluster [WRN] Health check failed: 1 clients failing 
to respond to cache pressure (MDS_CLIENT_RECALL)
2019-01-04 02:00:00.73 cluster [WRN] overall HEALTH_WARN 1 clients failing 
to respond to cache pressure
2019-01-04 03:00:00.69 cluster [WRN] overall HEALTH_WARN 1 clients failing 
to respond to cache pressure



Doing a simple
echo 2 | tee /proc/sys/vm/drop_caches
on backup server, is freeing caps again

# ceph daemonperf x
---mds --mds_cache--- ---mds_log -mds_mem- 
--mds_server-- mds_ -objecter-- purg 
req  rlat fwd  inos caps exi  imi |stry recy recd|subm evts segs|ino  dn  |hcr  
hcs  hsr |sess|actv rd   wr   rdwr|purg|
11600   22M 4.8M   00 |  400 |  1  117k 131 | 22M  22M|116  
  10 |167 |  0200 |  0 




Some questions here :

ceph side
-
Is it possible to setup some kind of priority between clients, to force 
retreive caps on a specific client ?
Is is possible to limit the number of caps for a client ?


client side 
---
I have tried to use vm.vfs_cache_pressure=4, to reclam inodes entries more 
fast, but server have 128GB ram.
Is it possible to limit the number of inodes in cache on linux.
Is is possible to tune something on the ceph mount point ?


Regards,

Alexandre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help Ceph Cluster Down

2019-01-03 Thread Chris
If you added OSDs and then deleted them repeatedly without waiting for 
replication to finish as the cluster attempted to re-balance across them, 
its highly likely that you are permanently missing PGs (especially if the 
disks were zapped each time).


If those 3 down OSDs can be revived there is a (small) chance that you can 
right the ship, but 1400pg/OSD is pretty extreme.  I'm surprised the 
cluster even let you do that - this sounds like a data loss event.



Bring back the 3 OSD and see what those 2 inconsistent pgs look like with 
ceph pg query.


On January 3, 2019 21:59:38 Arun POONIA  wrote:

Hi,

Recently I tried adding a new node (OSD) to ceph cluster using ceph-deploy 
tool. Since I was experimenting with tool and ended up deleting OSD nodes 
on new server couple of times.


Now since ceph OSDs are running on new server cluster PGs seems to be 
inactive (10-15%) and they are not recovering or rebalancing. Not sure what 
to do. I tried shutting down OSDs on new server.


Status:
[root@fre105 ~]# ceph -s
2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0) 
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
bind the UNIX domain socket to 
'/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2) 
No such file or directory

 cluster:
   id: adb9ad8e-f458-4124-bf58-7963a8d1391f
   health: HEALTH_ERR
   3 pools have many more objects per pg than average
   373907/12391198 objects misplaced (3.018%)
   2 scrub errors
   9677 PGs pending on creation
   Reduced data availability: 7145 pgs inactive, 6228 pgs down, 1 pg peering, 
   2717 pgs stale

   Possible data damage: 2 pgs inconsistent
   Degraded data redundancy: 178350/12391198 objects degraded (1.439%), 346 
   pgs degraded, 1297 pgs undersized

   52486 slow requests are blocked > 32 sec
   9287 stuck requests are blocked > 4096 sec
   too many PGs per OSD (2968 > max 200)

 services:
   mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
   mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
   osd: 39 osds: 36 up, 36 in; 51 remapped pgs
   rgw: 1 daemon active

 data:
   pools:   18 pools, 54656 pgs
   objects: 6050k objects, 10941 GB
   usage:   21727 GB used, 45308 GB / 67035 GB avail
   pgs: 13.073% pgs not active
178350/12391198 objects degraded (1.439%)
373907/12391198 objects misplaced (3.018%)
46177 active+clean
5054  down
1173  stale+down
1084  stale+active+undersized
547   activating
201   stale+active+undersized+degraded
158   stale+activating
96activating+degraded
46stale+active+clean
42activating+remapped
34stale+activating+degraded
23stale+activating+remapped
6 stale+activating+undersized+degraded+remapped
6 activating+undersized+degraded+remapped
2 activating+degraded+remapped
2 active+clean+inconsistent
1 stale+activating+degraded+remapped
1 stale+active+clean+remapped
1 stale+remapped
1 down+remapped
1 remapped+peering

 io:
   client:   0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr

Thanks
--
Arun Poonia

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help Ceph Cluster Down

2019-01-03 Thread Arun POONIA
Hi,

Recently I tried adding a new node (OSD) to ceph cluster using ceph-deploy
tool. Since I was experimenting with tool and ended up deleting OSD nodes
on new server couple of times.

Now since ceph OSDs are running on new server cluster PGs seems to be
inactive (10-15%) and they are not recovering or rebalancing. Not sure what
to do. I tried shutting down OSDs on new server.

Status:
[root@fre105 ~]# ceph -s
2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0)
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
bind the UNIX domain socket to
'/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2)
No such file or directory
  cluster:
id: adb9ad8e-f458-4124-bf58-7963a8d1391f
health: HEALTH_ERR
3 pools have many more objects per pg than average
373907/12391198 objects misplaced (3.018%)
2 scrub errors
9677 PGs pending on creation
Reduced data availability: 7145 pgs inactive, 6228 pgs down, 1
pg peering, 2717 pgs stale
Possible data damage: 2 pgs inconsistent
Degraded data redundancy: 178350/12391198 objects degraded
(1.439%), 346 pgs degraded, 1297 pgs undersized
52486 slow requests are blocked > 32 sec
9287 stuck requests are blocked > 4096 sec
too many PGs per OSD (2968 > max 200)

  services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
osd: 39 osds: 36 up, 36 in; 51 remapped pgs
rgw: 1 daemon active

  data:
pools:   18 pools, 54656 pgs
objects: 6050k objects, 10941 GB
usage:   21727 GB used, 45308 GB / 67035 GB avail
pgs: 13.073% pgs not active
 178350/12391198 objects degraded (1.439%)
 373907/12391198 objects misplaced (3.018%)
 46177 active+clean
 5054  down
 1173  stale+down
 1084  stale+active+undersized
 547   activating
 201   stale+active+undersized+degraded
 158   stale+activating
 96activating+degraded
 46stale+active+clean
 42activating+remapped
 34stale+activating+degraded
 23stale+activating+remapped
 6 stale+activating+undersized+degraded+remapped
 6 activating+undersized+degraded+remapped
 2 activating+degraded+remapped
 2 active+clean+inconsistent
 1 stale+activating+degraded+remapped
 1 stale+active+clean+remapped
 1 stale+remapped
 1 down+remapped
 1 remapped+peering

  io:
client:   0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr

Thanks
-- 
Arun Poonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Compacting omap data

2019-01-03 Thread Brad Hubbard
Nautilus will make this easier.

https://github.com/ceph/ceph/pull/18096

On Thu, Jan 3, 2019 at 5:22 AM Bryan Stillwell  wrote:
>
> Recently on one of our bigger clusters (~1,900 OSDs) running Luminous 
> (12.2.8), we had a problem where OSDs would frequently get restarted while 
> deep-scrubbing.
>
> After digging into it I found that a number of the OSDs had very large omap 
> directories (50GiB+).  I believe these were OSDs that had previous held PGs 
> that were part of the .rgw.buckets.index pool which I have recently moved to 
> all SSDs, however, it seems like the data remained on the HDDs.
>
> I was able to reduce the data usage on most of the OSDs (from ~50 GiB to < 
> 200 MiB!) by compacting the omap dbs offline by setting 
> 'leveldb_compact_on_mount = true' in the [osd] section of ceph.conf, but that 
> didn't work on the newer OSDs which use rocksdb.  On those I had to do an 
> online compaction using a command like:
>
> $ ceph tell osd.510 compact
>
> That worked, but today when I tried doing that on some of the SSD-based OSDs 
> which are backing .rgw.buckets.index I started getting slow requests and the 
> compaction ultimately failed with this error:
>
> $ ceph tell osd.1720 compact
> osd.1720: Error ENXIO: osd down
>
> When I tried it again it succeeded:
>
> $ ceph tell osd.1720 compact
> osd.1720: compacted omap in 420.999 seconds
>
> The data usage on that OSD dropped from 57.8 GiB to 43.4 GiB which was nice, 
> but I don't believe that'll get any smaller until I start splitting the PGs 
> in the .rgw.buckets.index pool to better distribute that pool across the 
> SSD-based OSDs.
>
> The first question I have is what is the option to do an offline compaction 
> of rocksdb so I don't impact our customers while compacting the rest of the 
> SSD-based OSDs?
>
> The next question is if there's a way to configure Ceph to automatically 
> compact the omap dbs in the background in a way that doesn't affect user 
> experience?
>
> Finally, I was able to figure out that the omap directories were getting 
> large because we're using filestore on this cluster, but how could someone 
> determine this when using BlueStore?
>
> Thanks,
> Bryan
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS client df command showing raw space after adding second pool to mds

2019-01-03 Thread Yan, Zheng
On Fri, Jan 4, 2019 at 1:53 AM David C  wrote:
>
> Hi All
>
> Luminous 12.2.12
> Single MDS
> Replicated pools
>
> A 'df' on a CephFS kernel client used to show me the usable space (i.e the 
> raw space with the replication overhead applied). This was when I just had a 
> single cephfs data pool.
>
> After adding a second pool to the mds and using file layouts to map a 
> directory to that pool, a df is now showing the raw space. It's not the end 
> of the world but was handy to see the usable space.
>
> I'm fairly sure the change was me adding the second pool although I'm not 99% 
> sure.
>
> I'm seeing this behavior on the latest Centos 7.6 kernel and a 4.14 kernel, 
> is this expected?
>

Yes, it's expected. see commit

commit 06d74376c8af32f5b8d777a943aa4dc99165088b
Author: Douglas Fuller 
Date:   Wed Aug 16 10:19:27 2017 -0400

ceph: more accurate statfs

Improve accuracy of statfs reporting for Ceph filesystems comprising
exactly one data pool. In this case, the Ceph monitor can now report
the space usage for the single data pool instead of the global data
for the entire Ceph cluster. Include support for this message in
mon_client and leverage it in ceph/super.


> Thanks,
> David
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Omap issues - metadata creating too many

2019-01-03 Thread J. Eric Ivancich
If you can wait a few weeks until the next release of luminous there
will be tooling to do this safely. Abhishek Lekshmanan of SUSE
contributed the PR. It adds some sub-commands to radosgw-admin:

radosgw-admin reshard stale-instances list
radosgw-admin reshard stale-instances rm

If you do it manually you should proceed with extreme caution as you
could do some damage that you might not be able to recover from.

Eric

On 1/3/19 11:31 AM, Bryan Stillwell wrote:
> Josef,
> 
>  
> 
> I've noticed that when dynamic resharding is on it'll reshard some of
> our bucket indices daily (sometimes more).  This causes a lot of wasted
> space in the .rgw.buckets.index pool which might be what you are seeing.
> 
>  
> 
> You can get a listing of all the bucket instances in your cluster with
> this command:
> 
>  
> 
> radosgw-admin metadata list bucket.instance | jq -r '.[]' | sort
> 
>  
> 
> Give that a try and see if you see the same problem.  It seems that once
> you remove the old bucket instances the omap dbs don't reduce in size
> until you compact them.
> 
>  
> 
> Bryan
> 
>  
> 
> *From: *Josef Zelenka 
> *Date: *Thursday, January 3, 2019 at 3:49 AM
> *To: *"J. Eric Ivancich" 
> *Cc: *"ceph-users@lists.ceph.com" , Bryan
> Stillwell 
> *Subject: *Re: [ceph-users] Omap issues - metadata creating too many
> 
>  
> 
> Hi, i had the default - so it was on(according to ceph kb). turned it
> 
> off, but the issue persists. i noticed Bryan Stillwell(cc-ing him) had
> 
> the same issue (reported about it yesterday) - tried his tips about
> 
> compacting, but it doesn't do anything, however i have to add to his
> 
> last point, this happens even with bluestore. Is there anything we can
> 
> do to clean up the omap manually?
> 
>  
> 
> Josef
> 
>  
> 
> On 18/12/2018 23:19, J. Eric Ivancich wrote:
> 
> On 12/17/18 9:18 AM, Josef Zelenka wrote:
> 
> Hi everyone, i'm running a Luminous 12.2.5 cluster with 6 hosts on
> 
> ubuntu 16.04 - 12 HDDs for data each, plus 2 SSD metadata OSDs(three
> 
> nodes have an additional SSD i added to have more space to
> rebalance the
> 
> metadata). CUrrently, the cluster is used mainly as a radosgw
> storage,
> 
> with 28tb data in total, replication 2x for both the metadata
> and data
> 
> pools(a cephfs isntance is running alongside there, but i don't
> think
> 
> it's the perpetrator - this happenned likely before we had it). All
> 
> pools aside from the data pool of the cephfs and data pool of the
> 
> radosgw are located on the SSD's. Now, the interesting thing -
> at random
> 
> times, the metadata OSD's fill up their entire capacity with
> OMAP data
> 
> and go to r/o mode and we have no other option currently than
> deleting
> 
> them and re-creating. The fillup comes at a random time, it
> doesn't seem
> 
> to be triggered by anything and it isn't caused by some data
> influx. It
> 
> seems like some kind of a bug to me to be honest, but i'm not
> certain -
> 
> anyone else seen this behavior with their radosgw? Thanks a lot
> 
> Hi Josef,
> 
>  
> 
> Do you have rgw_dynamic_resharding turned on? Try turning it off and see
> 
> if the behavior continues.
> 
>  
> 
> One theory is that dynamic resharding is triggered and possibly not
> 
> completing. This could add a lot of data to omap for the incomplete
> 
> bucket index shards. After a delay it tries resharding again, possibly
> 
> failing again, and adding more data to the omap. This continues.
> 
>  
> 
> If this is the ultimate issue we have some commits on the upstream
> 
> luminous branch that are designed to address this set of issues.
> 
>  
> 
> But we should first see if this is the cause.
> 
>  
> 
> Eric
> 
>  
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic 13.2.3?

2019-01-03 Thread Alex Litvak
It is true for all distros.  It doesn't happen the first time either. I 
think it is a bit dangerous.


On 1/3/19 12:25 AM, Ashley Merrick wrote:
Have just run an apt update and have noticed there are some CEPH 
packages now available for update on my mimic cluster / ubuntu.


Have yet to install these yet but it look's like we have the next point 
release of CEPH Mimic, but not able to see any release note's or 
official comm's yet?..


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS client df command showing raw space after adding second pool to mds

2019-01-03 Thread David C
Hi All

Luminous 12.2.12
Single MDS
Replicated pools

A 'df' on a CephFS kernel client used to show me the usable space (i.e the
raw space with the replication overhead applied). This was when I just had
a single cephfs data pool.

After adding a second pool to the mds and using file layouts to map a
directory to that pool, a df is now showing the raw space. It's not the end
of the world but was handy to see the usable space.

I'm fairly sure the change was me adding the second pool although I'm not
99% sure.

I'm seeing this behavior on the latest Centos 7.6 kernel and a 4.14 kernel,
is this expected?

Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Omap issues - metadata creating too many

2019-01-03 Thread Bryan Stillwell
Josef,

I've noticed that when dynamic resharding is on it'll reshard some of our 
bucket indices daily (sometimes more).  This causes a lot of wasted space in 
the .rgw.buckets.index pool which might be what you are seeing.

You can get a listing of all the bucket instances in your cluster with this 
command:

radosgw-admin metadata list bucket.instance | jq -r '.[]' | sort

Give that a try and see if you see the same problem.  It seems that once you 
remove the old bucket instances the omap dbs don't reduce in size until you 
compact them.

Bryan

From: Josef Zelenka 
Date: Thursday, January 3, 2019 at 3:49 AM
To: "J. Eric Ivancich" 
Cc: "ceph-users@lists.ceph.com" , Bryan Stillwell 

Subject: Re: [ceph-users] Omap issues - metadata creating too many

Hi, i had the default - so it was on(according to ceph kb). turned it
off, but the issue persists. i noticed Bryan Stillwell(cc-ing him) had
the same issue (reported about it yesterday) - tried his tips about
compacting, but it doesn't do anything, however i have to add to his
last point, this happens even with bluestore. Is there anything we can
do to clean up the omap manually?

Josef

On 18/12/2018 23:19, J. Eric Ivancich wrote:
On 12/17/18 9:18 AM, Josef Zelenka wrote:
Hi everyone, i'm running a Luminous 12.2.5 cluster with 6 hosts on
ubuntu 16.04 - 12 HDDs for data each, plus 2 SSD metadata OSDs(three
nodes have an additional SSD i added to have more space to rebalance the
metadata). CUrrently, the cluster is used mainly as a radosgw storage,
with 28tb data in total, replication 2x for both the metadata and data
pools(a cephfs isntance is running alongside there, but i don't think
it's the perpetrator - this happenned likely before we had it). All
pools aside from the data pool of the cephfs and data pool of the
radosgw are located on the SSD's. Now, the interesting thing - at random
times, the metadata OSD's fill up their entire capacity with OMAP data
and go to r/o mode and we have no other option currently than deleting
them and re-creating. The fillup comes at a random time, it doesn't seem
to be triggered by anything and it isn't caused by some data influx. It
seems like some kind of a bug to me to be honest, but i'm not certain -
anyone else seen this behavior with their radosgw? Thanks a lot
Hi Josef,

Do you have rgw_dynamic_resharding turned on? Try turning it off and see
if the behavior continues.

One theory is that dynamic resharding is triggered and possibly not
completing. This could add a lot of data to omap for the incomplete
bucket index shards. After a delay it tries resharding again, possibly
failing again, and adding more data to the omap. This continues.

If this is the ultimate issue we have some commits on the upstream
luminous branch that are designed to address this set of issues.

But we should first see if this is the cause.

Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help with setting device-class rule on pool without causing data to move

2019-01-03 Thread David C
Thanks, Sage! That did the trick.

Wido, seems like an interesting approach but I wasn't brave enough to
attempt it!

Eric, I suppose this does the same thing that the crushtool reclassify
feature does?

Thank you both for your suggestions.

For posterity:

-  I grabbed some 14.0.1 packages, extracted crushtool
and libceph-common.so.1
- Ran 'crushtool -i cm --reclassify --reclassify-root default hdd -o
cm_reclassified'
- Compared the maps with:

crushtool -i cm --compare cm_reclassified

That suggested I would get an acceptable amount of data reshuffling which I
expected, I didn't use --set-subtree-class as I'd already added SSD drives
to the cluster.

My ultimate goal was to migrate the cephfs_metadata pool onto SSD drives
while leaving the cephfs_data pool on the HDD drives. The device classes
feature made that really trivial, I just created an intermediary rule which
would use both HDD and SDD hosts (I didn't have any mixed devices in
hosts), set the Metadata pool to use the new rule, waited for recovery and
then set the Metadata pool to use an SSD-only rule. Not sure if that
intermediary stage was strictly necessary, I was concerned about inactive
PGs.

Thanks,
David

On Mon, Dec 31, 2018 at 6:06 PM Eric Goirand  wrote:

> Hi David,
>
> CERN has provided with a python script to swap the correct bucket IDs
> (default <-> hdd), you can find it here :
>
> https://github.com/cernceph/ceph-scripts/blob/master/tools/device-class-id-swap.py
>
> The principle is the following :
> - extract the CRUSH map
> - run the script on it => it creates a new CRUSH file.
> - edit the CRUSH map and modify the rule associated with the pool(s) you
> want to associate with HDD OSDs only like :
> => step take default WITH step take default class hdd
>
> Then recompile and reinject the new CRUSH map and voilà !
>
> Your cluster should be using only the HDD OSDs without rebalancing (or a
> very small amount).
>
> In case you have forgotten something, just reapply the former CRUSH map
> and start again.
>
> Cheers and Happy new year 2019.
>
> Eric
>
>
>
> On Sun, Dec 30, 2018, 21:16 David C  wrote:
>
>> Hi All
>>
>> I'm trying to set the existing pools in a Luminous cluster to use the hdd
>> device-class but without moving data around. If I just create a new rule
>> using the hdd class and set my pools to use that new rule it will cause a
>> huge amount of data movement even though the pgs are all already on HDDs.
>>
>> There is a thread on ceph-large [1] which appears to have the solution
>> but I can't get my head around what I need to do. I'm not too clear on
>> which IDs I need to swap. Could someone give me some pointers on this
>> please?
>>
>> [1]
>> http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000109.html
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] upgrade from jewel 10.2.10 to 10.2.11 broke anonymous swift

2019-01-03 Thread Johan Guldmyr
Hello, 

This is with RDO CentOS7, keystone and swift_account_in_url. The CEPH cluster 
runs luminous.

curl 'https://object.example.org/swift/v1/AUTH_12345qhexvalue/test20_segments'

this list the contents of the public bucket (Read ACL is .r:* according to 
swift stat test20_segments ) with 10.2.10 but with 10.2.11 it says 
"NoSuchBucket".

I've tried to look through the new running settings in ceph --show-config but 
nothing screams "fix anonymous swift".

http://tracker.ceph.com/issues/22259 from the release notes seems related, but 
it says that it would fix anonymous access? I'm a bit confused.

Authenticated swift (downloading an object in a private bucket with for example 
horizon works) and s3cmd get of a private file things seems to work nicely in 
both 10.2.10 and 10.2.11.

Does anybody have a suggestion of what I could try to troubleshoot this?

// Johan Guldmyr
Systems Specialist
CSC - IT Center for Science
http://www.csc.fi
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem w libvirt version 4.5 and 12.2.7

2019-01-03 Thread Konstantin Shalygin

After update to CentOS 7.6, libvirt was updated from 3.9 to 4.5.
Executing: "virsh vol-list ceph --details" makes libvirtd using 300% CPU
for 2 minutes to show volumes on rbd. Quick pick at tcpdump shows
accessing rbd_data.* which previous version of libvirtd did not need.
Ceph version is 12.2.7.

Any help will be appreciated
There is nothing special in libvirt 4.5, I was upgraded hypervisors to 
this version, still works flawless.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client instability

2019-01-03 Thread Andras Pataki
I wonder if anyone could offer any insight on the issue below, regarding 
the CentOS 7.6 kernel cephfs client connecting to a Luminous cluster.  I 
have since tried a much newer 4.19.13 kernel, which did not show the 
same issue (but unfortunately for various reasons unrelated to ceph, we 
can't go to such a new kernel).


Am I reading it right that somehow the monitor thinks this kernel is old 
and needs to prepare special maps in some older format for it, and that 
takes too long and the kernel just gives up, or perhaps has some other 
communication protocol error?  It seems like one of these mon 
communication sessions only lasts half a second.  Then it reconnects to 
another mon, and gets the same result, etc.  Any way around this?


Andras


On 12/26/18 7:55 PM, Andras Pataki wrote:
We've been using ceph-fuse with a pretty good stability record 
(against the Luminous 12.2.8 back end).  Unfortunately ceph-fuse has 
extremely poor small file performance (understandably), so we've been 
testing the kernel client.  The latest RedHat kernel 
3.10.0-957.1.3.el7.x86_64 seems to work pretty well, as long as the 
cluster is running in a completely clean state.  However, it seems 
that as soon as there is something happening to the cluster, the 
kernel client crashes pretty badly.


Today's example: I've reweighted some OSDs to balance the disk usage a 
bit (set nobackfill, reweight the OSDs, check the new hypothetical 
space usage, then unset nobackfill).   As soon as the reweighting 
procedure started, the kernel client went into an infinite loop trying 
to unsuccessfully connect to mons:


Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error
Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789 session 
lost, hunting for new mon
Dec 26 19:28:53 mon5 kernel: libceph: mon2 10.128.150.12:6789 session 
established

Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error
Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789 session 
lost, hunting for new mon
Dec 26 19:28:58 mon5 kernel: libceph: mon1 10.128.150.11:6789 session 
established

Dec 26 19:28:59 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error
Dec 26 19:28:59 mon5 kernel: libceph: mon1 10.128.150.11:6789 session 
lost, hunting for new mon
Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789 session 
established

Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error
Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789 session 
lost, hunting for new mon
Dec 26 19:28:59 mon5 kernel: libceph: mon0 10.128.150.10:6789 session 
established

Dec 26 19:29:00 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error
Dec 26 19:29:00 mon5 kernel: libceph: mon0 10.128.150.10:6789 session 
lost, hunting for new mon
Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 session 
established

Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error
Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 session 
lost, hunting for new mon
Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 session 
established

Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error
Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 session 
lost, hunting for new mon
Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 session 
established

Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error
Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789 session 
lost, hunting for new mon
Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 session 
established

Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error
Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 session 
lost, hunting for new mon
Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789 session 
established

Dec 26 19:29:02 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error
Dec 26 19:29:02 mon5 kernel: libceph: mon2 10.128.150.12:6789 session 
lost, hunting for new mon
Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789 session 
established

Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error
Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789 session 
lost, hunting for new mon

... etc ...

seemingly never recovering.  The cluster is healthy, all other clients 
are successfully doing I/O:


[root@cephmon00 ceph]# ceph -s
  cluster:
    id: d7b33135-0940-4e48-8aa6-1d2026597c2f
    health: HEALTH_WARN
    noout flag(s) set
    1 backfillfull osd(s)
    4 pool(s) backfillfull
    239119058/12419244975 objects misplaced (1.925%)

  services:
    mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02
    mgr: cephmon00(active)
    mds: cephfs-1/1/1 up  {0=cephmds00=up:active}, 1 up:standby
    osd: 3534 osds: 3534 up, 3534 in; 5040 remapped pgs
 flags noout

  

Re: [ceph-users] Omap issues - metadata creating too many

2019-01-03 Thread Josef Zelenka
Hi, i had the default - so it was on(according to ceph kb). turned it 
off, but the issue persists. i noticed Bryan Stillwell(cc-ing him) had 
the same issue (reported about it yesterday) - tried his tips about 
compacting, but it doesn't do anything, however i have to add to his 
last point, this happens even with bluestore. Is there anything we can 
do to clean up the omap manually?


Josef

On 18/12/2018 23:19, J. Eric Ivancich wrote:

On 12/17/18 9:18 AM, Josef Zelenka wrote:

Hi everyone, i'm running a Luminous 12.2.5 cluster with 6 hosts on
ubuntu 16.04 - 12 HDDs for data each, plus 2 SSD metadata OSDs(three
nodes have an additional SSD i added to have more space to rebalance the
metadata). CUrrently, the cluster is used mainly as a radosgw storage,
with 28tb data in total, replication 2x for both the metadata and data
pools(a cephfs isntance is running alongside there, but i don't think
it's the perpetrator - this happenned likely before we had it). All
pools aside from the data pool of the cephfs and data pool of the
radosgw are located on the SSD's. Now, the interesting thing - at random
times, the metadata OSD's fill up their entire capacity with OMAP data
and go to r/o mode and we have no other option currently than deleting
them and re-creating. The fillup comes at a random time, it doesn't seem
to be triggered by anything and it isn't caused by some data influx. It
seems like some kind of a bug to me to be honest, but i'm not certain -
anyone else seen this behavior with their radosgw? Thanks a lot

Hi Josef,

Do you have rgw_dynamic_resharding turned on? Try turning it off and see
if the behavior continues.

One theory is that dynamic resharding is triggered and possibly not
completing. This could add a lot of data to omap for the incomplete
bucket index shards. After a delay it tries resharding again, possibly
failing again, and adding more data to the omap. This continues.

If this is the ultimate issue we have some commits on the upstream
luminous branch that are designed to address this set of issues.

But we should first see if this is the cause.

Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com