[ceph-users] EC Pool Disk Performance Toshiba vs Segate

2018-12-12 Thread Ashley Merrick
I have a Mimic Bluestore EC RBD Pool running on 8+2, this is currently
running across 4 node's.

3 Node's are running Toshiba disk's while one node is running Segate disks
(same size, spinning speed, enterprise disks e.t.c), I have noticed huge
difference in IOWAIT and disk latency performance between the two sets of
disks, can also be seen from a ceph osd perf during read and write
operations.

Speaking to my host (server provider), they bench marked the two disks
before approving them for use in this type of server, they actually saw
higher performance from the Toshiba disk during their tests.

They did however state there test where at higher / larger block sizes, I
can imagine CEPH using EC of 8+2 the block sizes / requests are quite small?

Is there anything I can do ? Changing the RBD object size & stripe unit to
a bigger than default? Will this make the data sent to the disk larger
chunks at once compared to lot's of smaller block's.

If anyone else has any advice I'm open to trying.

P.s I have already disabled the disk cache on all disks and this was
causing high write latency across all.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds lost very frequently

2018-12-12 Thread Yan, Zheng
On Thu, Dec 13, 2018 at 2:55 AM Sang, Oliver  wrote:
>
> We are using luminous, we have seven ceph nodes and setup them all as MDS.
>
> Recently the MDS lost very frequently, and when there is only one MDS left, 
> the cephfs just degraded to unusable.
>
>
>
> Checked the mds log in one ceph node, I found below
>
> >
>
> /build/ceph-12.2.8/src/mds/Locker.cc: 5076: FAILED assert(lock->get_state() 
> == LOCK_PRE_SCAN)
>
>
>
> ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous 
> (stable)
>
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x102) [0x564400e50e42]
>
> 2: (Locker::file_recover(ScatterLock*)+0x208) [0x564400c6ae18]
>
> 3: (MDCache::start_files_to_recover()+0xb3) [0x564400b98af3]
>
> 4: (MDSRank::clientreplay_start()+0x1f7) [0x564400ae04c7]
>
> 5: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x25c0) 
> [0x564400aefd40]
>
> 6: (MDSDaemon::handle_mds_map(MMDSMap*)+0x154d) [0x564400ace3bd]
>
> 7: (MDSDaemon::handle_core_message(Message*)+0x7f3) [0x564400ad1273]
>
> 8: (MDSDaemon::ms_dispatch(Message*)+0x1c3) [0x564400ad15a3]
>
> 9: (DispatchQueue::entry()+0xeda) [0x5644011a547a]
>
> 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x564400ee3fcd]
>
> 11: (()+0x7494) [0x7f7a2b106494]
>
> 12: (clone()+0x3f) [0x7f7a2a17eaff]
>
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
>
> <
>
>
>
> The full log is also attached. Could you please help us? Thanks!
>
>

Please try below patch if you can compile ceph from source.  If you
can't compile ceph or the issue still happens, please set  debug_mds =
10 for standby mds (change debug_mds to 0 after mds becomes active).

Regards
Yan, Zheng

diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc
index 1e8b024b8a..d1150578f1 100644
--- a/src/mds/MDSRank.cc
+++ b/src/mds/MDSRank.cc
@@ -1454,8 +1454,8 @@ void MDSRank::rejoin_done()
 void MDSRank::clientreplay_start()
 {
   dout(1) << "clientreplay_start" << dendl;
-  finish_contexts(g_ceph_context, waiting_for_replay);  // kick waiters
   mdcache->start_files_to_recover();
+  finish_contexts(g_ceph_context, waiting_for_replay);  // kick waiters
   queue_one_replay();
 }

@@ -1487,8 +1487,8 @@ void MDSRank::active_start()

   mdcache->clean_open_file_lists();
   mdcache->export_remaining_imported_caps();
-  finish_contexts(g_ceph_context, waiting_for_replay);  // kick waiters
   mdcache->start_files_to_recover();
+  finish_contexts(g_ceph_context, waiting_for_replay);  // kick waiters

   mdcache->reissue_all_caps();
   mdcache->activate_stray_manager();



>
> BR
>
> Oliver
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RDMA/RoCE enablement failed with (113) No route to host

2018-12-12 Thread Michael Green
Hello collective wisdom,

ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable) 
here.

I have a working cluster here consisting of 3 monitor hosts,  64 OSD processes 
across 4 osd hosts, plus 2 MDSs, plus 2 MGRs. All of that is consumed by 10 
client nodes.

Every host in the cluster, including clients is 
RHEL 7.5
Mellanox OFED 4.4-2.0.7.0
RoCE NICs are either MCX416A-CCAT or MCX414A-CCAT @ 50Gbit/sec
The NICs are all mlx5_0 port 1

ring and ib_send_bw work fine both ways on any two nodes in the cluster.

Full configuration of the cluster is pasted below, but RDMA related parameters 
are configured as following:


ms_public_type = async+rdma
ms_cluster = async+rdma
# Exclude clients for now 
ms_type = async+posix

ms_async_rdma_device_name = mlx5_0
ms_async_rdma_polling_us = 0
ms_async_rdma_port_num=1

When I try to start MON, it immediately fails as below. Anybody has seen this 
or could give any pointers what to/where to look next?


--ceph-mon.rio.log--begin--
2018-12-12 22:35:30.011 7f515dc39140  0 set uid:gid to 167:167 (ceph:ceph)
2018-12-12 22:35:30.011 7f515dc39140  0 ceph version 13.2.2 
(02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process ceph-mon, 
pid 2129843
2018-12-12 22:35:30.011 7f515dc39140  0 pidfile_write: ignore empty --pid-file
2018-12-12 22:35:30.036 7f515dc39140  0 load: jerasure load: lrc load: isa
2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option compression = 
kNoCompression
2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option 
level_compaction_dynamic_level_bytes = true
2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option write_buffer_size = 
33554432
2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option compression = 
kNoCompression
2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option 
level_compaction_dynamic_level_bytes = true
2018-12-12 22:35:30.036 7f515dc39140  0  set rocksdb option write_buffer_size = 
33554432
2018-12-12 22:35:30.147 7f51442ed700  2 Event(0x55d927e95700 nevent=5000 
time_id=1).set_owner idx=1 owner=139987012998912
2018-12-12 22:35:30.147 7f51442ed700 10 stack operator() starting
2018-12-12 22:35:30.147 7f5143aec700  2 Event(0x55d927e95200 nevent=5000 
time_id=1).set_owner idx=0 owner=139987004606208
2018-12-12 22:35:30.147 7f5144aee700  2 Event(0x55d927e95c00 nevent=5000 
time_id=1).set_owner idx=2 owner=139987021391616
2018-12-12 22:35:30.147 7f5143aec700 10 stack operator() starting
2018-12-12 22:35:30.147 7f5144aee700 10 stack operator() starting
2018-12-12 22:35:30.147 7f515dc39140  0 starting mon.rio rank 0 at public addr 
192.168.1.58:6789/0 at bind addr 192.168.1.58:6789/0 mon_data 
/var/lib/ceph/mon/ceph-rio fsid 376540c8-a362-41cc-9a58-9c8ceca0e4ee
2018-12-12 22:35:30.147 7f515dc39140 10 -- - bind bind 192.168.1.58:6789/0
2018-12-12 22:35:30.147 7f515dc39140 10 -- - bind Network Stack is not ready 
for bind yet - postponed
2018-12-12 22:35:30.147 7f515dc39140  0 starting mon.rio rank 0 at 
192.168.1.58:6789/0 mon_data /var/lib/ceph/mon/ceph-rio fsid 
376540c8-a362-41cc-9a58-9c8ceca0e4ee
2018-12-12 22:35:30.148 7f515dc39140  0 mon.rio@-1(probing).mds e84 new map
2018-12-12 22:35:30.148 7f515dc39140  0 mon.rio@-1(probing).mds e84 print_map
e84
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout 
v2,10=snaprealm v2}
legacy client fscid: -1

No filesystems configured
Standby daemons:

5906437:192.168.1.152:6800/1077205146 'prince' mds.-1.0 up:standby seq 2
6284118:192.168.1.59:6800/1266235911 'salvador' mds.-1.0 up:standby seq 
2

2018-12-12 22:35:30.148 7f515dc39140  0 mon.rio@-1(probing).osd e25894 crush 
map has features 288514051259236352, adjusting msgr requires
2018-12-12 22:35:30.148 7f515dc39140  0 mon.rio@-1(probing).osd e25894 crush 
map has features 288514051259236352, adjusting msgr requires
2018-12-12 22:35:30.148 7f515dc39140  0 mon.rio@-1(probing).osd e25894 crush 
map has features 1009089991638532096, adjusting msgr requires
2018-12-12 22:35:30.148 7f515dc39140  0 mon.rio@-1(probing).osd e25894 crush 
map has features 288514051259236352, adjusting msgr requires
2018-12-12 22:35:30.149 7f515dc39140 10 -- - create_connect 
192.168.1.88:6800/1638, creating connection and registering
2018-12-12 22:35:30.149 7f515dc39140 10 -- - >> 192.168.1.88:6800/1638 
conn(0x55d9281fbe00 :-1 s=STATE_NONE pgs=0 cs=0 l=0)._connect csq=0
2018-12-12 22:35:30.149 7f515dc39140 10 -- - get_connection mgr.5894115 
192.168.1.88:6800/1638 new 0x55d9281fbe00
2018-12-12 22:35:30.150 7f515dc39140  1 -- - --> 192.168.1.88:6800/1638 -- 
mgropen(unknown.rio) v3 -- 0x55d92844e000 con 0
2018-12-12 22:35:30.151 7f515dc39140  1 -- - start start
2018-12-12 22:35:30.151 7f515dc39140  1 -- - start start
2018-12-12 22:35:30.151 7f515dc39140 10 -- - ready -

Re: [ceph-users] ERR scrub mismatch

2018-12-12 Thread Marco Aroldi
Hello,
Do you see the cause of the logged errors?
I can't find any documentation about that, so I'm stuck.
I really need a help.
Thanks everybody

Marco

Il giorno ven 7 dic 2018, 17:30 Marco Aroldi  ha
scritto:

> Thanks Greg,
> Yes, I'm using CephFS and RGW (mainly CephFS)
> The files are still accessible and users doesn't report any problem.
> Here is the output of ceph -s
>
> ceph -s
>   cluster:
> id: 
> health: HEALTH_OK
>
>   services:
> mon: 5 daemons, quorum
> ceph-mon01,ceph-mon02,ceph-mon03,ceph-mon04,ceph-mon05
> mgr: ceph-mon04(active), standbys: ceph-mon02, ceph-mon05, ceph-mon03,
> ceph-mon01
> mds: cephfs01-1/1/1 up  {0=ceph-mds03=up:active}, 3 up:standby
> osd: 4 osds: 4 up, 4 in
> rgw: 4 daemons active
>
>   data:
> pools:   15 pools, 224 pgs
> objects: 1.54M objects, 4.01TiB
> usage:   8.03TiB used, 64.7TiB / 72.8TiB avail
> pgs: 224 active+clean
>
> ceph versions
> {
> "mon": {
> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)": 5
> },
> "mgr": {
> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)": 5
> },
> "osd": {
> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)": 4
> },
> "mds": {
> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)": 1
> },
> "rgw": {
> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)": 4
> },
> "overall": {
> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0)
> luminous (stable)": 19
> }
> }
>
> Thanks for looking into it.
> Marco
>
> Il giorno gio 6 dic 2018 alle ore 23:18 Gregory Farnum 
> ha scritto:
>
>> Well, it looks like you have different data in the MDSMap across your
>> monitors. That's not good on its face, but maybe there are extenuating
>> circumstances. Do you actually use CephFS, or just RBD/RGW? What's the
>> full output of "ceph -s"?
>> -Greg
>>
>> On Thu, Dec 6, 2018 at 1:39 PM Marco Aroldi 
>> wrote:
>> >
>> > Sorry about this, I hate "to bump" a thread, but...
>> > Anyone has faced this situation?
>> > There is a procedure to follow?
>> >
>> > Thanks
>> > Marco
>> >
>> > Il giorno gio 8 nov 2018, 10:54 Marco Aroldi 
>> ha scritto:
>> >>
>> >> Hello,
>> >> Since upgrade from Jewel to Luminous 12.2.8, in the logs are reported
>> some errors related to "scrub mismatch", every day at the same time.
>> >> I have 5 mon (from mon.0 to mon.4) and I need help to indentify and
>> recover from this problem.
>> >>
>> >> This is the log:
>> >> 2018-11-07 15:13:53.808128 [ERR]  mon.4 ScrubResult(keys
>> {logm=46,mds_health=29,mds_metadata=1,mdsmap=24} crc
>> {logm=1239992787,mds_health=3182263811,mds_metadata=3704185590,mdsmap=1114086003})
>> >> 2018-11-07 15:13:53.808095 [ERR]  mon.0 ScrubResult(keys
>> {logm=46,mds_health=30,mds_metadata=1,mdsmap=23} crc
>> {logm=1239992787,mds_health=1194056063,mds_metadata=3704185590,mdsmap=3259702002})
>> >> 2018-11-07 15:13:53.808061 [ERR]  scrub mismatch
>> >> 2018-11-07 15:13:53.808026 [ERR]  mon.3 ScrubResult(keys
>> {logm=46,mds_health=31,mds_metadata=1,mdsmap=22} crc
>> {logm=1239992787,mds_health=807938287,mds_metadata=3704185590,mdsmap=662277977})
>> >> 2018-11-07 15:13:53.807970 [ERR]  mon.0 ScrubResult(keys
>> {logm=46,mds_health=30,mds_metadata=1,mdsmap=23} crc
>> {logm=1239992787,mds_health=1194056063,mds_metadata=3704185590,mdsmap=3259702002})
>> >> 2018-11-07 15:13:53.807939 [ERR]  scrub mismatch
>> >> 2018-11-07 15:13:53.807916 [ERR]  mon.2 ScrubResult(keys
>> {logm=46,mds_health=31,mds_metadata=1,mdsmap=22} crc
>> {logm=1239992787,mds_health=807938287,mds_metadata=3704185590,mdsmap=662277977})
>> >> 2018-11-07 15:13:53.807882 [ERR]  mon.0 ScrubResult(keys
>> {logm=46,mds_health=30,mds_metadata=1,mdsmap=23} crc
>> {logm=1239992787,mds_health=1194056063,mds_metadata=3704185590,mdsmap=3259702002})
>> >> 2018-11-07 15:13:53.807844 [ERR]  scrub mismatch
>> >>
>> >> Any help will be appreciated
>> >> Thanks
>> >> Marco
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Decommissioning cluster - rebalance questions

2018-12-12 Thread Dyweni - Ceph-Users

Safest to just 'osd crush reweight osd.X 0' and let rebalancing finish.

Then 'osd out X' and shutdown/remove osd drive.



On 2018-12-04 03:15, Jarek wrote:

On Mon, 03 Dec 2018 16:41:36 +0100
si...@turka.nl wrote:


Hi,

Currently I am decommissioning an old cluster.

For example, I want to remove OSD Server X with all its OSD's.

I am following these steps for all OSD's of Server X:
- ceph osd out 
- Wait for rebalance (active+clean)
- On OSD: service ceph stop osd.

Once the steps above are performed, the following steps should be
performed:
- ceph osd crush remove osd.
- ceph auth del osd.
- ceph osd rm 


What I don't get is, when I perform 'ceph osd out ' the cluster
is rebalancing, but when I perform 'ceph osd crush remove osd.'
it again starts to rebalance. Why does this happen? The cluster
should be already balanced after out'ed the osd. I didn't expect
another rebalance with removing the OSD from the CRUSH map.


'ceph osd out' doesn't change the host weight in crush map, 'ceph
osd crush remove' does.
Instead of 'ceph osd out' use 'ceph osd crush reweight'.

--
Pozdrawiam
Jarosław Mociak - Nettelekom GK Sp. z o.o.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Why does "df" against a mounted cephfs report (vastly) different free space?

2018-12-12 Thread David Young
Hi all,

I have a cluster used exclusively for cephfs (A EC "media" pool, and a standard 
metadata pool for the cephfs).

"ceph -s" shows me:

---
  data:
pools:   2 pools, 260 pgs
objects: 37.18 M objects, 141 TiB
usage:   177 TiB used, 114 TiB / 291 TiB avail
pgs: 260 active+clean
---

But 'df' against the mounted cephfs shows me:

---
root@node1:~# df | grep ceph
Filesystem   1K-blocks UsedAvailable Use% Mounted on
10.20.30.1:6789:/ 151264890880 151116939264147951616 100% /ceph

root@node1:~# df -h | grep ceph
Filesystem Size  Used Avail Use% Mounted on
10.20.30.1:6789:/  141T  141T  142G 100% /ceph
root@node1:~#
---

And "rados df" shows me:

---
root@node1:~# rados df
POOL_NAME  USED  OBJECTS CLONESCOPIES MISSING_ON_PRIMARY UNFOUND 
DEGRADEDRD_OPS  RD   WR_OPS  WR
cephfs_metadata 173 MiB27239  0 54478  0   0
0   1102765 9.8 GiB  8810925  43 GiB
media   141 TiB 37152647  0 185763235  0   0
0 110377842 120 TiB 74835385 183 TiB

total_objects37179886
total_used   177 TiB
total_avail  114 TiB
total_space  291 TiB
root@node1:~#
---

The amount used that df represents seems accurate (141TB at 4+1 EC), but the 
amount of remaining space is baffling me. Have I hit a limitation due to the 
amount of PGs I created, or is remaining free space just being misr-reported by 
df/cephfs?

Thanks!
D___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] size of inc_osdmap vs osdmap

2018-12-12 Thread Sergey Dolgov
Those are sizes in file system. I use filestore as a backend

On Wed, Dec 12, 2018, 22:53 Gregory Farnum  Hmm that does seem odd. How are you looking at those sizes?
>
> On Wed, Dec 12, 2018 at 4:38 AM Sergey Dolgov  wrote:
>
>> Greq, for example for our cluster ~1000 osd:
>>
>> size osdmap.1357881__0_F7FE779D__none = 363KB (crush_version 9860,
>> modified 2018-12-12 04:00:17.661731)
>> size osdmap.1357882__0_F7FE772D__none = 363KB
>> size osdmap.1357883__0_F7FE74FD__none = 363KB (crush_version 9861,
>> modified 2018-12-12 04:00:27.385702)
>> size inc_osdmap.1357882__0_B783A4EA__none = 1.2MB
>>
>> difference between epoch 1357881 and 1357883: crush weight one osd was
>> increased by 0.01 so we get 5 new pg_temp in osdmap.1357883 but size
>> inc_osdmap so huge
>>
>> чт, 6 дек. 2018 г. в 06:20, Gregory Farnum :
>> >
>> > On Wed, Dec 5, 2018 at 3:32 PM Sergey Dolgov  wrote:
>> >>
>> >> Hi guys
>> >>
>> >> I faced strange behavior of crushmap change. When I change crush
>> >> weight osd I sometimes get  increment osdmap(1.2MB) which size is
>> >> significantly bigger than size of osdmap(0.4MB)
>> >
>> >
>> > This is probably because when CRUSH changes, the new primary OSDs for a
>> PG will tend to set a "pg temp" value (in the OSDMap) that temporarily
>> reassigns it to the old acting set, so the data can be accessed while the
>> new OSDs get backfilled. Depending on the size of your cluster, the number
>> of PGs on it, and the size of the CRUSH change, this can easily be larger
>> than the rest of the map because it is data with size linear in the number
>> of PGs affected, instead of being more normally proportional to the number
>> of OSDs.
>> > -Greg
>> >
>> >>
>> >> I use luminois 12.2.8. Cluster was installed a long ago, I suppose
>> >> that initially it was firefly
>> >> How can I view content of increment osdmap or can you give me opinion
>> >> on this problem. I think that spikes of traffic tight after change of
>> >> crushmap relates to this crushmap behavior
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Best regards, Sergey Dolgov
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] size of inc_osdmap vs osdmap

2018-12-12 Thread Gregory Farnum
Hmm that does seem odd. How are you looking at those sizes?

On Wed, Dec 12, 2018 at 4:38 AM Sergey Dolgov  wrote:

> Greq, for example for our cluster ~1000 osd:
>
> size osdmap.1357881__0_F7FE779D__none = 363KB (crush_version 9860,
> modified 2018-12-12 04:00:17.661731)
> size osdmap.1357882__0_F7FE772D__none = 363KB
> size osdmap.1357883__0_F7FE74FD__none = 363KB (crush_version 9861,
> modified 2018-12-12 04:00:27.385702)
> size inc_osdmap.1357882__0_B783A4EA__none = 1.2MB
>
> difference between epoch 1357881 and 1357883: crush weight one osd was
> increased by 0.01 so we get 5 new pg_temp in osdmap.1357883 but size
> inc_osdmap so huge
>
> чт, 6 дек. 2018 г. в 06:20, Gregory Farnum :
> >
> > On Wed, Dec 5, 2018 at 3:32 PM Sergey Dolgov  wrote:
> >>
> >> Hi guys
> >>
> >> I faced strange behavior of crushmap change. When I change crush
> >> weight osd I sometimes get  increment osdmap(1.2MB) which size is
> >> significantly bigger than size of osdmap(0.4MB)
> >
> >
> > This is probably because when CRUSH changes, the new primary OSDs for a
> PG will tend to set a "pg temp" value (in the OSDMap) that temporarily
> reassigns it to the old acting set, so the data can be accessed while the
> new OSDs get backfilled. Depending on the size of your cluster, the number
> of PGs on it, and the size of the CRUSH change, this can easily be larger
> than the rest of the map because it is data with size linear in the number
> of PGs affected, instead of being more normally proportional to the number
> of OSDs.
> > -Greg
> >
> >>
> >> I use luminois 12.2.8. Cluster was installed a long ago, I suppose
> >> that initially it was firefly
> >> How can I view content of increment osdmap or can you give me opinion
> >> on this problem. I think that spikes of traffic tight after change of
> >> crushmap relates to this crushmap behavior
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Best regards, Sergey Dolgov
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mounting DR copy as Read-Only

2018-12-12 Thread Vikas Rana
When i promote the DR image, I could mount it fine
root@vtier-node1:~# rbd mirror image promote testm-pool/test01 --force
Image promoted to primary
root@vtier-node1:~#

root@vtier-node1:~# mount /dev/nbd0 /mnt
mount: block device /dev/nbd0 is write-protected, mounting read-only


On Wed, Dec 12, 2018 at 1:08 PM Vikas Rana  wrote:

> To give more output. This is XFS FS.
>
> root@vtier-node1:~# rbd-nbd --read-only map testm-pool/test01
> 2018-12-12 13:04:56.674818 7f1c56e29dc0 -1 asok(0x560b19b3bdf0)
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
> bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17)
> File exists
> /dev/nbd0
> root@vtier-node1:~#
> root@vtier-node1:~# mount /dev/nbd0 /mnt
> mount: block device /dev/nbd0 is write-protected, mounting read-only
> mount: /dev/nbd0: can't read superblock
> root@vtier-node1:~# mount -ro,norecovery /dev/nbd0 /mnt
> mount: /dev/nbd0: can't read superblock
> root@vtier-node1:~# mount -o ro,norecovery /dev/nbd0 /mnt
> mount: /dev/nbd0: can't read superblock
> root@vtier-node1:~# fdisk -l /dev/nbd0
> root@vtier-node1:~#
>
>
> Thanks,
> -Vikas
>
> On Wed, Dec 12, 2018 at 10:44 AM Vikas Rana  wrote:
>
>> Hi,
>>
>> We are using Luminous and copying a 100TB RBD image to DR site using RBD
>> Mirror.
>>
>> Everything seems to works fine.
>>
>> The question is, can we mount the DR copy as Read-Only? We can do it on
>> Netapp and we are trying to figure out if somehow we can mount it RO on DR
>> site, then we can do backups at DR site.
>>
>> When i tried it mount it via, RBD-NBD, it complains about super-block.
>> When I promoted the DR copy, the same image works fine. So data is there
>> but cant be mounted.
>>
>> We also tried taking snap but it complains that it can't take snap of a
>> Read-only copy.
>>
>> Any suggestion or pointer will be greatly appreciated.
>>
>> Thanks,
>> -Vikas
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mounting DR copy as Read-Only

2018-12-12 Thread Vikas Rana
To give more output. This is XFS FS.

root@vtier-node1:~# rbd-nbd --read-only map testm-pool/test01
2018-12-12 13:04:56.674818 7f1c56e29dc0 -1 asok(0x560b19b3bdf0)
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17)
File exists
/dev/nbd0
root@vtier-node1:~#
root@vtier-node1:~# mount /dev/nbd0 /mnt
mount: block device /dev/nbd0 is write-protected, mounting read-only
mount: /dev/nbd0: can't read superblock
root@vtier-node1:~# mount -ro,norecovery /dev/nbd0 /mnt
mount: /dev/nbd0: can't read superblock
root@vtier-node1:~# mount -o ro,norecovery /dev/nbd0 /mnt
mount: /dev/nbd0: can't read superblock
root@vtier-node1:~# fdisk -l /dev/nbd0
root@vtier-node1:~#


Thanks,
-Vikas

On Wed, Dec 12, 2018 at 10:44 AM Vikas Rana  wrote:

> Hi,
>
> We are using Luminous and copying a 100TB RBD image to DR site using RBD
> Mirror.
>
> Everything seems to works fine.
>
> The question is, can we mount the DR copy as Read-Only? We can do it on
> Netapp and we are trying to figure out if somehow we can mount it RO on DR
> site, then we can do backups at DR site.
>
> When i tried it mount it via, RBD-NBD, it complains about super-block.
> When I promoted the DR copy, the same image works fine. So data is there
> but cant be mounted.
>
> We also tried taking snap but it complains that it can't take snap of a
> Read-only copy.
>
> Any suggestion or pointer will be greatly appreciated.
>
> Thanks,
> -Vikas
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] НА: ceph pg backfill_toofull

2018-12-12 Thread Joachim Kraftmayer


In such a situation, we noticed a performance drop (caused by the 
filesystem) and soon had no free inodes left.


___

Clyso GmbH



Am 12.12.2018 um 09:24 schrieb Klimenko, Roman:


​Ok, I'll try these params. thx!


*От:* Maged Mokhtar 
*Отправлено:* 12 декабря 2018 г. 10:51
*Кому:* Klimenko, Roman; ceph-users@lists.ceph.com
*Тема:* Re: [ceph-users] ceph pg backfill_toofull


There are 2 relevant params

mon_osd_full_ratio     0.95

osd_backfill_full_ratio 0.85
you are probably hitting them both
As a short term/ temp fix you may increase these values and maybe 
adjust weights on osds if you have to.
However you really need to fix this by adding more osds to your 
cluster, else it will happen again and again. Also when planing for 
required storage capacity, you should plan when 1 or 2 hosts fail and 
their pgs will distributed on remaining nodes, else you will hit the 
same issue.


/Maged




On 12/12/2018 07:52, Klimenko, Roman wrote:


Hi everyone. Yesterday i found that on our overcrowded Hammer ceph 
cluster (83% used in HDD pool) several osds were in danger zone - 
near 95%.


I reweighted them, and after several moments I got pgs stuck in 
backfill_toofull.


After that, I reapplied reweight to osds - no luck.

Currently, all reweights are equal 1.0, and ceph do nothing - no 
rebalance and recovering.


How I can make ceph recover these pgs?

ceph -s

     health HEALTH_WARN
            47 pgs backfill_toofull
            47 pgs stuck unclean
            recovery 16/9422472 objects degraded (0.000%)
            recovery 365332/9422472 objects misplaced (3.877%)
            7 near full osd(s)

ceph osd df tree
ID WEIGHT   REWEIGHT SIZE   USE    AVAIL %USE VAR  TYPE NAME
-1 30.65996        - 37970G 29370G 8599G 77.35 1.00 root default
-6 18.65996        - 20100G 16681G 3419G 82.99 1.07     region HDD
-3  6.09000        -  6700G  5539G 1160G 82.68 1.07         host 
ceph03.HDD

 1  1.0  1.0  1116G   841G  274G 75.39 0.97             osd.1
 5  1.0  1.0  1116G   916G  200G 82.07 1.06             osd.5
 3  1.0  1.0  1116G   939G  177G 84.14 1.09             osd.3
 8  1.09000  1.0  1116G   952G  164G 85.29 1.10             osd.8
 7  1.0  1.0  1116G   972G  143G 87.11 1.13             osd.7
11  1.0  1.0  1116G   916G  200G 82.08 1.06             osd.11
-4  6.16998        -  6700G  5612G 1088G 83.76 1.08         host 
ceph02.HDD

14  1.09000  1.0  1116G   950G  165G 85.16 1.10             osd.14
13  0.8  1.0  1116G   949G  167G 85.03 1.10             osd.13
16  1.09000  1.0  1116G   921G  195G 82.50 1.07             osd.16
17  1.0  1.0  1116G   899G  216G 80.59 1.04             osd.17
10  1.09000  1.0  1116G   952G  164G 85.28 1.10             osd.10
15  1.0  1.0  1116G   938G  178G 84.02 1.09             osd.15
-2  6.39998        -  6700G  5529G 1170G 82.53 1.07         host 
ceph01.HDD

12  1.09000  1.0  1116G   953G  163G 85.39 1.10             osd.12
 9  0.95000  1.0  1116G   939G  177G 84.14 1.09             osd.9
 2  1.09000  1.0  1116G   911G  204G 81.64 1.06             osd.2
 0  1.09000  1.0  1116G   951G  165G 85.22 1.10             osd.0
 6  1.09000  1.0  1116G   917G  199G 82.12 1.06             osd.6
 4  1.09000  1.0  1116G   856G  260G 76.67 0.99             osd.4




​




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Maged Mokhtar
CEO PetaSAN
4 Emad El Deen Kamel
Cairo 11371, Egypt
www.petasan.org
+201006979931
skype: maged.mokhtar

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-12-12 Thread David Galloway
Hey Dan,

Thanks for bringing this to our attention.  Looks like it did get left
out.  I just pushed the package and added a step to the release process
to make sure packages don't get skipped again like that.

- David

On 12/12/2018 11:03 AM, Dan van der Ster wrote:
> Hey Abhishek,
> 
> We just noticed that the debuginfo is missing for 12.2.10:
> http://download.ceph.com/rpm-luminous/el7/x86_64/ceph-debuginfo-12.2.10-0.el7.x86_64.rpm
> 
> Did something break in the publishing?
> 
> Cheers, Dan
> 
> On Tue, Nov 27, 2018 at 3:50 PM Abhishek Lekshmanan  wrote:
>>
>>
>> We're happy to announce the tenth bug fix release of the Luminous
>> v12.2.x long term stable release series. The previous release, v12.2.9,
>> introduced the PG hard-limit patches which were found to cause an issue
>> in certain upgrade scenarios, and this release was expedited to revert
>> those patches. If you already successfully upgraded to v12.2.9, you
>> should **not** upgrade to v12.2.10, but rather **wait** for a release in
>> which http://tracker.ceph.com/issues/36686 is addressed. All other users
>> are encouraged to upgrade to this release.
>>
>> Notable Changes
>> ---
>>
>> * This release reverts the PG hard-limit patches added in v12.2.9 in which,
>>   a partial upgrade during a recovery/backfill, can cause the osds on the
>>   previous version, to fail with assert(trim_to <= info.last_complete). The
>>   workaround for users is to upgrade and restart all OSDs to a version with 
>> the
>>   pg hard limit, or only upgrade when all PGs are active+clean.
>>
>>   See also: http://tracker.ceph.com/issues/36686
>>
>>   As mentioned above if you've successfully upgraded to v12.2.9 DO NOT
>>   upgrade to v12.2.10 until the linked tracker issue has been fixed.
>>
>> * The bluestore_cache_* options are no longer needed. They are replaced
>>   by osd_memory_target, defaulting to 4GB. BlueStore will expand
>>   and contract its cache to attempt to stay within this
>>   limit. Users upgrading should note this is a higher default
>>   than the previous bluestore_cache_size of 1GB, so OSDs using
>>   BlueStore will use more memory by default.
>>
>>   For more details, see BlueStore docs[1]
>>
>>
>> For the complete release notes with changelog, please check out the
>> release blog entry at:
>> http://ceph.com/releases/v12-2-10-luminous-released
>>
>> Getting ceph:
>> 
>> * Git at git://github.com/ceph/ceph.git
>> * Tarball at http://download.ceph.com/tarballs/ceph-12.2.10.tar.gz
>> * For packages, see http://docs.ceph.com/docs/master/install/get-packages/
>> * Release git sha1: 177915764b752804194937482a39e95e0ca3de94
>>
>>
>> [1]: 
>> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#cache-size
>>
>> --
>> Abhishek Lekshmanan
>> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
>> HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-12-12 Thread Dan van der Ster
Hey Abhishek,

We just noticed that the debuginfo is missing for 12.2.10:
http://download.ceph.com/rpm-luminous/el7/x86_64/ceph-debuginfo-12.2.10-0.el7.x86_64.rpm

Did something break in the publishing?

Cheers, Dan

On Tue, Nov 27, 2018 at 3:50 PM Abhishek Lekshmanan  wrote:
>
>
> We're happy to announce the tenth bug fix release of the Luminous
> v12.2.x long term stable release series. The previous release, v12.2.9,
> introduced the PG hard-limit patches which were found to cause an issue
> in certain upgrade scenarios, and this release was expedited to revert
> those patches. If you already successfully upgraded to v12.2.9, you
> should **not** upgrade to v12.2.10, but rather **wait** for a release in
> which http://tracker.ceph.com/issues/36686 is addressed. All other users
> are encouraged to upgrade to this release.
>
> Notable Changes
> ---
>
> * This release reverts the PG hard-limit patches added in v12.2.9 in which,
>   a partial upgrade during a recovery/backfill, can cause the osds on the
>   previous version, to fail with assert(trim_to <= info.last_complete). The
>   workaround for users is to upgrade and restart all OSDs to a version with 
> the
>   pg hard limit, or only upgrade when all PGs are active+clean.
>
>   See also: http://tracker.ceph.com/issues/36686
>
>   As mentioned above if you've successfully upgraded to v12.2.9 DO NOT
>   upgrade to v12.2.10 until the linked tracker issue has been fixed.
>
> * The bluestore_cache_* options are no longer needed. They are replaced
>   by osd_memory_target, defaulting to 4GB. BlueStore will expand
>   and contract its cache to attempt to stay within this
>   limit. Users upgrading should note this is a higher default
>   than the previous bluestore_cache_size of 1GB, so OSDs using
>   BlueStore will use more memory by default.
>
>   For more details, see BlueStore docs[1]
>
>
> For the complete release notes with changelog, please check out the
> release blog entry at:
> http://ceph.com/releases/v12-2-10-luminous-released
>
> Getting ceph:
> 
> * Git at git://github.com/ceph/ceph.git
> * Tarball at http://download.ceph.com/tarballs/ceph-12.2.10.tar.gz
> * For packages, see http://docs.ceph.com/docs/master/install/get-packages/
> * Release git sha1: 177915764b752804194937482a39e95e0ca3de94
>
>
> [1]: 
> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#cache-size
>
> --
> Abhishek Lekshmanan
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to troubleshoot rsync to cephfs via nfs-ganesha stalling

2018-12-12 Thread Daniel Gryniewicz
Okay, this all looks fine, and it's extremely unlikely that a text file 
will have holes in it (I thought holes, because rsync handles holes, but 
wget would just copy zeros instead).


Is this reproducible?  If so, can you turn up Ganesha logging and post a 
log file somewhere?


Daniel

On 12/12/2018 04:56 AM, Marc Roos wrote:
  
Hi Daniel, thanks for looking at this.


These are the mount options
  type nfs4
(rw,nodev,relatime,vers=4,intr,local_lock=none,retrans=2,proto=tcp,rsize
=8192,wsize=8192,hard,namlen=255,sec=sys)

I have overwritten the original files, so I cannot examine if they had
holes. To be honest I don't even know how to query the file, to identify
holes.

These are the contents of the files, just plain text.
[@os0 CentOS7-x86_64]# cat CentOS_BuildTag
20181125-1500
[@os0 CentOS7-x86_64]# cat .discinfo
1543162572.807980
7.6
x86_64



-Original Message-
From: Daniel Gryniewicz [mailto:d...@redhat.com]
Sent: 10 December 2018 15:54
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How to troubleshoot rsync to cephfs via
nfs-ganesha stalling

This isn't something I've seen before.  rsync generally works fine, even
over cephfs.  More inline.

On 12/09/2018 09:42 AM, Marc Roos wrote:



This rsync command fails and makes the local nfs unavailable (Have to
stop nfs-ganesha, kill all rsync processes on the client and then
start
nfs-ganesha)

rsync -rlptDvSHP --delete  --exclude config.repo --exclude "local*"
--exclude "isos"
anonym...@mirror.ams1.nl.leaseweb.net::centos/7/os/x86_64/
/localpath/CentOS7-x86_64/

When I do individual rsyncs on the subfolders

-rw-r--r-- 1 nobody 500   14 Nov 25 17:01 CentOS_BuildTag
-rw-r--r-- 1 nobody 500   29 Nov 25 17:16 .discinfo
drwxr-xr-x 3 nobody 500 8.3M Nov 25 17:20 EFI
-rw-rw-r-- 1 nobody 500  227 Aug 30  2017 EULA
-rw-rw-r-- 1 nobody 500  18K Dec  9  2015 GPL drwxr-xr-x 3 nobody 500
572M Nov 25 17:21 images drwxr-xr-x 2 nobody 500  57M Dec  9 14:11
isolinux drwxr-xr-x 2 nobody 500 433M Nov 25 17:20 LiveOS drwxrwxr-x 2



nobody 500 9.5G Nov 25 16:58 Packages drwxrwxr-x 2 nobody 500  29M Dec
  

9 13:53 repodata
-rw-rw-r-- 1 nobody 500 1.7K Dec  9  2015 RPM-GPG-KEY-CentOS-7
-rw-rw-r-- 1 nobody 500 1.7K Dec  9  2015 RPM-GPG-KEY-CentOS-Testing-7
-rw-r--r-- 1 nobody 500  354 Nov 25 17:21 .treeinfo

These rsyncs are all going fine.

rsync -rlptDvSHP --delete  --exclude config.repo --exclude "local*"
--exclude "isos"
anonym...@mirror.ams1.nl.leaseweb.net::centos/7/os/x86_64/Packages/
/localpath/CentOS7-x86_64/Packages/
rsync -rlptDvSHP --delete  --exclude config.repo --exclude "local*"
--exclude "isos"
anonym...@mirror.ams1.nl.leaseweb.net::centos/7/os/x86_64/repodata/
/localpath/CentOS7-x86_64/repodata/
rsync -rlptDvSHP --delete  --exclude config.repo --exclude "local*"
--exclude "isos"
anonym...@mirror.ams1.nl.leaseweb.net::centos/7/os/x86_64/LiveOS/
/localpath/CentOS7-x86_64/LiveOS/

Except when I try to rsync the file CentOS_BuildTag then everything
stalls. Leaving such files
-rw--- 1 500 500 0 Dec  9 14:26 .CentOS_BuildTag.2igwc5
-rw--- 1 500 500 0 Dec  9 14:28 .CentOS_BuildTag.tkiwc5


So something is failing on the write, it seems.  These are the temporary
files made by rsync, and they're empty, so the initial write seems to
have failed.


I can resolf this by doing a wget and moving the file to the location
wget


'http://mirror.ams1.nl.leaseweb.net/centos/7/os/x86_64/CentOS_BuildTag'

mv CentOS_BuildTag /localpath/CentOS7-x86_64/

I had also problems with .discinfo and when I ls this directory on
cephfs mount it takes a long time to produce output.

When I do the full rsync to the cephfs mount it completes without
errors, when I then later do the sync on the nfs mount it completes
also (nothing being copied)


This confirms that it's not metadata related, as this second successful
rsync is purely metadata.


Anybody know what I should do to resolv this? Is this a typical
ganesha issue or is this cephfs corruption, that make ganesha stall?


Writes in Ganesha are pretty much passthrough, modulo some metadata
tracking.  This means that a write hang is likely to be somewhere
between Ganesha and CephFS.  However, this is a single, small file, so I
don't see how it could hang, especially when wget can copy the file
correctly.  Maybe there's something about the structure of the file?
Does it have holes in it, for example?

Also, can you send the mount options for the NFS mount?

Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mounting DR copy as Read-Only

2018-12-12 Thread Wido den Hollander



On 12/12/18 4:44 PM, Vikas Rana wrote:
> Hi,
> 
> We are using Luminous and copying a 100TB RBD image to DR site using RBD
> Mirror.
> 
> Everything seems to works fine.
> 
> The question is, can we mount the DR copy as Read-Only? We can do it on
> Netapp and we are trying to figure out if somehow we can mount it RO on
> DR site, then we can do backups at DR site.
> 
> When i tried it mount it via, RBD-NBD, it complains about super-block.
> When I promoted the DR copy, the same image works fine. So data is there
> but cant be mounted.
> 

What filesystem is being used here? EXT4?

Try mounting it with:

$ mount ro,noload /dev/ /tmp/myrbd

Otherwise the FS will try to load the journal and that fails.

Wido

> We also tried taking snap but it complains that it can't take snap of a
> Read-only copy.
> 
> Any suggestion or pointer will be greatly appreciated.
> 
> Thanks,
> -Vikas
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mounting DR copy as Read-Only

2018-12-12 Thread Vikas Rana
Hi,

We are using Luminous and copying a 100TB RBD image to DR site using RBD
Mirror.

Everything seems to works fine.

The question is, can we mount the DR copy as Read-Only? We can do it on
Netapp and we are trying to figure out if somehow we can mount it RO on DR
site, then we can do backups at DR site.

When i tried it mount it via, RBD-NBD, it complains about super-block. When
I promoted the DR copy, the same image works fine. So data is there but
cant be mounted.

We also tried taking snap but it complains that it can't take snap of a
Read-only copy.

Any suggestion or pointer will be greatly appreciated.

Thanks,
-Vikas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deploying an Active/Active NFS Cluster over CephFS

2018-12-12 Thread David C
Hi Jeff

Many thanks for this! Looking forward to testing it out.

Could you elaborate a bit on why Nautilus is recommended for this set-up
please. Would attempting this with a Luminous cluster be a non-starter?



On Wed, 12 Dec 2018, 12:16 Jeff Layton  (Sorry for the duplicate email to ganesha lists, but I wanted to widen
> it to include the ceph lists)
>
> In response to some cries for help over IRC, I wrote up this blog post
> the other day, which discusses how to set up parallel serving over
> CephFS:
>
>
> https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/
>
> Feel free to comment if you have questions. We may be want to eventually
> turn this into a document in the ganesha or ceph trees as well.
>
> Cheers!
> --
> Jeff Layton 
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-12 Thread Alfredo Deza
On Tue, Dec 11, 2018 at 7:28 PM Tyler Bishop
 wrote:
>
> Now I'm just trying to figure out how to create filestore in Luminous.
> I've read every doc and tried every flag but I keep ending up with
> either a data LV of 100% on the VG or a bunch fo random errors for
> unsupported flags...

An LV with 100% of the VG sounds like it tried to deploy bluestore.
ceph-deploy will try to behave like that unless LVs are created by
hand.

A newer option would be to try the `ceph-volume lvm batch` command on
your server (unsupported as of yet by ceph-deploy) to create all the
vgs/lvs needed including
the detection of HDDs and SSDs that would send the journals to the SSD if any:

ceph-volume lvm batch --filestore /dev/sda /dev/sdb /dev/sdc

Would create 3 OSDs, one for each spinning drive (assuming these are
spinning), and collocate the journal on the device itself. For the
journal on a separate device
a solid device would need to be added, for example:

ceph-volume lvm batch --filestore /dev/sda /dev/sdb /dev/sdc /dev/nvme0n1

Would create 3 OSDs again, but would put 3 journals on nvme0n1


>
> # ceph-disk prepare --filestore --fs-type xfs --data-dev /dev/sdb1
> --journal-dev /dev/sdb2 --osd-id 3
> usage: ceph-disk [-h] [-v] [--log-stdout] [--prepend-to-path PATH]
>  [--statedir PATH] [--sysconfdir PATH] [--setuser USER]
>  [--setgroup GROUP]
>
>
> {prepare,activate,activate-lockbox,activate-block,activate-journal,activate-all,list,suppress-activate,unsuppress-activate,deactivate,destroy,zap,trigger,fix}
>  ...
> ceph-disk: error: unrecognized arguments: /dev/sdb1
> On Tue, Dec 11, 2018 at 7:22 PM Christian Balzer  wrote:
> >
> >
> > Hello,
> >
> > On Tue, 11 Dec 2018 23:22:40 +0300 Igor Fedotov wrote:
> >
> > > Hi Tyler,
> > >
> > > I suspect you have BlueStore DB/WAL at these drives as well, don't you?
> > >
> > > Then perhaps you have performance issues with f[data]sync requests which
> > > DB/WAL invoke pretty frequently.
> > >
> > Since he explicitly mentioned using these SSDs with filestore AND the
> > journals on the same SSD I'd expect a similar impact aka piss-poor
> > performance in his existing setup (the 300 other OSDs).
> >
> > Unless of course some bluestore is significantly more sync happy than the
> > filestore journal and/or other bluestore particulars (reduced caching
> > space, not caching in some situations) are rearing their ugly heads.
> >
> > Christian
> >
> > > See the following links for details:
> > >
> > > https://www.percona.com/blog/2018/02/08/fsync-performance-storage-devices/
> > >
> > > https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> > >
> > > The latter link shows pretty poor numbers for M500DC drives.
> > >
> > >
> > > Thanks,
> > >
> > > Igor
> > >
> > >
> > > On 12/11/2018 4:58 AM, Tyler Bishop wrote:
> > >
> > > > Older Crucial/Micron M500/M600
> > > > _
> > > >
> > > > *Tyler Bishop*
> > > > EST 2007
> > > >
> > > >
> > > > O:513-299-7108 x1000
> > > > M:513-646-5809
> > > > http://BeyondHosting.net 
> > > >
> > > >
> > > > This email is intended only for the recipient(s) above and/or
> > > > otherwise authorized personnel. The information contained herein and
> > > > attached is confidential and the property of Beyond Hosting. Any
> > > > unauthorized copying, forwarding, printing, and/or disclosing
> > > > any information related to this email is prohibited. If you received
> > > > this message in error, please contact the sender and destroy all
> > > > copies of this email and any attachment(s).
> > > >
> > > >
> > > > On Mon, Dec 10, 2018 at 8:57 PM Christian Balzer  > > > > wrote:
> > > >
> > > > Hello,
> > > >
> > > > On Mon, 10 Dec 2018 20:43:40 -0500 Tyler Bishop wrote:
> > > >
> > > > > I don't think thats my issue here because I don't see any IO to
> > > > justify the
> > > > > latency.  Unless the IO is minimal and its ceph issuing a bunch
> > > > of discards
> > > > > to the ssd and its causing it to slow down while doing that.
> > > > >
> > > >
> > > > What does atop have to say?
> > > >
> > > > Discards/Trims are usually visible in it, this is during a fstrim 
> > > > of a
> > > > RAID1 / :
> > > > ---
> > > > DSK |  sdb  | busy 81% |  read   0 | write  8587
> > > > | MBw/s 2323.4 |  avio 0.47 ms |
> > > > DSK |  sda  | busy 70% |  read   2 | write  8587
> > > > | MBw/s 2323.4 |  avio 0.41 ms |
> > > > ---
> > > >
> > > > The numbers tend to be a lot higher than what the actual interface 
> > > > is
> > > > capable of, clearly the SSD is reporting its internal activity.
> > > >
> > > > In any case, it should give a good insight of what is going on
> > > > activity
> > > > wise.
> > > > Also for posterity and curiosity, what kind of SSDs?
>

Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-12 Thread Alfredo Deza
On Tue, Dec 11, 2018 at 8:16 PM Mark Kirkwood
 wrote:
>
> Looks like the 'delaylog' option for xfs is the problem - no longer supported 
> in later kernels. See 
> https://github.com/torvalds/linux/commit/444a702231412e82fb1c09679adc159301e9242c
>
> Offhand I'm not sure where that option is being added (whether ceph-deploy or 
> ceph-volume), but you could just do surgery on whichever one is adding it...

The default flags that ceph-volume uses for mounting XFS are:

rw,noatime,inode64

These can be overridden by a ceph.conf entry:

osd_mount_options_xfs=rw,noatime,inode64
>
> regards
>
> Mark
>
>
> On 12/12/18 1:33 PM, Tyler Bishop wrote:
>>
>>
>>> [osci-1001][DEBUG ] Running command: mount -t xfs -o 
>>> "rw,noatime,noquota,logbsize=256k,logbufs=8,inode64,allocsize=4M,delaylog" 
>>> /dev/ceph-7b308a5a-a8e9-48aa-86a9-39957dcbd1eb/osd-data-81522145-e31b-4325-83fd-6cfefc1b761f
>>>  /var/lib/ceph/osd/ceph-1
>>>
>>> [osci-1001][DEBUG ]  stderr: mount: unsupported option format: 
>>> "rw,noatime,noquota,logbsize=256k,logbufs=8,inode64,allocsize=4M,delaylog"
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] size of inc_osdmap vs osdmap

2018-12-12 Thread Sergey Dolgov
Greq, for example for our cluster ~1000 osd:

size osdmap.1357881__0_F7FE779D__none = 363KB (crush_version 9860,
modified 2018-12-12 04:00:17.661731)
size osdmap.1357882__0_F7FE772D__none = 363KB
size osdmap.1357883__0_F7FE74FD__none = 363KB (crush_version 9861,
modified 2018-12-12 04:00:27.385702)
size inc_osdmap.1357882__0_B783A4EA__none = 1.2MB

difference between epoch 1357881 and 1357883: crush weight one osd was
increased by 0.01 so we get 5 new pg_temp in osdmap.1357883 but size
inc_osdmap so huge

чт, 6 дек. 2018 г. в 06:20, Gregory Farnum :
>
> On Wed, Dec 5, 2018 at 3:32 PM Sergey Dolgov  wrote:
>>
>> Hi guys
>>
>> I faced strange behavior of crushmap change. When I change crush
>> weight osd I sometimes get  increment osdmap(1.2MB) which size is
>> significantly bigger than size of osdmap(0.4MB)
>
>
> This is probably because when CRUSH changes, the new primary OSDs for a PG 
> will tend to set a "pg temp" value (in the OSDMap) that temporarily reassigns 
> it to the old acting set, so the data can be accessed while the new OSDs get 
> backfilled. Depending on the size of your cluster, the number of PGs on it, 
> and the size of the CRUSH change, this can easily be larger than the rest of 
> the map because it is data with size linear in the number of PGs affected, 
> instead of being more normally proportional to the number of OSDs.
> -Greg
>
>>
>> I use luminois 12.2.8. Cluster was installed a long ago, I suppose
>> that initially it was firefly
>> How can I view content of increment osdmap or can you give me opinion
>> on this problem. I think that spikes of traffic tight after change of
>> crushmap relates to this crushmap behavior
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best regards, Sergey Dolgov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] yet another deep-scrub performance topic

2018-12-12 Thread Vladimir Prokofev
Thank you all for your input.
My best guess at the moment is that deep-scrub performs as it should, and
the issue is that it just has no limits on its performance, so it uses all
the OSD time it can. Even if it has lower priority than client IO, it still
can fill disk queue, and effectively bottleneck whole operation.

By the way my installation is 12.2.2, upgraded from 10.2 release, I use
bluestore OSDs with blockdb/WAL on separate SSD device(one per 4-6
spinners), spinners are a mix of 7200RPM SATA and 15kRPM SAS, we're in a
process of switching to all 15k.

> find ways to fix that (seperate block.db SSD's for instance might help)
I already have those, but for the sake of argument - how would in help even
in theory? If I'm not mistaken, blockdb is related to metadata only, while
deep-scrub operates on the data, and has to perfrom reads on your actual
data OSDs. deep-scrub has very little to do with metadata, this is what
ordinary scrub there for. Best I can imagine is that those few reads on
metadata that deep-scrub does would go to separate device and lower the IO
load on actual data drive, but it's a tiny droplet in the ocean of IOs it
will still have to perform, so the impact would be negligible.

For now I have unset nodeep-scrub and no-scrub from cluster, and
set nodeep-scrub true for my spinner-based pools. Will wait some time to
see if I have some slow requests or any other performance issues without
deep-scrub.
Meanwhile I would be interested to see your performance metrics on spinner
OSDs while deep-scrub is running. Does it consume all available OSD time or
not?

вт, 11 дек. 2018 г. в 15:24, Janne Johansson :

> Den tis 11 dec. 2018 kl 12:54 skrev Caspar Smit :
> >
> > On a Luminous 12.2.7 cluster these are the defaults:
> > ceph daemon osd.x config show
>
> thank you very much.
>
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to troubleshoot rsync to cephfs via nfs-ganesha stalling

2018-12-12 Thread Marc Roos
 
Hi Daniel, thanks for looking at this. 

These are the mount options
 type nfs4 
(rw,nodev,relatime,vers=4,intr,local_lock=none,retrans=2,proto=tcp,rsize
=8192,wsize=8192,hard,namlen=255,sec=sys)

I have overwritten the original files, so I cannot examine if they had 
holes. To be honest I don't even know how to query the file, to identify 
holes. 

These are the contents of the files, just plain text.
[@os0 CentOS7-x86_64]# cat CentOS_BuildTag
20181125-1500
[@os0 CentOS7-x86_64]# cat .discinfo
1543162572.807980
7.6
x86_64



-Original Message-
From: Daniel Gryniewicz [mailto:d...@redhat.com] 
Sent: 10 December 2018 15:54
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How to troubleshoot rsync to cephfs via 
nfs-ganesha stalling

This isn't something I've seen before.  rsync generally works fine, even 
over cephfs.  More inline.

On 12/09/2018 09:42 AM, Marc Roos wrote:
> 
> 
> This rsync command fails and makes the local nfs unavailable (Have to 
> stop nfs-ganesha, kill all rsync processes on the client and then 
> start
> nfs-ganesha)
> 
> rsync -rlptDvSHP --delete  --exclude config.repo --exclude "local*"
> --exclude "isos"
> anonym...@mirror.ams1.nl.leaseweb.net::centos/7/os/x86_64/
> /localpath/CentOS7-x86_64/
> 
> When I do individual rsyncs on the subfolders
> 
> -rw-r--r-- 1 nobody 500   14 Nov 25 17:01 CentOS_BuildTag
> -rw-r--r-- 1 nobody 500   29 Nov 25 17:16 .discinfo
> drwxr-xr-x 3 nobody 500 8.3M Nov 25 17:20 EFI
> -rw-rw-r-- 1 nobody 500  227 Aug 30  2017 EULA
> -rw-rw-r-- 1 nobody 500  18K Dec  9  2015 GPL drwxr-xr-x 3 nobody 500 
> 572M Nov 25 17:21 images drwxr-xr-x 2 nobody 500  57M Dec  9 14:11 
> isolinux drwxr-xr-x 2 nobody 500 433M Nov 25 17:20 LiveOS drwxrwxr-x 2 

> nobody 500 9.5G Nov 25 16:58 Packages drwxrwxr-x 2 nobody 500  29M Dec 
 
> 9 13:53 repodata
> -rw-rw-r-- 1 nobody 500 1.7K Dec  9  2015 RPM-GPG-KEY-CentOS-7
> -rw-rw-r-- 1 nobody 500 1.7K Dec  9  2015 RPM-GPG-KEY-CentOS-Testing-7
> -rw-r--r-- 1 nobody 500  354 Nov 25 17:21 .treeinfo
> 
> These rsyncs are all going fine.
> 
> rsync -rlptDvSHP --delete  --exclude config.repo --exclude "local*"
> --exclude "isos"
> anonym...@mirror.ams1.nl.leaseweb.net::centos/7/os/x86_64/Packages/
> /localpath/CentOS7-x86_64/Packages/
> rsync -rlptDvSHP --delete  --exclude config.repo --exclude "local*"
> --exclude "isos"
> anonym...@mirror.ams1.nl.leaseweb.net::centos/7/os/x86_64/repodata/
> /localpath/CentOS7-x86_64/repodata/
> rsync -rlptDvSHP --delete  --exclude config.repo --exclude "local*"
> --exclude "isos"
> anonym...@mirror.ams1.nl.leaseweb.net::centos/7/os/x86_64/LiveOS/
> /localpath/CentOS7-x86_64/LiveOS/
> 
> Except when I try to rsync the file CentOS_BuildTag then everything 
> stalls. Leaving such files
> -rw--- 1 500 500 0 Dec  9 14:26 .CentOS_BuildTag.2igwc5
> -rw--- 1 500 500 0 Dec  9 14:28 .CentOS_BuildTag.tkiwc5

So something is failing on the write, it seems.  These are the temporary 
files made by rsync, and they're empty, so the initial write seems to 
have failed.

> I can resolf this by doing a wget and moving the file to the location 
> wget 
> 
'http://mirror.ams1.nl.leaseweb.net/centos/7/os/x86_64/CentOS_BuildTag'
> mv CentOS_BuildTag /localpath/CentOS7-x86_64/
> 
> I had also problems with .discinfo and when I ls this directory on 
> cephfs mount it takes a long time to produce output.
> 
> When I do the full rsync to the cephfs mount it completes without 
> errors, when I then later do the sync on the nfs mount it completes 
> also (nothing being copied)

This confirms that it's not metadata related, as this second successful 
rsync is purely metadata.

> Anybody know what I should do to resolv this? Is this a typical 
> ganesha issue or is this cephfs corruption, that make ganesha stall?

Writes in Ganesha are pretty much passthrough, modulo some metadata 
tracking.  This means that a write hang is likely to be somewhere 
between Ganesha and CephFS.  However, this is a single, small file, so I 
don't see how it could hang, especially when wget can copy the file 
correctly.  Maybe there's something about the structure of the file? 
Does it have holes in it, for example?

Also, can you send the mount options for the NFS mount?

Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] move directories in cephfs

2018-12-12 Thread Zhenshi Zhou
Hi

Thanks for the explanation.

I did a test few moments ago.  Everything goes just like what I expect.

Thanks for your helps :)

Konstantin Shalygin  于2018年12月12日周三 下午4:57写道:

> Hi
>
> Than means, the 'mv' operation should be done if src and dst
> are in the same pool, and the client should have same permission
> on both src and dst.
>
> Do I have the right understanding?
>
> Yes.
>
>
>
> k
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] civitweb segfaults

2018-12-12 Thread Leon Robinson
That did the trick, we had it set to 0 just on the swift rgw definitions 
although it was set on other rgw services, I'm guessing someone must have 
thought there was a different precedence in play in the past.

On Tue, 2018-12-11 at 11:41 -0500, Casey Bodley wrote:

Hi Leon,


Are you running with a non-default value of rgw_gc_max_objs? I was able

to reproduce this exact stack trace by setting rgw_gc_max_objs = 0; I

can't think of any other way to get a 'Floating point exception' here.


On 12/11/18 10:31 AM, Leon Robinson wrote:

Hello, I have found a surefire way to bring down our swift gateways.


First, upload a bunch of large files and split it in to segments, e.g.


for i in {1..100}; do swift upload test_container -S 10485760

CentOS-7-x86_64-GenericCloud.qcow2 --object-name

CentOS-7-x86_64-GenericCloud.qcow2-$i; done


This creates 100 objects in test_container and 1000 or so objects in

test_container_segments


Then, Delete them. Preferably in a ludicrous manner.


for i in $(swift list test_container); do swift delete test_container

$i; done


What results is:


 -13> 2018-12-11 15:17:57.627655 7fc128b49700  1 --

172.28.196.121:0/464072497 <== osd.480 172.26.212.6:6802/2058882 1

 osd_op_reply(11 .dir.default.1083413551.2.7 [call,call]

v1423252'7548804 uv7548804 ondisk = 0) v8  213+0+0 (3895049453 0

0) 0x55c98f45e9c0 con 0x55c98f4d7800

   -12> 2018-12-11 15:17:57.627827 7fc0e3ffe700  1 --

172.28.196.121:0/464072497 --> 172.26.221.7:6816/2366816 --

osd_op(unknown.0.0:12 14.110b

14:d08c26b8:::default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10%2f1532606905.440697%2f938016768%2f10485760%2f0037:head

[cmpxattr user.rgw.idtag (25) op 1 mode 1,call rgw.obj_remove] snapc

0=[] ondisk+write+known_if_redirected e1423252) v8 -- 0x55c98f4603c0 con 0

   -11> 2018-12-11 15:17:57.628582 7fc128348700  5 --

172.28.196.121:0/157062182 >> 172.26.225.9:6828/2257653

conn(0x55c98f0eb000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH

pgs=540 cs=1 l=1). rx osd.87 seq 2 0x55c98f4603c0 osd_op_reply(340

obj_delete_at_hint.55 [call] v1423252'9217746 uv9217746 ondisk

= 0) v8

   -10> 2018-12-11 15:17:57.628604 7fc128348700  1 --

172.28.196.121:0/157062182 <== osd.87 172.26.225.9:6828/2257653 2 

osd_op_reply(340 obj_delete_at_hint.55 [call] v1423252'9217746

uv9217746 ondisk = 0) v8  173+0+0 (3971813511 0 0) 0x55c98f4603c0

con 0x55c98f0eb000

-9> 2018-12-11 15:17:57.628760 7fc1017f9700  1 --

172.28.196.121:0/157062182 --> 172.26.225.9:6828/2257653 --

osd_op(unknown.0.0:341 13.4f

13:f3db1134:::obj_delete_at_hint.55:head [call timeindex.list]

snapc 0=[] ondisk+read+known_if_redirected e1423252) v8 --

0x55c98f45fa00 con 0

-8> 2018-12-11 15:17:57.629306 7fc128348700  5 --

172.28.196.121:0/157062182 >> 172.26.225.9:6828/2257653

conn(0x55c98f0eb000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH

pgs=540 cs=1 l=1). rx osd.87 seq 3 0x55c98f45fa00 osd_op_reply(341

obj_delete_at_hint.55 [call] v0'0 uv9217746 ondisk = 0) v8

-7> 2018-12-11 15:17:57.629326 7fc128348700  1 --

172.28.196.121:0/157062182 <== osd.87 172.26.225.9:6828/2257653 3 

osd_op_reply(341 obj_delete_at_hint.55 [call] v0'0 uv9217746

ondisk = 0) v8  173+0+15 (3272189389 0 2149983739) 0x55c98f45fa00

con 0x55c98f0eb000

-6> 2018-12-11 15:17:57.629398 7fc128348700  5 --

172.28.196.121:0/464072497 >> 172.26.221.7:6816/2366816

conn(0x55c98f4d6000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH

pgs=181 cs=1 l=1). rx osd.58 seq 2 0x55c98f45fa00 osd_op_reply(12

default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10/1532606905.440697/938016768/10485760/0037

[cmpxattr (25) op 1 mode 1,call] v1423252'743755 uv743755 ondisk = 0) v8

-5> 2018-12-11 15:17:57.629418 7fc128348700  1 --

172.28.196.121:0/464072497 <== osd.58 172.26.221.7:6816/2366816 2 

osd_op_reply(12

default.1083413551.2_CentOS-7-x86_64-GenericCloud.qcow2-10/1532606905.440697/938016768/10485760/0037

[cmpxattr (25) op 1 mode 1,call] v1423252'743755 uv743755 ondisk = 0)

v8  290+0+0 (3763879162 0 0) 0x55c98f45fa00 con 0x55c98f4d6000

-4> 2018-12-11 15:17:57.629458 7fc1017f9700  1 --

172.28.196.121:0/157062182 --> 172.26.225.9:6828/2257653 --

osd_op(unknown.0.0:342 13.4f

13:f3db1134:::obj_delete_at_hint.55:head [call lock.unlock]

snapc 0=[] ondisk+write+known_if_redirected e1423252) v8 --

0x55c98f45fd40 con 0

-3> 2018-12-11 15:17:57.629603 7fc0e3ffe700  1 --

172.28.196.121:0/464072497 --> 172.26.212.6:6802/2058882 --

osd_op(unknown.0.0:13 15.1e0

15:079bdcbb:::.dir.default.1083413551.2.7:head [call

rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[]

ondisk+write+known_if_redirected e1423252) v8 -- 0x55c98f460700 con 0

-2> 2018-12-11 15:17:57.631312 7fc128b49700  5 --

172.28.196.121:0/464072497 >> 172.26.212.6:6802/2058882

conn(0x55c98f4d7800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH

pgs=202 cs=1 l=1).

Re: [ceph-users] move directories in cephfs

2018-12-12 Thread Konstantin Shalygin

Hi

Than means, the 'mv' operation should be done if src and dst
are in the same pool, and the client should have same permission
on both src and dst.

Do I have the right understanding?


Yes.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] НА: ceph pg backfill_toofull

2018-12-12 Thread Klimenko, Roman
?Ok, I'll try these params. thx!


От: Maged Mokhtar 
Отправлено: 12 декабря 2018 г. 10:51
Кому: Klimenko, Roman; ceph-users@lists.ceph.com
Тема: Re: [ceph-users] ceph pg backfill_toofull



There are 2 relevant params

mon_osd_full_ratio 0.95

osd_backfill_full_ratio 0.85
you are probably hitting them both
As a short term/ temp fix you may increase these values and maybe adjust 
weights on osds if you have to.
However you really need to fix this by adding more osds to your cluster, else 
it will happen again and again. Also when planing for required storage 
capacity, you should plan when 1 or 2 hosts fail and their pgs will distributed 
on remaining nodes, else you will hit the same issue.

/Maged




On 12/12/2018 07:52, Klimenko, Roman wrote:

Hi everyone. Yesterday i found that on our overcrowded Hammer ceph cluster (83% 
used in HDD pool) several osds were in danger zone - near 95%.

I reweighted them, and after several moments I got pgs stuck in 
backfill_toofull.

After that, I reapplied reweight to osds - no luck.

Currently, all reweights are equal 1.0, and ceph do nothing - no rebalance and 
recovering.

How I can make ceph recover these pgs?

ceph -s

 health HEALTH_WARN
47 pgs backfill_toofull
47 pgs stuck unclean
recovery 16/9422472 objects degraded (0.000%)
recovery 365332/9422472 objects misplaced (3.877%)
7 near full osd(s)

ceph osd df tree
ID WEIGHT   REWEIGHT SIZE   USEAVAIL %USE  VAR  TYPE NAME
-1 30.65996- 37970G 29370G 8599G 77.35 1.00 root default
-6 18.65996- 20100G 16681G 3419G 82.99 1.07 region HDD
-3  6.09000-  6700G  5539G 1160G 82.68 1.07 host ceph03.HDD
 1  1.0  1.0  1116G   841G  274G 75.39 0.97 osd.1
 5  1.0  1.0  1116G   916G  200G 82.07 1.06 osd.5
 3  1.0  1.0  1116G   939G  177G 84.14 1.09 osd.3
 8  1.09000  1.0  1116G   952G  164G 85.29 1.10 osd.8
 7  1.0  1.0  1116G   972G  143G 87.11 1.13 osd.7
11  1.0  1.0  1116G   916G  200G 82.08 1.06 osd.11
-4  6.16998-  6700G  5612G 1088G 83.76 1.08 host ceph02.HDD
14  1.09000  1.0  1116G   950G  165G 85.16 1.10 osd.14
13  0.8  1.0  1116G   949G  167G 85.03 1.10 osd.13
16  1.09000  1.0  1116G   921G  195G 82.50 1.07 osd.16
17  1.0  1.0  1116G   899G  216G 80.59 1.04 osd.17
10  1.09000  1.0  1116G   952G  164G 85.28 1.10 osd.10
15  1.0  1.0  1116G   938G  178G 84.02 1.09 osd.15
-2  6.39998-  6700G  5529G 1170G 82.53 1.07 host ceph01.HDD
12  1.09000  1.0  1116G   953G  163G 85.39 1.10 osd.12
 9  0.95000  1.0  1116G   939G  177G 84.14 1.09 osd.9
 2  1.09000  1.0  1116G   911G  204G 81.64 1.06 osd.2
 0  1.09000  1.0  1116G   951G  165G 85.22 1.10 osd.0
 6  1.09000  1.0  1116G   917G  199G 82.12 1.06 osd.6
 4  1.09000  1.0  1116G   856G  260G 76.67 0.99 osd.4




?





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Maged Mokhtar
CEO PetaSAN
4 Emad El Deen Kamel
Cairo 11371, Egypt
www.petasan.org
+201006979931
skype: maged.mokhtar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mds lost very frequently

2018-12-12 Thread Sang, Oliver
We are using luminous, we have seven ceph nodes and setup them all as MDS.
Recently the MDS lost very frequently, and when there is only one MDS left, the 
cephfs just degraded to unusable.

Checked the mds log in one ceph node, I found below
>
/build/ceph-12.2.8/src/mds/Locker.cc: 5076: FAILED assert(lock->get_state() == 
LOCK_PRE_SCAN)

ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) 
[0x564400e50e42]
2: (Locker::file_recover(ScatterLock*)+0x208) [0x564400c6ae18]
3: (MDCache::start_files_to_recover()+0xb3) [0x564400b98af3]
4: (MDSRank::clientreplay_start()+0x1f7) [0x564400ae04c7]
5: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x25c0) 
[0x564400aefd40]
6: (MDSDaemon::handle_mds_map(MMDSMap*)+0x154d) [0x564400ace3bd]
7: (MDSDaemon::handle_core_message(Message*)+0x7f3) [0x564400ad1273]
8: (MDSDaemon::ms_dispatch(Message*)+0x1c3) [0x564400ad15a3]
9: (DispatchQueue::entry()+0xeda) [0x5644011a547a]
10: (DispatchQueue::DispatchThread::entry()+0xd) [0x564400ee3fcd]
11: (()+0x7494) [0x7f7a2b106494]
12: (clone()+0x3f) [0x7f7a2a17eaff]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.
<

The full log is also attached. Could you please help us? Thanks!

BR
Oliver

<>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com