Re: [ceph-users] Slow OPS

2019-03-21 Thread Brad Hubbard
A repop is a sub-operation between primaries and replicas mostly.

That op only shows a duration of 1.3 seconds and the delay you
mentioned previously was under a second. Do you see larger delays? Are
they always between "sub_op_committed" and "commit_sent"?

What is your workload and how heavily utilised is your
cluster/network? How hard are the underlying disks working?

On Thu, Mar 21, 2019 at 4:11 PM Glen Baars  wrote:
>
> Hello Brad,
>
> It doesn't seem to be a set of OSDs, the cluster has 160ish OSDs over 9 hosts.
>
> I seem to get a lot of these ops also that don't show a client.
>
> "description": "osd_repop(client.14349712.0:4866968 15.36 
> e30675/22264 15:6dd17247:::rbd_data.2359ef6b8b4567.0042766
> a:head v 30675'5522366)",
> "initiated_at": "2019-03-21 16:51:56.862447",
> "age": 376.527241,
> "duration": 1.331278,
>
> Kind regards,
> Glen Baars
>
> -Original Message-
> From: Brad Hubbard 
> Sent: Thursday, 21 March 2019 1:43 PM
> To: Glen Baars 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Slow OPS
>
> Actually, the lag is between "sub_op_committed" and "commit_sent". Is there 
> any pattern to these slow requests? Do they involve the same osd, or set of 
> osds?
>
> On Thu, Mar 21, 2019 at 3:37 PM Brad Hubbard  wrote:
> >
> > On Thu, Mar 21, 2019 at 3:20 PM Glen Baars  
> > wrote:
> > >
> > > Thanks for that - we seem to be experiencing the wait in this section of 
> > > the ops.
> > >
> > > {
> > > "time": "2019-03-21 14:12:42.830191",
> > > "event": "sub_op_committed"
> > > },
> > > {
> > > "time": "2019-03-21 14:12:43.699872",
> > > "event": "commit_sent"
> > > },
> > >
> > > Does anyone know what that section is waiting for?
> >
> > Hi Glen,
> >
> > These are documented, to some extent, here.
> >
> > http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting
> > -osd/
> >
> > It looks like it may be taking a long time to communicate the commit
> > message back to the client? Are these slow ops always the same client?
> >
> > >
> > > Kind regards,
> > > Glen Baars
> > >
> > > -Original Message-
> > > From: Brad Hubbard 
> > > Sent: Thursday, 21 March 2019 8:23 AM
> > > To: Glen Baars 
> > > Cc: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] Slow OPS
> > >
> > > On Thu, Mar 21, 2019 at 12:11 AM Glen Baars  
> > > wrote:
> > > >
> > > > Hello Ceph Users,
> > > >
> > > >
> > > >
> > > > Does anyone know what the flag point ‘Started’ is? Is that ceph osd 
> > > > daemon waiting on the disk subsystem?
> > >
> > > This is set by "mark_started()" and is roughly set when the pg starts 
> > > processing the op. Might want to capture dump_historic_ops output after 
> > > the op completes.
> > >
> > > >
> > > >
> > > >
> > > > Ceph 13.2.4 on centos 7.5
> > > >
> > > >
> > > >
> > > > "description": "osd_op(client.1411875.0:422573570
> > > > 5.18ds0
> > > > 5:b1ed18e5:::rbd_data.6.cf7f46b8b4567.0046e41a:head [read
> > > >
> > > > 1703936~16384] snapc 0=[] ondisk+read+known_if_redirected
> > > > e30622)",
> > > >
> > > > "initiated_at": "2019-03-21 01:04:40.598438",
> > > >
> > > > "age": 11.340626,
> > > >
> > > > "duration": 11.342846,
> > > >
> > > > "type_data": {
> > > >
> > > > "flag_point": "started",
> > > >
> > > > "client_info": {
> > > >
> > > > "client": "client.1411875",
> > > >
> > > > "client_addr": "10.4.37.45:0/627562602",
> > > >
> > > > "tid": 422573570
> > > >
> > > > },
> > > >
> > > > "events": [
> > > >
> > > > {
> > > >
> > > > "time": "2019-03-21 01:04:40.598438",
> > > >
> > > > "event": "initiated"
> > > >
> > > > },
> > > >
> > > > {
> > > >
> > > > "time": "2019-03-21 01:04:40.598438",
> > > >
> > > > "event": "header_read"
> > > >
> > > > },
> > > >
> > > > {
> > > >
> > > > "time": "2019-03-21 01:04:40.598439",
> > > >
> > > > "event": "throttled"
> > > >
> > > > },
> > > >
> > > > {
> > > >
> > > > "time": "2019-03-21 01:04:40.598450",
> > > >
> > > > "event": "all_read"
> > > >
> > > > },
> > > >
> > > > {
> > > >
> > > > "time": "2019-03-21 01:04:40.598499",
> > > >
> > > > "event": "dispatched"
> > > >
> > > > },
> > > >
> > > > {
> > > >
> > > >  

Re: [ceph-users] Access cephfs from second public network

2019-03-21 Thread Paul Emmerich
clients also need to access the OSDs and MDS servers


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, Mar 21, 2019 at 1:02 PM Andres Rojas Guerrero  wrote:
>
>
> Hi all, we have deployed a Ceph cluster configured with two public networks:
>
> [global]
> cluster network = 10.100.188.0/23
> fsid = 88f62260-b8de-499f-b6fe-5eb66a967083
> mon host = 10.100.190.9,10.100.190.10,10.100.190.11
> mon initial members = mon1,mon2,mon3
> osd_pool_default_pg_num = 4096
> public network = 10.100.190.0/23,10.100.40.0/21
>
> Our problem is that we need to access to cephfs with clients from the
> second public network, for this one  we have deployed a haproxy system
> in transparent mode to enable the access from the clients of second
> network in order to connect with mon (ceph-mon process 6789 tcp port)
> running in the first public networks (10.100.190.0/23). In the haproxy
> configuration we have a frontend in the second public network and the
> backend in the mon network:
>
> frontend cephfs_mon
>
> timeout client  600
> mode tcp
> bind 10.100.47.207:6789 transparent
>
> default_backend ceph1_mon
>
> backend ceph1_mon
>
> timeout connect 5000
> source 0.0.0.0 usesrc clientip
> server mon1 10.100.190.9:6789 check
>
>
> Then we try to mount a cephfs from the client in the second public
> network but we have a timeout:
>
>
> mount -t ceph 10.100.47.207:6789:/ /mnt/cephfs -o
> name=cephfs,secret=AQBOJ5JcXFJAIxAAs4+CBliifhBAD927K9Qaig==
>
> mount: mount 10.100.47.207:6789:/ on /mnt/cephfs failed: Expired
> connection time
>
> I see the traffic back and forth from the client-haproxy-mon system.
>
> Otherwise if the client it's in the first public network we have no
> problem in order to access cephfs resource.
>
> Does anybody experience with this situation?
>
> Thank you very much.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: CEPH ISCSI LIO multipath change delay

2019-03-21 Thread Mike Christie
On 03/21/2019 11:27 AM, Maged Mokhtar wrote:
> 
> Though i do not recommend changing it, if there is a need to lower
> fast_io_fail_tmo, then osd_heartbeat_interval + osd_heartbeat_grace sum
> need to be lowered as well, their default sum is 25 sec, which i would
> assume why fast_io_fail_tmo is set to this.  you would want to have your
> higher layer timeouts equal or larger than the layers below.

Yeah, fast_io_fail_tmo is set to 25 to make sure the target has detected
the initiator has marked the path as down and has done its cleanup.

If you set that multipath timer lower you have to set the target side
nops lower. When I can get those userspace flush patches merged
upstream, then we do not have to rely on the kernel based nops to flush
things and we can set fast_io_fail as low as we want (ignoring the ceph
side timeouts I mean).


> 
> /Maged
> 
> 
> On 21/03/2019 17:07, Jason Dillaman wrote:
>> It's just the design of the iSCSI protocol. Sure, you can lower the
>> timeouts (see "fast_io_fail_tmo" [1]) but you will just end up w/ more
>> false-positive failovers.
>>
>> [1] http://docs.ceph.com/docs/master/rbd/iscsi-initiator-linux/
>>
>> On Thu, Mar 21, 2019 at 10:46 AM li jerry  wrote:
>>> Hi Maged
>>>
>>> thank you for your reply.
>>>
>>> To exclude the osd_heartbeat_interval and osd_heartbeat_grace
>>> factors, I cleared the current lio configuration, redeployed two
>>> CENTOS7 (not in any ceph role), and deployed rbd-target-api,
>>> rbd-target-gw, trum-runner on it. ;
>>>
>>> And do the following test
>>> 1. centos7 client mounts iscsi lun
>>> 2, write data to iscsi lun through dd
>>> 3. Close the target node that is active. (forced power off)
>>>
>>> [18:33:48 ] active target node power off
>>> [18:33:57] centos7 client found iscsi target interrupted
>>> [18:34:23] centos7 client converts to another target node
>>>
>>>
>>> The whole process lasted for 35 seconds, and ceph was always healthy
>>> during the test.
>>>
>>> This conversion time is too long to reach the production level. Do I
>>> still have a place to optimize?
>>>
>>>
>>> Below is the centos7 client log [messages]
>>> 
>>>
>>> Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: ping timeout
>>> of 5 secs expired, recv timeout 5, last rx 4409486146, last ping
>>> 4409491148, now 4409496160
>>> Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: detected
>>> conn error (1022)
>>> Mar 21 18:33:57 CEPH-client01test iscsid: Kernel reported iSCSI
>>> connection 4:0 error (1022 - Invalid or unknown error code) state (3)
>>> Mar 21 18:34:22 CEPH-client01test kernel: session4: session recovery
>>> timed out after 25 secs
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> request
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED
>>> Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB:
>>> Write(10) 2a 00 00 23 fd 00 00 00 80 00
>>> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O
>>> error, dev sda, sector 2358528
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> request
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> request
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> request
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> request
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> request
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> request
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> request
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> request
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> request
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O
>>> to offline device
>>> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing
>>> 

Re: [ceph-users] objects degraded higher than 100%

2019-03-21 Thread Simon Ironside

This behaviour is still an issue in mimic 13.2.5 and nautilus 14.2.0.
I've logged https://tracker.ceph.com/issues/38841 for this. Apologies if 
this has already been done.


Simon

On 06/03/2019 20:17, Simon Ironside wrote:
Yes, as I said that bug is marked resolved. It's also marked as only 
affecting jewel and luminous.

I'm pointing out that it's still an issue today in mimic 13.2.4.

Simon

On 06/03/2019 16:04, Darius Kasparavičius wrote:

For some reason I didn't notice that number.

But it's most likely you are hitting this or similar bug: 
https://tracker.ceph.com/issues/21803




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: CEPH ISCSI LIO multipath change delay

2019-03-21 Thread Maged Mokhtar


Though i do not recommend changing it, if there is a need to lower 
fast_io_fail_tmo, then osd_heartbeat_interval + osd_heartbeat_grace sum 
need to be lowered as well, their default sum is 25 sec, which i would 
assume why fast_io_fail_tmo is set to this.  you would want to have your 
higher layer timeouts equal or larger than the layers below.


/Maged


On 21/03/2019 17:07, Jason Dillaman wrote:

It's just the design of the iSCSI protocol. Sure, you can lower the
timeouts (see "fast_io_fail_tmo" [1]) but you will just end up w/ more
false-positive failovers.

[1] http://docs.ceph.com/docs/master/rbd/iscsi-initiator-linux/

On Thu, Mar 21, 2019 at 10:46 AM li jerry  wrote:

Hi Maged

thank you for your reply.

To exclude the osd_heartbeat_interval and osd_heartbeat_grace factors, I 
cleared the current lio configuration, redeployed two CENTOS7 (not in any ceph 
role), and deployed rbd-target-api, rbd-target-gw, trum-runner on it. ;

And do the following test
1. centos7 client mounts iscsi lun
2, write data to iscsi lun through dd
3. Close the target node that is active. (forced power off)

[18:33:48 ] active target node power off
[18:33:57] centos7 client found iscsi target interrupted
[18:34:23] centos7 client converts to another target node


The whole process lasted for 35 seconds, and ceph was always healthy during the 
test.

This conversion time is too long to reach the production level. Do I still have 
a place to optimize?


Below is the centos7 client log [messages]


Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: ping timeout of 5 secs 
expired, recv timeout 5, last rx 4409486146, last ping 4409491148, now 
4409496160
Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: detected conn error 
(1022)
Mar 21 18:33:57 CEPH-client01test iscsid: Kernel reported iSCSI connection 4:0 
error (1022 - Invalid or unknown error code) state (3)
Mar 21 18:34:22 CEPH-client01test kernel: session4: session recovery timed out 
after 25 secs
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: 
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 
00 00 23 fd 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev 
sda, sector 2358528
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 

Re: [ceph-users] ceph-volume lvm batch OSD replacement

2019-03-21 Thread Dan van der Ster
On Thu, Mar 21, 2019 at 12:14 PM Eugen Block  wrote:
>
> Hi Dan,
>
> I don't know about keeping the osd-id but I just partially recreated
> your scenario. I wiped one OSD and recreated it. You are trying to
> re-use the existing block.db-LV with the device path (--block.db
> /dev/vg-name/lv-name) instead the lv notation (--block.db
> vg-name/lv-name):
>
> > # ceph-volume lvm create --data /dev/sdq --block.db
> > /dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
> > --osd-id 240
>
> This fails in my test, too. But if I use the LV notation it works:
>
> ceph-2:~ # ceph-volume lvm create --data /dev/sda --block.db
> ceph-journals/journal-osd3
> [...]
> Running command: /bin/systemctl enable --runtime ceph-osd@3
> Running command: /bin/systemctl start ceph-osd@3
> --> ceph-volume lvm activate successful for osd ID: 3
> --> ceph-volume lvm create successful for: /dev/sda
>

Yes that's it! Worked for me too.

Thanks!

Dan


> This is a Nautilus test cluster, but I remember having this on a
> Luminous cluster, too. I hope this helps.
>
> Regards,
> Eugen
>
>
> Zitat von Dan van der Ster :
>
> > On Tue, Mar 19, 2019 at 12:25 PM Dan van der Ster  
> > wrote:
> >>
> >> On Tue, Mar 19, 2019 at 12:17 PM Alfredo Deza  wrote:
> >> >
> >> > On Tue, Mar 19, 2019 at 7:00 AM Alfredo Deza  wrote:
> >> > >
> >> > > On Tue, Mar 19, 2019 at 6:47 AM Dan van der Ster
> >>  wrote:
> >> > > >
> >> > > > Hi all,
> >> > > >
> >> > > > We've just hit our first OSD replacement on a host created with
> >> > > > `ceph-volume lvm batch` with mixed hdds+ssds.
> >> > > >
> >> > > > The hdd /dev/sdq was prepared like this:
> >> > > ># ceph-volume lvm batch /dev/sd[m-r] /dev/sdac --yes
> >> > > >
> >> > > > Then /dev/sdq failed and was then zapped like this:
> >> > > >   # ceph-volume lvm zap /dev/sdq --destroy
> >> > > >
> >> > > > The zap removed the pv/vg/lv from sdq, but left behind the db on
> >> > > > /dev/sdac (see P.S.)
> >> > >
> >> > > That is correct behavior for the zap command used.
> >> > >
> >> > > >
> >> > > > Now we're replaced /dev/sdq and we're wondering how to proceed. We 
> >> > > > see
> >> > > > two options:
> >> > > >   1. reuse the existing db lv from osd.240 (Though the osd fsid will
> >> > > > change when we re-create, right?)
> >> > >
> >> > > This is possible but you are right that in the current state, the FSID
> >> > > and other cluster data exist in the LV metadata. To reuse this LV for
> >> > > a new (replaced) OSD
> >> > > then you would need to zap the LV *without* the --destroy flag, which
> >> > > would clear all metadata on the LV and do a wipefs. The command would
> >> > > need the full path to
> >> > > the LV associated with osd.240, something like:
> >> > >
> >> > > ceph-volume lvm zap /dev/ceph-osd-lvs/db-lv-240
> >> > >
> >> > > >   2. remove the db lv from sdac then run
> >> > > > # ceph-volume lvm batch /dev/sdq /dev/sdac
> >> > > >  which should do the correct thing.
> >> > >
> >> > > This would also work if the db lv is fully removed with --destroy
> >> > >
> >> > > >
> >> > > > This is all v12.2.11 btw.
> >> > > > If (2) is the prefered approached, then it looks like a bug that the
> >> > > > db lv was not destroyed by lvm zap --destroy.
> >> > >
> >> > > Since /dev/sdq was passed in to zap, just that one device was removed,
> >> > > so this is working as expected.
> >> > >
> >> > > Alternatively, zap has the ability to destroy or zap LVs associated
> >> > > with an OSD ID. I think this is not released yet for Luminous but
> >> > > should be in the next release (which seems to be what you want)
> >> >
> >> > Seems like 12.2.11 was released with the ability to zap by OSD ID. You
> >> > can also zap by OSD FSID, both way will zap (and optionally destroy if
> >> > using --destroy)
> >> > all LVs associated with the OSD.
> >> >
> >> > Full examples on this can be found here:
> >> >
> >> > http://docs.ceph.com/docs/luminous/ceph-volume/lvm/zap/#removing-devices
> >> >
> >> >
> >>
> >> Ohh that's an improvement! (Our goal is outsourcing the failure
> >> handling to non-ceph experts, so this will help simplify things.)
> >>
> >> In our example, the operator needs to know the osd id, then can do:
> >>
> >> 1. ceph-volume lvm zap --destroy --osd-id 240 (wipes sdq and removes
> >> the lvm from sdac for osd.240)
> >> 2. replace the hdd
> >> 3. ceph-volume lvm batch /dev/sdq /dev/sdac --osd-ids 240
> >>
> >> But I just remembered that the --osd-ids flag hasn't been backported
> >> to luminous, so we can't yet do that. I guess we'll follow the first
> >> (1) procedure to re-use the existing db lv.
> >
> > Hmm... re-using the db lv didn't work.
> >
> > We zapped it (see https://pastebin.com/N6PwpbYu) then got this error
> > when trying to create:
> >
> > # ceph-volume lvm create --data /dev/sdq --block.db
> > /dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
> > --osd-id 240
> > Running command: 

Re: [ceph-users] cephfs manila snapshots best practices

2019-03-21 Thread Tom Barron

On 21/03/19 16:15 +0100, Dan van der Ster wrote:

On Thu, Mar 21, 2019 at 1:50 PM Tom Barron  wrote:


On 20/03/19 16:33 +0100, Dan van der Ster wrote:
>Hi all,
>
>We're currently upgrading our cephfs (managed by OpenStack Manila)
>clusters to Mimic, and want to start enabling snapshots of the file
>shares.
>There are different ways to approach this, and I hope someone can
>share their experiences with:
>
>1. Do you give users the 's' flag in their cap, so that they can
>create snapshots themselves? We're currently planning *not* to do this
>-- we'll create snapshots for the users.
>2. We want to create periodic snaps for all cephfs volumes. I can see
>pros/cons to creating the snapshots in /volumes/.snap or in
>/volumes/_nogroup//.snap. Any experience there? Or maybe even
>just an fs-wide snap in /.snap is the best approach ?
>3. I found this simple cephfs-snap script which should do the job:
>http://images.45drives.com/ceph/cephfs/cephfs-snap  Does anyone have a
>different recommendation?
>
>Thanks!
>
>Dan
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Dan,

Manila of course provides users with self-service file share snapshot
capability with quota control of the snapshots.  I'm sure you are
aware of this but just wanted to get it on record in this thread.

Snapshots are not enabled by default for cephfs native or cephfs with
nfs in Manila because cephfs snapshots were experimental when the
cephfs driver was added and we maintain backwards compatability in
the Manila configuration.  To enable, one sets:

   cephfs_enable_snapshots = True

in the configuration stanza for cephfsnative or cephfsnfs back end.

Also, the ``share_type`` referenced when creating shares (either
explicitly or the default one) needs to have the snapshot_support
capability enabled -- e.g. the cloud admin would (one time) issue a
command like the following:

  $ manila type-key  set snapshot_support=True

With this approach either the user or the administrator can create
snapshots of file shares.

Dan, I expect you have your reasons for choosing to control snapshots
via a script that calls cephfs-snap directly rather than using Manila
-- and of course that's fine -- but if you'd share them it will help
us Manila developers consider whether there are use cases that we are
not currently addressing that we should consider.



Hi Tom, Thanks for the detailed response.
The majority of our users are coming from ZFS/NFS Filers, where
they've gotten used to zfs-auto-snapshots, which we create for them
periodically with some retention. So accidental deletions or
overwrites are never a problem because they can quickly access
yesterday's files.
So our initial idea was to replicate this with CephFS/Manila.
I hadn't thought of using the Manila managed snapshots for these
auto-snaps -- it is indeed another option. Have you already considered
Manila-managed auto-snapshots?


I've added this topic to our etherpad list [1] for the upcoming PTG.  
Today one could make a script that interacts with the manila API to 
create snaps periodically but they would be done one-by-one (even if 
in parallel) rather than atomically for the whole file system or at 
the root of the all the users shares.


Please feel free to adjust they way I've framed the issue in that 
etherpad so that it is suitable.




Otherwise, I wonder if CephFS would work well with both the fs-wide
auto-snaps *and* user-managed Manila snapshots. Has anyone tried such
a thing?


I haven't and would be interested in hearing as well.  Manila also 
supports NetApp and ZFS back ends so I'll ask more generally as well 
as for CephFS.


-- Tom

[1] https://etherpad.openstack.org/p/manila-denver-train-ptg-planning


Thanks!

dan




Thanks,

-- Tom Barron



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs manila snapshots best practices

2019-03-21 Thread Dan van der Ster
On Thu, Mar 21, 2019 at 1:50 PM Tom Barron  wrote:
>
> On 20/03/19 16:33 +0100, Dan van der Ster wrote:
> >Hi all,
> >
> >We're currently upgrading our cephfs (managed by OpenStack Manila)
> >clusters to Mimic, and want to start enabling snapshots of the file
> >shares.
> >There are different ways to approach this, and I hope someone can
> >share their experiences with:
> >
> >1. Do you give users the 's' flag in their cap, so that they can
> >create snapshots themselves? We're currently planning *not* to do this
> >-- we'll create snapshots for the users.
> >2. We want to create periodic snaps for all cephfs volumes. I can see
> >pros/cons to creating the snapshots in /volumes/.snap or in
> >/volumes/_nogroup//.snap. Any experience there? Or maybe even
> >just an fs-wide snap in /.snap is the best approach ?
> >3. I found this simple cephfs-snap script which should do the job:
> >http://images.45drives.com/ceph/cephfs/cephfs-snap  Does anyone have a
> >different recommendation?
> >
> >Thanks!
> >
> >Dan
> >___
> >ceph-users mailing list
> >ceph-users@lists.ceph.com
> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> Dan,
>
> Manila of course provides users with self-service file share snapshot
> capability with quota control of the snapshots.  I'm sure you are
> aware of this but just wanted to get it on record in this thread.
>
> Snapshots are not enabled by default for cephfs native or cephfs with
> nfs in Manila because cephfs snapshots were experimental when the
> cephfs driver was added and we maintain backwards compatability in
> the Manila configuration.  To enable, one sets:
>
>cephfs_enable_snapshots = True
>
> in the configuration stanza for cephfsnative or cephfsnfs back end.
>
> Also, the ``share_type`` referenced when creating shares (either
> explicitly or the default one) needs to have the snapshot_support
> capability enabled -- e.g. the cloud admin would (one time) issue a
> command like the following:
>
>   $ manila type-key  set snapshot_support=True
>
> With this approach either the user or the administrator can create
> snapshots of file shares.
>
> Dan, I expect you have your reasons for choosing to control snapshots
> via a script that calls cephfs-snap directly rather than using Manila
> -- and of course that's fine -- but if you'd share them it will help
> us Manila developers consider whether there are use cases that we are
> not currently addressing that we should consider.
>

Hi Tom, Thanks for the detailed response.
The majority of our users are coming from ZFS/NFS Filers, where
they've gotten used to zfs-auto-snapshots, which we create for them
periodically with some retention. So accidental deletions or
overwrites are never a problem because they can quickly access
yesterday's files.
So our initial idea was to replicate this with CephFS/Manila.
I hadn't thought of using the Manila managed snapshots for these
auto-snaps -- it is indeed another option. Have you already considered
Manila-managed auto-snapshots?

Otherwise, I wonder if CephFS would work well with both the fs-wide
auto-snaps *and* user-managed Manila snapshots. Has anyone tried such
a thing?

Thanks!

dan



> Thanks,
>
> -- Tom Barron
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: CEPH ISCSI LIO multipath change delay

2019-03-21 Thread Jason Dillaman
It's just the design of the iSCSI protocol. Sure, you can lower the
timeouts (see "fast_io_fail_tmo" [1]) but you will just end up w/ more
false-positive failovers.

[1] http://docs.ceph.com/docs/master/rbd/iscsi-initiator-linux/

On Thu, Mar 21, 2019 at 10:46 AM li jerry  wrote:
>
> Hi Maged
>
> thank you for your reply.
>
> To exclude the osd_heartbeat_interval and osd_heartbeat_grace factors, I 
> cleared the current lio configuration, redeployed two CENTOS7 (not in any 
> ceph role), and deployed rbd-target-api, rbd-target-gw, trum-runner on it. ;
>
> And do the following test
> 1. centos7 client mounts iscsi lun
> 2, write data to iscsi lun through dd
> 3. Close the target node that is active. (forced power off)
>
> [18:33:48 ] active target node power off
> [18:33:57] centos7 client found iscsi target interrupted
> [18:34:23] centos7 client converts to another target node
>
>
> The whole process lasted for 35 seconds, and ceph was always healthy during 
> the test.
>
> This conversion time is too long to reach the production level. Do I still 
> have a place to optimize?
>
>
> Below is the centos7 client log [messages]
> 
>
> Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: ping timeout of 5 
> secs expired, recv timeout 5, last rx 4409486146, last ping 4409491148, now 
> 4409496160
> Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: detected conn error 
> (1022)
> Mar 21 18:33:57 CEPH-client01test iscsid: Kernel reported iSCSI connection 
> 4:0 error (1022 - Invalid or unknown error code) state (3)
> Mar 21 18:34:22 CEPH-client01test kernel: session4: session recovery timed 
> out after 25 secs
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: 
> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 
> 00 00 23 fd 00 00 00 80 00
> Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev 
> sda, sector 2358528
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline device
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
> Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to 
> offline 

[ceph-users] 答复: CEPH ISCSI LIO multipath change delay

2019-03-21 Thread li jerry
Hi Maged

thank you for your reply.

To exclude the osd_heartbeat_interval and osd_heartbeat_grace factors, I 
cleared the current lio configuration, redeployed two CENTOS7 (not in any ceph 
role), and deployed rbd-target-api, rbd-target-gw, trum-runner on it. ;

And do the following test
1. centos7 client mounts iscsi lun
2, write data to iscsi lun through dd
3. Close the target node that is active. (forced power off)

[18:33:48 ] active target node power off
[18:33:57] centos7 client found iscsi target interrupted
[18:34:23] centos7 client converts to another target node


The whole process lasted for 35 seconds, and ceph was always healthy during the 
test.

This conversion time is too long to reach the production level. Do I still have 
a place to optimize?


Below is the centos7 client log [messages]


Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: ping timeout of 5 secs 
expired, recv timeout 5, last rx 4409486146, last ping 4409491148, now 
4409496160
Mar 21 18:33:57 CEPH-client01test kernel: connection4:0: detected conn error 
(1022)
Mar 21 18:33:57 CEPH-client01test iscsid: Kernel reported iSCSI connection 4:0 
error (1022 - Invalid or unknown error code) state (3)
Mar 21 18:34:22 CEPH-client01test kernel: session4: session recovery timed out 
after 25 secs
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] FAILED Result: 
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] CDB: Write(10) 2a 
00 00 23 fd 00 00 00 80 00
Mar 21 18:34:22 CEPH-client01test kernel: blk_update_request: I/O error, dev 
sda, sector 2358528
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: rejecting I/O to offline 
device
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: [sda] killing request
Mar 21 18:34:22 CEPH-client01test kernel: sd 5:0:0:0: 

Re: [ceph-users] OMAP size on disk

2019-03-21 Thread Brent Kennedy
They released Luminous 12.2.11 and now my large objects are starting to
count down ( after running the rm command suggested in the release notes ).
Seems dynamic sharding will clean up after itself now to!  So case closed!

-Brent

-Original Message-
From: ceph-users  On Behalf Of Brent
Kennedy
Sent: Thursday, October 11, 2018 2:47 PM
To: 'Matt Benjamin' 
Cc: 'Ceph Users' 
Subject: Re: [ceph-users] OMAP size on disk

Does anyone have a good blog entry or explanation of bucket sharding
requirements/commands?  Plus perhaps a howto?  

I upgraded our cluster to Luminous and now I have a warning about 5 large
objects.  The official blog says that sharding is turned on by default but
we upgraded, so I cant quite tell if our existing buckets had sharding
turned on during the upgrade or if that is something I need to do after(blog
doesn't state that).  Also, when I looked into the sharding commands, they
wanted a shard size, which if its automated, why would need to provide that?
Not to mention I don't know what to start with...

I found this:  https://tracker.ceph.com/issues/24457 which talks about the
issue and the #14 says he worked through it, but information seems outside
of my googlefu.

-Brent

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Matt Benjamin
Sent: Tuesday, October 9, 2018 7:28 AM
To: Luis Periquito 
Cc: Ceph Users 
Subject: Re: [ceph-users] OMAP size on disk

Hi Luis,

There are currently open issues with space reclamation after dynamic bucket
index resharding, esp. http://tracker.ceph.com/issues/34307

Changes are being worked on to address this, and to permit administratively
reclaiming space.

Matt

On Tue, Oct 9, 2018 at 5:50 AM, Luis Periquito  wrote:
> Hi all,
>
> I have several clusters, all running Luminous (12.2.7) proving S3 
> interface. All of them have enabled dynamic resharding and is working.
>
> One of the newer clusters is starting to give warnings on the used 
> space for the OMAP directory. The default.rgw.buckets.index pool is 
> replicated with 3x copies of the data.
>
> I created a new crush ruleset to only use a few well known SSDs, and 
> the OMAP directory size changed as expected: if I set the OSD as out 
> and them tell to compact, the size of the OMAP will shrink. If I set 
> the OSD as in the OMAP will grow to its previous state. And while the 
> backfill is going we get loads of key recoveries.
>
> Total physical space for OMAP in the OSDs that have them is ~1TB, so 
> given a 3x replica ~330G before replication.
>
> The data size for the default.rgw.buckets.data is just under 300G.
> There is one bucket who has ~1.7M objects and 22 shards.
>
> After deleting that bucket the size of the database didn't change - 
> even after running gc process and telling the OSD to compact its 
> database.
>
> This is not happening in older clusters, i.e created with hammer.
> Could this be a bug?
>
> I looked at getting all the OMAP keys and sizes
> (https://ceph.com/geen-categorie/get-omap-keyvalue-size/) and they add 
> up to close the value I expected them to take, looking at the physical 
> storage.
>
> Any ideas where to look next?
>
> thanks for all the help.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs manila snapshots best practices

2019-03-21 Thread Tom Barron

On 20/03/19 16:33 +0100, Dan van der Ster wrote:

Hi all,

We're currently upgrading our cephfs (managed by OpenStack Manila)
clusters to Mimic, and want to start enabling snapshots of the file
shares.
There are different ways to approach this, and I hope someone can
share their experiences with:

1. Do you give users the 's' flag in their cap, so that they can
create snapshots themselves? We're currently planning *not* to do this
-- we'll create snapshots for the users.
2. We want to create periodic snaps for all cephfs volumes. I can see
pros/cons to creating the snapshots in /volumes/.snap or in
/volumes/_nogroup//.snap. Any experience there? Or maybe even
just an fs-wide snap in /.snap is the best approach ?
3. I found this simple cephfs-snap script which should do the job:
http://images.45drives.com/ceph/cephfs/cephfs-snap  Does anyone have a
different recommendation?

Thanks!

Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Dan,

Manila of course provides users with self-service file share snapshot 
capability with quota control of the snapshots.  I'm sure you are 
aware of this but just wanted to get it on record in this thread.


Snapshots are not enabled by default for cephfs native or cephfs with
nfs in Manila because cephfs snapshots were experimental when the
cephfs driver was added and we maintain backwards compatability in
the Manila configuration.  To enable, one sets:

  cephfs_enable_snapshots = True

in the configuration stanza for cephfsnative or cephfsnfs back end.

Also, the ``share_type`` referenced when creating shares (either 
explicitly or the default one) needs to have the snapshot_support 
capability enabled -- e.g. the cloud admin would (one time) issue a 
command like the following:


 $ manila type-key  set snapshot_support=True

With this approach either the user or the administrator can create 
snapshots of file shares.


Dan, I expect you have your reasons for choosing to control snapshots 
via a script that calls cephfs-snap directly rather than using Manila 
-- and of course that's fine -- but if you'd share them it will help 
us Manila developers consider whether there are use cases that we are 
not currently addressing that we should consider.


Thanks,

-- Tom Barron


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Access cephfs from second public network

2019-03-21 Thread Andres Rojas Guerrero


Hi all, we have deployed a Ceph cluster configured with two public networks:

[global]
cluster network = 10.100.188.0/23
fsid = 88f62260-b8de-499f-b6fe-5eb66a967083
mon host = 10.100.190.9,10.100.190.10,10.100.190.11
mon initial members = mon1,mon2,mon3
osd_pool_default_pg_num = 4096
public network = 10.100.190.0/23,10.100.40.0/21

Our problem is that we need to access to cephfs with clients from the
second public network, for this one  we have deployed a haproxy system
in transparent mode to enable the access from the clients of second
network in order to connect with mon (ceph-mon process 6789 tcp port)
running in the first public networks (10.100.190.0/23). In the haproxy
configuration we have a frontend in the second public network and the
backend in the mon network:

frontend cephfs_mon

timeout client  600
mode tcp
bind 10.100.47.207:6789 transparent

default_backend ceph1_mon

backend ceph1_mon

timeout connect 5000
source 0.0.0.0 usesrc clientip
server mon1 10.100.190.9:6789 check


Then we try to mount a cephfs from the client in the second public
network but we have a timeout:


mount -t ceph 10.100.47.207:6789:/ /mnt/cephfs -o
name=cephfs,secret=AQBOJ5JcXFJAIxAAs4+CBliifhBAD927K9Qaig==

mount: mount 10.100.47.207:6789:/ on /mnt/cephfs failed: Expired
connection time

I see the traffic back and forth from the client-haproxy-mon system.

Otherwise if the client it's in the first public network we have no
problem in order to access cephfs resource.

Does anybody experience with this situation?

Thank you very much.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume lvm batch OSD replacement

2019-03-21 Thread Eugen Block

Hi Dan,

I don't know about keeping the osd-id but I just partially recreated  
your scenario. I wiped one OSD and recreated it. You are trying to  
re-use the existing block.db-LV with the device path (--block.db  
/dev/vg-name/lv-name) instead the lv notation (--block.db  
vg-name/lv-name):



# ceph-volume lvm create --data /dev/sdq --block.db
/dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
--osd-id 240


This fails in my test, too. But if I use the LV notation it works:

ceph-2:~ # ceph-volume lvm create --data /dev/sda --block.db  
ceph-journals/journal-osd3

[...]
Running command: /bin/systemctl enable --runtime ceph-osd@3
Running command: /bin/systemctl start ceph-osd@3
--> ceph-volume lvm activate successful for osd ID: 3
--> ceph-volume lvm create successful for: /dev/sda

This is a Nautilus test cluster, but I remember having this on a  
Luminous cluster, too. I hope this helps.


Regards,
Eugen


Zitat von Dan van der Ster :


On Tue, Mar 19, 2019 at 12:25 PM Dan van der Ster  wrote:


On Tue, Mar 19, 2019 at 12:17 PM Alfredo Deza  wrote:
>
> On Tue, Mar 19, 2019 at 7:00 AM Alfredo Deza  wrote:
> >
> > On Tue, Mar 19, 2019 at 6:47 AM Dan van der Ster  
 wrote:

> > >
> > > Hi all,
> > >
> > > We've just hit our first OSD replacement on a host created with
> > > `ceph-volume lvm batch` with mixed hdds+ssds.
> > >
> > > The hdd /dev/sdq was prepared like this:
> > ># ceph-volume lvm batch /dev/sd[m-r] /dev/sdac --yes
> > >
> > > Then /dev/sdq failed and was then zapped like this:
> > >   # ceph-volume lvm zap /dev/sdq --destroy
> > >
> > > The zap removed the pv/vg/lv from sdq, but left behind the db on
> > > /dev/sdac (see P.S.)
> >
> > That is correct behavior for the zap command used.
> >
> > >
> > > Now we're replaced /dev/sdq and we're wondering how to proceed. We see
> > > two options:
> > >   1. reuse the existing db lv from osd.240 (Though the osd fsid will
> > > change when we re-create, right?)
> >
> > This is possible but you are right that in the current state, the FSID
> > and other cluster data exist in the LV metadata. To reuse this LV for
> > a new (replaced) OSD
> > then you would need to zap the LV *without* the --destroy flag, which
> > would clear all metadata on the LV and do a wipefs. The command would
> > need the full path to
> > the LV associated with osd.240, something like:
> >
> > ceph-volume lvm zap /dev/ceph-osd-lvs/db-lv-240
> >
> > >   2. remove the db lv from sdac then run
> > > # ceph-volume lvm batch /dev/sdq /dev/sdac
> > >  which should do the correct thing.
> >
> > This would also work if the db lv is fully removed with --destroy
> >
> > >
> > > This is all v12.2.11 btw.
> > > If (2) is the prefered approached, then it looks like a bug that the
> > > db lv was not destroyed by lvm zap --destroy.
> >
> > Since /dev/sdq was passed in to zap, just that one device was removed,
> > so this is working as expected.
> >
> > Alternatively, zap has the ability to destroy or zap LVs associated
> > with an OSD ID. I think this is not released yet for Luminous but
> > should be in the next release (which seems to be what you want)
>
> Seems like 12.2.11 was released with the ability to zap by OSD ID. You
> can also zap by OSD FSID, both way will zap (and optionally destroy if
> using --destroy)
> all LVs associated with the OSD.
>
> Full examples on this can be found here:
>
> http://docs.ceph.com/docs/luminous/ceph-volume/lvm/zap/#removing-devices
>
>

Ohh that's an improvement! (Our goal is outsourcing the failure
handling to non-ceph experts, so this will help simplify things.)

In our example, the operator needs to know the osd id, then can do:

1. ceph-volume lvm zap --destroy --osd-id 240 (wipes sdq and removes
the lvm from sdac for osd.240)
2. replace the hdd
3. ceph-volume lvm batch /dev/sdq /dev/sdac --osd-ids 240

But I just remembered that the --osd-ids flag hasn't been backported
to luminous, so we can't yet do that. I guess we'll follow the first
(1) procedure to re-use the existing db lv.


Hmm... re-using the db lv didn't work.

We zapped it (see https://pastebin.com/N6PwpbYu) then got this error
when trying to create:

# ceph-volume lvm create --data /dev/sdq --block.db
/dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
--osd-id 240
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
9f63b457-37e0-4e33-971e-c0fc24658b65 240
Running command: vgcreate --force --yes
ceph-8ef05e54-8909-49f8-951d-0f9d37aeba45 /dev/sdq
 stdout: Physical volume "/dev/sdq" successfully created.
 stdout: Volume group "ceph-8ef05e54-8909-49f8-951d-0f9d37aeba45"
successfully created
Running command: 

Re: [ceph-users] v14.2.0 Nautilus released

2019-03-21 Thread John Hearns
Martin, my thanks to Croit for making this repository available.
I have been building Ceph from source on Ubuntu Cosmic for the last few
days.
It is much more convenient to use a repo.

On Thu, 21 Mar 2019 at 09:32, Martin Verges  wrote:

> Hello,
>
> we strongly believe it would be good for Ceph to have the packaged
> directly on the official Debian mirrors, but for everyone out there
> having trouble with Ceph on Debian we are glad to help.
> If Ceph is not available on Debian, it might affect a lot of other
> Software, for example Proxmox.
>
> You can find Ceph Nautilus 14.2.0 for Debian 10 Buster on our public
> mirror.
>
> $ curl https://mirror.croit.io/keys/release.asc | apt-key add -
> $ echo 'deb https://mirror.croit.io/debian-nautilus/ buster main' >>
> /etc/apt/sources.list.d/croit-ceph.list
>
> If we can help to get the packages on the official mirrors, please
> feel free contact us!
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Mi., 20. März 2019 um 20:49 Uhr schrieb Ronny Aasen
> :
> >
> >
> > with Debian buster frozen, If there are issues with ceph on debian that
> > would best be fixed in debian, now is the last chance to get anything
> > into buster before the next release.
> >
> > it is also important to get mimic and luminous packages built for
> > Buster. Since you want to avoid a situation where you have to upgrade
> > both the OS and ceph at the same time.
> >
> > kind regards
> > Ronny Aasen
> >
> >
> >
> > On 20.03.2019 07:09, Alfredo Deza wrote:
> > > There aren't any Debian packages built for this release because we
> > > haven't updated the infrastructure to build (and test) Debian packages
> > > yet.
> > >
> > > On Tue, Mar 19, 2019 at 10:24 AM Sean Purdy 
> wrote:
> > >> Hi,
> > >>
> > >>
> > >> Will debian packages be released?  I don't see them in the nautilus
> repo.  I thought that Nautilus was going to be debian-friendly, unlike
> Mimic.
> > >>
> > >>
> > >> Sean
> > >>
> > >> On Tue, 19 Mar 2019 14:58:41 +0100
> > >> Abhishek Lekshmanan  wrote:
> > >>
> > >>> We're glad to announce the first release of Nautilus v14.2.0 stable
> > >>> series. There have been a lot of changes across components from the
> > >>> previous Ceph releases, and we advise everyone to go through the
> release
> > >>> and upgrade notes carefully.
> > >> ___
> > >> ceph-users mailing list
> > >> ceph-users@lists.ceph.com
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v14.2.0 Nautilus released

2019-03-21 Thread Martin Verges
Hello,

we strongly believe it would be good for Ceph to have the packaged
directly on the official Debian mirrors, but for everyone out there
having trouble with Ceph on Debian we are glad to help.
If Ceph is not available on Debian, it might affect a lot of other
Software, for example Proxmox.

You can find Ceph Nautilus 14.2.0 for Debian 10 Buster on our public mirror.

$ curl https://mirror.croit.io/keys/release.asc | apt-key add -
$ echo 'deb https://mirror.croit.io/debian-nautilus/ buster main' >>
/etc/apt/sources.list.d/croit-ceph.list

If we can help to get the packages on the official mirrors, please
feel free contact us!

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Mi., 20. März 2019 um 20:49 Uhr schrieb Ronny Aasen
:
>
>
> with Debian buster frozen, If there are issues with ceph on debian that
> would best be fixed in debian, now is the last chance to get anything
> into buster before the next release.
>
> it is also important to get mimic and luminous packages built for
> Buster. Since you want to avoid a situation where you have to upgrade
> both the OS and ceph at the same time.
>
> kind regards
> Ronny Aasen
>
>
>
> On 20.03.2019 07:09, Alfredo Deza wrote:
> > There aren't any Debian packages built for this release because we
> > haven't updated the infrastructure to build (and test) Debian packages
> > yet.
> >
> > On Tue, Mar 19, 2019 at 10:24 AM Sean Purdy  
> > wrote:
> >> Hi,
> >>
> >>
> >> Will debian packages be released?  I don't see them in the nautilus repo.  
> >> I thought that Nautilus was going to be debian-friendly, unlike Mimic.
> >>
> >>
> >> Sean
> >>
> >> On Tue, 19 Mar 2019 14:58:41 +0100
> >> Abhishek Lekshmanan  wrote:
> >>
> >>> We're glad to announce the first release of Nautilus v14.2.0 stable
> >>> series. There have been a lot of changes across components from the
> >>> previous Ceph releases, and we advise everyone to go through the release
> >>> and upgrade notes carefully.
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] When to use a separate RocksDB SSD

2019-03-21 Thread Glen Baars
Hello Ceph,

What is the best way to find out how the RocksDB is currently performing? I 
need to build a business case for NVME devices for RocksDB.
Kind regards,
Glen Baars
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: effects of using hard links

2019-03-21 Thread Dan van der Ster
On Thu, Mar 21, 2019 at 8:51 AM Gregory Farnum  wrote:
>
> On Wed, Mar 20, 2019 at 6:06 PM Dan van der Ster  wrote:
>>
>> On Tue, Mar 19, 2019 at 9:43 AM Erwin Bogaard  
>> wrote:
>> >
>> > Hi,
>> >
>> >
>> >
>> > For a number of application we use, there is a lot of file duplication. 
>> > This wastes precious storage space, which I would like to avoid.
>> >
>> > When using a local disk, I can use a hard link to let all duplicate files 
>> > point to the same inode (use “rdfind”, for example).
>> >
>> >
>> >
>> > As there isn’t any deduplication in Ceph(FS) I’m wondering if I can use 
>> > hard links on CephFS in the same way as I use for ‘regular’ file systems 
>> > like ext4 and xfs.
>> >
>> > 1. Is it advisible to use hard links on CephFS? (It isn’t in the ‘best 
>> > practices’: http://docs.ceph.com/docs/master/cephfs/app-best-practices/)
>> >
>> > 2. Is there any performance (dis)advantage?
>> >
>> > 3. When using hard links, is there an actual space savings, or is there 
>> > some trickery happening?
>> >
>> > 4. Are there any issues (other than the regular hard link ‘gotcha’s’) I 
>> > need to keep in mind combining hard links with CephFS?
>>
>> The only issue we've seen is if you hardlink b to a, then rm a, then
>> never stat b, the inode is added to the "stray" directory. By default
>> there is a limit of 1 million stray entries -- so if you accumulate
>> files in this state eventually users will be unable to rm any files,
>> until you stat the `b` files.
>
>
> Eek. Do you know if we have any tickets about that issue? It's easy to see 
> how that happens but definitely isn't a good user experience!

I'm not aware of a ticket -- I had thought it was just a fact of life
with hardlinks and cephfs.
After hitting this issue in prod, we found the explanation here in
this old thread (with your useful post ;) ):

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-October/013621.html

Our immediate workaround was to increase mds bal fragment size max
(e.g. to 20).
In our env we now monitor num_strays in case these get out of control again.

BTW, now thinking about this more... isn't directory fragmentation
supposed to let the stray dir grow to unlimited shards? (on our side
it seems limited to 10 shards). Maybe this is just some configuration
issue on our side?

-- dan



> -Greg
>
>>
>>
>> -- dan
>>
>>
>> -- dan
>>
>>
>> >
>> >
>> >
>> > Thanks
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Recovery Settings

2019-03-21 Thread Marc Roos



Beware of start using this influx. I have 80GB db, and I regret using 
it. I have to move now to storing data in graphite. Collectd has also a 
plugin for that. 
Influx cannot downsample properly when having tags I think (still wait 
for a response to this [0])
What I have understood is that with downsampled data you have to select 
different source. That means changing / adapting your metrics. I think 
this is better in graphite 

Then I have had numerous strange things with influx. Logging format 
changes out of the blue, I have the impression they do not even have 
proper release strategy. It is difficult/impossible to do simple 
arithmetic between results of queries. When I filed an issue for having 
the home/end buttons work on the console and a option to escape out of 
the influx shell. They were even replying with it works on macos. As if 
anyone is ever going to host an influx production environment on macos.
Anyway the whole development team there is al in al giving a not 
professional impression. Totally the opposite of what you will find here 
at ceph. Maybe it is because of this trendy 'go' language they use.
Then the people of timescale did a much better job at using postgress as 
a backend.

So if you only want to get things working quickly without hassle, and 
see if it is working use influx. Otherwise use  I cannot advice 
graphite from experience yet, have to still look at it ;)



[0] 
https://community.influxdata.com/t/how-does-grouping-work-does-it-work/7936/2

-Original Message-
From: Brent Kennedy 
Sent: 21 March 2019 02:21
To: 'Reed Dier'
Cc: 'ceph-users'
Subject: Re: [ceph-users] SSD Recovery Settings

Lots of good info there, thank you!  I tend to get options fatigue when 
trying to pick out a new system.  This should help narrow that focus 
greatly.  

 

-Brent

 

From: Reed Dier 
Sent: Wednesday, March 20, 2019 12:48 PM
To: Brent Kennedy 
Cc: ceph-users 
Subject: Re: [ceph-users] SSD Recovery Settings

 

Grafana 
  is the web frontend for creating the graphs.

 

InfluxDB 
  holds the time 
series data that Grafana pulls from.

 

To collect data, I am using collectd 
  daemons running on each ceph 
node (mon,mds,osd), as this was my initial way of ingesting metrics.

I am also now using the influx plugin in ceph-mgr 
  to have ceph-mgr directly 
report statistics to InfluxDB.

 

I know two other popular methods of collecting data are Telegraf 
  and Prometheus 
 
, both of which are popular, both of which have ceph-mgr plugins as well 
here   and here 
 .

Influx Data also has a Grafana like graphing front end Chronograf 
 , which some 
prefer to Grafana.

 

Hopefully thats enough to get you headed in the right direction.

I would recommend not going down the CollectD path, as the project 
doesn't move as quickly as Telegraf and Prometheus, and the majority of 
the metrics I am pulling from these days are provided from the ceph-mgr 
plugin.

 

Hope that helps,

Reed





On Mar 20, 2019, at 11:30 AM, Brent Kennedy  
wrote:

 

Reed:  If you dont mind me asking, what was the graphing tool you 
had in the post?  I am using the ceph health web panel right now but it 
doesnt go that deep.

 

Regards,

Brent

 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: effects of using hard links

2019-03-21 Thread Gregory Farnum
On Wed, Mar 20, 2019 at 6:06 PM Dan van der Ster  wrote:

> On Tue, Mar 19, 2019 at 9:43 AM Erwin Bogaard 
> wrote:
> >
> > Hi,
> >
> >
> >
> > For a number of application we use, there is a lot of file duplication.
> This wastes precious storage space, which I would like to avoid.
> >
> > When using a local disk, I can use a hard link to let all duplicate
> files point to the same inode (use “rdfind”, for example).
> >
> >
> >
> > As there isn’t any deduplication in Ceph(FS) I’m wondering if I can use
> hard links on CephFS in the same way as I use for ‘regular’ file systems
> like ext4 and xfs.
> >
> > 1. Is it advisible to use hard links on CephFS? (It isn’t in the ‘best
> practices’: http://docs.ceph.com/docs/master/cephfs/app-best-practices/)
> >
> > 2. Is there any performance (dis)advantage?
> >
> > 3. When using hard links, is there an actual space savings, or is there
> some trickery happening?
> >
> > 4. Are there any issues (other than the regular hard link ‘gotcha’s’) I
> need to keep in mind combining hard links with CephFS?
>
> The only issue we've seen is if you hardlink b to a, then rm a, then
> never stat b, the inode is added to the "stray" directory. By default
> there is a limit of 1 million stray entries -- so if you accumulate
> files in this state eventually users will be unable to rm any files,
> until you stat the `b` files.
>

Eek. Do you know if we have any tickets about that issue? It's easy to see
how that happens but definitely isn't a good user experience!
-Greg


>
> -- dan
>
>
> -- dan
>
>
> >
> >
> >
> > Thanks
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow OPS

2019-03-21 Thread Glen Baars
Hello Brad,

It doesn't seem to be a set of OSDs, the cluster has 160ish OSDs over 9 hosts.

I seem to get a lot of these ops also that don't show a client.

"description": "osd_repop(client.14349712.0:4866968 15.36 
e30675/22264 15:6dd17247:::rbd_data.2359ef6b8b4567.0042766
a:head v 30675'5522366)",
"initiated_at": "2019-03-21 16:51:56.862447",
"age": 376.527241,
"duration": 1.331278,

Kind regards,
Glen Baars

-Original Message-
From: Brad Hubbard 
Sent: Thursday, 21 March 2019 1:43 PM
To: Glen Baars 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Slow OPS

Actually, the lag is between "sub_op_committed" and "commit_sent". Is there any 
pattern to these slow requests? Do they involve the same osd, or set of osds?

On Thu, Mar 21, 2019 at 3:37 PM Brad Hubbard  wrote:
>
> On Thu, Mar 21, 2019 at 3:20 PM Glen Baars  
> wrote:
> >
> > Thanks for that - we seem to be experiencing the wait in this section of 
> > the ops.
> >
> > {
> > "time": "2019-03-21 14:12:42.830191",
> > "event": "sub_op_committed"
> > },
> > {
> > "time": "2019-03-21 14:12:43.699872",
> > "event": "commit_sent"
> > },
> >
> > Does anyone know what that section is waiting for?
>
> Hi Glen,
>
> These are documented, to some extent, here.
>
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting
> -osd/
>
> It looks like it may be taking a long time to communicate the commit
> message back to the client? Are these slow ops always the same client?
>
> >
> > Kind regards,
> > Glen Baars
> >
> > -Original Message-
> > From: Brad Hubbard 
> > Sent: Thursday, 21 March 2019 8:23 AM
> > To: Glen Baars 
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Slow OPS
> >
> > On Thu, Mar 21, 2019 at 12:11 AM Glen Baars  
> > wrote:
> > >
> > > Hello Ceph Users,
> > >
> > >
> > >
> > > Does anyone know what the flag point ‘Started’ is? Is that ceph osd 
> > > daemon waiting on the disk subsystem?
> >
> > This is set by "mark_started()" and is roughly set when the pg starts 
> > processing the op. Might want to capture dump_historic_ops output after the 
> > op completes.
> >
> > >
> > >
> > >
> > > Ceph 13.2.4 on centos 7.5
> > >
> > >
> > >
> > > "description": "osd_op(client.1411875.0:422573570
> > > 5.18ds0
> > > 5:b1ed18e5:::rbd_data.6.cf7f46b8b4567.0046e41a:head [read
> > >
> > > 1703936~16384] snapc 0=[] ondisk+read+known_if_redirected
> > > e30622)",
> > >
> > > "initiated_at": "2019-03-21 01:04:40.598438",
> > >
> > > "age": 11.340626,
> > >
> > > "duration": 11.342846,
> > >
> > > "type_data": {
> > >
> > > "flag_point": "started",
> > >
> > > "client_info": {
> > >
> > > "client": "client.1411875",
> > >
> > > "client_addr": "10.4.37.45:0/627562602",
> > >
> > > "tid": 422573570
> > >
> > > },
> > >
> > > "events": [
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598438",
> > >
> > > "event": "initiated"
> > >
> > > },
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598438",
> > >
> > > "event": "header_read"
> > >
> > > },
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598439",
> > >
> > > "event": "throttled"
> > >
> > > },
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598450",
> > >
> > > "event": "all_read"
> > >
> > > },
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598499",
> > >
> > > "event": "dispatched"
> > >
> > > },
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598504",
> > >
> > > "event": "queued_for_pg"
> > >
> > > },
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598883",
> > >
> > > "event": "reached_pg"
> > >
> > > },
> > >
> > > {
> > >
> > > "time": "2019-03-21 01:04:40.598905",
> > >
> > > "event": "started"
> > >
> > > }
> > >
> > > ]
> > >
> > > }
> > >
> > > }
> > >
> > > ],
> > >
> > >
> > >
> > > Glen
> > >
> > > This e-mail is intended solely for the benefit of