Re: [ceph-users] RBD Cache and rbd-nbd

2018-05-14 Thread Marc Schöchlin
Hello Jason, many thanks for your informative response! Am 11.05.2018 um 17:02 schrieb Jason Dillaman: > I cannot speak for Xen, but in general IO to a block device will hit > the pagecache unless the IO operation is flagged as direct (e.g. > O_DIRECT) to bypass the pagecache and directly send

[ceph-users] RBD Cache and rbd-nbd

2018-05-10 Thread Marc Schöchlin
Hello list, i map ~30 rbds  per xenserver host by using rbd-nbd to run virtual machines on these devices. I have the following questions: * Is it possible to use rbd cache for rbd-nbd? I assume that this is true, but  the

Re: [ceph-users] RBD Cache and rbd-nbd

2018-05-11 Thread Marc Schöchlin
Hello Jason, thanks for your response. Am 10.05.2018 um 21:18 schrieb Jason Dillaman: >> If i configure caches like described at >> http://docs.ceph.com/docs/luminous/rbd/rbd-config-ref/, are there dedicated >> caches per rbd-nbd/krbd device or is there a only a single cache area. > The librbd

Re: [ceph-users] Ceph - Xen accessing RBDs through libvirt

2018-05-22 Thread Marc Schöchlin
Hello thg, in the last weeks we spent some time in in improving RBDSR a rbd storage repository for XenServer. RBDSR is capable to userRBD by fuse, krbd and rbd-nbd. Our improvements are based on https://github.com/rposudnevskiy/RBDSR/tree/v2.0 and are currently published at

Re: [ceph-users] iSCSI rookies questions

2018-06-13 Thread Marc Schöchlin
Hi Max, just a sidenote: we are using a fork of RBDSR (https://github.com/vico-research-and-consulting/RBDSR) to connect XENServer 7.2 Community to RBDs directly using rbd-nbd. After a bit of hacking this works pretty good: direct RBD Creation from the storage repo, live Migration between

Re: [ceph-users] Ceph snapshots

2018-06-29 Thread Marc Schöchlin
Paul > > > 2018-06-29 17:28 GMT+02:00 Marc Schöchlin <mailto:m...@256bit.org>>: > > Hi Gregory, > > thanks for the link - very interesting talk. > You mentioned the following settings in your talk, but i was not > able to find some documentatio

Re: [ceph-users] Ceph snapshots

2018-06-27 Thread Marc Schöchlin
Hello list, i currently hold 3 snapshots per rbd image for my virtual systems. What i miss in the current documentation: * details about the implementation of snapshots o implementation details o which scenarios create high overhead per snapshot o what causes the really

[ceph-users] unable to perform a "rbd-nbd map" without forgroud flag

2018-04-26 Thread Marc Schöchlin
Hello list, this bug is filed as: https://tracker.ceph.com/issues/23891 >From my point of this is a bug, probably others are experiencing also this problem and can provide additional details. I would like to map a rbd using rbd-nbd. Without adding the foreground flag it is not possible to map

Re: [ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-26 Thread Marc Schöchlin
gt; graph on this to figure out where the user time is being spent? > > On Wed, Apr 25, 2018 at 11:25 AM, Marc Schöchlin <m...@256bit.org> wrote: >> Hello Jason, >> >> according to this, latency between client and osd should not be the problem: >> (the high amou

Re: [ceph-users] ceph-mgr dashboard behind reverse proxy

2018-08-07 Thread Marc Schöchlin
Hi, Am 04.08.2018 um 09:04 schrieb Tobias Florek: > I want to set up the dashboard behind a reverse proxy. How do >>> people determine which ceph-mgr is active? Is there any simple and >>> elegant solution? >> You can use haproxy. It supports periodic check for the availability >> of the

Re: [ceph-users] Slow requests from bluestore osds

2018-09-06 Thread Marc Schöchlin
at might >> be causing this? >> >> -Brett >> >> On Mon, Sep 3, 2018 at 4:13 AM, Marc Schöchlin > <mailto:m...@256bit.org>> wrote: >> >>     Hi, >> >>     we are also experiencing this type of behavior for some

[ceph-users] KPIs for Ceph/OSD client latency / deepscrub latency overhead

2018-07-11 Thread Marc Schöchlin
Hello ceph-users and ceph-devel list, we got in production with our new shiny luminous (12.2.5) cluster. This cluster runs SSD and HDD based OSD pools. To ensure the service quality of the cluster and to have a baseline for client latency optimization (i.e. in the area of deepscrub optimization)

Re: [ceph-users] KPIs for Ceph/OSD client latency / deepscrub latency overhead

2018-07-12 Thread Marc Schöchlin
eir > format. The median OSD is a good indicator and so is the slowest OSD. > > Paul > > 2018-07-11 17:50 GMT+02:00 Marc Schöchlin <mailto:m...@256bit.org>>: > > Hello ceph-users and ceph-devel list, > > we got in production with our new shiny luminou

[ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-25 Thread Marc Schöchlin
Hello list, we are trying to integrate a storage repository in xenserver. (i also describe the problem as a issue in the ceph bugtracker: https://tracker.ceph.com/issues/23853) Summary: The slowness is a real pain for us, because this prevents the xen storage repository to work efficently.

Re: [ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-25 Thread Marc Schöchlin
      2   real    0m18.562s user    0m12.513s sys    0m0.793s I also attached a json dump of my pool structure. Regards Marc Am 25.04.2018 um 14:46 schrieb Piotr Dałek: > On 18-04-25 02:29 PM, Marc Schöchlin wrote: >> Hello list, >> >> we are trying to integrate

Re: [ceph-users] Integrating XEN Server : Long query time for "rbd ls -l" queries

2018-04-25 Thread Marc Schöchlin
Also, I have to ask, but how often are you expecting to scrape the > images from pool? The long directory list involves opening each image > in the pool (which involves numerous round-trips to the OSDs) plus > iterating through each snapshot (which also involves round-trips). > >

[ceph-users] Slow requests from bluestore osds

2018-09-03 Thread Marc Schöchlin
Hi, we are also experiencing this type of behavior for some weeks on our not so performance critical hdd pools. We haven't spent so much time on this problem, because there are currently more important tasks - but here are a few details: Running the following loop results in the following

Re: [ceph-users] Huge latency spikes

2018-12-31 Thread Marc Schöchlin
Hi, our dell servers contain "PERC H730P Mini" raid controllers with 2GB battery backed cache memory. All of our ceph osd disks (typically 12 * 8GB spinners or  16 * 1-2 TBssds per node) are used directly without using the raid functionality. We deactivated the cache of the controller for the

[ceph-users] Ceph Dashboard Rewrite

2019-01-08 Thread Marc Schöchlin
Hello ceph-users, we are using ceph luminous 12.2.10. We run 3 mgrs - if i access the dashboard on a non-active mgr i get a location redirect to the hostname. Because this is not a fqdn, i cannot access the the dasboard in a convenient way because my workstation does not append the datacenter

[ceph-users] Ceph rbd.ko compatibility

2019-01-27 Thread Marc Schöchlin
Hello ceph-users, we are using a low number of rbd.ko clients with our luminous cluster. Where can i get information about the following questions: * Which features and cluster compatibility is provided by the rbd.ko module of my system? (/sys/module/rbd/**, "modinfo rbd" not seems to

Re: [ceph-users] Slow requests from bluestore osds

2019-01-28 Thread Marc Schöchlin
(a enhancement is in progress to get more iops) What can i do to decrease the impact of snaptrims to prevent slow requests? (i.e. reduce "osd max trimming pgs" to "1") Regards Marc Schöchlin Am 03.09.18 um 10:13 schrieb Marc Schöchlin: > Hi, > > we are also experiencing t

Re: [ceph-users] Slow requests from bluestore osds

2019-05-12 Thread Marc Schöchlin
New memtable created with log file: #422511. Immutable memtables: 0. Any hints how to find more details about the origin of this problem? How can we solve that? Regards Marc Am 28.01.19 um 22:27 schrieb Marc Schöchlin: > Hello cephers, > > as described - we also have the slow reques

Re: [ceph-users] Slow requests from bluestore osds

2019-05-13 Thread Marc Schöchlin
ome seconds(SSD) to minutes(HDD) and > perform a compact of OMAP database. > > Regards, > > > > > -----Mensaje original- > De: ceph-users En nombre de Marc Schöchlin > Enviado el: lunes, 13 de mayo de 2019 6:59 > Para: ceph-users@lists.ceph.com > Asunto: Re: [ceph-users

Re: [ceph-users] performance in a small cluster

2019-05-25 Thread Marc Schöchlin
Hello Robert, probably the following tool provides deeper insights whats happening on your osds: https://github.com/scoopex/ceph/blob/master/src/tools/histogram_dump.py https://github.com/ceph/ceph/pull/28244

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-20 Thread Marc Schöchlin
os Fernandez: > Hi Marc, > > Try to compact OSD with slow request > > ceph tell osd.[ID] compact > > This will make the OSD offline for some seconds(SSD) to minutes(HDD) and > perform a compact of OMAP database. > > Regards, > > > > > -Mensaje o

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Marc Schöchlin
Hello Jason, Am 20.05.19 um 23:49 schrieb Jason Dillaman: > On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote: >> Hello cephers, >> >> we have a few systems which utilize a rbd-bd map/mount to get access to a >> rbd volume. >> (This problem seems to be relat

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-13 Thread Marc Schöchlin
Hello Jason, thanks for your response. See my inline comments. Am 31.07.19 um 14:43 schrieb Jason Dillaman: > On Wed, Jul 31, 2019 at 6:20 AM Marc Schöchlin wrote: > > > The problem not seems to be related to kernel releases, filesystem types or > the ceph and network setup. &

[ceph-users] reproducible rbd-nbd crashes

2019-08-14 Thread Marc Schöchlin
Hello Mike, see my inline comments. Am 14.08.19 um 02:09 schrieb Mike Christie: >>> - >>> Previous tests crashed in a reproducible manner with "-P 1" (single io >>> gzip/gunzip) after a few minutes up to 45 minutes. >>> >>> Overview of my tests: >>> >>> - SUCCESSFUL: kernel 4.15, ceph

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-15 Thread Marc Schöchlin
Hello Mike, Am 15.08.19 um 19:57 schrieb Mike Christie: > >> Don't waste your time. I found a way to replicate it now. >> > > Just a quick update. > > Looks like we are trying to allocate memory in the IO path in a way that > can swing back on us, so we can end up locking up. You are probably not

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-12 Thread Marc Schöchlin
Hello Jason, yesterday i started rbd-nbd in forground mode to see if there are any additional informations. root@int-nfs-001:/etc/ceph# rbd-nbd map rbd_hdd/int-nfs-001_srv-ceph -d --id nfs 2019-09-11 13:07:41.444534 77fe1040  0 ceph version 12.2.12

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-13 Thread Marc Schöchlin
Hello Jason, Am 12.09.19 um 16:56 schrieb Jason Dillaman: > On Thu, Sep 12, 2019 at 3:31 AM Marc Schöchlin wrote: > > Whats that, have we seen that before? ("Numerical argument out of domain") > It's the error that rbd-nbd prints when the kernel prematurely closes >

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-10 Thread Marc Schöchlin
Hello Mike, Am 03.09.19 um 04:41 schrieb Mike Christie: > On 09/02/2019 06:20 AM, Marc Schöchlin wrote: >> Hello Mike, >> >> i am having a quick look to this on vacation because my coworker >> reports daily and continuous crashes ;-) >> Any updates here (i am

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-10 Thread Marc Schöchlin
019]  __schedule+0x2bd/0x850 [Tue Sep 10 14:46:51 2019]  ? try_to_del_timer_sync+0x53/0x80 [Tue Sep 10 14:46:51 2019]  schedule+0x2c/0x70 [Tue Sep 10 14:46:51 2019]  xfs_log_force+0x15f/0x2e0 [xfs] [Tue Sep 10 14:46:51 2019]  ? wake_up_q+0x80/0x80 [Tue Sep 10 14:46:51 2019]  xfsaild+0x17b/0x800 [

Re: [ceph-users] reproducible rbd-nbd crashes

2019-07-31 Thread Marc Schöchlin
and 12.2.5? From my point of view (without in depth-knowledge of rbd-nbd/librbd) my assumption is that this problem might be caused by rbd-nbd code and not by librbd. The probability that a bug like this survives uncovered in librbd for such a long time seems to be low for me :-) Regards Ma

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-29 Thread Marc Schöchlin
Hello Jason, i updated the ticket https://tracker.ceph.com/issues/40822 Am 24.07.19 um 19:20 schrieb Jason Dillaman: > On Wed, Jul 24, 2019 at 12:47 PM Marc Schöchlin wrote: >> >> Testing with a 10.2.5 librbd/rbd-nbd ist currently not that easy for me, >> because the

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-22 Thread Marc Schöchlin
Hi Mike, Am 22.07.19 um 16:48 schrieb Mike Christie: > On 07/22/2019 06:00 AM, Marc Schöchlin wrote: >>> With older kernels no timeout would be set for each command by default, >>> so if you were not running that tool then you would not see the nbd >>> disconnect+

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-22 Thread Marc Schöchlin
Hi Mike, Am 22.07.19 um 17:01 schrieb Mike Christie: > On 07/19/2019 02:42 AM, Marc Schöchlin wrote: >> We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, >> kernel 4.4.0+10, centos clone) and ~20 high load krbd devices (kernel >> 4.15.0-45, ubu

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Marc Schöchlin
Am 23.07.19 um 07:28 schrieb Marc Schöchlin: > > Okay, i already experimented with high timeouts (i.e 600 seconds). As i can > remember this leaded to pretty unusable system if i put high amounts of io on > the ec volume. > This system also runs als krbd volume which satur

[ceph-users] reproducable rbd-nbd crashes

2019-07-18 Thread Marc Schöchlin
Hello cephers, rbd-nbd crashes in a reproducible way here. I created the following bug report: https://tracker.ceph.com/issues/40822 Do you also experience this problem? Do you have suggestions for in depth debug data collection? I invoke the following command on a freshly mapped rbd and

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-19 Thread Marc Schöchlin
Hello Jason, Am 18.07.19 um 20:10 schrieb Jason Dillaman: > On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin wrote: >> Hello cephers, >> >> rbd-nbd crashes in a reproducible way here. > I don't see a crash report in the log below. Is it really crashing or > is it shutt

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Marc Schöchlin
Hi Jason, Am 24.07.19 um 00:40 schrieb Jason Dillaman: > >> Sure, which kernel do you prefer? > You said you have never had an issue w/ rbd-nbd 12.2.5 in your Xen > environment. Can you use a matching kernel version? Thats true, our virtual machines of our xen environments completly run on

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Marc Schöchlin
Hi Jason, Am 23.07.19 um 14:41 schrieb Jason Dillaman > Can you please test a consistent Ceph release w/ a known working > kernel release? It sounds like you have changed two variables, so it's > hard to know which one is broken. We need *you* to isolate what > specific Ceph or kernel release

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-24 Thread Marc Schöchlin
Am 24.07.19 um 07:55 schrieb Marc Schöchlin: > Hi Jason, > > Am 24.07.19 um 00:40 schrieb Jason Dillaman: >>> Sure, which kernel do you prefer? >> You said you have never had an issue w/ rbd-nbd 12.2.5 in your Xen >> environment. Can you use a matching kernel version

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-22 Thread Marc Schöchlin
Hello Mike, i attached inline comments. Am 19.07.19 um 22:20 schrieb Mike Christie: > >> We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, >> kernel 4.4.0+10, centos clone) and ~20 high load krbd devices (kernel >> 4.15.0-45, ubuntu 16.04) - we never experienced

Re: [ceph-users] reproducible rbd-nbd crashes

2019-09-20 Thread Marc Schöchlin
rent state in correcting this problem? Can we support you in testing the by running tests with custom kernel- or rbd-nbd builds? Regards Marc Am 13.09.19 um 14:15 schrieb Marc Schöchlin: >>> Nevertheless i will try EXT4 on another system. > I converted the filesystem to a ext4 filesy