Re: [ceph-users] slow ops for mon slowly increasing

2019-09-20 Thread Kevin Olbrich
OK, looks like clock skew is the problem. I thought this is caused by the reboot but it did not fix itself after some minutes (mon3 was 6 seconds ahead). After forcing time sync from the same server, it seems to be solved now. Kevin Am Fr., 20. Sept. 2019 um 07:33 Uhr schrieb Kevin Olbrich

[ceph-users] slow ops for mon slowly increasing

2019-09-19 Thread Kevin Olbrich
Hi! Today some OSDs went down, a temporary problem that was solved easily. The mimic cluster is working and all OSDs are complete, all active+clean. Completely new for me is this: > 25 slow ops, oldest one blocked for 219 sec, mon.mon03 has slow ops The cluster itself looks fine, monitoring for

Re: [ceph-users] QEMU/KVM client compatibility

2019-05-28 Thread Kevin Olbrich
Am Di., 28. Mai 2019 um 10:20 Uhr schrieb Wido den Hollander : > > > On 5/28/19 10:04 AM, Kevin Olbrich wrote: > > Hi Wido, > > > > thanks for your reply! > > > > For CentOS 7, this means I can switch over to the "rpm-nautilus/el7" > > repos

Re: [ceph-users] QEMU/KVM client compatibility

2019-05-28 Thread Kevin Olbrich
Hollander : > > > On 5/28/19 7:52 AM, Kevin Olbrich wrote: > > Hi! > > > > How can I determine which client compatibility level (luminous, mimic, > > nautilus, etc.) is supported in Qemu/KVM? > > Does it depend on the version of ceph packages on the s

[ceph-users] QEMU/KVM client compatibility

2019-05-27 Thread Kevin Olbrich
Hi! How can I determine which client compatibility level (luminous, mimic, nautilus, etc.) is supported in Qemu/KVM? Does it depend on the version of ceph packages on the system? Or do I need a recent version Qemu/KVM? Which component defines, which client level will be supported? Thank you very

Re: [ceph-users] cluster is not stable

2019-03-12 Thread Kevin Olbrich
Are you sure that firewalld is stopped and disabled? Looks exactly like that when I missed one host in a test cluster. Kevin Am Di., 12. März 2019 um 09:31 Uhr schrieb Zhenshi Zhou : > Hi, > > I deployed a ceph cluster with good performance. But the logs > indicate that the cluster is not as

Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-26 Thread Kevin Olbrich
2019 um 07:34 Uhr schrieb Konstantin Shalygin : > > On 1/5/19 4:17 PM, Kevin Olbrich wrote: > > root@adminnode:~# ceph osd tree > > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > > -1 30.82903 root default > > -16 30.82903

Re: [ceph-users] Rezising an online mounted ext4 on a rbd - failed

2019-01-26 Thread Kevin Olbrich
Am Sa., 26. Jan. 2019 um 13:43 Uhr schrieb Götz Reinicke : > > Hi, > > I have a fileserver which mounted a 4TB rbd, which is ext4 formatted. > > I grow that rbd and ext4 starting with an 2TB rbd that way: > > rbd resize testpool/disk01--size 4194304 > > resize2fs /dev/rbd0 > > Today I wanted to

Re: [ceph-users] pgs stuck in creating+peering state

2019-01-17 Thread Kevin Olbrich
Are you sure, no service like firewalld is running? Did you check that all machines have the same MTU and jumbo frames are enabled if needed? I had this problem when I first started with ceph and forgot to disable firewalld. Replication worked perfectly fine but the OSD was kicked out every few

Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Kevin Olbrich
ards > > On Tue, Jan 8, 2019 at 11:28 AM Kevin Olbrich wrote: >> >> You use replication 3 failure-domain host. >> OSD 2 and 4 are full, thats why your pool is also full. >> You need to add two disks to pf-us1-dfs3 or swap one from the larger >> nodes to this one.

Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Kevin Olbrich
You use replication 3 failure-domain host. OSD 2 and 4 are full, thats why your pool is also full. You need to add two disks to pf-us1-dfs3 or swap one from the larger nodes to this one. Kevin Am Di., 8. Jan. 2019 um 15:20 Uhr schrieb Rodrigo Embeita : > > Hi Yoann, thanks for your response. >

Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Kevin Olbrich
Looks like the same problem like mine: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032054.html The free space is total while Ceph uses the smallest free space (worst OSD). Please check your (re-)weights. Kevin Am Di., 8. Jan. 2019 um 14:32 Uhr schrieb Rodrigo Embeita : > >

Re: [ceph-users] Balancer=on with crush-compat mode

2019-01-05 Thread Kevin Olbrich
If I understand the balancer correct, it balances PGs not data. This worked perfectly fine in your case. I prefer a PG count of ~100 per OSD, you are at 30. Maybe it would help to bump the PGs. Kevin Am Sa., 5. Jan. 2019 um 14:39 Uhr schrieb Marc Roos : > > > I have straw2, balancer=on,

Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-05 Thread Kevin Olbrich
5:12 Uhr schrieb Konstantin Shalygin : > > On 1/5/19 1:51 AM, Kevin Olbrich wrote: > > PS: Could behttp://tracker.ceph.com/issues/36361 > > There is one HDD OSD that is out (which will not be replaced because > > the SSD pool will get the images and the hdd pool will be deleted

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Kevin Olbrich
667 activating+degraded > 159 stale+activating > 116 down > 77activating+remapped > 34stale+activating+degraded > 21stale+activating+remapped > 9 stale+active+undersiz

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Kevin Olbrich
Hard Disk Iops on new server which are very low compared to > existing cluster server. > > Indeed this is a critical cluster but I don't have expertise to make it > flawless. > > Thanks > Arun > > On Fri, Jan 4, 2019 at 11:35 AM Kevin Olbrich wrote: >> >>

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Kevin Olbrich
If you realy created and destroyed OSDs before the cluster healed itself, this data will be permanently lost (not found / inactive). Also your PG count is so much oversized, the calculation for peering will most likely break because this was never tested. If this is a critical cluster, I would

Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-04 Thread Kevin Olbrich
PS: Could be http://tracker.ceph.com/issues/36361 There is one HDD OSD that is out (which will not be replaced because the SSD pool will get the images and the hdd pool will be deleted). Kevin Am Fr., 4. Jan. 2019 um 19:46 Uhr schrieb Kevin Olbrich : > > Hi! > > I did what you wrote

Re: [ceph-users] Usage of devices in SSD pool vary very much

2019-01-04 Thread Kevin Olbrich
Hi! I did what you wrote but my MGRs started to crash again: root@adminnode:~# ceph -s cluster: id: 086d9f80-6249-4594-92d0-e31b6a9c health: HEALTH_WARN no active mgr 105498/6277782 objects misplaced (1.680%) services: mon: 3 daemons, quorum

[ceph-users] TCP qdisc + congestion control / BBR

2019-01-02 Thread Kevin Olbrich
Hi! I wonder if changing qdisc and congestion_control (for example fq with Google BBR) on Ceph servers / clients has positive effects during high load. Google BBR: https://cloud.google.com/blog/products/gcp/tcp-bbr-congestion-control-comes-to-gcp-your-internet-just-got-faster I am running a lot

[ceph-users] Usage of devices in SSD pool vary very much

2019-01-02 Thread Kevin Olbrich
Hi! On a medium sized cluster with device-classes, I am experiencing a problem with the SSD pool: root@adminnode:~# ceph osd df | grep ssd ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS 2 ssd 0.43700 1.0 447GiB 254GiB 193GiB 56.77 1.28 50 3 ssd 0.43700 1.0

Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Kevin Olbrich
> > Assuming everything is on LVM including the root filesystem, only moving > > the boot partition will have to be done outside of LVM. > > Since the OP mentioned MS Exchange, I assume the VM is running windows. > You can do the same LVM-like trick in Windows Server via Disk Manager > though; add

[ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Kevin Olbrich
Hi! Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous cluster (which already holds lot's of images). The server has access to both local and cluster-storage, I only need to live migrate the storage, not

Re: [ceph-users] Packages for debian in Ceph repo

2018-11-15 Thread Kevin Olbrich
I now had the time to test and after installing this package, uploads to rbd are working perfectly. Thank you very much fur sharing this! Kevin Am Mi., 7. Nov. 2018 um 15:36 Uhr schrieb Kevin Olbrich : > Am Mi., 7. Nov. 2018 um 07:40 Uhr schrieb Nicolas Huillard < > nhuill...@do

Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-13 Thread Kevin Olbrich
I read the whole thread and it looks like the write cache should always be disabled as in the worst case, the performance is the same(?). This is based on this discussion. I will test some WD4002FYYZ which don't mention "media cache". Kevin Am Di., 13. Nov. 2018 um 09:27 Uhr schrieb Виталий

Re: [ceph-users] Ceph or Gluster for implementing big NAS

2018-11-12 Thread Kevin Olbrich
; -- Dan > > On Mon, Nov 12, 2018 at 3:01 PM Kevin Olbrich wrote: > > > > Hi! > > > > ZFS won't play nice on ceph. Best would be to mount CephFS directly with > the ceph-fuse driver on the endpoint. > > If you definitely want to put a storage gateway between the

Re: [ceph-users] Ceph or Gluster for implementing big NAS

2018-11-12 Thread Kevin Olbrich
Hi! ZFS won't play nice on ceph. Best would be to mount CephFS directly with the ceph-fuse driver on the endpoint. If you definitely want to put a storage gateway between the data and the compute nodes, then go with nfs-ganesha which can export CephFS directly without local ("proxy") mount. I

Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Kevin Olbrich
Am Mi., 7. Nov. 2018 um 16:40 Uhr schrieb Gregory Farnum : > On Wed, Nov 7, 2018 at 5:58 AM Simon Ironside > wrote: > >> >> >> On 07/11/2018 10:59, Konstantin Shalygin wrote: >> >> I wonder if there is any release announcement for ceph 12.2.9 that I >> missed. >> >> I just found the new packages

Re: [ceph-users] Packages for debian in Ceph repo

2018-11-07 Thread Kevin Olbrich
Am Mi., 7. Nov. 2018 um 07:40 Uhr schrieb Nicolas Huillard < nhuill...@dolomede.fr>: > > > It lists rbd but still fails with the exact same error. > > I stumbled upon the exact same error, and since there was no answer > anywhere, I figured it was a very simple problem: don't forget to > install

Re: [ceph-users] ceph-deploy osd creation failed with multipath and dmcrypt

2018-11-06 Thread Kevin Olbrich
I met the same problem. I had to create GPT table for each disk, create first partition over full space and then fed these to ceph-volume (should be similar for ceph-deploy). Also I am not sure if you can combine fs-type btrfs with bluestore (afaik this is for filestore). Kevin Am Di., 6. Nov.

Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread Kevin Olbrich
_cdrom host_device http https iscsi iser luks nbd null-aio > null-co parallels qcow qcow2 qed quorum raw rbd replication sheepdog > throttle vdi vhdx vmdk vpc vvfat zeroinit > On Tue, Oct 30, 2018 at 12:08 PM Kevin Olbrich wrote: > >> Is it possible to use qemu-img with rbd su

Re: [ceph-users] Packages for debian in Ceph repo

2018-10-30 Thread Kevin Olbrich
Is it possible to use qemu-img with rbd support on Debian Stretch? I am on Luminous and try to connect my image-buildserver to load images into a ceph pool. root@buildserver:~# qemu-img convert -p -O raw /target/test-vm.qcow2 > rbd:rbd_vms_ssd_01/test_vm > qemu-img: Unknown protocol 'rbd' Kevin

[ceph-users] Command to check last change to rbd image?

2018-10-28 Thread Kevin Olbrich
Hi! Is there an easy way to check when an image was last modified? I want to make sure, that the images I want to clean up, were not used for a long time. Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] nfs-ganesha version in Ceph repos

2018-10-09 Thread Kevin Olbrich
I had a similar problem: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/029698.html But even the recent 2.6.x releases were not working well for me (many many segfaults). I am on the master-branch (2.7.x) and that works well with less crashs. Cluster is 13.2.1/.2 with

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
kub > > pon., 8 paź 2018, 19:32 użytkownik Alfredo Deza > napisał: > >> On Mon, Oct 8, 2018 at 6:09 AM Kevin Olbrich wrote: >> > >> > Hi! >> > >> > Yes, thank you. At least on one node this works, the other node just >> freezes but this mi

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
gt; Wido > > On 10/08/2018 12:01 PM, Kevin Olbrich wrote: > > Hi! > > > > Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id? > > Before I migrated from filestore with simple-mode to bluestore with lvm, > > I was able to find the raw disk with "df&

[ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Kevin Olbrich
Hi! Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id? Before I migrated from filestore with simple-mode to bluestore with lvm, I was able to find the raw disk with "df". Now, I need to go from LVM LV to PV to disk every time I need to check/smartctl a disk. Kevin

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-08 Thread Kevin Olbrich
> setting debug bluefs to 20/20. And if that's not enough log try also > setting debug osd, debug bluestore, and debug bdev to 20/20. > > > > Paul > Am Mi., 3. Okt. 2018 um 13:48 Uhr schrieb Kevin Olbrich : > > > > The disks were deployed with ceph-deploy /

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
umes? > > On 10/3/2018 11:22 AM, Kevin Olbrich wrote: > > Small addition: the failing disks are in the same host. > This is a two-host, failure-domain OSD cluster. > > > Am Mi., 3. Okt. 2018 um 10:13 Uhr schrieb Kevin Olbrich : > >> Hi! >> >> Yesterda

Re: [ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
Small addition: the failing disks are in the same host. This is a two-host, failure-domain OSD cluster. Am Mi., 3. Okt. 2018 um 10:13 Uhr schrieb Kevin Olbrich : > Hi! > > Yesterday one of our (non-priority) clusters failed when 3 OSDs went down > (EC 8+2) together. > *This is st

[ceph-users] After 13.2.2 upgrade: bluefs mount failed to replay log: (5) Input/output error

2018-10-03 Thread Kevin Olbrich
Hi! Yesterday one of our (non-priority) clusters failed when 3 OSDs went down (EC 8+2) together. *This is strange as we did an upgrade from 13.2.1 to 13.2.2 one or two hours before.* They failed exactly at the same moment, rendering the cluster unusable (CephFS). We are using CentOS 7 with latest

Re: [ceph-users] data-pool option for qemu-img / ec pool

2018-09-23 Thread Kevin Olbrich
, is there a better way? Kevin Am So., 23. Sep. 2018 um 18:08 Uhr schrieb Paul Emmerich : > > The usual trick for clients not supporting this natively is the option > "rbd_default_data_pool" in ceph.conf which should also work here. > > > Paul > Am So., 23. Sep. 2018 um

[ceph-users] data-pool option for qemu-img / ec pool

2018-09-23 Thread Kevin Olbrich
Hi! Is it possible to set data-pool for ec-pools on qemu-img? For repl-pools I used "qemu-img convert" to convert from e.g. vmdk to raw and write to rbd/ceph directly. The rbd utility is able to do this for raw or empty images but without convert (converting 800G and writing it again would now

Re: [ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
For example, if you have a hierarchy like root --> host1, host2, host3 > --> nvme/ssd/sata OSDs, then you'll actually have 3 trees: > > root~ssd -> host1~ssd, host2~ssd ... > root~sata -> host~sata, ... > > > Paul > > 2018-09-20 14:54 GMT+02:00 Kevin Olbrich : > &g

Re: [ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
To answer my own question: ceph osd crush tree --show-shadow Sorry for the noise... Am Do., 20. Sep. 2018 um 14:54 Uhr schrieb Kevin Olbrich : > Hi! > > Currently I have a cluster with four hosts and 4x HDDs + 4 SSDs per host. > I also have replication rules to distinguish between

[ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
Hi! Currently I have a cluster with four hosts and 4x HDDs + 4 SSDs per host. I also have replication rules to distinguish between HDD and SSD (and failure-domain set to rack) which are mapped to pools. What happens if I add a heterogeneous host with 1x SSD and 1x NVMe (where NVMe will be a new

[ceph-users] (no subject)

2018-09-18 Thread Kevin Olbrich
Hi! is the compressible hint / incompressible hint supported on qemu+kvm? http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/ If not, only aggressive would work in this case for rbd, right? Kind regards Kevin ___ ceph-users

[ceph-users] nfs-ganesha FSAL CephFS: nfs_health :DBUS :WARN :Health status is unhealthy

2018-09-10 Thread Kevin Olbrich
Hi! Today one of our nfs-ganesha gateway experienced an outage and since crashs every time, the client behind it tries to access the data. This is a Ceph Mimic cluster with nfs-ganesha from ceph-repos: nfs-ganesha-2.6.2-0.1.el7.x86_64 nfs-ganesha-ceph-2.6.2-0.1.el7.x86_64 There were fixes for

[ceph-users] SPDK/DPDK with Intel P3700 NVMe pool

2018-08-30 Thread Kevin Olbrich
Hi! During our move from filestore to bluestore, we removed several Intel P3700 NVMe from the nodes. Is someone running a SPDK/DPDK NVMe-only EC pool? Is it working well? The docs are very short about the setup:

[ceph-users] HDD-only CephFS cluster with EC and without SSD/NVMe

2018-08-22 Thread Kevin Olbrich
Hi! I am in the progress of moving a local ("large", 24x1TB) ZFS RAIDZ2 to CephFS. This storage is used for backup images (large sequential reads and writes). To save space and have a RAIDZ2 (RAID6) like setup, I am planning the following profile: ceph osd erasure-code-profile set myprofile \

Re: [ceph-users] Running 12.2.5 without problems, should I upgrade to 12.2.7 or wait for 12.2.8?

2018-08-10 Thread Kevin Olbrich
Am Fr., 10. Aug. 2018 um 19:29 Uhr schrieb : > > > Am 30. Juli 2018 09:51:23 MESZ schrieb Micha Krause : > >Hi, > > Hi Micha, > > > > >I'm Running 12.2.5 and I have no Problems at the moment. > > > >However my servers reporting daily that they want to upgrade to 12.2.7, > >is this save or should

Re: [ceph-users] v12.2.7 Luminous released

2018-07-19 Thread Kevin Olbrich
Hi, on upgrade from 12.2.4 to 12.2.5 the balancer module broke (mgr crashes minutes after service started). Only solution was to disable the balancer (service is running fine since). Is this fixed in 12.2.7? I was unable to locate the bug in bugtracker. Kevin 2018-07-17 18:28 GMT+02:00

Re: [ceph-users] Periodically activating / peering on OSD add

2018-07-14 Thread Kevin Olbrich
PS: It's luminous 12.2.5! Mit freundlichen Grüßen / best regards, Kevin Olbrich. 2018-07-14 15:19 GMT+02:00 Kevin Olbrich : > Hi, > > why do I see activating followed by peering during OSD add (refill)? > I did not change pg(p)_num. > > Is this normal? From my other clust

[ceph-users] Periodically activating / peering on OSD add

2018-07-14 Thread Kevin Olbrich
Hi, why do I see activating followed by peering during OSD add (refill)? I did not change pg(p)_num. Is this normal? From my other clusters, I don't think that happend... Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Bluestore and number of devices

2018-07-13 Thread Kevin Olbrich
You can keep the same layout as before. Most place DB/WAL combined in one partition (similar to the journal on filestore). Kevin 2018-07-13 12:37 GMT+02:00 Robert Stanford : > > I'm using filestore now, with 4 data devices per journal device. > > I'm confused by this: "BlueStore manages

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-11 Thread Kevin Olbrich
Sounds a little bit like the problem I had on OSDs: [ceph-users] Blocked requests activating+remapped after extending pg(p)_num <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026680.html> *Kevin Olbrich* - [ceph-users] Blocked requests activating+remapped afterextendi

Re: [ceph-users] rbd lock remove unable to parse address

2018-07-10 Thread Kevin Olbrich
2018-07-10 14:37 GMT+02:00 Jason Dillaman : > On Tue, Jul 10, 2018 at 2:37 AM Kevin Olbrich wrote: > >> 2018-07-10 0:35 GMT+02:00 Jason Dillaman : >> >>> Is the link-local address of "fe80::219:99ff:fe9e:3a86%eth0" at least >>> present on the c

Re: [ceph-users] rbd lock remove unable to parse address

2018-07-10 Thread Kevin Olbrich
ocal when there is an ULA-prefix available. The address is available on brX on this client node. - Kevin > On Mon, Jul 9, 2018 at 3:43 PM Kevin Olbrich wrote: > >> 2018-07-09 21:25 GMT+02:00 Jason Dillaman : >> >>> BTW -- are you running Ceph on a one-node computer? I t

Re: [ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Kevin Olbrich
locks and IPv6 addresses >> since it is failing to parse the address as valid. Perhaps it's barfing on >> the "%eth0" scope id suffix within the address. >> >> On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich wrote: >> >>> Hi! >>> >>> I

Re: [ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Kevin Olbrich
pe id suffix within the address. > > On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich wrote: > >> Hi! >> >> I tried to convert an qcow2 file to rbd and set the wrong pool. >> Immediately I stopped the transfer but the image is stuck locked: >> >> Previusly whe

[ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Kevin Olbrich
Hi! I tried to convert an qcow2 file to rbd and set the wrong pool. Immediately I stopped the transfer but the image is stuck locked: Previusly when that happened, I was able to remove the image after 30 secs. [root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02 There is 1 exclusive

[ceph-users] GFS2 as RBD on ceph?

2018-06-12 Thread Kevin Olbrich
Hi! *Is it safe to run GFS2 on ceph as RBD and mount it to approx. 3 to 5 vm's?* Idea is to consolidate 3 webservers which are located behind proxys. The old infrastructure is not HA or capable of load balancing. I would like to set up a webserver, clone the image and mount the GFS2 disk as

Re: [ceph-users] Adding cluster network to running cluster

2018-06-07 Thread Kevin Olbrich
Realy? I always thought that splitting the replication network is best practice. Keeping everything in the same IPv6 network is much easier. Thank you. Kevin 2018-06-07 10:44 GMT+02:00 Wido den Hollander : > > > On 06/07/2018 09:46 AM, Kevin Olbrich wrote: > > Hi! > >

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
never do min_size 1. > > > Paul > > > 2018-05-17 15:48 GMT+02:00 Kevin Olbrich <k...@sv01.de>: > >> I was able to obtain another NVMe to get the HDDs in node1004 into the >> cluster. >> The number of disks (all 1TB) is now balanced between racks, sti

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
but why are they failing to proceed to active+clean or active+remapped? Kind regards, Kevin 2018-05-17 14:05 GMT+02:00 Kevin Olbrich <k...@sv01.de>: > Ok, I just waited some time but I still got some "activating" issues: > > data: > pools: 2 pools, 1536 pgs

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
is is just temporary. Calculated PGs per OSD is 200. I searched the net and the bugtracker but most posts suggest osd_max_pg_per_osd_hard_ratio = 32 to fix this issue but this time, I got more stuck PGs. Any more hints? Kind regards. Kevin 2018-05-17 13:37 GMT+02:00 Kevin Olbrich <k...@sv

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by default, will place 200 PGs on each OSD. I read about the protection in the docs and later noticed that I better had only placed 100 PGs. 2018-05-17 13:35 GMT+02:00 Kevin Olbrich <k...@sv01.de>: > Hi! > >

Re: [ceph-users] Blocked requests activating+remapped afterextendingpg(p)_num

2018-05-17 Thread Kevin Olbrich
t; > > On 05/17/2018 01:09 PM, Kevin Olbrich wrote: > >> Hi! >> >> Today I added some new OSDs (nearly doubled) to my luminous cluster. >> I then changed pg(p)_num from 256 to 1024 for that pool because it was >> complaining about to few PGs. (I no

[ceph-users] Blocked requests activating+remapped after extending pg(p)_num

2018-05-17 Thread Kevin Olbrich
Hi! Today I added some new OSDs (nearly doubled) to my luminous cluster. I then changed pg(p)_num from 256 to 1024 for that pool because it was complaining about to few PGs. (I noticed that should better have been small changes). This is the current status: health: HEALTH_ERR

[ceph-users] read_fsid unparsable uuid

2018-04-26 Thread Kevin Olbrich
Hi! Yesterday I deployed 3x SSDs as OSDs fine but today I get this error when deploying an HDD with separted WAL/DB: stderr: 2018-04-26 11:58:19.531966 7fe57e5f5e00 -1 bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid Command: ceph-deploy --overwrite-conf osd create --dmcrypt

Re: [ceph-users] Where to place Block-DB?

2018-04-26 Thread Kevin Olbrich
cluster. > > On Thu, Apr 26, 2018 at 12:58 PM, Kevin Olbrich <k...@sv01.de> wrote: > > Hi! > > > > On a small cluster I have an Intel P3700 as the journaling device for 4 > > HDDs. > > While using filestore, I used it as journal. > > > > On

[ceph-users] Where to place Block-DB?

2018-04-26 Thread Kevin Olbrich
Hi! On a small cluster I have an Intel P3700 as the journaling device for 4 HDDs. While using filestore, I used it as journal. On bluestore, is it safe to move both Block-DB and WAL to this journal NVMe? Easy maintenance is first priority (on filestore we just had to flush and replace the SSD).

[ceph-users] Backup LUKS/Dmcrypt keys

2018-04-25 Thread Kevin Olbrich
Hi, how can I backup the dmcrypt keys on luminous? The folder under /etc/ceph does not exist anymore. Kind regards Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Some monitors have still not reached quorum

2018-02-23 Thread Kevin Olbrich
I found a fix: It is *mandatory *to set the public network to the same network the mons use. Skipping this while the mon has another network interface, saves garbage to the monmap. - Kevin 2018-02-23 11:38 GMT+01:00 Kevin Olbrich <k...@sv01.de>: > I always see this: > >

Re: [ceph-users] Some monitors have still not reached quorum

2018-02-23 Thread Kevin Olbrich
"0.0.0.0:0/2", [mon01][DEBUG ] "rank": 2 [mon01][DEBUG ] } [mon01][DEBUG ] ] DNS is working fine and the hostnames are also listed in /etc/hosts. I already purged the mon but still the same problem. - Kevin 2018-02-23 10:26 GMT+01:00 Kevin Olbri

[ceph-users] Some monitors have still not reached quorum

2018-02-23 Thread Kevin Olbrich
Hi! On a new cluster, I get the following error. All 3x mons are connected to the same switch and ping between them works (firewalls disabled). Mon-nodes are Ubuntu 16.04 LTS on Cep Luminous. [ceph_deploy.mon][ERROR ] Some monitors have still not reached quorum: [ceph_deploy.mon][ERROR ] mon03

Re: [ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?

2018-02-08 Thread Kevin Olbrich
2018-02-08 11:20 GMT+01:00 Martin Emrich : > I have a machine here mounting a Ceph RBD from luminous 12.2.2 locally, > running linux-generic-hwe-16.04 (4.13.0-32-generic). > > Works fine, except that it does not support the latest features: I had to > disable

Re: [ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?

2018-02-07 Thread Kevin Olbrich
Would be interested as well. - Kevin 2018-02-04 19:00 GMT+01:00 Yoann Moulin : > Hello, > > What is the best kernel for Luminous on Ubuntu 16.04 ? > > Is linux-image-virtual-lts-xenial still the best one ? Or > linux-virtual-hwe-16.04 will offer some improvement ? > >

Re: [ceph-users] _read_bdev_label failed to open

2018-02-04 Thread Kevin Olbrich
1 - 2 were not added, they are (this disk has only two partitions). Should I open a bug? Kind regards, Kevin 2018-02-04 19:05 GMT+01:00 Kevin Olbrich <k...@sv01.de>: > I also noticed there are no folders under /var/lib/ceph/osd/ ... > > > Mit freundlichen Grüßen / best regards,

Re: [ceph-users] _read_bdev_label failed to open

2018-02-04 Thread Kevin Olbrich
I also noticed there are no folders under /var/lib/ceph/osd/ ... Mit freundlichen Grüßen / best regards, Kevin Olbrich. 2018-02-04 19:01 GMT+01:00 Kevin Olbrich <k...@sv01.de>: > Hi! > > Currently I try to re-deploy a cluster from filestore to bluestore. > I zapped all dis

[ceph-users] _read_bdev_label failed to open

2018-02-04 Thread Kevin Olbrich
Hi! Currently I try to re-deploy a cluster from filestore to bluestore. I zapped all disks (multiple times) but I fail adding a disk array: Prepare: > ceph-deploy --overwrite-conf osd prepare --bluestore --block-wal /dev/sdb > --block-db /dev/sdb osd01.cloud.example.local:/dev/mapper/mpatha

Re: [ceph-users] RFC Bluestore-Cluster of SAMSUNG PM863a

2018-02-02 Thread Kevin Olbrich
2018-02-02 12:44 GMT+01:00 Richard Hesketh <richard.hesk...@rd.bbc.co.uk>: > On 02/02/18 08:33, Kevin Olbrich wrote: > > Hi! > > > > I am planning a new Flash-based cluster. In the past we used SAMSUNG > PM863a 480G as journal drives in our HDD cluster. > >

[ceph-users] RFC Bluestore-Cluster of SAMSUNG PM863a

2018-02-02 Thread Kevin Olbrich
Hi! I am planning a new Flash-based cluster. In the past we used SAMSUNG PM863a 480G as journal drives in our HDD cluster. After a lot of tests with luminous and bluestore on HDD clusters, we plan to re-deploy our whole RBD pool (OpenNebula cloud) using these disks. As far as I understand, it

[ceph-users] Adding a new node to a small cluster (size = 2)

2017-05-31 Thread Kevin Olbrich
Hi! A customer is running a small two node ceph cluster with 14 disks each. He has min_size 1 and size 2 and it is only used for backups. If we add a third member with 14 identical disks and remain size = 2, replicas should be distributed evenly, right? Or is an uneven count of hosts unadvisable

Re: [ceph-users] Failed to start Ceph disk activation: /dev/dm-18

2017-05-16 Thread Kevin Olbrich
e until entry in "df" and reboot fixed it. Then OSDs were failing again. Cause: IPv6 DAD on bond-interface. Disabled via sysctl. Reboot and voila, cluster immediately online. Kind regards, Kevin. 2017-05-16 16:59 GMT+02:00 Kevin Olbrich <k...@sv01.de>: > HI! > > Currently I a

[ceph-users] Failed to start Ceph disk activation: /dev/dm-18

2017-05-16 Thread Kevin Olbrich
HI! Currently I am deploying a small cluster with two nodes. I installed ceph jewel on all nodes and made a basic deployment. After "ceph osd create..." I am now getting "Failed to start Ceph disk activation: /dev/dm-18" on boot. All 28 OSDs were never active. This server has a 14 disk JBOD with

Re: [ceph-users] Do we know which version of ceph-client has this fix ? http://tracker.ceph.com/issues/17191

2017-03-22 Thread Kevin Olbrich
he > > original message. > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > &g

[ceph-users] Shrinking lab cluster to free hardware for a new deployment

2017-03-08 Thread Kevin Olbrich
the OSDs out one by one or with norefill, norecovery flags set but all at once? If last is the case, which flags should be set also? Thanks! Kind regards, Kevin Olbrich. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com

[ceph-users] Review of Ceph on ZFS - or how not to deploy Ceph for RBD + OpenStack

2017-01-10 Thread Kevin Olbrich
. I hope this helps all Ceph users who are interested in the idea of running Ceph on ZFS. Kind regards, Kevin Olbrich. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] What happens if all replica OSDs journals are broken?

2016-12-14 Thread Kevin Olbrich
2016-12-14 2:37 GMT+01:00 Christian Balzer <ch...@gol.com>: > > Hello, > Hi! > > On Wed, 14 Dec 2016 00:06:14 +0100 Kevin Olbrich wrote: > > > Ok, thanks for your explanation! > > I read those warnings about size 2 + min_size 1 (we are using ZFS as

Re: [ceph-users] What happens if all replica OSDs journals are broken?

2016-12-13 Thread Kevin Olbrich
Ok, thanks for your explanation! I read those warnings about size 2 + min_size 1 (we are using ZFS as RAID6, called zraid2) as OSDs. Time to raise replication! Kevin 2016-12-13 0:00 GMT+01:00 Christian Balzer <ch...@gol.com>: > On Mon, 12 Dec 2016 22:41:41 +0100 Kevin Olbrich wrote:

[ceph-users] What happens if all replica OSDs journals are broken?

2016-12-12 Thread Kevin Olbrich
Hi, just in case: What happens when all replica journal SSDs are broken at once? The PGs most likely will be stuck inactive but as I read, the journals just need to be replaced (http://ceph.com/planet/ceph-recover-osds-after-ssd- journal-failure/). Does this also work in this case? Kind

Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

2016-12-07 Thread Kevin Olbrich
this is safe regardless of full outage. Mit freundlichen Grüßen / best regards, Kevin Olbrich. 2016-12-07 21:10 GMT+01:00 Wido den Hollander <w...@42on.com>: > > > Op 7 december 2016 om 21:04 schreef "Will.Boege" <will.bo...@target.com > >: > > > > >

Re: [ceph-users] Deploying new OSDs in parallel or one after another

2016-11-28 Thread Kevin Olbrich
I need to note that I already have 5 hosts with one OSD each. Mit freundlichen Grüßen / best regards, Kevin Olbrich. 2016-11-28 10:02 GMT+01:00 Kevin Olbrich <k...@sv01.de>: > Hi! > > I want to deploy two nodes with 4 OSDs each. I already prepared OSDs and > only need to act

[ceph-users] Deploying new OSDs in parallel or one after another

2016-11-28 Thread Kevin Olbrich
Hi! I want to deploy two nodes with 4 OSDs each. I already prepared OSDs and only need to activate them. What is better? One by one or all at once? Kind regards, Kevin. ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] Ceph performance laggy (requests blocked > 32) on OpenStack

2016-11-25 Thread Kevin Olbrich
of them run remote services (terminal). My question is: Are 80 VMs hosted on 53 disks (mostly 7.2k SATA) to much? We sometime experience lags where nearly all servers suffer from "blocked IO > 32" seconds. What are your experiences? Mit freundlichen Grüßen / best regards,

Re: [ceph-users] degraded objects after osd add

2016-11-23 Thread Kevin Olbrich
regards, Kevin Olbrich. > > Original Message > Subject: Re: [ceph-users] degraded objects after osd add (17-Nov-2016 9:14) > From:Burkhard Linke <burkhard.li...@computational.bio.uni-giessen.de> > To: c...@dolphin-it.de > > Hi, > >

[ceph-users] How are replicas spread in default crush configuration?

2016-11-23 Thread Kevin Olbrich
st 4x OSDs (and setting size to 3). I want to make sure we can resist two offline hosts (in terms of hardware). Is my assumption correct? Mit freundlichen Grüßen / best regards, Kevin Olbrich. ___ ceph-users mailing list ceph-users@lists.ceph.com http://list