[ceph-users] OSD failed, wont come up

2018-07-19 Thread shrey chauhan
Hi all, I am facing a major issue where my osd is down and not coming up after a reboot. These are the last osd logs 2018-07-20 10:43:00.701904 7f02f1b53d80 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1532063580701900, "job": 1, "event": "recovery_finished"} 2018-07-20 10:43:00.735978 7f02f1b53d80

Re: [ceph-users] RDMA question for ceph

2018-07-19 Thread Will Zhao
Hi John: Thanks for your reply. Yes, the following is the detail . ibdev2netdev mlx4_0 port 1 ==> ib0 (Down) mlx4_0 port 2 ==> ib1 (Up) sh show-gids.sh DEV PORTINDEX GID IPv4 VER DEV --- - ---

[ceph-users] OSD failed, wont come up

2018-07-19 Thread shrey chauhan
Hi all, I am facing a major issue where my osd is down and not coming up after a reboot. These are the last osd logs 2018-07-20 10:43:00.701904 7f02f1b53d80 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1532063580701900, "job": 1, "event": "recovery_finished"} 2018-07-20 10:43:00.735978 7f02f1b53d80

Re: [ceph-users] Omap warning in 12.2.6

2018-07-19 Thread Brad Hubbard
Search the cluster log for 'Large omap object found' for more details. On Fri, Jul 20, 2018 at 5:13 AM, Brent Kennedy wrote: > I just upgraded our cluster to 12.2.6 and now I see this warning about 1 > large omap object. I looked and it seems this warning was just added in > 12.2.6. I found a

Re: [ceph-users] active+clean+inconsistent PGs after upgrade to 12.2.7

2018-07-19 Thread Brad Hubbard
I've updated the tracker. On Thu, Jul 19, 2018 at 7:51 PM, Robert Sander wrote: > On 19.07.2018 11:15, Ronny Aasen wrote: > >> Did you upgrade from 12.2.5 or 12.2.6 ? > > Yes. > >> sounds like you hit the reason for the 12.2.7 release >> >> read :

Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Willem Jan Withagen
On 19/07/2018 10:53, Simon Ironside wrote: On 19/07/18 07:59, Dietmar Rieder wrote: We have P840ar controllers with battery backed cache in our OSD nodes and configured an individual RAID-0 for each OSD (ceph luminous + bluestore). We have not seen any problems with this setup so far and

Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Satish Patel
Thanks for massive details, so what are the options I have can I disable raid controller and run system without raid and use software raid for OS? Does that make sense ? Sent from my iPhone > On Jul 19, 2018, at 6:33 AM, Willem Jan Withagen wrote: > >> On 19/07/2018 10:53, Simon Ironside

Re: [ceph-users] v12.2.7 Luminous released

2018-07-19 Thread Kevin Olbrich
Hi, on upgrade from 12.2.4 to 12.2.5 the balancer module broke (mgr crashes minutes after service started). Only solution was to disable the balancer (service is running fine since). Is this fixed in 12.2.7? I was unable to locate the bug in bugtracker. Kevin 2018-07-17 18:28 GMT+02:00

Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Willem Jan Withagen
On 19/07/2018 13:28, Satish Patel wrote: Thanks for massive details, so what are the options I have can I disable raid controller and run system without raid and use software raid for OS? Not sure what kind of RAID controller you have. I seem to recall and HP thingy? And those I don't trust

[ceph-users] Force cephfs delayed deletion

2018-07-19 Thread Alexander Ryabov
Hello, I see that free space is not released after files are removed on CephFS. I'm using Luminous with replica=3 without any snapshots etc and with default settings. >From client side: $ du -sh /mnt/logs/ 4.1G /mnt/logs/ $ df -h /mnt/logs/ Filesystem Size Used Avail Use% Mounted on

[ceph-users] Increase tcmalloc thread cache bytes - still recommended?

2018-07-19 Thread Robert Stanford
It seems that the Ceph community no longer recommends changing to jemalloc. However this also recommends to do what's in this email's subject: https://ceph.com/geen-categorie/the-ceph-and-tcmalloc-performance-story/ Is it still recommended to increase the tcmalloc thread cache bytes, or is

Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Dietmar Rieder
On 07/19/2018 04:44 AM, Satish Patel wrote: > If i have 8 OSD drives in server on P410i RAID controller (HP), If i > want to make this server has OSD node in that case show should i > configure RAID? > > 1. Put all drives in RAID-0? > 2. Put individual HDD in RAID-0 and create 8 individual RAID-0

[ceph-users] active+clean+inconsistent PGs after upgrade to 12.2.7

2018-07-19 Thread Robert Sander
Hi, just a quick warning: We currently see active+clean+inconsistent PGs on two cluster after upgrading to 12.2.7. I created http://tracker.ceph.com/issues/24994 Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 /

Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Simon Ironside
On 19/07/18 07:59, Dietmar Rieder wrote: We have P840ar controllers with battery backed cache in our OSD nodes and configured an individual RAID-0 for each OSD (ceph luminous + bluestore). We have not seen any problems with this setup so far and performance is great at least for our workload.

Re: [ceph-users] Crush Rules with multiple Device Classes

2018-07-19 Thread Oliver Freyermuth
Am 19.07.2018 um 08:43 schrieb Linh Vu: > Since the new NVMes are meant to replace the existing SSDs, why don't you > assign class "ssd" to the new NVMe OSDs? That way you don't need to change > the existing OSDs nor the existing crush rule. And the new NVMe OSDs won't > lose any performance,

Re: [ceph-users] Crush Rules with multiple Device Classes

2018-07-19 Thread Linh Vu
Since the new NVMes are meant to replace the existing SSDs, why don't you assign class "ssd" to the new NVMe OSDs? That way you don't need to change the existing OSDs nor the existing crush rule. And the new NVMe OSDs won't lose any performance, "ssd" or "nvme" is just a name. When you deploy

Re: [ceph-users] active+clean+inconsistent PGs after upgrade to 12.2.7

2018-07-19 Thread Ronny Aasen
On 19. juli 2018 10:37, Robert Sander wrote: Hi, just a quick warning: We currently see active+clean+inconsistent PGs on two cluster after upgrading to 12.2.7. I created http://tracker.ceph.com/issues/24994 Regards Did you upgrade from 12.2.5 or 12.2.6 ? sounds like you hit the reason for

Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Marco Gaiarin
Mandi! Troy Ablan In chel di` si favelave... > Even worse, the P410i doesn't appear to support a pass-thru (JBOD/HBA) > mode, so your only sane option for using this card is to create RAID-0s. I confirm Even worse, P410i can define a maximum of 2 'array' (even a fake array composed of one disk

Re: [ceph-users] Fwd: MDS memory usage is very high

2018-07-19 Thread Daniel Carrasco
Hello, Finally I've to remove CephFS and use a simple NFS, because the MDS daemon starts to use a lot of memory and is unstable. After reboot one node because it started to swap (the cluster will be able to survive without a node), the cluster goes down because one of the other MDS starts to use

Re: [ceph-users] Fwd: MDS memory usage is very high

2018-07-19 Thread Daniel Carrasco
Hello again, It is still early to say that is working fine now, but looks like the MDS memory is now under 20% of RAM and the most of time between 6-9%. Maybe was a mistake on configuration. As appointment, I've changed this client config: [global] ... bluestore_cache_size_ssd = 805306360

Re: [ceph-users] krbd vs librbd performance with qemu

2018-07-19 Thread Nikola Ciprich
> > opts="--randrepeat=1 --ioengine=rbd --direct=1 --numjobs=${numjobs} > > --gtod_reduce=1 --name=test --pool=${pool} --rbdname=${vol} --invalidate=0 > > --bs=4k --iodepth=64 --time_based --runtime=$time --group_reporting" > > > > So that "--numjobs" parameter is what I was referring to when I

[ceph-users] RDMA question for ceph

2018-07-19 Thread Will Zhao
Hi all: Has anyone successfully set up ceph with rdma over IB ? By following the instructions: (https://community.mellanox.com/docs/DOC-2721) (https://community.mellanox.com/docs/DOC-2693) (http://hwchiu.com/2017-05-03-ceph-with-rdma.html) I'm trying to configure CEPH with RDMA feature

Re: [ceph-users] Crush Rules with multiple Device Classes

2018-07-19 Thread Oliver Freyermuth
Am 19.07.2018 um 05:57 schrieb Konstantin Shalygin: >> Now my first question is: >> 1) Is there a way to specify "take default class (ssd or nvme)"? >>Then we could just do this for the migration period, and at some point >> remove "ssd". >> >> If multi-device-class in a crush rule is not

Re: [ceph-users] active+clean+inconsistent PGs after upgrade to 12.2.7

2018-07-19 Thread Robert Sander
On 19.07.2018 11:15, Ronny Aasen wrote: > Did you upgrade from 12.2.5 or 12.2.6 ? Yes. > sounds like you hit the reason for the 12.2.7 release > > read : https://ceph.com/releases/12-2-7-luminous-released/ > > there should come features in 12.2.8 that can deal with the "objects are > in sync

Re: [ceph-users] Converting to BlueStore, and external journal devices

2018-07-19 Thread Eugen Block
Sounds like the typical configuration is just RocksDB on the SSD, and both data and WAL on the OSD disk? Not quite, WAL will be on the fastest available device. If you have NVMe, SSD and HDD, your command should look something like this: ceph-volume lvm create --bluestore --data /dev/$HDD

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-19 Thread Oliver Schulz
Yes, I'd love to go with Optanes ... you think 480 GB will be fine for WAL+DB for 15x12TB, long term? I only hesitate because I've seen recommendations of "10 GB DB per 1 TB HDD" several times. How much total HDD capacity do you have per Optane 900P 480GB? Cheers, Oliver On 18.07.2018 10:23,

[ceph-users] Lost TB for Object storage

2018-07-19 Thread CUZA Frédéric
Hi Guys, We are running a Ceph Luminous 12.2.6 cluster. The cluster is used both for RBD storage and Ceph Object Storage and is about 742 TB raw space. We have an application that push snapshots of our VMs through RGW all seem to be fine except that we have a decorrelation between what the S3

[ceph-users] Converting to BlueStore, and external journal devices

2018-07-19 Thread Robert Stanford
I am following the steps here: http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/ The final step is: ceph-volume lvm create --bluestore --data $DEVICE --osd-id $ID I notice this command doesn't specify a device to use as the journal. Is it implied that BlueStore will use

Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)

2018-07-19 Thread Troy Ablan
>> >> I'm on IRC (as MooingLemur) if more real-time communication would help :) > > Sure, I'll try to contact you there. In the meantime could you open up > a tracker showing the crash stack trace above and a brief description > of the current situation and the events leading up to it? Could

Re: [ceph-users] Converting to BlueStore, and external journal devices

2018-07-19 Thread Eugen Block
Hi, if you have SSDs for RocksDB, you should provide that in the command (--block.db $DEV), otherwise Ceph will use the one provided disk for all data and RocksDB/WAL. Before you create that OSD you probably should check out the help page for that command, maybe there are more options you

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-19 Thread Oliver Schulz
Thanks! On 18.07.2018 10:23, Linh Vu wrote: I think the P4600 should be fine, although 2TB is probably way over kill for 15 OSDs. Our older nodes use the P3700 400GB for 16 OSDs. I have yet to see the WAL and DB getting filled up at 2GB/10GB each. Our newer nodes use the Intel Optane 900P

[ceph-users] [RBD]Replace block device cluster

2018-07-19 Thread Nino Bosteels
We're looking to replace our existing RBD cluster, which makes and stores our backups. Atm we've got one machine running backuppc, where the RBD is mounted and 8 ceph nodes. The idea is to gain in speed and/or pay less (or pay equally for moar speed). Doubting to get SSD in the mix. Have I

Re: [ceph-users] Force cephfs delayed deletion

2018-07-19 Thread John Spray
On Thu, Jul 19, 2018 at 1:58 PM Alexander Ryabov wrote: > Hello, > > I see that free space is not released after files are removed on CephFS. > > I'm using Luminous with replica=3 without any snapshots etc and with > default settings. > > > From client side: > $ du -sh /mnt/logs/ > 4.1G

Re: [ceph-users] Converting to BlueStore, and external journal devices

2018-07-19 Thread Robert Stanford
Thank you. Sounds like the typical configuration is just RocksDB on the SSD, and both data and WAL on the OSD disk? On Thu, Jul 19, 2018 at 9:00 AM, Eugen Block wrote: > Hi, > > if you have SSDs for RocksDB, you should provide that in the command > (--block.db $DEV), otherwise Ceph will use

Re: [ceph-users] Need advice on Ceph design

2018-07-19 Thread Satish Patel
I am following your blog which is awesome! based on your explanation this is what i am thinking, I have hardware and some consumer grade SSD in my stock so i am build my cluster using those and will keep journal+data on same SSD after that i will run some load test to see how it performing and

Re: [ceph-users] Force cephfs delayed deletion

2018-07-19 Thread Alexander Ryabov
>Also, since I see this is a log directory, check that you don't have some >processes that are holding their log files open even after they're unlinked. Thank you very much - that was the case. lsof /mnt/logs | grep deleted After dealing with these, space was reclaimed in about 2-3min.

Re: [ceph-users] Migrating EC pool to device-class crush rules

2018-07-19 Thread Graham Allan
On 7/18/2018 10:27 PM, Konstantin Shalygin wrote: So mostly I want to confirm that is is safe to change the crush rule for the EC pool. Changing crush rules for replicated or ec pool is safe. One thing is, when I was migrated from multiroot to device-classes I was recreate ec pools and clone

Re: [ceph-users] Increase tcmalloc thread cache bytes - still recommended?

2018-07-19 Thread Gregory Farnum
I don't think that's a default recommendation — Ceph is doing more configuration of tcmalloc these days, tcmalloc has resolved a lot of bugs, and that was only ever a thing that mattered for SSD-backed OSDs anyway. -Greg On Thu, Jul 19, 2018 at 5:50 AM Robert Stanford wrote: > > It seems that

[ceph-users] Omap warning in 12.2.6

2018-07-19 Thread Brent Kennedy
I just upgraded our cluster to 12.2.6 and now I see this warning about 1 large omap object. I looked and it seems this warning was just added in 12.2.6. I found a few discussions on what is was but not much information on addressing it properly. Our cluster uses rgw exclusively with just a few

Re: [ceph-users] Omap warning in 12.2.6

2018-07-19 Thread Brady Deetz
12.2.6 has a regression. See "v12.2.7 Luminous released" and all of the related disaster posts. Also in the release nodes for .7 is a bug disclosure for 12.2.5 that affects rgw users pretty badly during upgrade. You might take a look there. On Thu, Jul 19, 2018 at 2:13 PM Brent Kennedy wrote: >

Re: [ceph-users] Increase tcmalloc thread cache bytes - still recommended?

2018-07-19 Thread Mark Nelson
I believe that the standard mechanisms for launching OSDs already sets the thread cache higher than default.  It's possible we might be able to relax that now as async messenger doesn't thrash the cache as badly as simple messenger did.  I suspect there's probably still some value to

[ceph-users] design question - NVME + NLSAS, SSD or SSD + NLSAS

2018-07-19 Thread Steven Vacaroaia
Hi, I would appreciate any advice ( with arguments , if possible) regarding the best design approach considering below facts - budget is set to XX amount - goal is to get as much performance / capacity as possible using XX - 4 to 6 servers, DELL R620/R630 with 8 disk slots, 64 G RAM and 8 cores