Re: [ceph-users] [RFC] New S3 Benchmark

2019-08-15 Thread Mark Nelson
Ha, I hadn't thought to check if that existed which was probably pretty short-sighted on my part. :) I suppose the good news is that it might be a good candidate for comparison, and once I've implemented an S3 client endpoint in CBT I should be able to target both pretty easily. Mark On

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-15 Thread Marc Schöchlin
Hello Mike, Am 15.08.19 um 19:57 schrieb Mike Christie: > >> Don't waste your time. I found a way to replicate it now. >> > > Just a quick update. > > Looks like we are trying to allocate memory in the IO path in a way that > can swing back on us, so we can end up locking up. You are probably not

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-15 Thread Mike Christie
On 08/14/2019 06:55 PM, Mike Christie wrote: > On 08/14/2019 02:09 PM, Mike Christie wrote: >> On 08/14/2019 07:35 AM, Marc Schöchlin wrote: > 3. I wonder if we are hitting a bug with PF_MEMALLOC Ilya hit with krbd. > He removed that code from the krbd. I will ping him on that. >>> >>>

Re: [ceph-users] New Cluster Failing to Start

2019-08-15 Thread solarflow99
You are using Nautilus right? Did you use ansible to deploy it? On Wed, Aug 14, 2019, 10:31 AM wrote: > All; > > We're working to deploy our first production Ceph cluster, and we've run > into a snag. > > The MONs start, but the "cluster" doesn't appear to come up. Ceph -s > never returns. >

[ceph-users] pgs inconsistent

2019-08-15 Thread huxia...@horebdata.cn
Dear folks, I had a Ceph cluster with replication 2, 3 nodes, each node with 3 OSDs, on Luminous 12.2.12. Some days ago i had one OSD down (the disk is still fine) due to some errors on rocksdb crash. I tried to restart that OSD but failed. So I tried to rebalance but encountered PGs

Re: [ceph-users] CephFS meltdown fallout: mds assert failure, kernel oopses

2019-08-15 Thread Jeff Layton
On Thu, 2019-08-15 at 16:45 +0900, Hector Martin wrote: > On 15/08/2019 03.40, Jeff Layton wrote: > > On Wed, 2019-08-14 at 19:29 +0200, Ilya Dryomov wrote: > > > Jeff, the oops seems to be a NULL dereference in ceph_lock_message(). > > > Please take a look. > > > > > > > (sorry for duplicate

Re: [ceph-users] WAL/DB size

2019-08-15 Thread Виталий Филиппов
30gb already includes WAL, see http://yourcmc.ru/wiki/Ceph_performance#About_block.db_sizing 15 августа 2019 г. 1:15:58 GMT+03:00, Anthony D'Atri пишет: >Good points in both posts, but I think there’s still some unclarity. > >Absolutely let’s talk about DB and WAL together. By “bluestore goes

Re: [ceph-users] WAL/DB size

2019-08-15 Thread Mark Nelson
Hi Folks, The basic idea behind the WAL is that for every DB write transaction you first write it into an in-memory buffer and to a region on disk.  RocksDB typically is setup to have multiple WAL buffers, and when one or more fills up, it will start flushing the data to L0 while new writes

Re: [ceph-users] CephFS meltdown fallout: mds assert failure, kernel oopses

2019-08-15 Thread Hector Martin
On 15/08/2019 03.40, Jeff Layton wrote: On Wed, 2019-08-14 at 19:29 +0200, Ilya Dryomov wrote: Jeff, the oops seems to be a NULL dereference in ceph_lock_message(). Please take a look. (sorry for duplicate mail -- the other one ended up in moderation) Thanks Ilya, That function is pretty

Re: [ceph-users] ceph device list empty

2019-08-15 Thread Eugen Block
Hi, are the OSD nodes on Nautilus already? We upgraded from Luminous to Nautilus recently and the commands return valid output, except for those OSDs that haven't been upgraded yet. Zitat von Gary Molenkamp : I've had no luck in tracing this down.  I've tried setting debugging and log

Re: [ceph-users] WAL/DB size

2019-08-15 Thread Janne Johansson
Den tors 15 aug. 2019 kl 00:16 skrev Anthony D'Atri : > Good points in both posts, but I think there’s still some unclarity. > ... > We’ve seen good explanations on the list of why only specific DB sizes, > say 30GB, are actually used _for the DB_. > If the WAL goes along with the DB,