from:"Webert de Souza Lima"

[ceph-users] ceph mds log: dne in the mdsmap

2017-07-11 Thread Webert de Souza Lima

Hello, today I got a MDS respawn with the following message: 2017-07-11 07:07:55.397645 7ffb7a1d7700 1 mds.b handle_mds_map i ( 10.0.1.2:6822/28190) dne in the mdsmap, respawning myself it happened 3 times within 5 minutes. After so, the MDS took 50 minutes to recover. I can't find what exactly

Re: [ceph-users] ceph mds log: dne in the mdsmap

2017-07-11 Thread Webert de Souza Lima

day was not the case. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* On Tue, Jul 11, 2017 at 11:36 AM, John Spray wrote: > On Tue, Jul 11, 2017 at 3:23 PM, Webert de Souza Lima > wrote: > > Hello, > > > > today I got a MDS respawn wi

Re: [ceph-users] New Ceph Community Manager

2017-07-21 Thread Webert de Souza Lima

Thank you for all your efforts, Patrick. Parabéns e boa sorte Leo :) Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* On Thu, Jul 20, 2017 at 4:21 PM, Patrick McGarry wrote: > Hey cephers, > > As most of you know, my last day as the Ceph community lead is next

Re: [ceph-users] ceph and Fscache : can you kindly share your experiences?

2017-08-01 Thread Webert de Souza Lima

Hi Anish, in case you're still interested, we're using cephfs in production since jewel 10.2.1. I have a few similar clusters with some small set up variations. They're not so big but they're under heavy workload. - 15~20 x 6TB HDD OSDs (5 per node), ~4 x 480GB SSD OSDs (2 per node, set for cache

Re: [ceph-users] ceph and Fscache : can you kindly share your experiences?

2017-08-02 Thread Webert de Souza Lima

age Cluster. > > thanks, > Anish > > > > > ------ > On Tuesday, August 1, 2017, 9:55:39 AM PDT, Webert de Souza Lima < > webert.b...@gmail.com> wrote: > > > Hi Anish, in case you're still interested, we're using cephfs in > production since

[ceph-users] lease_timeout - new election

2017-08-09 Thread Webert de Souza Lima

Hi, I recently had a mds outage beucase the mds suicided due to "dne in the mds map". I've asked it here before and I know that happens because the monitors took out this mds from the mds map even though it was alive. Weird thing, there was no network related issues happening at the time, which i

Re: [ceph-users] lease_timeout - new election

2017-08-09 Thread Webert de Souza Lima

stem state was when this happened? Is there a time of day it is more > likely to happen (expect to find a Cron at that time)? > > On Wed, Aug 9, 2017, 8:37 AM Webert de Souza Lima > wrote: > >> Hi, >> >> I recently had a mds outage beucase the mds suicided due to &q

Re: [ceph-users] lease_timeout - new election

2017-08-21 Thread Webert de Souza Lima

:6801/267422746) dne in the mdsmap, respawning myself 2017-08-19 06:36:11.412808 7f929c788700 1 mds.bhs1-mail03-ds02 respawn Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* On Wed, Aug 9, 2017 at 10:53 AM, Webert de Souza Lima wrote: > Hi David, > > thanks

Re: [ceph-users] lease_timeout - new election

2017-08-25 Thread Webert de Souza Lima

eon).paxos(paxos recovering c 8317388..8318132) lease_timeout -- calling new election Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* On Mon, Aug 21, 2017 at 10:34 AM, Webert de Souza Lima < webert.b...@gmail.com> wrote: > I really need some help thr

Re: [ceph-users] RGW how to delete orphans

2017-09-28 Thread Webert de Souza Lima

Hello, not an expert here but I think the answer is something like: radosgw-admin orphans find --pool=_DATA_POOL_ --job-id=_JOB_ID_ radosgw-admin orphans finish --job-id=_JOB_ID_ _JOB_ID_ being anything. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* On Thu

Re: [ceph-users] RGW how to delete orphans

2017-09-28 Thread Webert de Souza Lima

the moment it just outputs a lot of text > saying something like putting $num in orphans.$jobid.$shardnum and listing > objects that are not orphans? > > Regards, > Andreas > > On 28 Sep 2017 15:10, "Webert de Souza Lima" > wrote: > > Hello, > > not

Re: [ceph-users] RGW how to delete orphans

2017-10-02 Thread Webert de Souza Lima

Hey Christian, On 29 Sep 2017 12:32 a.m., "Christian Wuerdig" > wrote: > >> I'm pretty sure the orphan find command does exactly just that - >> finding orphans. I remember some emails on the dev list where Yehuda >> said he wasn't 100% comfortable of automating the delete just yet. >> So the purp

Re: [ceph-users] Ceph stuck creating pool

2017-10-03 Thread Webert de Souza Lima

This looks like something wrong with the crush rule. What's the size, min_size and crush_rule of this pool? ceph osd pool get POOLNAME size ceph osd pool get POOLNAME min_size ceph osd pool get POOLNAME crush_ruleset How is the crush rule? ceph osd crush rule dump Regards, Webert Lima DevO

[ceph-users] ceph osd disk full (partition 100% used)

2017-10-11 Thread Webert de Souza Lima

Hi, I have a cephfs cluster as follows: 1 15x HDD data pool (primary cephfs data pool) 1 2x SSD data pool (linked to a specific dir via xattrs) 1 2x SSD metadata pool 1 2x SSD cache tier pool the cache tier pool consists in 2 host, with one SSD OSD on each host, with size=2 replicated by host. L

Re: [ceph-users] ceph osd disk full (partition 100% used)

2017-10-11 Thread Webert de Souza Lima

That sounds like it. Thanks David. I wonder if that behavior of ignoring the OSD full_ratio is intentional. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* On Wed, Oct 11, 2017 at 12:26 PM, David Turner wrote: > The full ratio is based on the max bytes. if yo

Re: [ceph-users] Help with full osd and RGW not responsive

2017-10-18 Thread Webert de Souza Lima

Hi Bryan. I hope that solved it for you. Another think you can do in situations like this is to set the full_ration higher so you can work on the problem. Always set it back to a safe value after the issue is solved. *ceph pg set_full_ratio 0.98* Regards, Webert Lima DevOps Engineer at MAV Te

Re: [ceph-users] luminous ubuntu 16.04 HWE (4.10 kernel). ceph-disk can't prepare a disk

2017-10-24 Thread Webert de Souza Lima

When you do umount the device, the raised error is still the same? Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* On Mon, Oct 23, 2017 at 4:46 AM, Wido den Hollander wrote: > > > Op 22 oktober 2017 om 18:45 schreef Sean Sullivan : > > > > > > On freshly insta

Re: [ceph-users] RGW uploaded objects integrity

2017-12-07 Thread Webert de Souza Lima

I have had many cases of corrupt objects in my radosgw cluster. Until now I have looked at it as a software (my software) bug, still unresolved as the incidence as lowered a lot. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Thu, Dec

Re: [ceph-users] Deterministic naming of LVM volumes (ceph-volume)

2017-12-13 Thread Webert de Souza Lima

Hello, On Wed, Dec 13, 2017 at 7:36 AM, Stefan Kooman wrote: > Hi, > > Is there a way to ask Ceph which OSD_ID > would be next up? if I may suggest, "ceph osd create" allocates and returns an OSD ID. So you could take it by doing: ID=$(ceph osd create) then remove it with ceph osd rm $ID

Re: [ceph-users] Deterministic naming of LVM volumes (ceph-volume)

2017-12-13 Thread Webert de Souza Lima

Cool On Wed, Dec 13, 2017 at 11:04 AM, Stefan Kooman wrote: > So, a "ceph osd ls" should give us a list, and we will pick the smallest > available number as the new osd id to use. We will make a check in the > (ansible) deployment code to see Ceph will indeed use that number. > > Thanks, > > Gr

Re: [ceph-users] Deterministic naming of LVM volumes (ceph-volume)

2017-12-13 Thread Webert de Souza Lima

On Wed, Dec 13, 2017 at 11:51 AM, Stefan Kooman wrote: > If we want to remove the OSD (for whatever reason) and release the ID then > we > will use "ceph osd purge* osd.$ID" ... which basically does what you > suggest (ceph auth del osd.$OSD_ID, crush remove osd.$OSD_ID, ceph osd > rm osd.$OSD_ID

Re: [ceph-users] cephfs automatic data pool cleanup

2017-12-13 Thread Webert de Souza Lima

I have experienced delayed free in used space before, in Jewel, but that just stopped happening with no intervention. Back then, umounting all client's fs would make it free the space rapidly. I don't know if that's related. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte

[ceph-users] cephfs mds millions of caps

2017-12-14 Thread Webert de Souza Lima

Hi, I've been look at ceph mds perf counters and I saw the one of my clusters was hugely different from other in number of caps: rlat inos caps | hsr hcs hcr | writ read actv | recd recy stry purg | segs evts subm 0 3.0M 5.1M | 0 0 595 | 30440 | 0 0 13k 0

Re: [ceph-users] cephfs mds millions of caps

2017-12-14 Thread Webert de Souza Lima

Hi Patrick, On Thu, Dec 14, 2017 at 7:52 PM, Patrick Donnelly wrote: > > It's likely you're a victim of a kernel backport that removed a dentry > invalidation mechanism for FUSE mounts. The result is that ceph-fuse > can't trim dentries. > even though I'm not using FUSE? I'm using kernel mount

Re: [ceph-users] cephfs mds millions of caps

2017-12-15 Thread Webert de Souza Lima

Hello, Mr. Yan On Thu, Dec 14, 2017 at 11:36 PM, Yan, Zheng wrote: > > The client hold so many capabilities because kernel keeps lots of > inodes in its cache. Kernel does not trim inodes by itself if it has > no memory pressure. It seems you have set mds_cache_size config to a > large value.

Re: [ceph-users] cephfs mds millions of caps

2017-12-15 Thread Webert de Souza Lima

Thanks On Fri, Dec 15, 2017 at 10:46 AM, Yan, Zheng wrote: > recent > version kernel client and ceph-fuse should trim they cache > aggressively when mds recovers. > So the bug (not sure if I can call it a bug) is already fixed in newer kernel? Can I just update the kernel and expect this to be

Re: [ceph-users] cephfs mds millions of caps

2017-12-15 Thread Webert de Souza Lima

So, On Fri, Dec 15, 2017 at 10:58 AM, Yan, Zheng wrote: > > 300k are ready quite a lot. opening them requires long time. does you > mail server really open so many files? Yes, probably. It's a commercial solution. A few thousand domains, dozens of thousands of users and god knows how any mailb

Re: [ceph-users] cephfs mds millions of caps

2017-12-21 Thread Webert de Souza Lima

id" : "admin" }, "replay_requests" : 0 }, still 1.4M caps used. is upgrading the client kernel enough ? Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Fri, Dec 15, 2017 at 11:16 AM, Webert de Souz

Re: [ceph-users] cephfs mds millions of caps

2017-12-21 Thread Webert de Souza Lima

*Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Thu, Dec 21, 2017 at 11:55 AM, Yan, Zheng wrote: > On Thu, Dec 21, 2017 at 7:33 PM, Webert de Souza Lima > wrote: > > I have upgraded the kernel on a client node (one that has close-to-zero > >

Re: [ceph-users] cephfs mds millions of caps

2017-12-22 Thread Webert de Souza Lima

On Fri, Dec 22, 2017 at 3:20 AM, Yan, Zheng wrote: > idle client shouldn't hold so many caps. > i'll try to make it reproducible for you to test. yes. For now, it's better to run "echo 3 >/proc/sys/vm/drop_caches" > after cronjob finishes Thanks. I'll adopt that for now. Regards, Webert L

Re: [ceph-users] MDS locatiins

2017-12-22 Thread Webert de Souza Lima

it depends on how you use it. for me, it runs fine on the OSD hosts but the mds server consumes loads of RAM, so be aware of that. if the system load average goes too high due to osd disk utilization the MDS server might run into troubles too, as delayed response from the host could cause the MDS t

Re: [ceph-users] [luminous 12.2.2] Cluster write performance degradation problem(possibly tcmalloc related)

2017-12-22 Thread Webert de Souza Lima

On Thu, Dec 21, 2017 at 12:52 PM, shadow_lin wrote: > > After 18:00 suddenly the write throughput dropped and the osd latency > increased. TCmalloc started relcaim page heap freelist much more > frequently.All of this happened very fast and every osd had the indentical > pattern. > Could that be c

Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Webert de Souza Lima

try to kick out (evict) that cephfs client from the mds node, see http://docs.ceph.com/docs/master/cephfs/eviction/ Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, Jan 10, 2018 at 12:59 AM, Mark Schouten wrote: > Hi, > > While up

Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Webert de Souza Lima

On Wed, Jan 10, 2018 at 12:44 PM, Mark Schouten wrote: > > Thanks, that's a good suggestion. Just one question, will this affect > RBD- > > access from the same (client)host? i'm sorry that this didn't help. No, it does not affect rbd clients, as MDS is related only to cephfs. Regards, Webert

Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Webert de Souza Lima

Good to know. I don't think this should trigger HEALTH_ERR though, but HEALTH_WARN makes sense. It makes sense to keep the backfillfull_ratio greater than nearfull_ratio as one might need backfilling to avoid OSD getting full on reweight operations. Regards, Webert Lima DevOps Engineer at MAV Te

[ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima

Hello, I'm running near-out-of service radosgw (very slow to write new objects) and I suspect it's because of ceph df is showing 100% usage in some pools, though I don't know what that information comes from. Pools: #~ ceph osd pool ls detail -> http://termbin.com/lsd0 Crush Rules (important is

Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima

Also, there is no quota set for the pools Here is "ceph osd pool get xxx all": http://termbin.com/ix0n Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* ___ ceph-users mailing list ceph-users@lists

Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima

Sorry I forgot, this is a ceph jewel 10.2.10 Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-cep

Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima

WebertRLZ* On Thu, Jan 18, 2018 at 8:05 PM, David Turner wrote: > `ceph osd df` is a good command for you to see what's going on. Compare > the osd numbers with `ceph osd tree`. > > >> >> On Thu, Jan 18, 2018 at 3:34 PM Webert de Souza Lima < >>

Re: [ceph-users] ceph df shows 100% used

2018-01-18 Thread Webert de Souza Lima

With the help of robbat2 and llua on IRC channel I was able to solve this situation by taking down the 2-OSD only hosts. After crush reweighting OSDs 8 and 23 from host mia1-master-fe02 to 0, ceph df showed the expected storage capacity usage (about 70%) With this in mind, those guys have told me

Re: [ceph-users] ceph df shows 100% used

2018-01-19 Thread Webert de Souza Lima

available space. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Thu, Jan 18, 2018 at 8:21 PM, Webert de Souza Lima wrote: > With the help of robbat2 and llua on IRC channel I was able to solve this > situation by taking down the 2-OS

Re: [ceph-users] ceph df shows 100% used

2018-01-22 Thread Webert de Souza Lima

Hi, On Fri, Jan 19, 2018 at 8:31 PM, zhangbingyin wrote: > 'MAX AVAIL' in the 'ceph df' output represents the amount of data that can > be used before the first OSD becomes full, and not the sum of all free > space across a set of OSDs. > Thank you very much. I figured this out by the end of t

[ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima

Hi, after running a cephfs on my ceph cluster I got stuck with the following heath status: # ceph status cluster ac482f5b-dce7-410d-bcc9-7b8584bd58f5 health HEALTH_WARN 128 pgs degraded 128 pgs stuck unclean 128 pgs undersized recovery 24/4

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima

Also, i instructed all unclean pgs to repair and nothing happend. I did it like this: ~# for pg in `ceph pg dump_stuck unclean 2>&1 | grep -Po '[0-9]+\.[A-Za-z0-9]+'`; do ceph pg repair $pg; done On Tue, Nov 15, 2016 at 9:58 AM Webert de Souza Lima wrote: > Hi, > > af

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima

Hey John. Just to be sure; by "deleting the pools" you mean the *cephfs_metadata* and *cephfs_metadata* pools, right? Does it have any impact over radosgw? Thanks. On Tue, Nov 15, 2016 at 10:10 AM John Spray wrote: > On Tue, Nov 15, 2016 at 11:58 AM, Webert de Souza Lima >

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima

I'm sorry, I meant *cephfs_data* and *cephfs_metadata* On Tue, Nov 15, 2016 at 10:15 AM Webert de Souza Lima wrote: > Hey John. > > Just to be sure; by "deleting the pools" you mean the *cephfs_metadata* > and *cephfs_metadata* pools, right? > Does it have a

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima

John Spray wrote: > On Tue, Nov 15, 2016 at 12:14 PM, Webert de Souza Lima > wrote: > > Hey John. > > > > Just to be sure; by "deleting the pools" you mean the cephfs_metadata and > > cephfs_metadata pools, right? > > Does it have any impact over rado

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima

data master.rgw.meta master.rgw.buckets.non-ec rbd cephfs_metadata cephfs_data *# ceph osd pool stats* https://paste.debian.net/895840/ On Tue, Nov 15, 2016 at 10:33 AM Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de> wrote: > Hi, > > > On 11/15/2016 01:27 PM, Webert de

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima

the id of the hdd crush rule. On Tue, Nov 15, 2016 at 11:09 AM Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de> wrote: > Hi, > > On 11/15/2016 01:55 PM, Webert de Souza Lima wrote: > > sure, as requested: > > *cephfs* was created using the following

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima

I removed cephfs and its pools, created everything again using the default crush ruleset, which is for the HDD, and now ceph health is OK. I appreciate your help. Thank you very much. On Tue, Nov 15, 2016 at 11:48 AM Webert de Souza Lima wrote: > Right, thank you. > > On this particula

[ceph-users] cephfs mds failing to respond to capability release

2016-11-16 Thread Webert de Souza Lima

hi, I have many clusters running cephfs, and in the last 45 days or so, 2 of them started giving me the following message in *ceph health:* *mds0: Client dc1-mx02-fe02:guest failing to respond to capability release* When this happens, cephfs stops responding. It will only get back after I *restar

Re: [ceph-users] cephfs mds failing to respond to capability release

2016-11-16 Thread Webert de Souza Lima

I'm sorry, by server, I meant cluster. On one cluster the rate of files created and read is about 5 per second. On another cluster it's from 25 to 30 files created and read per second. On Wed, Nov 16, 2016 at 2:03 PM Webert de Souza Lima wrote: > Hello John. > > I'

Re: [ceph-users] cephfs mds failing to respond to capability release

2016-11-16 Thread Webert de Souza Lima

t. i'm not sure what would happen here. On Wed, Nov 16, 2016 at 1:42 PM John Spray wrote: > On Wed, Nov 16, 2016 at 3:15 PM, Webert de Souza Lima > wrote: > > hi, > > > > I have many clusters running cephfs, and in the last 45 days or so, 2 of > > them star

Re: [ceph-users] cephfs mds failing to respond to capability release

2016-11-23 Thread Webert de Souza Lima

is it possible to count open file descriptors in cephfs only? On Wed, Nov 16, 2016 at 2:12 PM Webert de Souza Lima wrote: > I'm sorry, by server, I meant cluster. > On one cluster the rate of files created and read is about 5 per second. > On another cluster it's from 25 to 3

Re: [ceph-users] SSD for bluestore

2018-07-09 Thread Webert de Souza Lima

bluestore doesn't have a journal like the filestore does, but there is the WAL (Write-Ahead Log) which is looks like a journal but works differently. You can (or must, depending or your needs) have SSDs to serve this WAL (and for Rocks DB). Regards, Webert Lima DevOps Engineer at MAV Tecnologia *

Re: [ceph-users] v10.2.11 Jewel released

2018-07-11 Thread Webert de Souza Lima

Cheers! Thanks for all the backports and fixes. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, Jul 11, 2018 at 1:46 PM Abhishek Lekshmanan wrote: > > We're glad to announce v10.2.11 release of the Jewel stable release > series.

Re: [ceph-users] Whole cluster flapping

2018-07-31 Thread Webert de Souza Lima

The pool deletion might have triggered a lot of IO operations on the disks and the process might be too busy to respond to hearbeats, so the mons mark them as down due to no response. Check also the OSD logs to see if they are actually crashing and restarting, and disk IO usage (i.e. iostat). Rega

Re: [ceph-users] Whole cluster flapping

2018-08-07 Thread Webert de Souza Lima

> Regards, > > > > *De :* ceph-users *De la part de* > Webert de Souza Lima > *Envoyé :* 31 July 2018 16:25 > *À :* ceph-users > *Objet :* Re: [ceph-users] Whole cluster flapping > > > > The pool deletion might have triggered a lot of IO operations on the

Re: [ceph-users] Whole cluster flapping

2018-08-07 Thread Webert de Souza Lima

Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Tue, Aug 7, 2018 at 10:47 AM CUZA Frédéric wrote: > Pool is already deleted and no longer present in stats. > > > > Regards, > > > > *De :* ceph-users *De la part de* > Webert de Souz

Re: [ceph-users] cephfs kernel client hangs

2018-08-07 Thread Webert de Souza Lima

Yan, Zheng 于2018年8月7日周二下午7:51写道： > On Tue, Aug 7, 2018 at 7:15 PM Zhenshi Zhou wrote: > this can cause memory deadlock. you should avoid doing this > > > Yan, Zheng 于2018年8月7日周二19:12写道： > >> > >> did you mount cephfs on the same machines that run ceph-osd? > >> I didn't know about this. I ru

Re: [ceph-users] cephfs kernel client hangs

2018-08-07 Thread Webert de Souza Lima

he kernel client at this > point, but that isn’t etched in stone. > > > > Curious if there is more to share. > > > > Reed > > > > On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima > wrote: > > > > > > Yan, Zheng 于2018年8月7日周二下午7:51写道： >

Re: [ceph-users] Whole cluster flapping

2018-08-08 Thread Webert de Souza Lima

p is_healthy 'OSD::osd_op_tp thread 0x7fdabd897700' had > timed out after 90 > > > > (I update it to 90 instead of 15s) > > > > Regards, > > > > > > > > *De :* ceph-users *De la part de* > Webert de Souza Lima > *Envoyé :* 07 August

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Webert de Souza Lima

gt;>> >>> >>> This is not a Ceph-specific thing -- it can also affect similar >>> >>> systems like Lustre. >>> >>> >>> >>> The classic case is when under some memory pressure, the kernel tries >>> >>> to f

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Webert de Souza Lima

g time. > So I cannot get useful infomation from the command you provide. > > Thanks > > Webert de Souza Lima 于2018年8月8日周三下午10:10写道： > >> You could also see open sessions at the MDS server by issuing `ceph >> daemon mds.XX session ls` >> >> Regards, >>

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Webert de Souza Lima

nt, I can't restart it everytime. > > Webert de Souza Lima 于2018年8月8日周三下午10:33写道： > >> Hi Zhenshi, >> >> if you still have the client mount hanging but no session is connected, >> you probably have some PID waiting with blocked IO from cephfs mount. >>

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-10-04 Thread Webert de Souza Lima

DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, May 16, 2018 at 5:15 PM Webert de Souza Lima wrote: > Thanks Jack. > > That's good to know. It is definitely something to consider. > In a distributed storage scenario we might build a dedica

Re: [ceph-users] Can't get MDS running after a power outage

2018-03-29 Thread Webert de Souza Lima

I'd also try to boot up only one mds until it's fully up and running. Not both of them. Sometimes they go switching states between each other. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Thu, Mar 29, 2018 at 7:32 AM, John Spray wro

[ceph-users] Question: CephFS + Bluestore

2018-05-09 Thread Webert de Souza Lima

Hello, Currently, I run Jewel + Filestore for cephfs, with SSD-only pools used for cephfs-metadata, and HDD-only pools for cephfs-data. The current metadata/data ratio is something like 0,25% (50GB metadata for 20TB data). Regarding bluestore architecture, assuming I have: - SSDs for WAL+DB -

Re: [ceph-users] Question: CephFS + Bluestore

2018-05-09 Thread Webert de Souza Lima

I'm sorry I have mixed up some information. The actual ratio I have now is 0,0005% (*100MB for 20TB data*). Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, May 9, 2018 at 11:32 AM, Webert de Souza Lima wrote: &g

Re: [ceph-users] Question: CephFS + Bluestore

2018-05-09 Thread Webert de Souza Lima

Hey Jon! On Wed, May 9, 2018 at 12:11 PM, John Spray wrote: > It depends on the metadata intensity of your workload. It might be > quite interesting to gather some drive stats on how many IOPS are > currently hitting your metadata pool over a week of normal activity. > Any ceph built-in tool f

Re: [ceph-users] howto: multiple ceph filesystems

2018-05-11 Thread Webert de Souza Lima

Basically what we're trying to figure out looks like what is being done here: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020958.html But instead of using LIBRADOS to store EMAILs directly into RADOS we're still using CEPHFS for it, just figuring out if it makes sense to sep

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-05-11 Thread Webert de Souza Lima

You could use "mds_cache_size" to limit number of CAPS untill you have this fixed, but I'd say for your number of caps and inodes, 20GB is normal. this mds (jewel) here is consuming 24GB RAM: { "mds": { "request": 7194867047, "reply": 7194866688, "reply_latency": {

Re: [ceph-users] Node crash, filesytem not usable

2018-05-11 Thread Webert de Souza Lima

This message seems to be very concerning: >mds0: Metadata damage detected but for the rest, the cluster seems still to be recovering. you could try to seep thing up with ceph tell, like: ceph tell osd.* injectargs --osd_max_backfills=10 ceph tell osd.* injectargs --osd_recovery_sleep

Re: [ceph-users] Question: CephFS + Bluestore

2018-05-11 Thread Webert de Souza Lima

(write/read)_bytes(_total) Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, May 9, 2018 at 2:23 PM Webert de Souza Lima wrote: > Hey Jon! > > On Wed, May 9, 2018 at 12:11 PM, John Spray wrote: > >> It depends

Re: [ceph-users] Question: CephFS + Bluestore

2018-05-11 Thread Webert de Souza Lima

11, 2018 at 2:39 PM Webert de Souza Lima < > webert.b...@gmail.com> wrote: > >> I think ceph doesn't have IO metrics will filters by pool right? I see IO >> metrics from clients only: >> >> ceph_client_io_ops >> ceph_client_io_read_bytes >> ceph_cli

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-05-14 Thread Webert de Souza Lima

On Sat, May 12, 2018 at 3:11 AM Alexandre DERUMIER wrote: > The documentation (luminous) say: > > >mds cache size > > > >Description:The number of inodes to cache. A value of 0 indicates an > unlimited number. It is recommended to use mds_cache_memory_limit to limit > the amount of memory t

Re: [ceph-users] Node crash, filesytem not usable

2018-05-15 Thread Webert de Souza Lima

0", > "osd_op_num_threads_per_shard_hdd": "1", > "osd_op_num_threads_per_shard_ssd": "2", > "osd_op_thread_suicide_timeout": "150", > "osd_op_thread_timeout": "15", > "os

[ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima

I'm sending this message to both dovecot and ceph-users ML so please don't mind if something seems too obvious for you. Hi, I have a question for both dovecot and ceph lists and below I'll explain what's going on. Regarding dbox format (https://wiki2.dovecot.org/MailboxFormat/dbox), when using s

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima

and will help you a lot: > - Compression (classic, https://wiki.dovecot.org/Plugins/Zlib) > - Single-Instance-Storage (aka sis, aka "attachment deduplication" : > https://www.dovecot.org/list/dovecot/2013-December/094276.html) > > Regards, > On 05/16/2018 08:37 PM, Webert d

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima

ction, but you can try it to run a POC. > > For more information check out my slides from Ceph Day London 2018: > https://dalgaaf.github.io/cephday-london2018-emailstorage/#/cover-page > > The project can be found on github: > https://github.com/ceph-dovecot/ > > -Danny >

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima

*IRC NICK - WebertRLZ* On Wed, May 16, 2018 at 4:45 PM Jack wrote: > On 05/16/2018 09:35 PM, Webert de Souza Lima wrote: > > We'll soon do benchmarks of sdbox vs mdbox over cephfs with bluestore > > backend. > > We'll have to do some some work on how to simulat

Re: [ceph-users] Multi-MDS Failover

2018-05-18 Thread Webert de Souza Lima

Hello, On Mon, Apr 30, 2018 at 7:16 AM Daniel Baumann wrote: > additionally: if rank 0 is lost, the whole FS stands still (no new > client can mount the fs; no existing client can change a directory, etc.). > > my guess is that the root of a cephfs (/; which is always served by rank > 0) is nee

[ceph-users] (yet another) multi active mds advise needed

2018-05-18 Thread Webert de Souza Lima

Hi, We're migrating from a Jewel / filestore based cephfs archicture to a Luminous / buestore based one. One MUST HAVE is multiple Active MDS daemons. I'm still lacking knowledge of how it actually works. After reading the docs and ML we learned that they work by sort of dividing the responsibili

Re: [ceph-users] (yet another) multi active mds advise needed

2018-05-18 Thread Webert de Souza Lima

Hi Patrick On Fri, May 18, 2018 at 6:20 PM Patrick Donnelly wrote: > Each MDS may have multiple subtrees they are authoritative for. Each > MDS may also replicate metadata from another MDS as a form of load > balancing. Ok, its good to know that it actually does some load balance. Thanks. New

Re: [ceph-users] (yet another) multi active mds advise needed

2018-05-19 Thread Webert de Souza Lima

Hi Daniel, Thanks for clarifying. I'll have a look at dirfrag option. Regards, Webert Lima Em sáb, 19 de mai de 2018 01:18, Daniel Baumann escreveu: > On 05/19/2018 01:13 AM, Webert de Souza Lima wrote: > > New question: will it make any difference in the balancing if instead

[ceph-users] cephfs: bind data pool via file layout

2018-06-12 Thread Webert de Souza Lima

hello, is there any performance impact on cephfs for using file layouts to bind a specific directory in cephfs to a given pool? Of course, such pool is not the default data pool for this cephfs. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRL

Re: [ceph-users] cephfs: bind data pool via file layout

2018-06-13 Thread Webert de Souza Lima

I think in this scenario the overhead may be acceptable for us. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, Jun 13, 2018 at 9:51 AM Yan, Zheng wrote: > On Wed, Jun 13, 2018 at 3:34 AM Webert de Souza Lima > wrote: >

Re: [ceph-users] cephfs: bind data pool via file layout

2018-06-13 Thread Webert de Souza Lima

pool isn’t > available you would stack up pending RADOS writes inside of your mds but > the rest of the system would continue unless you manage to run the mds out > of memory. > -Greg > On Wed, Jun 13, 2018 at 9:25 AM Webert de Souza Lima < > webert.b...@gmail.com> wrote: > >>

Re: [ceph-users] cephfs: bind data pool via file layout

2018-06-13 Thread Webert de Souza Lima

n’t. The backtrace does > create another object but IIRC it’s a maximum one IO per create/rename (on > the file). > On Wed, Jun 13, 2018 at 1:12 PM Webert de Souza Lima < > webert.b...@gmail.com> wrote: > >> Thanks for clarifying that, Gregory. >> >> As said bef

Re: [ceph-users] Minimal MDS for CephFS on OSD hosts

2018-06-19 Thread Webert de Souza Lima

Keep in mind that the mds server is cpu-bound, so during heavy workloads it will eat up CPU usage, so the OSD daemons can affect or be affected by the MDS daemon. But it does work well. We've been running a few clusters with MON, MDS and OSDs sharing the same hosts for a couple of years now. Regar

[ceph-users] cephfs cache tiering - hitset

2017-03-17 Thread Webert de Souza Lima

Hello everyone, I`m deploying a ceph cluster with cephfs and I`d like to tune ceph cache tiering, and I`m a little bit confused of the settings hit_set_count, hit_set_period and min_read_recency_for_promote. The docs are very lean and I can`f find any more detailed explanation anywhere. Could som

Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-04 Thread Webert de Souza Lima

I have faced the same problem many times. Usually it doesn't cause anything bad, but I had a 30 min system outage twice because of this. It might be because of the number of inodes on your ceph filesystem. Go to the MDS server and do (supposing your mds server id is intcfs-osd1): ceph daemon mds.

[ceph-users] Ceph MDS daemonperf

2017-05-09 Thread Webert de Souza Lima

Hi, by issuing `ceph daemonperf mds.x` I see the following columns: -mds-- --mds_server-- ---objecter--- -mds_cache- ---mds_log rlat inos caps|hsr hcs hcr |writ read actv|recd recy stry purg|segs evts subm| 0 95 41 | 000 | 000 | 00 250 | 1

[ceph-users] CephFS Performance

2017-05-09 Thread Webert de Souza Lima

Hello all, I'm been using cephfs for a while but never really evaluated its performance. As I put up a new ceph cluster, I though that I should run a benchmark to see if I'm going the right way. By the results I got, I see that RBD performs *a lot* better in comparison to cephfs. The cluster is

Re: [ceph-users] CephFS Performance

2017-05-09 Thread Webert de Souza Lima

That 1gbps link is the only option I have for those servers, unfortunately. It's all dedicated server rentals from OVH. I don't have information regarding the internals of the vrack. So by what you said, I understand that one should expect a performance drop in comparison to ceph rbd using the sam

Re: [ceph-users] CephFS Performance

2017-05-09 Thread Webert de Souza Lima

ms or so. > > Ceph spends much more time in the CPU then it will take the network to > forward that IP-packet. > > I wouldn't be too afraid to run Ceph over a L3 network. > > Wido > > > On May 9, 2017 12:01 PM, "Webert de Souza Lima" > > wrote:

Re: [ceph-users] CephFS Performance

2017-05-10 Thread Webert de Souza Lima

On Tue, May 9, 2017 at 9:07 PM, Brady Deetz wrote: > So with email, you're talking about lots of small reads and writes. In my > experience with dicom data (thousands of 20KB files per directory), cephfs > doesn't perform very well at all on platter drivers. I haven't experimented > with pure ssd

Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-12 Thread Webert de Souza Lima

On Wed, May 10, 2017 at 4:09 AM, gjprabu wrote: > Hi Webert, > > Thanks for your reply , can pls suggest ceph pg value for data and > metadata. I have set 128 for data and 128 for metadata , is this correct > Well I think this has nothing to do with your current problem but the PG number d

Re: [ceph-users] Ceph MDS daemonperf

2017-05-12 Thread Webert de Souza Lima

ote: > On Tue, May 9, 2017 at 5:23 PM, Webert de Souza Lima > wrote: > > Hi, > > > > by issuing `ceph daemonperf mds.x` I see the following columns: > > > > -mds-- --mds_server-- ---objecter--- -mds_cache- > > ---mds_log > > rlat i

Re: [ceph-users] Ceph MDS daemonperf

2017-05-12 Thread Webert de Souza Lima

wrote: > On Fri, May 12, 2017 at 12:47 PM, Webert de Souza Lima > wrote: > > Thanks John, > > > > I did as yuo suggested but unfortunately I only found information > regarding > > the objecter nicks "writ, read and actv", any more suggestions? > >

1 2 >

1 - 100 of 105 matches

Mail list logo