Hello,
today I got a MDS respawn with the following message:
2017-07-11 07:07:55.397645 7ffb7a1d7700 1 mds.b handle_mds_map i (
10.0.1.2:6822/28190) dne in the mdsmap, respawning myself
it happened 3 times within 5 minutes. After so, the MDS took 50 minutes to
recover.
I can't find what exactly
day was not the case.
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
On Tue, Jul 11, 2017 at 11:36 AM, John Spray wrote:
> On Tue, Jul 11, 2017 at 3:23 PM, Webert de Souza Lima
> wrote:
> > Hello,
> >
> > today I got a MDS respawn wi
Thank you for all your efforts, Patrick.
Parabéns e boa sorte Leo :)
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
On Thu, Jul 20, 2017 at 4:21 PM, Patrick McGarry
wrote:
> Hey cephers,
>
> As most of you know, my last day as the Ceph community lead is next
Hi Anish, in case you're still interested, we're using cephfs in production
since jewel 10.2.1.
I have a few similar clusters with some small set up variations. They're
not so big but they're under heavy workload.
- 15~20 x 6TB HDD OSDs (5 per node), ~4 x 480GB SSD OSDs (2 per node, set
for cache
age Cluster.
>
> thanks,
> Anish
>
>
>
>
> ------
> On Tuesday, August 1, 2017, 9:55:39 AM PDT, Webert de Souza Lima <
> webert.b...@gmail.com> wrote:
>
>
> Hi Anish, in case you're still interested, we're using cephfs in
> production since
Hi,
I recently had a mds outage beucase the mds suicided due to "dne in the mds
map".
I've asked it here before and I know that happens because the monitors took
out this mds from the mds map even though it was alive.
Weird thing, there was no network related issues happening at the time,
which i
stem state was when this happened? Is there a time of day it is more
> likely to happen (expect to find a Cron at that time)?
>
> On Wed, Aug 9, 2017, 8:37 AM Webert de Souza Lima
> wrote:
>
>> Hi,
>>
>> I recently had a mds outage beucase the mds suicided due to &q
:6801/267422746) dne in the mdsmap, respawning
myself
2017-08-19 06:36:11.412808 7f929c788700 1 mds.bhs1-mail03-ds02 respawn
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
On Wed, Aug 9, 2017 at 10:53 AM, Webert de Souza Lima wrote:
> Hi David,
>
> thanks
eon).paxos(paxos recovering c
8317388..8318132) lease_timeout -- calling new election
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
On Mon, Aug 21, 2017 at 10:34 AM, Webert de Souza Lima <
webert.b...@gmail.com> wrote:
> I really need some help thr
Hello,
not an expert here but I think the answer is something like:
radosgw-admin orphans find --pool=_DATA_POOL_ --job-id=_JOB_ID_
radosgw-admin orphans finish --job-id=_JOB_ID_
_JOB_ID_ being anything.
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
On Thu
the moment it just outputs a lot of text
> saying something like putting $num in orphans.$jobid.$shardnum and listing
> objects that are not orphans?
>
> Regards,
> Andreas
>
> On 28 Sep 2017 15:10, "Webert de Souza Lima"
> wrote:
>
> Hello,
>
> not
Hey Christian,
On 29 Sep 2017 12:32 a.m., "Christian Wuerdig"
> wrote:
>
>> I'm pretty sure the orphan find command does exactly just that -
>> finding orphans. I remember some emails on the dev list where Yehuda
>> said he wasn't 100% comfortable of automating the delete just yet.
>> So the purp
This looks like something wrong with the crush rule.
What's the size, min_size and crush_rule of this pool?
ceph osd pool get POOLNAME size
ceph osd pool get POOLNAME min_size
ceph osd pool get POOLNAME crush_ruleset
How is the crush rule?
ceph osd crush rule dump
Regards,
Webert Lima
DevO
Hi,
I have a cephfs cluster as follows:
1 15x HDD data pool (primary cephfs data pool)
1 2x SSD data pool (linked to a specific dir via xattrs)
1 2x SSD metadata pool
1 2x SSD cache tier pool
the cache tier pool consists in 2 host, with one SSD OSD on each host, with
size=2 replicated by host.
L
That sounds like it. Thanks David.
I wonder if that behavior of ignoring the OSD full_ratio is intentional.
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
On Wed, Oct 11, 2017 at 12:26 PM, David Turner
wrote:
> The full ratio is based on the max bytes. if yo
Hi Bryan.
I hope that solved it for you.
Another think you can do in situations like this is to set the full_ration
higher so you can work on the problem. Always set it back to a safe value
after the issue is solved.
*ceph pg set_full_ratio 0.98*
Regards,
Webert Lima
DevOps Engineer at MAV Te
When you do umount the device, the raised error is still the same?
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
On Mon, Oct 23, 2017 at 4:46 AM, Wido den Hollander wrote:
>
> > Op 22 oktober 2017 om 18:45 schreef Sean Sullivan :
> >
> >
> > On freshly insta
I have had many cases of corrupt objects in my radosgw cluster. Until now I
have looked at it as a software (my software) bug, still unresolved as the
incidence as lowered a lot.
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Thu, Dec
Hello,
On Wed, Dec 13, 2017 at 7:36 AM, Stefan Kooman wrote:
> Hi,
>
> Is there a way to ask Ceph which OSD_ID
> would be next up?
if I may suggest, "ceph osd create" allocates and returns an OSD ID. So you
could take it by doing:
ID=$(ceph osd create)
then remove it with
ceph osd rm $ID
Cool
On Wed, Dec 13, 2017 at 11:04 AM, Stefan Kooman wrote:
> So, a "ceph osd ls" should give us a list, and we will pick the smallest
> available number as the new osd id to use. We will make a check in the
> (ansible) deployment code to see Ceph will indeed use that number.
>
> Thanks,
>
> Gr
On Wed, Dec 13, 2017 at 11:51 AM, Stefan Kooman wrote:
> If we want to remove the OSD (for whatever reason) and release the ID then
> we
> will use "ceph osd purge* osd.$ID" ... which basically does what you
> suggest (ceph auth del osd.$OSD_ID, crush remove osd.$OSD_ID, ceph osd
> rm osd.$OSD_ID
I have experienced delayed free in used space before, in Jewel, but that
just stopped happening with no intervention.
Back then, umounting all client's fs would make it free the space rapidly.
I don't know if that's related.
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte
Hi,
I've been look at ceph mds perf counters and I saw the one of my clusters
was hugely different from other in number of caps:
rlat inos caps | hsr hcs hcr | writ read actv | recd recy stry purg
| segs evts subm
0 3.0M 5.1M | 0 0 595 | 30440 | 0 0 13k
0
Hi Patrick,
On Thu, Dec 14, 2017 at 7:52 PM, Patrick Donnelly
wrote:
>
> It's likely you're a victim of a kernel backport that removed a dentry
> invalidation mechanism for FUSE mounts. The result is that ceph-fuse
> can't trim dentries.
>
even though I'm not using FUSE? I'm using kernel mount
Hello, Mr. Yan
On Thu, Dec 14, 2017 at 11:36 PM, Yan, Zheng wrote:
>
> The client hold so many capabilities because kernel keeps lots of
> inodes in its cache. Kernel does not trim inodes by itself if it has
> no memory pressure. It seems you have set mds_cache_size config to a
> large value.
Thanks
On Fri, Dec 15, 2017 at 10:46 AM, Yan, Zheng wrote:
> recent
> version kernel client and ceph-fuse should trim they cache
> aggressively when mds recovers.
>
So the bug (not sure if I can call it a bug) is already fixed in newer
kernel? Can I just update the kernel and expect this to be
So,
On Fri, Dec 15, 2017 at 10:58 AM, Yan, Zheng wrote:
>
> 300k are ready quite a lot. opening them requires long time. does you
> mail server really open so many files?
Yes, probably. It's a commercial solution. A few thousand domains, dozens
of thousands of users and god knows how any mailb
id" : "admin"
},
"replay_requests" : 0
},
still 1.4M caps used.
is upgrading the client kernel enough ?
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Fri, Dec 15, 2017 at 11:16 AM, Webert de Souz
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Thu, Dec 21, 2017 at 11:55 AM, Yan, Zheng wrote:
> On Thu, Dec 21, 2017 at 7:33 PM, Webert de Souza Lima
> wrote:
> > I have upgraded the kernel on a client node (one that has close-to-zero
> >
On Fri, Dec 22, 2017 at 3:20 AM, Yan, Zheng wrote:
> idle client shouldn't hold so many caps.
>
i'll try to make it reproducible for you to test.
yes. For now, it's better to run "echo 3 >/proc/sys/vm/drop_caches"
> after cronjob finishes
Thanks. I'll adopt that for now.
Regards,
Webert L
it depends on how you use it. for me, it runs fine on the OSD hosts but the
mds server consumes loads of RAM, so be aware of that.
if the system load average goes too high due to osd disk utilization the
MDS server might run into troubles too, as delayed response from the host
could cause the MDS t
On Thu, Dec 21, 2017 at 12:52 PM, shadow_lin wrote:
>
> After 18:00 suddenly the write throughput dropped and the osd latency
> increased. TCmalloc started relcaim page heap freelist much more
> frequently.All of this happened very fast and every osd had the indentical
> pattern.
>
Could that be c
try to kick out (evict) that cephfs client from the mds node, see
http://docs.ceph.com/docs/master/cephfs/eviction/
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Wed, Jan 10, 2018 at 12:59 AM, Mark Schouten wrote:
> Hi,
>
> While up
On Wed, Jan 10, 2018 at 12:44 PM, Mark Schouten wrote:
> > Thanks, that's a good suggestion. Just one question, will this affect
> RBD-
> > access from the same (client)host?
i'm sorry that this didn't help. No, it does not affect rbd clients, as MDS
is related only to cephfs.
Regards,
Webert
Good to know. I don't think this should trigger HEALTH_ERR though, but
HEALTH_WARN makes sense.
It makes sense to keep the backfillfull_ratio greater than nearfull_ratio
as one might need backfilling to avoid OSD getting full on reweight
operations.
Regards,
Webert Lima
DevOps Engineer at MAV Te
Hello,
I'm running near-out-of service radosgw (very slow to write new objects)
and I suspect it's because of ceph df is showing 100% usage in some pools,
though I don't know what that information comes from.
Pools:
#~ ceph osd pool ls detail -> http://termbin.com/lsd0
Crush Rules (important is
Also, there is no quota set for the pools
Here is "ceph osd pool get xxx all": http://termbin.com/ix0n
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
___
ceph-users mailing list
ceph-users@lists
Sorry I forgot, this is a ceph jewel 10.2.10
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-cep
WebertRLZ*
On Thu, Jan 18, 2018 at 8:05 PM, David Turner wrote:
> `ceph osd df` is a good command for you to see what's going on. Compare
> the osd numbers with `ceph osd tree`.
>
>
>>
>> On Thu, Jan 18, 2018 at 3:34 PM Webert de Souza Lima <
>>
With the help of robbat2 and llua on IRC channel I was able to solve this
situation by taking down the 2-OSD only hosts.
After crush reweighting OSDs 8 and 23 from host mia1-master-fe02 to 0, ceph
df showed the expected storage capacity usage (about 70%)
With this in mind, those guys have told me
available space.
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Thu, Jan 18, 2018 at 8:21 PM, Webert de Souza Lima wrote:
> With the help of robbat2 and llua on IRC channel I was able to solve this
> situation by taking down the 2-OS
Hi,
On Fri, Jan 19, 2018 at 8:31 PM, zhangbingyin
wrote:
> 'MAX AVAIL' in the 'ceph df' output represents the amount of data that can
> be used before the first OSD becomes full, and not the sum of all free
> space across a set of OSDs.
>
Thank you very much. I figured this out by the end of t
Hi,
after running a cephfs on my ceph cluster I got stuck with the following
heath status:
# ceph status
cluster ac482f5b-dce7-410d-bcc9-7b8584bd58f5
health HEALTH_WARN
128 pgs degraded
128 pgs stuck unclean
128 pgs undersized
recovery 24/4
Also, i instructed all unclean pgs to repair and nothing happend. I did it
like this:
~# for pg in `ceph pg dump_stuck unclean 2>&1 | grep -Po
'[0-9]+\.[A-Za-z0-9]+'`; do ceph pg repair $pg; done
On Tue, Nov 15, 2016 at 9:58 AM Webert de Souza Lima
wrote:
> Hi,
>
> af
Hey John.
Just to be sure; by "deleting the pools" you mean the *cephfs_metadata* and
*cephfs_metadata* pools, right?
Does it have any impact over radosgw? Thanks.
On Tue, Nov 15, 2016 at 10:10 AM John Spray wrote:
> On Tue, Nov 15, 2016 at 11:58 AM, Webert de Souza Lima
>
I'm sorry, I meant *cephfs_data* and *cephfs_metadata*
On Tue, Nov 15, 2016 at 10:15 AM Webert de Souza Lima
wrote:
> Hey John.
>
> Just to be sure; by "deleting the pools" you mean the *cephfs_metadata*
> and *cephfs_metadata* pools, right?
> Does it have a
John Spray wrote:
> On Tue, Nov 15, 2016 at 12:14 PM, Webert de Souza Lima
> wrote:
> > Hey John.
> >
> > Just to be sure; by "deleting the pools" you mean the cephfs_metadata and
> > cephfs_metadata pools, right?
> > Does it have any impact over rado
data
master.rgw.meta
master.rgw.buckets.non-ec
rbd
cephfs_metadata
cephfs_data
*# ceph osd pool stats*
https://paste.debian.net/895840/
On Tue, Nov 15, 2016 at 10:33 AM Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de> wrote:
> Hi,
>
>
> On 11/15/2016 01:27 PM, Webert de
the id of the hdd crush rule.
On Tue, Nov 15, 2016 at 11:09 AM Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de> wrote:
> Hi,
>
> On 11/15/2016 01:55 PM, Webert de Souza Lima wrote:
>
> sure, as requested:
>
> *cephfs* was created using the following
I removed cephfs and its pools, created everything again using the default
crush ruleset, which is for the HDD, and now ceph health is OK.
I appreciate your help. Thank you very much.
On Tue, Nov 15, 2016 at 11:48 AM Webert de Souza Lima
wrote:
> Right, thank you.
>
> On this particula
hi,
I have many clusters running cephfs, and in the last 45 days or so, 2 of
them started giving me the following message in *ceph health:*
*mds0: Client dc1-mx02-fe02:guest failing to respond to capability release*
When this happens, cephfs stops responding. It will only get back
after I *restar
I'm sorry, by server, I meant cluster.
On one cluster the rate of files created and read is about 5 per second.
On another cluster it's from 25 to 30 files created and read per second.
On Wed, Nov 16, 2016 at 2:03 PM Webert de Souza Lima
wrote:
> Hello John.
>
> I'
t. i'm not sure what would happen here.
On Wed, Nov 16, 2016 at 1:42 PM John Spray wrote:
> On Wed, Nov 16, 2016 at 3:15 PM, Webert de Souza Lima
> wrote:
> > hi,
> >
> > I have many clusters running cephfs, and in the last 45 days or so, 2 of
> > them star
is it possible to count open file descriptors in cephfs only?
On Wed, Nov 16, 2016 at 2:12 PM Webert de Souza Lima
wrote:
> I'm sorry, by server, I meant cluster.
> On one cluster the rate of files created and read is about 5 per second.
> On another cluster it's from 25 to 3
bluestore doesn't have a journal like the filestore does, but there is the
WAL (Write-Ahead Log) which is looks like a journal but works differently.
You can (or must, depending or your needs) have SSDs to serve this WAL (and
for Rocks DB).
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*
Cheers!
Thanks for all the backports and fixes.
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Wed, Jul 11, 2018 at 1:46 PM Abhishek Lekshmanan
wrote:
>
> We're glad to announce v10.2.11 release of the Jewel stable release
> series.
The pool deletion might have triggered a lot of IO operations on the disks
and the process might be too busy to respond to hearbeats, so the mons mark
them as down due to no response.
Check also the OSD logs to see if they are actually crashing and
restarting, and disk IO usage (i.e. iostat).
Rega
> Regards,
>
>
>
> *De :* ceph-users *De la part de*
> Webert de Souza Lima
> *Envoyé :* 31 July 2018 16:25
> *À :* ceph-users
> *Objet :* Re: [ceph-users] Whole cluster flapping
>
>
>
> The pool deletion might have triggered a lot of IO operations on the
Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Tue, Aug 7, 2018 at 10:47 AM CUZA Frédéric wrote:
> Pool is already deleted and no longer present in stats.
>
>
>
> Regards,
>
>
>
> *De :* ceph-users *De la part de*
> Webert de Souz
Yan, Zheng 于2018年8月7日周二 下午7:51写道:
> On Tue, Aug 7, 2018 at 7:15 PM Zhenshi Zhou wrote:
> this can cause memory deadlock. you should avoid doing this
>
> > Yan, Zheng 于2018年8月7日 周二19:12写道:
> >>
> >> did you mount cephfs on the same machines that run ceph-osd?
> >>
I didn't know about this. I ru
he kernel client at this
> point, but that isn’t etched in stone.
> >
> > Curious if there is more to share.
> >
> > Reed
> >
> > On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima
> wrote:
> >
> >
> > Yan, Zheng 于2018年8月7日周二 下午7:51写道:
>
p is_healthy 'OSD::osd_op_tp thread 0x7fdabd897700' had
> timed out after 90
>
>
>
> (I update it to 90 instead of 15s)
>
>
>
> Regards,
>
>
>
>
>
>
>
> *De :* ceph-users *De la part de*
> Webert de Souza Lima
> *Envoyé :* 07 August
gt;>>
>>> >>> This is not a Ceph-specific thing -- it can also affect similar
>>> >>> systems like Lustre.
>>> >>>
>>> >>> The classic case is when under some memory pressure, the kernel tries
>>> >>> to f
g time.
> So I cannot get useful infomation from the command you provide.
>
> Thanks
>
> Webert de Souza Lima 于2018年8月8日周三 下午10:10写道:
>
>> You could also see open sessions at the MDS server by issuing `ceph
>> daemon mds.XX session ls`
>>
>> Regards,
>>
nt, I can't restart it everytime.
>
> Webert de Souza Lima 于2018年8月8日周三 下午10:33写道:
>
>> Hi Zhenshi,
>>
>> if you still have the client mount hanging but no session is connected,
>> you probably have some PID waiting with blocked IO from cephfs mount.
>>
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Wed, May 16, 2018 at 5:15 PM Webert de Souza Lima
wrote:
> Thanks Jack.
>
> That's good to know. It is definitely something to consider.
> In a distributed storage scenario we might build a dedica
I'd also try to boot up only one mds until it's fully up and running. Not
both of them.
Sometimes they go switching states between each other.
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Thu, Mar 29, 2018 at 7:32 AM, John Spray wro
Hello,
Currently, I run Jewel + Filestore for cephfs, with SSD-only pools used for
cephfs-metadata, and HDD-only pools for cephfs-data. The current
metadata/data ratio is something like 0,25% (50GB metadata for 20TB data).
Regarding bluestore architecture, assuming I have:
- SSDs for WAL+DB
-
I'm sorry I have mixed up some information. The actual ratio I have now
is 0,0005% (*100MB for 20TB data*).
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Wed, May 9, 2018 at 11:32 AM, Webert de Souza Lima wrote:
&g
Hey Jon!
On Wed, May 9, 2018 at 12:11 PM, John Spray wrote:
> It depends on the metadata intensity of your workload. It might be
> quite interesting to gather some drive stats on how many IOPS are
> currently hitting your metadata pool over a week of normal activity.
>
Any ceph built-in tool f
Basically what we're trying to figure out looks like what is being done
here:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020958.html
But instead of using LIBRADOS to store EMAILs directly into RADOS we're
still using CEPHFS for it, just figuring out if it makes sense to sep
You could use "mds_cache_size" to limit number of CAPS untill you have this
fixed, but I'd say for your number of caps and inodes, 20GB is normal.
this mds (jewel) here is consuming 24GB RAM:
{
"mds": {
"request": 7194867047,
"reply": 7194866688,
"reply_latency": {
This message seems to be very concerning:
>mds0: Metadata damage detected
but for the rest, the cluster seems still to be recovering. you could try
to seep thing up with ceph tell, like:
ceph tell osd.* injectargs --osd_max_backfills=10
ceph tell osd.* injectargs --osd_recovery_sleep
(write/read)_bytes(_total)
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Wed, May 9, 2018 at 2:23 PM Webert de Souza Lima
wrote:
> Hey Jon!
>
> On Wed, May 9, 2018 at 12:11 PM, John Spray wrote:
>
>> It depends
11, 2018 at 2:39 PM Webert de Souza Lima <
> webert.b...@gmail.com> wrote:
>
>> I think ceph doesn't have IO metrics will filters by pool right? I see IO
>> metrics from clients only:
>>
>> ceph_client_io_ops
>> ceph_client_io_read_bytes
>> ceph_cli
On Sat, May 12, 2018 at 3:11 AM Alexandre DERUMIER
wrote:
> The documentation (luminous) say:
>
> >mds cache size
> >
> >Description:The number of inodes to cache. A value of 0 indicates an
> unlimited number. It is recommended to use mds_cache_memory_limit to limit
> the amount of memory t
0",
> "osd_op_num_threads_per_shard_hdd": "1",
> "osd_op_num_threads_per_shard_ssd": "2",
> "osd_op_thread_suicide_timeout": "150",
> "osd_op_thread_timeout": "15",
> "os
I'm sending this message to both dovecot and ceph-users ML so please don't
mind if something seems too obvious for you.
Hi,
I have a question for both dovecot and ceph lists and below I'll explain
what's going on.
Regarding dbox format (https://wiki2.dovecot.org/MailboxFormat/dbox), when
using s
and will help you a lot:
> - Compression (classic, https://wiki.dovecot.org/Plugins/Zlib)
> - Single-Instance-Storage (aka sis, aka "attachment deduplication" :
> https://www.dovecot.org/list/dovecot/2013-December/094276.html)
>
> Regards,
> On 05/16/2018 08:37 PM, Webert d
ction, but you can try it to run a POC.
>
> For more information check out my slides from Ceph Day London 2018:
> https://dalgaaf.github.io/cephday-london2018-emailstorage/#/cover-page
>
> The project can be found on github:
> https://github.com/ceph-dovecot/
>
> -Danny
>
*IRC NICK - WebertRLZ*
On Wed, May 16, 2018 at 4:45 PM Jack wrote:
> On 05/16/2018 09:35 PM, Webert de Souza Lima wrote:
> > We'll soon do benchmarks of sdbox vs mdbox over cephfs with bluestore
> > backend.
> > We'll have to do some some work on how to simulat
Hello,
On Mon, Apr 30, 2018 at 7:16 AM Daniel Baumann
wrote:
> additionally: if rank 0 is lost, the whole FS stands still (no new
> client can mount the fs; no existing client can change a directory, etc.).
>
> my guess is that the root of a cephfs (/; which is always served by rank
> 0) is nee
Hi,
We're migrating from a Jewel / filestore based cephfs archicture to a
Luminous / buestore based one.
One MUST HAVE is multiple Active MDS daemons. I'm still lacking knowledge
of how it actually works.
After reading the docs and ML we learned that they work by sort of dividing
the responsibili
Hi Patrick
On Fri, May 18, 2018 at 6:20 PM Patrick Donnelly
wrote:
> Each MDS may have multiple subtrees they are authoritative for. Each
> MDS may also replicate metadata from another MDS as a form of load
> balancing.
Ok, its good to know that it actually does some load balance. Thanks.
New
Hi Daniel,
Thanks for clarifying.
I'll have a look at dirfrag option.
Regards,
Webert Lima
Em sáb, 19 de mai de 2018 01:18, Daniel Baumann
escreveu:
> On 05/19/2018 01:13 AM, Webert de Souza Lima wrote:
> > New question: will it make any difference in the balancing if instead
hello,
is there any performance impact on cephfs for using file layouts to bind a
specific directory in cephfs to a given pool? Of course, such pool is not
the default data pool for this cephfs.
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRL
I think in this scenario the overhead may be acceptable for us.
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Wed, Jun 13, 2018 at 9:51 AM Yan, Zheng wrote:
> On Wed, Jun 13, 2018 at 3:34 AM Webert de Souza Lima
> wrote:
>
pool isn’t
> available you would stack up pending RADOS writes inside of your mds but
> the rest of the system would continue unless you manage to run the mds out
> of memory.
> -Greg
> On Wed, Jun 13, 2018 at 9:25 AM Webert de Souza Lima <
> webert.b...@gmail.com> wrote:
>
>>
n’t. The backtrace does
> create another object but IIRC it’s a maximum one IO per create/rename (on
> the file).
> On Wed, Jun 13, 2018 at 1:12 PM Webert de Souza Lima <
> webert.b...@gmail.com> wrote:
>
>> Thanks for clarifying that, Gregory.
>>
>> As said bef
Keep in mind that the mds server is cpu-bound, so during heavy workloads it
will eat up CPU usage, so the OSD daemons can affect or be affected by the
MDS daemon.
But it does work well. We've been running a few clusters with MON, MDS and
OSDs sharing the same hosts for a couple of years now.
Regar
Hello everyone,
I`m deploying a ceph cluster with cephfs and I`d like to tune ceph cache
tiering, and I`m
a little bit confused of the settings hit_set_count, hit_set_period and
min_read_recency_for_promote. The docs are very lean and I can`f find any
more detailed explanation anywhere.
Could som
I have faced the same problem many times. Usually it doesn't cause anything
bad, but I had a 30 min system outage twice because of this.
It might be because of the number of inodes on your ceph filesystem. Go to
the MDS server and do (supposing your mds server id is intcfs-osd1):
ceph daemon mds.
Hi,
by issuing `ceph daemonperf mds.x` I see the following columns:
-mds-- --mds_server-- ---objecter--- -mds_cache-
---mds_log
rlat inos caps|hsr hcs hcr |writ read actv|recd recy stry purg|segs evts
subm|
0 95 41 | 000 | 000 | 00 250 | 1
Hello all,
I'm been using cephfs for a while but never really evaluated its
performance.
As I put up a new ceph cluster, I though that I should run a benchmark to
see if I'm going the right way.
By the results I got, I see that RBD performs *a lot* better in comparison
to cephfs.
The cluster is
That 1gbps link is the only option I have for those servers, unfortunately.
It's all dedicated server rentals from OVH.
I don't have information regarding the internals of the vrack.
So by what you said, I understand that one should expect a performance drop
in comparison to ceph rbd using the sam
ms or so.
>
> Ceph spends much more time in the CPU then it will take the network to
> forward that IP-packet.
>
> I wouldn't be too afraid to run Ceph over a L3 network.
>
> Wido
>
> > On May 9, 2017 12:01 PM, "Webert de Souza Lima"
> > wrote:
On Tue, May 9, 2017 at 9:07 PM, Brady Deetz wrote:
> So with email, you're talking about lots of small reads and writes. In my
> experience with dicom data (thousands of 20KB files per directory), cephfs
> doesn't perform very well at all on platter drivers. I haven't experimented
> with pure ssd
On Wed, May 10, 2017 at 4:09 AM, gjprabu wrote:
> Hi Webert,
>
> Thanks for your reply , can pls suggest ceph pg value for data and
> metadata. I have set 128 for data and 128 for metadata , is this correct
>
Well I think this has nothing to do with your current problem but the PG
number d
ote:
> On Tue, May 9, 2017 at 5:23 PM, Webert de Souza Lima
> wrote:
> > Hi,
> >
> > by issuing `ceph daemonperf mds.x` I see the following columns:
> >
> > -mds-- --mds_server-- ---objecter--- -mds_cache-
> > ---mds_log
> > rlat i
wrote:
> On Fri, May 12, 2017 at 12:47 PM, Webert de Souza Lima
> wrote:
> > Thanks John,
> >
> > I did as yuo suggested but unfortunately I only found information
> regarding
> > the objecter nicks "writ, read and actv", any more suggestions?
>
>
1 - 100 of 105 matches
Mail list logo