need to reduce
bluestore_min_alloc_size.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
g cleaned up. I guess
I'll see once I catch up on snapshot deletions.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On 13/09/2019 16.25, Hector Martin wrote:
> Is this expected for CephFS? I know data deletions are asynchronous, but
> not being able to delete metadata/directories without an undue impact on
> the whole filesystem performance is somewhat problematic.
I think I'm getting a feelin
ch at that time, so I'm not sure what the
bottleneck is here.
Is this expected for CephFS? I know data deletions are asynchronous, but
not being able to delete metadata/directories without an undue impact on
the whole filesystem performance is somewhat problematic.
--
Hector Martin (hec...@marc
magine you should be able to get reasonable aggregate
performance out of the whole thing, but I've never tried a setup like that.
I'm actually considering this kind of thing in the future (moving from
one monolithic server to a more cluster-like setup) but it's just an
idea for now.
--
Hec
involve keeping two
months worth of snapshots? That CephFS can't support this kind of use
case (and in general that CephFS uses the stray subdir persistently for
files in snapshots that could remain forever, while the stray dirs don't
scale) sounds like a bug.
--
Hector Martin (hec
dx
0xd788 <+536>: mov%edx,0x48(%r15)
That means req->r_reply_info.filelock_reply was NULL.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
about reconnections and such) and seems to be fine.
I can't find these errors anywhere, so I'm guessing they're not known bugs?
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.co
y and tested and
everything seems fine. I deployed it to production and got rid of the
drop_caches hack and I've seen no stuck ops for two days so far.
If there is a bug or PR opened for this can you point me to it so I can
track when it goes into a release?
Thanks!
--
Hector Martin (hec...@ma
On 13/06/2019 14.31, Hector Martin wrote:
> On 12/06/2019 22.33, Yan, Zheng wrote:
>> I have tracked down the bug. thank you for reporting this. 'echo 2 >
>> /proc/sys/vm/drop_cache' should fix the hang. If you can compile ceph
>> from source, please try following patch
if they need to be
base64 decoded or what have you) if you really want to go this route.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users
s@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
*Pardhiv Karri*
"Rise and Rise again untilLAMBSbecome LIONS"
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
://mirrors.gigenet.com/ceph/
This one is *way* behind on sync, it doesn't even have Nautilus.
Perhaps there should be some monitoring for public mirror quality?
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
NAP))
> cap->mark_needsnapflush();
> }
>
>
>
That was quick, thanks! I can build from source but I won't have time to
do so and test it until next week, if that's okay.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
"event": "dispatched"
},
{
"time": "2019-06-12 16:15:59.096318",
"event": "failed to rdlock, waiting"
},
{
"time": "2019-06-12 16:15:59.268368",
"event": "failed to rdlock, waiting"
}
]
}
}
],
"num_ops": 1
}
My guess is somewhere along the line of this process there's a race
condition and the dirty client isn't properly flushing its data.
A 'sync' on host2 does not clear the stuck op. 'echo 1 >
/proc/sys/vm/drop_caches' does not either, while 'echo 2 >
/proc/sys/vm/drop_caches' does fix it. So I guess the problem is a
dentry/inode that is stuck dirty in the cache of host2?
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ves in, and you need to hit all 3). This is
marginally higher than the ~ 0.00891% with uniformly distributed PGs,
because you've eliminated all sets of OSDs which share a host.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
c
t.
https://www.memset.com/support/resources/raid-calculator/
I'll take a look tonight :)
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
EC
encoding, but if you do lose a PG you'll lose more data because there
are fewer PGs.
Feedback on my math welcome.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http
/
In particular, you turned on CRUSH_TUNALBLES5, which causes a large
amount of data movement:
http://docs.ceph.com/docs/master/rados/operations/crush-map/#jewel-crush-tunables5
Going from Firefly to Hammer has a much smaller impact (see the CRUSH_V4
section).
--
Hector Martin (hec...@marcansoft.com
oit.io>
> Tel: +49 89 1896585 90
>
> On Tue, Mar 12, 2019 at 10:07 AM Hector Martin
> mailto:hec...@marcansoft.com>> wrote:
> >
> > It's worth noting that most containerized deployments can effectively
> > limit RAM for containers (cgroups
e per day), have one active metadata server,
and change several TB daily - it's much, *much* faster than with fuse.
Cluster has 10 OSD nodes, currently storing 2PB, using ec 8:2 coding.
ta ta
Jake
On 3/6/19 11:10 AM, Hector Martin wrote:
On 06/03/2019 12:07, Zhenshi Zhou wrote:
Hi,
I'm gon
been doing this on two machines (single-host Ceph
clusters) for months with no ill effects. The FUSE client performs a lot
worse than the kernel client, so I switched to the latter, and it's been
working well with no deadlocks.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https
m OSDs without regard for the hosts; you will be able to use
effectively any EC widths you want, but there will be no guarantees of
data durability if you lose a whole host.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph
for months now without any issues in two single-host setups. I'm
also in the process of testing and migrating a production cluster
workload from a different setup to CephFS on 13.2.4 and it's looking good.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
2 Sept 2028, the day and month
are also wrong.
Obvious question: are you sure the date/time on your cluster nodes and
your clients is correct? Can you track down which files (if any) have
the ctime in the future by following the rctime down the filesystem
to know about all
the files in a pool.
As far as I can tell you *can* read the ceph.file.layout.pool xattr on
any files in CephFS, even those that haven't had it explicitly set.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
tat(), right. (I only just
realized this :-))
Are there Python bindings for what ceph-dencoder does, or at least a C
API? I could shell out to ceph-dencoder but I imagine that won't be too
great for performance.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/
file (formerly
variable length, now I just pad it to the full 128 bytes and rewrite it
in-place). This is good information to know for optimizing things :-)
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing
the cluster seed.
>
>
> I appreciate small clusters are not the target use case of Ceph, but
> everyone has to start somewhere!
>
> _______
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
ust do the above dance for every hardlinked file to
move the primaries off, but this seems fragile and likely to break in
certain situations (or do needless work). Any other ideas?
Thanks,
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
an-it' and make sure there isn't a spurious entry for
it in ceph.conf, then re-deploy. Once you do that there is no possible
other place for the OSD to somehow remember its old IP.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-
lt data pool.
The FSMap seems to store pools by ID, not by name, so renaming the pools
won't work.
This past thread has an untested procedure for migrating CephFS pools:
https://www.spinics.net/lists/ceph-users/msg29536.html
--
Hector Martin (hec...@marcansoft.com)
Public Key: https:/
g to connect via the external IP of that node.
Does your ceph.conf have the right network settings? Compare it with the
other nodes. Also check that your network interfaces and routes are
correctly configured on the problem node, of course.
--
Hector Martin (hec...@marcansoft.com)
Public Key: htt
apparently created on deletion (I
wasn't aware of this). So for ~700 snapshots the output you're seeing is
normal. It seems that using a "rolling snapshot" pattern in CephFS
inherently creates a "one present, one deleted" pattern in the
underlying pools.
--
Hector Martin (hec...@ma
without truncation does not.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
might want to go through your snapshots and check that
you aren't leaking old snapshots forever, or deleting the wrong ones.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http
ere's some discussion on this here:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020510.html
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://list
they're atomic.
Is there any documentation on what write operations incur significant
overhead on CephFS like this, and why? This particular issue isn't
mentioned in http://docs.ceph.com/docs/master/cephfs/app-best-practices/
(which seems like it mostly deals with reads, not writes).
--
Hecto
> /etc/conf.d/ceph-osd
The Gentoo initscript setup for Ceph is unfortunately not very well
documented. I've been meaning to write a blogpost about this to try to
share what I've learned :-)
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
__
s a slight disadvantage here because its chunk of the drives is
logically after the traditional RAID, and HDDs get slower towards higher
logical addresses, but this should be on the order of a 15-20% hit at most.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://m
LVM on both ends!)
Ultimately a lot of this is dictated by whatever tools you feel
comfortable using :-)
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On 19/01/2019 02.24, Brian Topping wrote:
>
>
>> On Jan 18, 2019, at 4:29 AM, Hector Martin wrote:
>>
>> On 12/01/2019 15:07, Brian Topping wrote:
>>> I’m a little nervous that BlueStore assumes it owns the partition table and
>>> will not be happy tha
On 18/01/2019 22.33, Alfredo Deza wrote:
> On Fri, Jan 18, 2019 at 7:07 AM Hector Martin wrote:
>>
>> On 17/01/2019 00:45, Sage Weil wrote:
>>> Hi everyone,
>>>
>>> This has come up several times before, but we need to make a final
>>> decis
) to hopefully squash more
lurking Python 3 bugs.
(just my 2c - maybe I got unlucky and otherwise things work well enough
for everyone else in Py3; I'm certainly happy to get rid of Py2 ASAP).
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
so far in my home cluster, but I haven't finished setting things up yet.
Those are definitely not SMR.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
___
ceph-users mailing list
ceph-users@lists.ceph.com
h
with some
custom code, but then normal usage just uses ceph-disk (it certainly
doesn't care about extra partitions once everything is set up). This was
formerly FileStore and now BlueStore, but it's a legacy setup. I expect
to move this over to ceph-volume at some point.
--
Hector Martin (hec
blem then, good to know it isn't *supposed* to work yet :-)
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
g-bluefs 20 --log-file
bluefs-bdev-expand.log"
Perhaps it makes sense to open a ticket at ceph bug tracker to proceed...
Thanks,
Igor
On 12/27/2018 12:19 PM, Hector Martin wrote:
Hi list,
I'm slightly expanding the underlying LV for two OSDs and figured I
could use ceph-bluestore-tool to avo
this
again with osd.1 if needed and see if I can get it fixed. Otherwise I'll
just re-create it and move on.
# ceph --version
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
(stable)
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
On 21/12/2018 03.02, Gregory Farnum wrote:
> RBD snapshots are indeed crash-consistent. :)
> -Greg
Thanks for the confirmation! May I suggest putting this little nugget in
the docs somewhere? This might help clarify things for others :)
--
Hector Martin (hec...@marcansoft.com)
Public Key:
ionally reset the VM if thawing fails.
Ultimately this whole thing is kind of fragile, so if I can get away
without freezing at all it would probably make the whole process a lot
more robust.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://
5 minutes, we'll see if the problem recurs.
Given this, it makes even more sense to just avoid the freeze if at all
reasonable. There's no real way to guarantee that a fsfreeze will
complete in a "reasonable" amount of time as far as I can tell.
--
Hector Martin (hec...@marcansoft.com)
has higher impact but also probably a much lower chance of messing up
(or having excess latency), since it doesn't involve the guest OS or the
qemu agent at all...
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
On 26/11/2018 11.05, Yan, Zheng wrote:
> On Mon, Nov 26, 2018 at 4:30 AM Hector Martin wrote:
>>
>> On 26/11/2018 00.19, Paul Emmerich wrote:
>>> No, wait. Which system did kernel panic? Your CephFS client running rsync?
>>> In this case this would be expect
ose pages are flushed?
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
hings
go down at once?)
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
some DR tests when I set this up, to
prove to myself that it all works out :-)
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds
Would this be preferable to just restoring the mon from a backup? What
about the MDS map?
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
the cache, OSDs will
creep up in memory usage up to some threshold, and I'm not sure what
determines what that baseline usage is or whether it can be controlled.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-u
ceph ceph 6 Oct 28 16:12 ready
-rw--- 1 ceph ceph 10 Oct 28 16:12 type
-rw--- 1 ceph ceph 3 Oct 28 16:12 whoami
(lockbox.keyring is for encryption, which you do not use)
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mr
because there is a backup at the end of the device, but wipefs *should*
know about that as far as I know.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.co
ileStore remnant tries to mount phantom partitions.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
zap these 10 osds and start over although at
>this point I am afraid even zapping may not be a simple task
>
>
>
>On Tue, Nov 6, 2018 at 3:44 PM, Hector Martin
>wrote:
>
>> On 11/7/18 5:27 AM, Hayashida, Mami wrote:
>> > 1. Stopped osd.60-69: no
re of by the ceph-volume activation.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
, it's
safe to move or delete all those OSD directories for BlueStore OSDs and
try activating them cleanly again, which hopefully will do the right thing.
In the end this all might fix your device ownership woes too, making the
udev rule unnecessary. If it all works out, try a reboot and
and "mount | grep osd" instead and
see if ceph-60 through ceph-69 show up.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
8:112 0 3.7T 0 disk
> └─hdd60-data60 252:1 0 3.7T 0 lvm
>
> and "ceph osd tree" shows
> 60 hdd 3.63689 osd.60 up 1.0 1.0
That looks correct as far as the weight goes, but I'm really confused as
to why you have a 10GB "bl
links to block devices. I'm not sure what happened there.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
SUBSYSTEM=="block", ENV{DEVTYPE}=="disk", \
ENV{DM_LV_NAME}=="db*", ENV{DM_VG_NAME}=="ssd0", \
OWNER="ceph", GROUP="ceph", MODE="660"
Reboot after that and see if the OSDs come up without further action.
--
Hector Martin (hec
On 11/6/18 3:21 AM, Alfredo Deza wrote:
> On Mon, Nov 5, 2018 at 11:51 AM Hector Martin wrote:
>>
>> Those units don't get triggered out of nowhere, there has to be a
>> partition table with magic GUIDs or a fstab or something to cause them
>> to be triggered. The bett
hat references any of the old partitions that don't exist
(/dev/sdh1 etc) should be removed. The disks are now full-disk LVM PVs
and should have no partitions.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users maili
;>> > ├─ssd0-db61 252:1 0 40G 0 lvm
> >> >> >>> > ├─ssd0-db62 252:2 0 40G 0 lvm
> >> >> >>> > ├─ssd0-db63 252:3 0 40G 0 lvm
> >> >> >>> > ├─ssd0-db64 252:4 0 40G 0 lvm
> >> >> >>> > ├─ssd0-db65 252:5 0 40G 0 lvm
&g
On 11/6/18 1:08 AM, Hector Martin wrote:
> On 11/6/18 12:42 AM, Hayashida, Mami wrote:
>> Additional info -- I know that /var/lib/ceph/osd/ceph-{60..69} are not
>> mounted at this point (i.e. mount | grep ceph-60, and 61-69, returns
>> nothing.). They don't show up wh
emd is still trying to mount
the old OSDs, which used disk partitions. Look in /etc/fstab and in
/etc/systemd/system for any references to those filesystems and get rid
of them. /dev/sdh1 and company no longer exist, and nothing should
reference them.
--
Hector Martin (hec...@marcansoft.com
just try
to start the OSDs again? Maybe check the overall system log with
journalctl for hints.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
...
ceph-volume lvm activate --all
I think it might be possible to just let ceph-volume create the PV/VG/LV
for the data disks and only manually create the DB LVs, but it shouldn't
hurt to do it on your own and just give ready-made LVs to ceph-volume
for everything.
--
Hector Martin (hec
On 2018-06-16 13:04, Hector Martin wrote:
> I'm at a loss as to what happened here.
Okay, I just realized CephFS has a default 1TB file size... that
explains what triggered the problem. I just bumped it to 10TB. What that
doesn't explain is why rsync didn't complain about anything. Maybe w
t've happened here? If this happens again / is
reproducible I'll try to see if I can do some more debugging...
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://l
is
currently very overprovisioned for space, so we're probably not going to
be adding OSDs for quite a while, but we'll be adding pools.
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
___
ceph-users mailing list
ceph-users
as you add pools.
We are following the hardware recommendations for RAM: 1GB per 1TB of
storage, so 16GB for each OSD box (4GB per OSD daemon, each OSD being
one 4TB drive).
--
Hector Martin (hec...@marcansoft.com)
Public Key: https://marcan.st/marcan.asc
On 06/02/15 21:07, Udo Lembke wrote:
Am 06.02.2015 09:06, schrieb Hector Martin:
On 02/02/15 03:38, Udo Lembke wrote:
With 3 hosts only you can't survive an full node failure, because for
that you need
host = k + m.
Sure you can. k=2, m=1 with the failure domain set to host will survive
81 matches
Mail list logo