48.16 50.574872 24.01572
Same here - shold be cached in the blue-store cache as it is 16GB x 84
OSD's .. with a 1GB testfile.
Any thoughts - suggestions - insights ?
Jesper
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-use
?
Exclusive lock on RBD images will kill any (theoretical) performance gains.
Without exclusive lock, you lose some of RBD features.
Plus, using 2+ clients with single images doesn't sound like a good idea.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
or network hardware issues.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
messengers use the same protocol.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
snapshots.
Removal of these "lightweight" snapshots would be instant (or near instant).
So what do others think about this?
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ce
a
object and remove only objects indiced by these metadata. "--prefix" is used
when these metadata are lost or overwritten.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.
or rbd images are named differently.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
was running and how heavy it was.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
scrubbing is
fine).
>Shutdown all activity to the ceph cluster before that moment?
Depends on whether it's actually possible in your case and what load your
users generate - you have to decide.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.
f there are other pgs to backfill
and/or recovery.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
to find smth out in osd logs but there are nothing about it.
Any thoughts how to avoid it?
Have you tried disabling scrub and deep scrub?
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
. For us this already proved
useful in the past.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovhcloud.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
n the "rbd" utility.
So what can i do to make "rbd ls -l" faster or to get comparable
information regarding snapshot hierarchy information?
Can you run this command with extra argument
"--rbd_concurrent_management_ops=1" and share the timing of that?
--
Piotr Dałek
piot
ay want to try the above as well.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
expect that non-size counters - like object counts - use base-10,
and size counters use base-2 units. Ceph's "standard" of using base-2
everywhere was confusing for me as well initially, but I got used to that...
Still, wouldn't mind if that would get sorted out once and for all.
--
P
On 17-12-15 03:58 PM, Sage Weil wrote:
On Fri, 15 Dec 2017, Piotr Dałek wrote:
On 17-12-14 05:31 PM, David Turner wrote:
I've tracked this in a much more manual way. I would grab a random subset
[..]
This was all on a Hammer cluster. The changes to the snap trimming queues
going
once disk space is all used up.
Hopefully it'll be convincing enough for devs. ;)
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
be helpful in
pushing this into next Jewel release.
Thanks!
[1] one of our guys hacked a bash oneliner that printed out snap trim queue
lengths for all pgs, but full run takes over an hour to complete on a
cluster with over 20k pgs...
[2] https://github.com/ceph/ceph/pull/19520
--
Piotr Dałek
t 3 lowest, or if that's not acceptable then at least set "osd heartbeat
min size" to 0.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
area once such OSD fails).
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
a *big* problem with this (we haven't upgraded
to Luminous yet, so we can skip to next point release and move to
ceph-volume together with Luminous). It's still a problem, though - now we
have more of our infrastructure to migrate and test, meaning even more
delays in production upgrades.
--
Piotr
files in a subdir before merging into parent
NOTE: A negative value means to disable subdir merging
"
will variable definition like "filestore_merge_treshold = -50" (negative
value) work? (in Jewel it worked like a charm)
Yes, I don't see any changes to that.
--
Piotr Dałek
piot
tiple" is not observed for runtime
changes, meaning that new value will be stored in osd.0 process memory, but
not used at all.
Do I really need to restart OSD to make changes to take effect?
ceph version 12.2.1 () luminous (stable)
Yes.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://w
want to *stop* (as in, freeze) a process instead of killing it?
Anyway, with processes still there, it may take a few minutes before cluster
realizes that daemons are stopped and kicks it out of cluster, restoring
normal behavior (assuming correctly set crush rules).
--
Piotr Dałek
piotr.da
g properly refreshed.
I'd love to, but that would require us to restart that client - not an
option. We'll try to reproduce this somehow anyway and let you know if
something interesting shows up.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
__
, that makes things clear.
Seems like we have some Cinders utilizing Infernalis (9.2.1) librbd. Are you
aware of any bugs in 9.2.x that could cause such behavior? We've seen that
for the first time...
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us
ots but not remove them when exclusive
lock on image is taken? (jewel bug?)
2. Why the error is transformed and then ignored?
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.
).
You may want to look at this:
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
:-)
Since Jewel (AFAIR), when (re)starting OSDs, pg status is reset to "never
contacted", resulting in "pgs are stuck inactive for more than 300 seconds"
being reported until osds regain connections between themselves.
--
Piotr Dałek
piotr.da...@corp.ovh.com
ht
On 17-07-06 09:39 PM, Jason Dillaman wrote:
On Thu, Jul 6, 2017 at 3:25 PM, Piotr Dałek <bra...@predictor.org.pl> wrote:
Is that deep copy an equivalent of what
Jewel librbd did at unspecified point of time, or extra one?
It's equivalent / replacement -- not an additiona
On 17-07-06 04:40 PM, Jason Dillaman wrote:
On Thu, Jul 6, 2017 at 10:22 AM, Piotr Dałek <piotr.da...@corp.ovh.com> wrote:
So I really see two problems here: lack of API docs and
backwards-incompatible change in API behavior.
Docs are always in need of update, so any pull requests
memory bus? So I really see two problems here:
lack of API docs and backwards-incompatible change in API behavior.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http
On 17-07-06 03:03 PM, Jason Dillaman wrote:
On Thu, Jul 6, 2017 at 8:26 AM, Piotr Dałek <piotr.da...@corp.ovh.com> wrote:
Hi,
If you're using "rbd_aio_write()" in your code, be aware of the fact that
before Luminous release, this function expects buffer to remain unchanged
unt
ary memory allocation and
copy on your side (though it's probably unavoidable with current state of
Luminous).
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/li
On 17-06-21 03:24 PM, Sage Weil wrote:
On Wed, 21 Jun 2017, Piotr Dałek wrote:
On 17-06-14 03:44 PM, Sage Weil wrote:
On Wed, 14 Jun 2017, Paweł Sadowski wrote:
On 04/13/2017 04:23 PM, Piotr Dałek wrote:
On 04/06/2017 03:25 PM, Sage Weil wrote:
On Thu, 6 Apr 2017, Piotr Dałek wrote:
[snip
if that would work for you (as others
wrote), or +1 this PR: https://github.com/ceph/ceph/pull/13723 (it's bit
outdated as I'm constantly low on time, but I promise to push it forward!).
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph
On 17-06-14 03:44 PM, Sage Weil wrote:
On Wed, 14 Jun 2017, Paweł Sadowski wrote:
On 04/13/2017 04:23 PM, Piotr Dałek wrote:
On 04/06/2017 03:25 PM, Sage Weil wrote:
On Thu, 6 Apr 2017, Piotr Dałek wrote:
[snip]
I think the solution here is to use sparse_read during recovery. The
PushOp
by sending side. Try gathering some more examples of such crc
errors and isolate osd/host that sends malformed data, then do usual
diagnostics like memory test on that mahcine.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph
drives, because Ceph is not
optimized for those.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
tart of Ceph daemons is still required.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
to figure out.
Yes, I understand that. But wouldn't be faster and/or more convenient if you
would just recompile binaries in-place (or use network symlinks) instead of
packaging entire Ceph and (re)installing its packages each time you do the
change? Generating RPMs takes a while.
--
Piotr
them via
nfs (or whatever) to build machine and build once there.
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
uldn't be a problem (at least we don't see it anymore).
--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
in ceph -w. i haven't dug into it much but just
wanted to second that i've seen this happen on a recent hammer to recent
jewel upgrade.
Thanks for confirmation.
We've prepared the patch which fixes the issue for us:
https://github.com/ceph/ceph/pull/13131
--
Piotr Dałek
piotr.da...@corp.ovh.com
On 01/17/2017 12:52 PM, Piotr Dałek wrote:
During our testing we found out that during upgrade from 0.94.9 to 10.2.5
we're hitting issue http://tracker.ceph.com/issues/17386 ("Upgrading 0.94.6
-> 0.94.9 saturating mon node networking"). Apparently, there's a few
commits for both ham
pposed to fix this issue for
upgrades from 0.94.6 to 0.94.9 (and possibly for others), but we're still
seeing this upgrading to Jewel, and symptoms are exactly same - after
upgrading MONs, each not yet upgraded OSD takes full OSDMap from monitors
after failing the CRC check. Anyone else encountered thi
intermediate data copy, which will reduce cpu and memory load on clients. If you're using librados C API for object
writes, feel free to comment here or in the pull request.
--
Piotr Dałek
___
ceph-users mailing list
ceph-users@lists.ceph.com
http
47 matches
Mail list logo