I created a tracker issue, maybe that will get some attention:
https://tracker.ceph.com/issues/61861
Zitat von Michel Jouvin :
Hi Eugen,
Thank you very much for these detailed tests that match what I
observed and reported earlier. I'm happy to see that we have the
same understanding of
Hi,
adding the dev mailing list, hopefully someone there can chime in. But
apparently the LRC code hasn't been maintained for a few years
(https://github.com/ceph/ceph/tree/main/src/erasure-code/lrc). Let's
see...
Zitat von Michel Jouvin :
Hi Eugen,
Thank you very much for these
Hi Eugen,
Thank you very much for these detailed tests that match what I observed
and reported earlier. I'm happy to see that we have the same
understanding of how it should work (based on the documentation). Is
there any other way that this list to enter in contact with the plugin
Hi, I have a real hardware cluster for testing available now. I'm not
sure whether I'm completely misunderstanding how it's supposed to work
or if it's a bug in the LRC plugin.
This cluster has 18 HDD nodes available across 3 rooms (or DCs), I
intend to use 15 nodes to be able to recover if
Hi,
I realize that the crushmap I attached to one of my email, probably
required to understand the discussion here, has been stripped down by
mailman. To avoid poluting the thread with a long output, I put it on at
https://box.in2p3.fr/index.php/s/J4fcm7orfNE87CX. Download it if you are
Hi Eugen,
My LRC pool is also somewhat experimental so nothing really urgent. If you
manage to do some tests that help me to understand the problem I remain
interested. I propose to keep this thread for that.
Zitat, I shared my crush map in the email you answered if the attachment
was not
Hi, I don’t have a good explanation for this yet, but I’ll soon get
the opportunity to play around with a decommissioned cluster. I’ll try
to get a better understanding of the LRC plugin, but it might take
some time, especially since my vacation is coming up. :-)
I have some thoughts about
Hi,
I've been following this thread with interest as it seems like a unique use
case to expand my knowledge. I don't use LRC or anything outside basic
erasure coding.
What is your current crush steps rule? I know you made changes since your
first post and had some thoughts I wanted to share,
Hi Eugen,
Yes, sure, no problem to share it. I attach it to this email (as it may
clutter the discussion if inline).
If somebody on the list has some clue on the LRC plugin, I'm still
interested by understand what I'm doing wrong!
Cheers,
Michel
Le 04/05/2023 à 15:07, Eugen Block a écrit
Subject: [ceph-users] Re: Help needed to configure erasure coding LRC plugin
Hi,
I don't think you've shared your osd tree yet, could you do that?
Apparently nobody else but us reads this thread or nobody reading this
uses the LRC plugin. ;-)
Thanks,
Eugen
Zitat von Michel Jouvin :
> Hi,
>
&
Hi,
I don't think you've shared your osd tree yet, could you do that?
Apparently nobody else but us reads this thread or nobody reading this
uses the LRC plugin. ;-)
Thanks,
Eugen
Zitat von Michel Jouvin :
Hi,
I had to restart one of my OSD server today and the problem showed
up
Hi,
I had to restart one of my OSD server today and the problem showed up
again. This time I managed to capture "ceph health detail" output
showing the problem with the 2 PGs:
[WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down
pg 56.1 is down, acting
I think I got it wrong with the locality setting, I'm still limited by
the number of hosts I have available in my test cluster, but as far as
I got with failure-domain=osd I believe k=6, m=3, l=3 with
locality=datacenter could fit your requirement, at least with regards
to the recovery
Hi,
disclaimer: I haven't used LRC in a real setup yet, so there might be
some misunderstandings on my side. But I tried to play around with one
of my test clusters (Nautilus). Because I'm limited in the number of
hosts (6 across 3 virtual DCs) I tried two different profiles with
lower
Hi,
No... our current setup is 3 datacenters with the same configuration,
i.e. 1 mon/mgr + 4 OSD servers with 16 OSDs each. Thus the total of 12
OSDs servers. As with LRC plugin, k+m must be a multiple of l, I found
that k=9/m=66/l=5 with crush-locality=datacenter was achieving my goal
of
Hello,
What is your current setup, 1 server pet data center with 12 osd each? What
is your current crush rule and LRC crush rule?
On Fri, Apr 28, 2023, 12:29 Michel Jouvin
wrote:
> Hi,
>
> I think I found a possible cause of my PG down but still understand why.
> As explained in a previous
Hi,
I think I found a possible cause of my PG down but still understand why.
As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9,
m=6) but I have only 12 OSD servers in the cluster. To workaround the
problem I defined the failure domain as 'osd' with the reasoning that as
I
Hi,
I'm still interesting by getting feedback from those using the LRC
plugin about the right way to configure it... Last week I upgraded from
Pacific to Quincy (17.2.6) with cephadm which is doing the upgrade host
by host, checking if an OSD is ok to stop before actually upgrading it.
I had
Hi,
Is somebody using LRC plugin ?
I came to the conclusion that LRC k=9, m=3, l=4 is not the same as
jerasure k=9, m=6 in terms of protection against failures and that I
should use k=9, m=6, l=5 to get a level of resilience >= jerasure k=9,
m=6. The example in the documentation (k=4, m=2,
Answering to myself, I found the reason for 2147483647: it's documented
as a failure to find enough OSD (missing OSDs). And it is normal as I
selected different hosts for the 15 OSDs but I have only 12 hosts!
I'm still interested by an "expert" to confirm that LRC k=9, m=3, l=4
configuration
20 matches
Mail list logo