[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-30 Thread Eugen Block
I created a tracker issue, maybe that will get some attention: https://tracker.ceph.com/issues/61861 Zitat von Michel Jouvin : Hi Eugen, Thank you very much for these detailed tests that match what I observed and reported earlier. I'm happy to see that we have the same understanding of

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Eugen Block
Hi, adding the dev mailing list, hopefully someone there can chime in. But apparently the LRC code hasn't been maintained for a few years (https://github.com/ceph/ceph/tree/main/src/erasure-code/lrc). Let's see... Zitat von Michel Jouvin : Hi Eugen, Thank you very much for these

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Michel Jouvin
Hi Eugen, Thank you very much for these detailed tests that match what I observed and reported earlier. I'm happy to see that we have the same understanding of how it should work (based on the documentation). Is there any other way that this list to enter in contact with the plugin

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Eugen Block
Hi, I have a real hardware cluster for testing available now. I'm not sure whether I'm completely misunderstanding how it's supposed to work or if it's a bug in the LRC plugin. This cluster has 18 HDD nodes available across 3 rooms (or DCs), I intend to use 15 nodes to be able to recover if

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-26 Thread Michel Jouvin
Hi,  I realize that the crushmap I attached to one of my email, probably required to understand the discussion here, has been stripped down by mailman. To avoid poluting the thread with a long output, I put it on at https://box.in2p3.fr/index.php/s/J4fcm7orfNE87CX. Download it if you are

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-21 Thread Michel Jouvin
Hi Eugen, My LRC pool is also somewhat experimental so nothing really urgent. If you manage to do some tests that help me to understand the problem I remain interested. I propose to keep this thread for that. Zitat, I shared my crush map in the email you answered if the attachment was not

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-18 Thread Eugen Block
Hi, I don’t have a good explanation for this yet, but I’ll soon get the opportunity to play around with a decommissioned cluster. I’ll try to get a better understanding of the LRC plugin, but it might take some time, especially since my vacation is coming up. :-) I have some thoughts about

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-17 Thread Curt
Hi, I've been following this thread with interest as it seems like a unique use case to expand my knowledge. I don't use LRC or anything outside basic erasure coding. What is your current crush steps rule? I know you made changes since your first post and had some thoughts I wanted to share,

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-16 Thread Michel Jouvin
Hi Eugen, Yes, sure, no problem to share it. I attach it to this email (as it may clutter the discussion if inline). If somebody on the list has some clue on the LRC plugin, I'm still interested by understand what I'm doing wrong! Cheers, Michel Le 04/05/2023 à 15:07, Eugen Block a écrit 

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Frank Schilder
Subject: [ceph-users] Re: Help needed to configure erasure coding LRC plugin Hi, I don't think you've shared your osd tree yet, could you do that? Apparently nobody else but us reads this thread or nobody reading this uses the LRC plugin. ;-) Thanks, Eugen Zitat von Michel Jouvin : > Hi, > &

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Eugen Block
Hi, I don't think you've shared your osd tree yet, could you do that? Apparently nobody else but us reads this thread or nobody reading this uses the LRC plugin. ;-) Thanks, Eugen Zitat von Michel Jouvin : Hi, I had to restart one of my OSD server today and the problem showed up

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Michel Jouvin
Hi, I had to restart one of my OSD server today and the problem showed up again. This time I managed to capture "ceph health detail" output showing the problem with the 2 PGs: [WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down     pg 56.1 is down, acting

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-03 Thread Eugen Block
I think I got it wrong with the locality setting, I'm still limited by the number of hosts I have available in my test cluster, but as far as I got with failure-domain=osd I believe k=6, m=3, l=3 with locality=datacenter could fit your requirement, at least with regards to the recovery

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-02 Thread Eugen Block
Hi, disclaimer: I haven't used LRC in a real setup yet, so there might be some misunderstandings on my side. But I tried to play around with one of my test clusters (Nautilus). Because I'm limited in the number of hosts (6 across 3 virtual DCs) I tried two different profiles with lower

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-29 Thread Michel Jouvin
Hi, No... our current setup is 3 datacenters with the same configuration, i.e. 1 mon/mgr + 4 OSD servers with 16 OSDs each. Thus the total of 12 OSDs servers. As with LRC plugin, k+m must be a multiple of l, I found that k=9/m=66/l=5 with crush-locality=datacenter was achieving my goal of

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-29 Thread Curt
Hello, What is your current setup, 1 server pet data center with 12 osd each? What is your current crush rule and LRC crush rule? On Fri, Apr 28, 2023, 12:29 Michel Jouvin wrote: > Hi, > > I think I found a possible cause of my PG down but still understand why. > As explained in a previous

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-28 Thread Michel Jouvin
Hi, I think I found a possible cause of my PG down but still understand why. As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9, m=6) but I have only 12 OSD servers in the cluster. To workaround the problem I defined the failure domain as 'osd' with the reasoning that as I

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-24 Thread Michel Jouvin
Hi, I'm still interesting by getting feedback from those using the LRC plugin about the right way to configure it... Last week I upgraded from Pacific to Quincy (17.2.6) with cephadm which is doing the upgrade host by host, checking if an OSD is ok to stop before actually upgrading it. I had

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-06 Thread Michel Jouvin
Hi, Is somebody using LRC plugin ? I came to the conclusion that LRC  k=9, m=3, l=4 is not the same as jerasure k=9, m=6 in terms of protection against failures and that I should use k=9, m=6, l=5 to get a level of resilience >= jerasure k=9, m=6. The example in the documentation (k=4, m=2,

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-04 Thread Michel Jouvin
Answering to myself, I found the reason for 2147483647: it's documented as a failure to find enough OSD (missing OSDs). And it is normal as I selected different hosts for the 15 OSDs but I have only 12 hosts! I'm still interested by an "expert" to confirm that LRC  k=9, m=3, l=4 configuration