Hi Frens Jan, I agree that it is better to use the regular RAFT replication. How big is the object size you are expecting? Ozone uses a 4-MB chunk size for creating large objects in megabytes or gigabytes.
Tsz-Wo On Thu, Jan 30, 2025 at 10:53 AM Frens Jan Rumph <[email protected]> wrote: > Dear Tsz-Wo, > > Thank you for getting back to me on this! > > I think that I understand. For a system like Ozone this makes sense; as > the data written can be retrieved from another node by reading it from the > object/file stored. I’m not sure how that would work for data that’s > overwritten, but that’s for another day. For a database, something like > this is probably not so easy as the contents of the stream aren’t reflected > as-is in the state machine and there is no (easy) way to identify the side > effects the mutation has caused. > > I’ll stick to using regular replication for now and devise some chunking > strategy. Perhaps later for something like bulk-loading I’ll take another > look at streaming. > > Best regards, > Frens Jan > > > > On 30 Jan 2025, at 18:38, Tsz-Wo Nicholas Sze <[email protected]> wrote: > > Hi Fens, > > Thanks a lot for trying Raits and the Streaming feature! > > > ... But what about when a node is added to a group? ... > > An existing stream does not support adding a new node dynamically. The > streams created afterward will be able to write to the new node. > > > ... I reckon that replication/recovery of such stream needs to be done > ‘outside’ of Ratis; is that correct? ... > > You are right that Ratis does not replicate stream data outside the > original stream. It currently assumes that all data is already replicated > before the "link" transaction. > > In both cases, we need a missing feature for replicating stream data outside > the original stream. It should be done when the link transaction happens > -- if the data is missing, read it from another node. Without such a > feature, a workaround is to use snapshot -- when the stream data is > missing, trigger a snapshot. > > Please feel free to let us know if you have more questions. > > Tsz-Wo > > > On Tue, Jan 28, 2025 at 1:00 PM Frens Jan Rumph <[email protected]> > wrote: > >> Dear Ratis devs and users, >> >> I’m investigating use of Ratis for HA of RDF4J. In particular I’m >> wondering about implementation patterns/advice on the stream feature. I’ve >> read e.g., >> https://blog.cloudera.com/ozone-write-pipeline-v2-with-ratis-streaming/ and >> I’ve got a small prototype working. But as the javadoc of >> org.apache.ratis.statemachine.StateMachine.DataApi#link indicates, the >> stream _may_ not be available. >> >> I understand there might be error cases that need to be handled. But what >> about when a node is added to a group? In my investigation so far it seems >> that also in that case the stream is unavailable. I reckon that >> replication/recovery of such stream needs to be done ‘outside’ of Ratis; is >> that correct? I didn’t see such facilities in the file store example: >> https://github.com/apache/ratis/blob/master/ratis-examples/src/main/java/org/apache/ratis/examples/filestore/FileStoreStateMachine.java. >> I’ve also tried to figure out how Ozone implements this, but this codebase >> is a bit to big to easily wrap my head around how this would work. >> >> I would appreciate any pointers with regards to this matter. >> >> Thanks! >> Frens Jan >> >> >> Award-winning OSINT partner for Law Enforcement and >> Defence. >> >> >> >> *Frens Jan Rumph* >> Data platform engineering lead >> >> phone: >> site: >> >> pgp: +31 50 21 11 622 >> web-iq.com >> >> CEE2 A4F1 972E 78C0 F816 >> 86BB D096 18E2 3AC0 16E0 >> The content of this email is confidential and intended for the >> recipient(s) specified in this message only. It is strictly forbidden to >> share any part of this message with any third party, without a written >> consent of the sender. If you received this message by mistake, please >> reply to this message and follow with its deletion, so that we can ensure >> such a mistake does not occur in the future. >> >> >> >
