Dear Tsz-Wo,

Thank you for getting back to me on this!

I think that I understand. For a system like Ozone this makes sense; as the 
data written can be retrieved from another node by reading it from the 
object/file stored. I’m not sure how that would work for data that’s 
overwritten, but that’s for another day. For a database, something like this is 
probably not so easy as the contents of the stream aren’t reflected as-is in 
the state machine and there is no (easy) way to identify the side effects the 
mutation has caused.

I’ll stick to using regular replication for now and devise some chunking 
strategy. Perhaps later for something like bulk-loading I’ll take another look 
at streaming.

Best regards,
Frens Jan



> On 30 Jan 2025, at 18:38, Tsz-Wo Nicholas Sze <[email protected]> wrote:
> 
> Hi Fens,
> 
> Thanks a lot for trying Raits and the Streaming feature!
> 
> > ... But what about when a node is added to a group? ...
> 
> An existing stream does not support adding a new node dynamically.  The 
> streams created afterward will be able to write to the new node.
> 
> > ...  I reckon that replication/recovery of such stream needs to be done 
> > ‘outside’ of Ratis; is that correct? ...
> 
> You are right that Ratis does not replicate stream data outside the original 
> stream.  It currently assumes that all data is already replicated before the 
> "link" transaction.
> 
> In both cases, we need a missing feature for replicating stream data outside 
> the original stream.  It should be done when the link transaction happens -- 
> if the data is missing, read it from another node.  Without such a feature, a 
> workaround is to use snapshot -- when the stream data is missing, trigger a 
> snapshot.
> 
> Please feel free to let us know if you have more questions.
> 
> Tsz-Wo
> 
> 
> On Tue, Jan 28, 2025 at 1:00 PM Frens Jan Rumph <[email protected] 
> <mailto:[email protected]>> wrote:
>> Dear Ratis devs and users,
>> 
>> I’m investigating use of Ratis for HA of RDF4J. In particular I’m wondering 
>> about implementation patterns/advice on the stream feature. I’ve read e.g., 
>> https://blog.cloudera.com/ozone-write-pipeline-v2-with-ratis-streaming/ and 
>> I’ve got a small prototype working. But as the javadoc of 
>> org.apache.ratis.statemachine.StateMachine.DataApi#link indicates, the 
>> stream _may_ not be available.
>> 
>> I understand there might be error cases that need to be handled. But what 
>> about when a node is added to a group? In my investigation so far it seems 
>> that also in that case the stream is unavailable. I reckon that 
>> replication/recovery of such stream needs to be done ‘outside’ of Ratis; is 
>> that correct? I didn’t see such facilities in the file store example: 
>> https://github.com/apache/ratis/blob/master/ratis-examples/src/main/java/org/apache/ratis/examples/filestore/FileStoreStateMachine.java.
>>  I’ve also tried to figure out how Ozone implements this, but this codebase 
>> is a bit to big to easily wrap my head around how this would work.
>> 
>> I would appreciate any pointers with regards to this matter.
>> 
>> Thanks!
>> Frens Jan
>> 
>> 
>>                Award-winning OSINT partner for Law Enforcement and Defence.
>> 
>> Frens Jan Rumph
>> Data platform engineering lead
>> 
>> phone:
>> site: 
>> 
>> pgp: +31 50 21 11 622
>> web-iq.com <https://web-iq.com/>
>> 
>> CEE2 A4F1 972E 78C0 F816
>> 86BB D096 18E2 3AC0 16E0
>> The content of this email is confidential and intended for the recipient(s) 
>> specified in this message only. It is strictly forbidden to share any part 
>> of this message with any third party, without a written consent of the 
>> sender. If you received this message by mistake, please reply to this 
>> message and follow with its deletion, so that we can ensure such a mistake 
>> does not occur in the future.
>> 
>> 

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to