Hi Frens Jan,

I agree that it is better to use the regular RAFT replication.  How big is
the object size you are expecting?  Ozone uses a 4-MB chunk size for
creating large objects in megabytes or gigabytes.

Tsz-Wo


On Thu, Jan 30, 2025 at 10:53 AM Frens Jan Rumph <[email protected]>
wrote:

> Dear Tsz-Wo,
>
> Thank you for getting back to me on this!
>
> I think that I understand. For a system like Ozone this makes sense; as
> the data written can be retrieved from another node by reading it from the
> object/file stored. I’m not sure how that would work for data that’s
> overwritten, but that’s for another day. For a database, something like
> this is probably not so easy as the contents of the stream aren’t reflected
> as-is in the state machine and there is no (easy) way to identify the side
> effects the mutation has caused.
>
> I’ll stick to using regular replication for now and devise some chunking
> strategy. Perhaps later for something like bulk-loading I’ll take another
> look at streaming.
>
> Best regards,
> Frens Jan
>
>
>
> On 30 Jan 2025, at 18:38, Tsz-Wo Nicholas Sze <[email protected]> wrote:
>
> Hi Fens,
>
> Thanks a lot for trying Raits and the Streaming feature!
>
> > ... But what about when a node is added to a group? ...
>
> An existing stream does not support adding a new node dynamically.  The
> streams created afterward will be able to write to the new node.
>
> > ...  I reckon that replication/recovery of such stream needs to be done
> ‘outside’ of Ratis; is that correct? ...
>
> You are right that Ratis does not replicate stream data outside the
> original stream.  It currently assumes that all data is already replicated
> before the "link" transaction.
>
> In both cases, we need a missing feature for replicating stream data outside
> the original stream.  It should be done when the link transaction happens
> -- if the data is missing, read it from another node.  Without such a
> feature, a workaround is to use snapshot -- when the stream data is
> missing, trigger a snapshot.
>
> Please feel free to let us know if you have more questions.
>
> Tsz-Wo
>
>
> On Tue, Jan 28, 2025 at 1:00 PM Frens Jan Rumph <[email protected]>
> wrote:
>
>> Dear Ratis devs and users,
>>
>> I’m investigating use of Ratis for HA of RDF4J. In particular I’m
>> wondering about implementation patterns/advice on the stream feature. I’ve
>> read e.g.,
>> https://blog.cloudera.com/ozone-write-pipeline-v2-with-ratis-streaming/ and
>> I’ve got a small prototype working. But as the javadoc of
>> org.apache.ratis.statemachine.StateMachine.DataApi#link indicates, the
>> stream _may_ not be available.
>>
>> I understand there might be error cases that need to be handled. But what
>> about when a node is added to a group? In my investigation so far it seems
>> that also in that case the stream is unavailable. I reckon that
>> replication/recovery of such stream needs to be done ‘outside’ of Ratis; is
>> that correct? I didn’t see such facilities in the file store example:
>> https://github.com/apache/ratis/blob/master/ratis-examples/src/main/java/org/apache/ratis/examples/filestore/FileStoreStateMachine.java.
>> I’ve also tried to figure out how Ozone implements this, but this codebase
>> is a bit to big to easily wrap my head around how this would work.
>>
>> I would appreciate any pointers with regards to this matter.
>>
>> Thanks!
>> Frens Jan
>>
>>
>>                Award-winning OSINT partner for Law Enforcement and
>> Defence.
>>
>>
>>
>> *Frens Jan Rumph*
>> Data platform engineering lead
>>
>> phone:
>> site:
>>
>> pgp: +31 50 21 11 622
>> web-iq.com
>>
>> CEE2 A4F1 972E 78C0 F816
>> 86BB D096 18E2 3AC0 16E0
>> The content of this email is confidential and intended for the
>> recipient(s) specified in this message only. It is strictly forbidden to
>> share any part of this message with any third party, without a written
>> consent of the sender. If you received this message by mistake, please
>> reply to this message and follow with its deletion, so that we can ensure
>> such a mistake does not occur in the future.
>>
>>
>>
>

Reply via email to