Dear Frens Jan,

> ... in principle terabytes for bulk loading RDF statements.  ...

This case definitely needs more attention since the original Raft design is
not suitable for data intensive applications.  The problems include
1) Write amplification -- data will be written to both the Raft log and
also the state machine storage.
2) A large number of transactions -- the terabytes have to be broken down
into many small megabytes-sized chunks.

For (1), Ratis has a feature to separate metadata and state machine data so
only metadata is written to Raft log and the state machine will manage
the state
machine data.

For (2), Ratis Steaming is designed to solve that problem.  However, as
mentioned previously, we do have a missing feature to replicate stream data
outside the original stream.

You may consider using Steaming only when the data size is in terabytes.
When there is a failure or a Raft group membership change, use snapshot as
a workaround.  The terabytes-sized data still has to be broken down to
tens/hundreds gigabytes chunks.

Hope it helps.

Tsz-Wo



On Thu, Jan 30, 2025 at 11:51 PM Frens Jan Rumph <[email protected]>
wrote:

> Dear Tsz-Wo,
>
> The “objects” could range from mere bytes for management commands to in
> principle terabytes for bulk loading RDF statements. So some type of
> chunking would be necessary. RDF statements themselves aren’t typically
> that large. So that should be doable. Or I could possibly treat them as
> opaque binary uploads and only parse them on commit.
>
> Frens Jan
>
>
> On 30 Jan 2025, at 22:13, Tsz-Wo Nicholas Sze <[email protected]> wrote:
>
> Hi Frens Jan,
>
> I agree that it is better to use the regular RAFT replication.  How big
> is the object size you are expecting?  Ozone uses a 4-MB chunk size for
> creating large objects in megabytes or gigabytes.
>
> Tsz-Wo
>
>
> On Thu, Jan 30, 2025 at 10:53 AM Frens Jan Rumph <[email protected]>
> wrote:
>
>> Dear Tsz-Wo,
>>
>> Thank you for getting back to me on this!
>>
>> I think that I understand. For a system like Ozone this makes sense; as
>> the data written can be retrieved from another node by reading it from the
>> object/file stored. I’m not sure how that would work for data that’s
>> overwritten, but that’s for another day. For a database, something like
>> this is probably not so easy as the contents of the stream aren’t reflected
>> as-is in the state machine and there is no (easy) way to identify the side
>> effects the mutation has caused.
>>
>> I’ll stick to using regular replication for now and devise some chunking
>> strategy. Perhaps later for something like bulk-loading I’ll take another
>> look at streaming.
>>
>> Best regards,
>> Frens Jan
>>
>>
>>
>> On 30 Jan 2025, at 18:38, Tsz-Wo Nicholas Sze <[email protected]> wrote:
>>
>> Hi Fens,
>>
>> Thanks a lot for trying Raits and the Streaming feature!
>>
>> > ... But what about when a node is added to a group? ...
>>
>> An existing stream does not support adding a new node dynamically.  The
>> streams created afterward will be able to write to the new node.
>>
>> > ...  I reckon that replication/recovery of such stream needs to be
>> done ‘outside’ of Ratis; is that correct? ...
>>
>> You are right that Ratis does not replicate stream data outside the
>> original stream.  It currently assumes that all data is already replicated
>> before the "link" transaction.
>>
>> In both cases, we need a missing feature for replicating stream data outside
>> the original stream.  It should be done when the link transaction happens
>> -- if the data is missing, read it from another node.  Without such a
>> feature, a workaround is to use snapshot -- when the stream data is
>> missing, trigger a snapshot.
>>
>> Please feel free to let us know if you have more questions.
>>
>> Tsz-Wo
>>
>>
>> On Tue, Jan 28, 2025 at 1:00 PM Frens Jan Rumph <[email protected]>
>> wrote:
>>
>>> Dear Ratis devs and users,
>>>
>>> I’m investigating use of Ratis for HA of RDF4J. In particular I’m
>>> wondering about implementation patterns/advice on the stream feature. I’ve
>>> read e.g.,
>>> https://blog.cloudera.com/ozone-write-pipeline-v2-with-ratis-streaming/ and
>>> I’ve got a small prototype working. But as the javadoc of
>>> org.apache.ratis.statemachine.StateMachine.DataApi#link indicates, the
>>> stream _may_ not be available.
>>>
>>> I understand there might be error cases that need to be handled. But
>>> what about when a node is added to a group? In my investigation so far it
>>> seems that also in that case the stream is unavailable. I reckon that
>>> replication/recovery of such stream needs to be done ‘outside’ of Ratis; is
>>> that correct? I didn’t see such facilities in the file store example:
>>> https://github.com/apache/ratis/blob/master/ratis-examples/src/main/java/org/apache/ratis/examples/filestore/FileStoreStateMachine.java.
>>> I’ve also tried to figure out how Ozone implements this, but this codebase
>>> is a bit to big to easily wrap my head around how this would work.
>>>
>>> I would appreciate any pointers with regards to this matter.
>>>
>>> Thanks!
>>> Frens Jan
>>>
>>>
>>>                Award-winning OSINT partner for Law Enforcement and
>>> Defence.
>>>
>>>
>>>
>>> *Frens Jan Rumph*
>>> Data platform engineering lead
>>>
>>> phone:
>>> site:
>>>
>>> pgp: +31 50 21 11 622
>>> web-iq.com
>>>
>>> CEE2 A4F1 972E 78C0 F816
>>> 86BB D096 18E2 3AC0 16E0
>>> The content of this email is confidential and intended for the
>>> recipient(s) specified in this message only. It is strictly forbidden to
>>> share any part of this message with any third party, without a written
>>> consent of the sender. If you received this message by mistake, please
>>> reply to this message and follow with its deletion, so that we can ensure
>>> such a mistake does not occur in the future.
>>>
>>>
>>>
>>
>

Reply via email to