Manybubbles added a comment.
This isn't really a scale out thing but i'm putting it here any way: One
option for installation is keep all the BlazeGraph servers independently up to
date using the same mechanism that we'd use to sync data to a cluster. In this
case we don't need HA at all -
Thompsonbry.systap added a comment.
This depends on how you model the reified RDF data. However, the inlined
statements about statements are not in the same part of the statement indices
as the ground statements. This is because the IVs all have a prefix byte that
includes whether the IV is
Thompsonbry.systap added a comment.
BlazeGraph supports arbitrary nesting of statements on statements, so, yes,
that would be fine.
TASK DETAIL
https://phabricator.wikimedia.org/T90117
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign
Smalyshev added a subscriber: Smalyshev.
Smalyshev added a comment.
@Thompsonbry.systap currently our reification model is pretty close to one
described here: http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf except
that we also have direct link between entity and value in addition to
Manybubbles added a comment.
One option might be to keep the truthy dump on nice SSDs and make sure those
are super duper fast but to support the more reified forms only on other
machines. Maybe those machines use spinning disks and allow more time between
writes?
TASK DETAIL
Jdouglas added a subscriber: Jdouglas.
Jdouglas added a comment.
BlazeGraph doesn't support clustering - only high availability
Does this refer to update scaling, or just query scaling? Blazegraph supports
replication clustering, which allows horizontal/linear //query// scaling.
Is
Manybubbles added a comment.
Already asked to get statistics on current usage but I imagine we can add
new nodes if we need to. I think we'll have to see what load is like once
we get it for wikigrok and see what its like to run in laba. For more data
scaling up looks to be the thing. I think