Manybubbles added a comment.

In https://phabricator.wikimedia.org/T90119#1065623, @Thompsonbry.systap wrote:

> The RDR inlining of reified statement models is handled by the 
> StatementBuffer class.   It is important to have a limited lexical scope in 
> the dump for the different RDF triples involved in the reified statement 
> model.  The code needs to buffer incomplete statement models until they 
> become complete statement models, at which point it can release the storage 
> associated with the partial model and write it out.  Also, if your output 
> includes a lot of blank nodes, it is a Good Idea to have limited resolution 
> scope for blank nodes since the parser must maintain them across the entire 
> document. Thus, outputting an RDF dump as a series of files can reduce the 
> parser overhead.


Are blanks nodes required for the RDR inlining?  Is there any way in Turtle or 
N-Triples to allow blank nodes to go out of scope?  I ask because we'll 
certainly be outputting the dump as a single large document - that is how our 
dumps work and fighting against that would be difficult.  We can create a tool 
to slice it smaller if there isn't a standard way to control scope.

I should note that this buffering thing removes one of the nicest parts about 
N-Triples: you can no longer just slice it on any new line to generate batches. 
 Its context sensitive again.


TASK DETAIL
  https://phabricator.wikimedia.org/T90119

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manybubbles
Cc: Thompsonbry.systap, Smalyshev, Manybubbles, Aklapper, Haasepeter, 
Beebs.systap, daniel, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, 
JanZerebecki



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to