Re: Streaming JSON RowSets (JENA-2302)

2022-03-10 Thread Andy Seaborne
This PR is looking good. It'll be great to have true stream 
implementation of JSON results for the normal cases of a result set that 
"head" then the "results" array.


Andy

On 09/03/2022 10:37, Claus Stadler wrote:

Dear all,


I want to inform you of an active PR for making RowSets over 
application/sparql-reults+json streaming



JIRA: https://issues.apache.org/jira/projects/JENA/issues/JENA-2302

PR: https://github.com/apache/jena/pull/1218


As nowadays JSON is the default content type used in Jena for sparql 
results, this PR is aimed at easing working with large sparql result 
sets by having streaming working out-of-the-box.


The implementation used by jena so far loaded json sparql result sets 
into memory first.



The JSON format itself allows for repeated keys (where the last one 
takes precedence) and keys may appear in any order - these things 
introduce a certain variety in how sparql result sets can be represented 
and that needs to be handled correctly by the implementation.



While the new implementation already succeeds on all existing jena 
tests, there is still the risk of breaking existing implementations that 
rely on certain behavior of the non-streaming approach.



Therefore, if you think this change might (negatively) affect you then 
please provide feedback on the proposed PR.



Best regards,

Claus Stadler



Streaming JSON RowSets (JENA-2302)

2022-03-09 Thread Claus Stadler

Dear all,


I want to inform you of an active PR for making RowSets over 
application/sparql-reults+json streaming



JIRA: https://issues.apache.org/jira/projects/JENA/issues/JENA-2302

PR: https://github.com/apache/jena/pull/1218


As nowadays JSON is the default content type used in Jena for sparql 
results, this PR is aimed at easing working with large sparql result 
sets by having streaming working out-of-the-box.


The implementation used by jena so far loaded json sparql result sets 
into memory first.



The JSON format itself allows for repeated keys (where the last one 
takes precedence) and keys may appear in any order - these things 
introduce a certain variety in how sparql result sets can be represented 
and that needs to be handled correctly by the implementation.



While the new implementation already succeeds on all existing jena 
tests, there is still the risk of breaking existing implementations that 
rely on certain behavior of the non-streaming approach.



Therefore, if you think this change might (negatively) affect you then 
please provide feedback on the proposed PR.



Best regards,

Claus Stadler

--
Dipl. Inf. Claus Stadler
Institute of Applied Informatics (InfAI) / University of Leipzig
Workpage & WebID: http://aksw.org/ClausStadler