date:20190807

Re: Sensible size limit for SPARQL update payload to Fuseki2?

2019-08-07 Thread Lorenz Buehmann




> 
>  "http://www.eclipse.org/jetty/configure_9_3.dtd";>
>
> 
>   
> org.eclipse.jetty.server.Request.maxFormContentSize
> 
>   
>
> [2019-08-06 17:07:48] Server ERROR SPARQLServer: Failed to configure 
> server: 0
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.jena.fuseki.cmd.JettyFusekiWebapp.configServer(JettyFusekiWebapp.java:297)
> at 
> org.apache.jena.fuseki.cmd.JettyFusekiWebapp.buildServerWebapp(JettyFusekiWebapp.java:243)
> at 
> org.apache.jena.fuseki.cmd.JettyFusekiWebapp.(JettyFusekiWebapp.java:99)
> at 
> org.apache.jena.fuseki.cmd.JettyFusekiWebapp.initializeServer(JettyFusekiWebapp.java:94)
> at org.apache.jena.fuseki.cmd.FusekiCmd.runFuseki(FusekiCmd.java:371)
> at 
> org.apache.jena.fuseki.cmd.FusekiCmd$FusekiCmdInner.exec(FusekiCmd.java:356)
> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> at 
> org.apache.jena.fuseki.cmd.FusekiCmd$FusekiCmdInner.innerMain(FusekiCmd.java:104)
> at org.apache.jena.fuseki.cmd.FusekiCmd.main(FusekiCmd.java:67)
>
> So I suppose I need a complete jetty config file rather than a snippet 
> (unless the above is erroneous anyway). I wasn't able to find the default 
> jetty configuration file in the jars. 
>
> I found this 
> https://github.com/apache/jena/blob/master/jena-fuseki2/examples/fuseki-jetty-https.xml
> But it mentions needing configuring further things and I have no clue how to 
> adapt it. 

if you don't need HTTPS resp. SSL, you can just remove everything that
configures the SSL stuff:
https://github.com/apache/jena/blob/master/jena-fuseki2/examples/fuseki-jetty-https.xml#L179-L285


And then add


    org.eclipse.jetty.server.Request.maxFormContentSize
    200
  

with whatever size you need.

And don't forget to change the port, in the file it's configured with
8082 compared to default 3030.

RE: Sensible size limit for SPARQL update payload to Fuseki2?

2019-08-07 Thread Pierre Grenon

Thank you, Lorenz.

I did as you suggest and made the changes indicated.

Fuseki started and seems to have accepted the jetty config. But then when 
trying to send the update the same error occurs and the limit seems unmodified 
(I used2).

Caused by: java.lang.IllegalStateException: Form too large: 100948991 > 1000
at 
org.eclipse.jetty.server.Request.extractFormParameters(Request.java:545)
at 
org.eclipse.jetty.server.Request.extractContentParameters(Request.java:475)
at org.eclipse.jetty.server.Request.getParameters(Request.java:386)
... 50 more

Can it be that the config does not override some default set elsewhere in 
Fuseki?

I’ll try to figure if I’m not doing something else wrong…

Many thanks,
Pierre

For reference:
https://www.eclipse.org/jetty/documentation/current/configuring-form-size.html



From: Lorenz Buehmann [mailto:buehm...@informatik.uni-leipzig.de]
Sent: 07 August 2019 08:08
To: users@jena.apache.org
Subject: Re: Sensible size limit for SPARQL update payload to Fuseki2?



> 
>  "http://www.eclipse.org/jetty/configure_9_3.dtd">
>
> 
> 
> org.eclipse.jetty.server.Request.maxFormContentSize
> 
> 
>
> [2019-08-06 17:07:48] Server ERROR SPARQLServer: Failed to configure server: 0
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.jena.fuseki.cmd.JettyFusekiWebapp.configServer(JettyFusekiWebapp.java:297)
> at 
> org.apache.jena.fuseki.cmd.JettyFusekiWebapp.buildServerWebapp(JettyFusekiWebapp.java:243)
> at 
> org.apache.jena.fuseki.cmd.JettyFusekiWebapp.(http://JettyFusekiWebapp.java:99)
> at 
> org.apache.jena.fuseki.cmd.JettyFusekiWebapp.initializeServer(JettyFusekiWebapp.java:94)
> at org.apache.jena.fuseki.cmd.FusekiCmd.runFuseki(FusekiCmd.java:371)
> at 
> org.apache.jena.fuseki.cmd.FusekiCmd$FusekiCmdInner.exec(FusekiCmd.java:356)
> at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
> at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> at 
> org.apache.jena.fuseki.cmd.FusekiCmd$FusekiCmdInner.innerMain(FusekiCmd.java:104)
> at org.apache.jena.fuseki.cmd.FusekiCmd.main(FusekiCmd.java:67)
>
> So I suppose I need a complete jetty config file rather than a snippet 
> (unless the above is erroneous anyway). I wasn't able to find the default 
> jetty configuration file in the jars.
>
> I found this 
> https://github.com/apache/jena/blob/master/jena-fuseki2/examples/fuseki-jetty-https.xml
> But it mentions needing configuring further things and I have no clue how to 
> adapt it.

if you don't need HTTPS resp. SSL, you can just remove everything that
configures the SSL stuff:
https://github.com/apache/jena/blob/master/jena-fuseki2/examples/fuseki-jetty-https.xml#L179-L285


And then add


org.eclipse.jetty.server.Request.maxFormContentSize
200
  

with whatever size you need.

And don't forget to change the port, in the file it's configured with
8082 compared to default 3030.

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION. 
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL 
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS 
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE 
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN. 

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES 
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS 
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT 
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE 
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION 
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014). 
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION 
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION 
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS, 
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/. 

HORIZON ASSET LLP IS AUTHORISED AND REGULATED 
BY THE FINANCIAL CONDUCT AUTHORITY.

Re: How to parse huge RDF data in a tar.gz file.

2019-08-07 Thread Andy Seaborne


Yasunori,

It should be possible to pass the InputStream for the tar entry contents 
directly to the RDFParserBuilder.source, no need to convert to a string 
first.


IIRC TarArchiveInputStream is a bit weird - it signals "end of file" at 
the end of the tar archive entry, the the app moves to the next entry 
and the input stream is then for that entry and can be passed to a new 
RDFParserBuilder call.


An RDFParser does not close an inputStream it is passed.

It will need a new RDFParser for each entry.

If that is now hat is happened, please let us know.

Andy


On 06/08/2019 23:31, Yasunori Yamamoto wrote:

Files in a tar are in RDF/XML or Turtle.

Yasunori

2019/08/07 3:11、ajs6f のメール:

In what format are these RDF files?

ajs6f


On Aug 6, 2019, at 10:05 AM, Yasunori Yamamoto  wrote:

Hello, I'm trying to learn how to parse RDF data archived in a tar.gz
file (e.g., rdfdatasets.tar.gz that contains a set of RDF data files)
within my Java program.
The following code does work properly, but it is inefficient because
the process reads and loads the entire RDF data in an entry of the
given tar.gz file into a main memory before parsing.
So, could you please let me know a better way to save a memory space ?

TarArchiveInputStream tarInput = new TarArchiveInputStream(new
GzipCompressorInputStream(new FileInputStream(filename)));
TarArchiveEntry currentEntry;
PipedRDFIterator iter = new
PipedRDFIterator(buffersize, false, pollTimeout, maxPolls);
final PipedRDFStream inputStream = new PipedTriplesStream(iter);

while ((currentEntry = tarInput.getNextTarEntry()) != null) {
String currentFile = currentEntry.getName();
Lang lang = RDFLanguages.filenameToLang(currentFile);
parser_object = RDFParserBuilder
   .create()
   .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
   .source(new StringReader(CharStreams.toString(new
InputStreamReader(tarInput
   .checking(checking)
   .lang(lang)
   .build();
parser_object.parse(inputStream);
}
tarInput.close();

Sincerely yours,
Yasunori Yamamoto

Re: Sensible size limit for SPARQL update payload to Fuseki2?

2019-08-07 Thread Martynas Jusevičius

Pierre,

what are you trying to do? Does the INSERT contain some variables/do
some pattern matching?

If not (e.g. it's INSERT DATA), then you might be better off using the
Graph Store Protocol:
https://jena.apache.org/documentation/fuseki2/soh.html#soh-sparql-http

On Wed, Aug 7, 2019 at 9:49 AM Pierre Grenon
 wrote:
>
> Thank you, Lorenz.
>
> I did as you suggest and made the changes indicated.
>
> Fuseki started and seems to have accepted the jetty config. But then when 
> trying to send the update the same error occurs and the limit seems 
> unmodified (I used2).
>
> Caused by: java.lang.IllegalStateException: Form too large: 100948991 > 
> 1000
> at 
> org.eclipse.jetty.server.Request.extractFormParameters(Request.java:545)
> at 
> org.eclipse.jetty.server.Request.extractContentParameters(Request.java:475)
> at org.eclipse.jetty.server.Request.getParameters(Request.java:386)
> ... 50 more
>
> Can it be that the config does not override some default set elsewhere in 
> Fuseki?
>
> I’ll try to figure if I’m not doing something else wrong…
>
> Many thanks,
> Pierre
>
> For reference:
> https://www.eclipse.org/jetty/documentation/current/configuring-form-size.html
>
>
>
> From: Lorenz Buehmann [mailto:buehm...@informatik.uni-leipzig.de]
> Sent: 07 August 2019 08:08
> To: users@jena.apache.org
> Subject: Re: Sensible size limit for SPARQL update payload to Fuseki2?
>
>
>
> > 
> >  > "http://www.eclipse.org/jetty/configure_9_3.dtd">
> >
> > 
> > 
> > org.eclipse.jetty.server.Request.maxFormContentSize
> > 
> > 
> >
> > [2019-08-06 17:07:48] Server ERROR SPARQLServer: Failed to configure 
> > server: 0
> > java.lang.ArrayIndexOutOfBoundsException: 0
> > at 
> > org.apache.jena.fuseki.cmd.JettyFusekiWebapp.configServer(JettyFusekiWebapp.java:297)
> > at 
> > org.apache.jena.fuseki.cmd.JettyFusekiWebapp.buildServerWebapp(JettyFusekiWebapp.java:243)
> > at 
> > org.apache.jena.fuseki.cmd.JettyFusekiWebapp.(http://JettyFusekiWebapp.java:99)
> > at 
> > org.apache.jena.fuseki.cmd.JettyFusekiWebapp.initializeServer(JettyFusekiWebapp.java:94)
> > at org.apache.jena.fuseki.cmd.FusekiCmd.runFuseki(FusekiCmd.java:371)
> > at 
> > org.apache.jena.fuseki.cmd.FusekiCmd$FusekiCmdInner.exec(FusekiCmd.java:356)
> > at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
> > at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
> > at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
> > at 
> > org.apache.jena.fuseki.cmd.FusekiCmd$FusekiCmdInner.innerMain(FusekiCmd.java:104)
> > at org.apache.jena.fuseki.cmd.FusekiCmd.main(FusekiCmd.java:67)
> >
> > So I suppose I need a complete jetty config file rather than a snippet 
> > (unless the above is erroneous anyway). I wasn't able to find the default 
> > jetty configuration file in the jars.
> >
> > I found this 
> > https://github.com/apache/jena/blob/master/jena-fuseki2/examples/fuseki-jetty-https.xml
> > But it mentions needing configuring further things and I have no clue how 
> > to adapt it.
>
> if you don't need HTTPS resp. SSL, you can just remove everything that
> configures the SSL stuff:
> https://github.com/apache/jena/blob/master/jena-fuseki2/examples/fuseki-jetty-https.xml#L179-L285
>
>
> And then add
>
> 
> org.eclipse.jetty.server.Request.maxFormContentSize
> 200
>   
>
> with whatever size you need.
>
> And don't forget to change the port, in the file it's configured with
> 8082 compared to default 3030.
>
> THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION.
> IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL
> IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS
> E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE
> MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN.
>
> IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES
> MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS
> "THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT
> MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE
> FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION
> (AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014).
> (https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
> COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION
> AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION
> ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS,
> PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/.
>
> HORIZON ASSET LLP IS AUTHORISED AND REGULATED
> BY THE FINANCIAL CONDUCT AUTHORITY.
>
>

Re: Sensible size limit for SPARQL update payload to Fuseki2?

2019-08-07 Thread Andy Seaborne


Pierre,

RDFLib/SPARQLWrapper is using an HTML form upload.

The scalable way is to POST with "Content-type: 
application/sparql-update" and the INSERT in the body, then it will 
stream - directly reading the update from the HTTP input stream with no 
HTML Form (Request.extractFormParameters) on the execution path.


Fro an HTML form, the entire requests ends up in memory - its the way 
that HTML forms have to handled to see all the name=value pairs in the 
form. Incidentally, the same is true in the client.


The default form size is already bumped up to 10M from the Jetty default 
of 200K.


If the server is running in verbose mode, the entire SPARQL update is 
read in for logging/debugging purposes.


The default jetty configuration is in code. For the form size, that is 
JettyFusekiWebapp.createWebApp which is 10M - we can make that default 
bigger but not 101M which is the request.


Otherwise, break the request into parts and send multiple requests.

Andy

On 07/08/2019 08:49, Pierre Grenon wrote:

Thank you, Lorenz.

I did as you suggest and made the changes indicated.

Fuseki started and seems to have accepted the jetty config. But then when trying to send 
the update the same error occurs and the limit seems unmodified (I used
2).

Caused by: java.lang.IllegalStateException: Form too large: 100948991 > 1000
 at 
org.eclipse.jetty.server.Request.extractFormParameters(Request.java:545)
 at 
org.eclipse.jetty.server.Request.extractContentParameters(Request.java:475)
 at org.eclipse.jetty.server.Request.getParameters(Request.java:386)
 ... 50 more

Can it be that the config does not override some default set elsewhere in 
Fuseki?

I’ll try to figure if I’m not doing something else wrong…

Many thanks,
Pierre

For reference:
https://www.eclipse.org/jetty/documentation/current/configuring-form-size.html

Re: Sensible size limit for SPARQL update payload to Fuseki2?

2019-08-07 Thread Laura Morales

Basically, this request should work?

POST /database HTTP/1.1
Host: example.com
Content-Type: application/sparql-update

INSERT DATA { < 100 MB of triples > }




> Sent: Wednesday, August 07, 2019 at 10:44 AM
> From: "Andy Seaborne" 
> To: users@jena.apache.org
> Subject: Re: Sensible size limit for SPARQL update payload to Fuseki2?
>
> Pierre,
> 
> RDFLib/SPARQLWrapper is using an HTML form upload.
> 
> The scalable way is to POST with "Content-type: 
> application/sparql-update" and the INSERT in the body, then it will 
> stream - directly reading the update from the HTTP input stream with no 
> HTML Form (Request.extractFormParameters) on the execution path.
> 
> Fro an HTML form, the entire requests ends up in memory - its the way 
> that HTML forms have to handled to see all the name=value pairs in the 
> form. Incidentally, the same is true in the client.
> 
> The default form size is already bumped up to 10M from the Jetty default 
> of 200K.
> 
> If the server is running in verbose mode, the entire SPARQL update is 
> read in for logging/debugging purposes.
> 
> The default jetty configuration is in code. For the form size, that is 
> JettyFusekiWebapp.createWebApp which is 10M - we can make that default 
> bigger but not 101M which is the request.
> 
> Otherwise, break the request into parts and send multiple requests.
> 
>  Andy
> 
> On 07/08/2019 08:49, Pierre Grenon wrote:
> > Thank you, Lorenz.
> > 
> > I did as you suggest and made the changes indicated.
> > 
> > Fuseki started and seems to have accepted the jetty config. But then when 
> > trying to send the update the same error occurs and the limit seems 
> > unmodified (I used2).
> > 
> > Caused by: java.lang.IllegalStateException: Form too large: 100948991 > 
> > 1000
> >  at 
> > org.eclipse.jetty.server.Request.extractFormParameters(Request.java:545)
> >  at 
> > org.eclipse.jetty.server.Request.extractContentParameters(Request.java:475)
> >  at org.eclipse.jetty.server.Request.getParameters(Request.java:386)
> >  ... 50 more
> > 
> > Can it be that the config does not override some default set elsewhere in 
> > Fuseki?
> > 
> > I’ll try to figure if I’m not doing something else wrong…
> > 
> > Many thanks,
> > Pierre
> > 
> > For reference:
> > https://www.eclipse.org/jetty/documentation/current/configuring-form-size.html
>

RE: Sensible size limit for SPARQL update payload to Fuseki2?

2019-08-07 Thread Pierre Grenon

Thanks, Andy. 

I'll drop the jetty config attempt. (I surmise that the concern with bumping 
very high the form size would cause 'performance' issues.) I can somewhat toy 
with the jetty locally for testing but in any event it becomes tedious when 
working with remote servers deployed in varied ways. So, I'll keep the strategy 
client side. 

With many thanks, 
Pierre

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION. 
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL 
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS 
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE 
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN. 

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES 
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS 
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT 
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE 
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION 
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014). 
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION 
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION 
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS, 
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/. 

HORIZON ASSET LLP IS AUTHORISED AND REGULATED 
BY THE FINANCIAL CONDUCT AUTHORITY.

From: Andy Seaborne [mailto:a...@apache.org] 
Sent: 07 August 2019 09:45
To: users@jena.apache.org
Subject: Re: Sensible size limit for SPARQL update payload to Fuseki2?

Pierre,

RDFLib/SPARQLWrapper is using an HTML form upload.

The scalable way is to POST with "Content-type: 
application/sparql-update" and the INSERT in the body, then it will 
stream - directly reading the update from the HTTP input stream with no 
HTML Form (Request.extractFormParameters) on the execution path.

Fro an HTML form, the entire requests ends up in memory - its the way 
that HTML forms have to handled to see all the name=value pairs in the 
form. Incidentally, the same is true in the client.

The default form size is already bumped up to 10M from the Jetty default 
of 200K.

If the server is running in verbose mode, the entire SPARQL update is 
read in for logging/debugging purposes.

The default jetty configuration is in code. For the form size, that is 
JettyFusekiWebapp.createWebApp which is 10M - we can make that default 
bigger but not 101M which is the request.

Otherwise, break the request into parts and send multiple requests.

Andy

On 07/08/2019 08:49, Pierre Grenon wrote:
> Thank you, Lorenz.
> 
> I did as you suggest and made the changes indicated.
> 
> Fuseki started and seems to have accepted the jetty config. But then when 
> trying to send the update the same error occurs and the limit seems 
> unmodified (I used 2).
> 
> Caused by: java.lang.IllegalStateException: Form too large: 100948991 > 
> 1000
> at org.eclipse.jetty.server.Request.extractFormParameters(Request.java:545)
> at org.eclipse.jetty.server.Request.extractContentParameters(Request.java:475)
> at org.eclipse.jetty.server.Request.getParameters(Request.java:386)
> ... 50 more
> 
> Can it be that the config does not override some default set elsewhere in 
> Fuseki?
> 
> I’ll try to figure if I’m not doing something else wrong…
> 
> Many thanks,
> Pierre
> 
> For reference:
> https://www.eclipse.org/jetty/documentation/current/configuring-form-size.html

RE: Sensible size limit for SPARQL update payload to Fuseki2?

2019-08-07 Thread Pierre Grenon

FYC I basically do this 

from SPARQLWrapper import SPARQLWrapper

# a lot longer
myString = "INSERT DATA {}"

def insertFromString(url, sparql): 
endpoint = SPARQLWrapper(url)
endpoint.setQuery(sparql)
endpoint.method = 'POST'
endpoint.query()

insertFromString('http://localhost:3030/myDS/update', myString)

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION. 
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL 
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS 
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE 
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN. 

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES 
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS 
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT 
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE 
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION 
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014). 
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION 
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION 
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS, 
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/. 

HORIZON ASSET LLP IS AUTHORISED AND REGULATED 
BY THE FINANCIAL CONDUCT AUTHORITY.

From: Laura Morales [mailto:laure...@mail.com] 
Sent: 07 August 2019 10:29
To: users@jena.apache.org
Cc: users@jena.apache.org
Subject: Re: Sensible size limit for SPARQL update payload to Fuseki2?

Basically, this request should work?

POST /database HTTP/1.1
Host: example.com
Content-Type: application/sparql-update

INSERT DATA { < 100 MB of triples > }

> Sent: Wednesday, August 07, 2019 at 10:44 AM
> From: "Andy Seaborne" 
> To: users@jena.apache.org
> Subject: Re: Sensible size limit for SPARQL update payload to Fuseki2?
>
> Pierre,
> 
> RDFLib/SPARQLWrapper is using an HTML form upload.
> 
> The scalable way is to POST with "Content-type: 
> application/sparql-update" and the INSERT in the body, then it will 
> stream - directly reading the update from the HTTP input stream with no 
> HTML Form (Request.extractFormParameters) on the execution path.
> 
> Fro an HTML form, the entire requests ends up in memory - its the way 
> that HTML forms have to handled to see all the name=value pairs in the 
> form. Incidentally, the same is true in the client.
> 
> The default form size is already bumped up to 10M from the Jetty default 
> of 200K.
> 
> If the server is running in verbose mode, the entire SPARQL update is 
> read in for logging/debugging purposes.
> 
> The default jetty configuration is in code. For the form size, that is 
> JettyFusekiWebapp.createWebApp which is 10M - we can make that default 
> bigger but not 101M which is the request.
> 
> Otherwise, break the request into parts and send multiple requests.
> 
> Andy
> 
> On 07/08/2019 08:49, Pierre Grenon wrote:
> > Thank you, Lorenz.
> > 
> > I did as you suggest and made the changes indicated.
> > 
> > Fuseki started and seems to have accepted the jetty config. But then when 
> > trying to send the update the same error occurs and the limit seems 
> > unmodified (I used 2).
> > 
> > Caused by: java.lang.IllegalStateException: Form too large: 100948991 > 
> > 1000
> > at org.eclipse.jetty.server.Request.extractFormParameters(Request.java:545)
> > at 
> > org.eclipse.jetty.server.Request.extractContentParameters(Request.java:475)
> > at org.eclipse.jetty.server.Request.getParameters(Request.java:386)
> > ... 50 more
> > 
> > Can it be that the config does not override some default set elsewhere in 
> > Fuseki?
> > 
> > I’ll try to figure if I’m not doing something else wrong…
> > 
> > Many thanks,
> > Pierre
> > 
> > For reference:
> > https://www.eclipse.org/jetty/documentation/current/configuring-form-size.html
>

Re: RE: Sensible size limit for SPARQL update payload to Fuseki2?

2019-08-07 Thread Laura Morales

> from SPARQLWrapper import SPARQLWrapper
>
> # a lot longer
> myString = "INSERT DATA {}"
>
> def insertFromString(url, sparql):
> endpoint = SPARQLWrapper(url)
> endpoint.setQuery(sparql)
> endpoint.method = 'POST'
> endpoint.query()
>
> insertFromString('http://localhost:3030/myDS/update', myString)


can you change the html headers and see if it works? Something like this:

endpoint.addCustomHttpHeader("Content-Type", "application/sparql-update")

RE: Sensible size limit for SPARQL update payload to Fuseki2?

2019-08-07 Thread Pierre Grenon

Following Andy's suggestion works perfectly indeed --- many thanks again!

Best, 
Pierre

import requests

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION. 
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL 
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS 
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE 
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN. 

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES 
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS 
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT 
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE 
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION 
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014). 
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION 
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION 
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS, 
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/. 

HORIZON ASSET LLP IS AUTHORISED AND REGULATED 
BY THE FINANCIAL CONDUCT AUTHORITY.


def insertFromString(url, sparql):
req_ = requests.post(url, headers = 
{"Content-Type":"application/sparql-update"}, data = sparql)
resp_ = req_.content 
return(resp_)

insertFromString('http://localhost:3030/myDS/update', myString)

Then 

queryFn(qls2, "select (Count(?x) as ?noot) where {?x  ?y ?z}")

  noot.value
0 899401

> -Original Message-
> From: Pierre Grenon
> Sent: 07 August 2019 12:10
> To: 'users@jena.apache.org'
> Subject: RE: Sensible size limit for SPARQL update payload to Fuseki2?
> 
> 
> FYC I basically do this
> 
> from SPARQLWrapper import SPARQLWrapper
> 
> # a lot longer
> myString = "INSERT DATA {}"
> 
> def insertFromString(url, sparql):
> endpoint = SPARQLWrapper(url)
> endpoint.setQuery(sparql)
> endpoint.method = 'POST'
> endpoint.query()
> 
> insertFromString('http://localhost:3030/myDS/update', myString)
> 
> 
> From: Laura Morales [mailto:laure...@mail.com]
> Sent: 07 August 2019 10:29
> To: users@jena.apache.org
> Cc: users@jena.apache.org
> Subject: Re: Sensible size limit for SPARQL update payload to Fuseki2?
> 
> Basically, this request should work?
> 
> POST /database HTTP/1.1
> Host: example.com
> Content-Type: application/sparql-update
> 
> INSERT DATA { < 100 MB of triples > }
> 
> 
> 
> 
> > Sent: Wednesday, August 07, 2019 at 10:44 AM
> > From: "Andy Seaborne" 
> > To: users@jena.apache.org
> > Subject: Re: Sensible size limit for SPARQL update payload to Fuseki2?
> >
> > Pierre,
> >
> > RDFLib/SPARQLWrapper is using an HTML form upload.
> >
> > The scalable way is to POST with "Content-type:
> > application/sparql-update" and the INSERT in the body, then it will
> > stream - directly reading the update from the HTTP input stream with no
> > HTML Form (Request.extractFormParameters) on the execution path.
> >
> > Fro an HTML form, the entire requests ends up in memory - its the way
> > that HTML forms have to handled to see all the name=value pairs in the
> > form. Incidentally, the same is true in the client.
> >
> > The default form size is already bumped up to 10M from the Jetty default
> > of 200K.
> >
> > If the server is running in verbose mode, the entire SPARQL update is
> > read in for logging/debugging purposes.
> >
> > The default jetty configuration is in code. For the form size, that is
> > JettyFusekiWebapp.createWebApp which is 10M - we can make that
> default
> > bigger but not 101M which is the request.
> >
> > Otherwise, break the request into parts and send multiple requests.
> >
> > Andy
> >
> > On 07/08/2019 08:49, Pierre Grenon wrote:
> > > Thank you, Lorenz.
> > >
> > > I did as you suggest and made the changes indicated.
> > >
> > > Fuseki started and seems to have accepted the jetty config. But then
> when trying to send the update the same error occurs and the limit seems
> unmodified (I used 2).
> > >
> > > Caused by: java.lang.IllegalStateException: Form too large: 100948991 >
> 1000
> > > at
> org.eclipse.jetty.server.Request.extractFormParameters(Request.java:545)
> > > at
> org.eclipse.jetty.server.Request.extractContentParameters(Request.java:475)
> > > at org.eclipse.jetty.server.Request.getParameters(Request.java:386)
> > > ... 50 more
> > >
> > > Can it be that the config does not override some default set elsewhere in
> Fuseki?
> > >
> > > I’ll try to figure if I’m not doing something else wrong…
> > >
> > > Many thanks,
> > > Pierre
> > >
> > > For reference:
> > > https://www.eclipse.org/jetty/documentation/current/configuring-form-
> size.html
> >

RE: RE: Sensible size limit for SPARQL update payload to Fuseki2?

2019-08-07 Thread Pierre Grenon

As indicated by Andy, not with that library but just using requests work -- see 
follow up email

Best, 
P

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION. 
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL 
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS 
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE 
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN. 

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES 
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS 
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT 
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE 
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION 
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014). 
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION 
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION 
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS, 
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/. 

HORIZON ASSET LLP IS AUTHORISED AND REGULATED 
BY THE FINANCIAL CONDUCT AUTHORITY.


> -Original Message-
> From: Laura Morales [mailto:laure...@mail.com]
> Sent: 07 August 2019 12:19
> To: users@jena.apache.org
> Cc: users@jena.apache.org
> Subject: Re: RE: Sensible size limit for SPARQL update payload to Fuseki2?
> 
> > from SPARQLWrapper import SPARQLWrapper
> >
> > # a lot longer
> > myString = "INSERT DATA {}"
> >
> > def insertFromString(url, sparql):
> > endpoint = SPARQLWrapper(url)
> > endpoint.setQuery(sparql)
> > endpoint.method = 'POST'
> > endpoint.query()
> >
> > insertFromString('http://localhost:3030/myDS/update', myString)
> 
> 
> can you change the html headers and see if it works? Something like this:
> 
> endpoint.addCustomHttpHeader("Content-Type", "application/sparql-
> update")

Re: How to parse huge RDF data in a tar.gz file.

2019-08-07 Thread Yasunori Yamamoto

Hi Andy,

Thank you for your reply.
Is the following code what you assume?
If so, it crashed with Exception in thread "main"
java.lang.NullPointerException.

TarArchiveInputStream tarInput = new TarArchiveInputStream(new ...);
TarArchiveEntry currentEntry;
while ((currentEntry = tarInput.getNextTarEntry()) != null) {
...
  parser_object = RDFParserBuilder
.create()
.errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
.source(tarInput)
.checking(checking)
.lang(lang)
.build();
...
}

Error stack follows.
at 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:296)
at java.io.InputStream.skip(java.base@9-internal/InputStream.java:351)
at org.apache.commons.compress.utils.IOUtils.skip(IOUtils.java:111)
at 
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.skipRecordPadding(TarArchiveInputStream.java:344)
at 
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:271)
at ... ( where my code calls tarInput.getNextTarEntry() )

Regards,
Yasunori

2019年8月7日(水) 18:04 Andy Seaborne :
>
> Yasunori,
>
> It should be possible to pass the InputStream for the tar entry contents
> directly to the RDFParserBuilder.source, no need to convert to a string
> first.
>
> IIRC TarArchiveInputStream is a bit weird - it signals "end of file" at
> the end of the tar archive entry, the the app moves to the next entry
> and the input stream is then for that entry and can be passed to a new
> RDFParserBuilder call.
>
> An RDFParser does not close an inputStream it is passed.
>
> It will need a new RDFParser for each entry.
>
> If that is now hat is happened, please let us know.
>
>  Andy
>
>
> On 06/08/2019 23:31, Yasunori Yamamoto wrote:
> > Files in a tar are in RDF/XML or Turtle.
> >
> > Yasunori
> >
> > 2019/08/07 3:11、ajs6f のメール:
> >
> > In what format are these RDF files?
> >
> > ajs6f
> >
> >> On Aug 6, 2019, at 10:05 AM, Yasunori Yamamoto  
> >> wrote:
> >>
> >> Hello, I'm trying to learn how to parse RDF data archived in a tar.gz
> >> file (e.g., rdfdatasets.tar.gz that contains a set of RDF data files)
> >> within my Java program.
> >> The following code does work properly, but it is inefficient because
> >> the process reads and loads the entire RDF data in an entry of the
> >> given tar.gz file into a main memory before parsing.
> >> So, could you please let me know a better way to save a memory space ?
> >>
> >> TarArchiveInputStream tarInput = new TarArchiveInputStream(new
> >> GzipCompressorInputStream(new FileInputStream(filename)));
> >> TarArchiveEntry currentEntry;
> >> PipedRDFIterator iter = new
> >> PipedRDFIterator(buffersize, false, pollTimeout, maxPolls);
> >> final PipedRDFStream inputStream = new PipedTriplesStream(iter);
> >>
> >> while ((currentEntry = tarInput.getNextTarEntry()) != null) {
> >> String currentFile = currentEntry.getName();
> >> Lang lang = RDFLanguages.filenameToLang(currentFile);
> >> parser_object = RDFParserBuilder
> >>.create()
> >>.errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
> >>.source(new StringReader(CharStreams.toString(new
> >> InputStreamReader(tarInput
> >>.checking(checking)
> >>.lang(lang)
> >>.build();
> >> parser_object.parse(inputStream);
> >> }
> >> tarInput.close();
> >>
> >> Sincerely yours,
> >> Yasunori Yamamoto

Re: How to parse huge RDF data in a tar.gz file.

2019-08-07 Thread Andy Seaborne


Presumably on the second entry?

Protect the parser stream from the unwanted close with 
CloseShieldInputStream:


On 07/08/2019 17:57, Yasunori Yamamoto wrote:

Hi Andy,

Thank you for your reply.
Is the following code what you assume?
If so, it crashed with Exception in thread "main"
java.lang.NullPointerException.

TarArchiveInputStream tarInput = new TarArchiveInputStream(new ...);
TarArchiveEntry currentEntry;
while ((currentEntry = tarInput.getNextTarEntry()) != null) {
...


InputStream in = new CloseShieldInputStream(tarInput);


   parser_object = RDFParserBuilder
 .create()
 .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
 .source(tarInput)


   .source(in)


 .checking(checking)
 .lang(lang)
 .build();
...
}




which is also a good way of putting a breakpoint to track down who is 
calling close()


(TokenizeText if the first entry is a Turtle file. And  in the JDK XML 
parser if the fist entry is RDF/XML)


Andy


Error stack follows.
at 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:296)
at java.io.InputStream.skip(java.base@9-internal/InputStream.java:351)
at org.apache.commons.compress.utils.IOUtils.skip(IOUtils.java:111)
at 
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.skipRecordPadding(TarArchiveInputStream.java:344)
at 
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:271)
at ... ( where my code calls tarInput.getNextTarEntry() )

Regards,
Yasunori

2019年8月7日(水) 18:04 Andy Seaborne :


Yasunori,

It should be possible to pass the InputStream for the tar entry contents
directly to the RDFParserBuilder.source, no need to convert to a string
first.

IIRC TarArchiveInputStream is a bit weird - it signals "end of file" at
the end of the tar archive entry, the the app moves to the next entry
and the input stream is then for that entry and can be passed to a new
RDFParserBuilder call.

An RDFParser does not close an inputStream it is passed.

It will need a new RDFParser for each entry.

If that is now hat is happened, please let us know.

  Andy


On 06/08/2019 23:31, Yasunori Yamamoto wrote:

Files in a tar are in RDF/XML or Turtle.

Yasunori

2019/08/07 3:11、ajs6f のメール:

In what format are these RDF files?

ajs6f


On Aug 6, 2019, at 10:05 AM, Yasunori Yamamoto  wrote:

Hello, I'm trying to learn how to parse RDF data archived in a tar.gz
file (e.g., rdfdatasets.tar.gz that contains a set of RDF data files)
within my Java program.
The following code does work properly, but it is inefficient because
the process reads and loads the entire RDF data in an entry of the
given tar.gz file into a main memory before parsing.
So, could you please let me know a better way to save a memory space ?

TarArchiveInputStream tarInput = new TarArchiveInputStream(new
GzipCompressorInputStream(new FileInputStream(filename)));
TarArchiveEntry currentEntry;
PipedRDFIterator iter = new
PipedRDFIterator(buffersize, false, pollTimeout, maxPolls);
final PipedRDFStream inputStream = new PipedTriplesStream(iter);

while ((currentEntry = tarInput.getNextTarEntry()) != null) {
String currentFile = currentEntry.getName();
Lang lang = RDFLanguages.filenameToLang(currentFile);
parser_object = RDFParserBuilder
.create()
.errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
.source(new StringReader(CharStreams.toString(new
InputStreamReader(tarInput
.checking(checking)
.lang(lang)
.build();
parser_object.parse(inputStream);
}
tarInput.close();

Sincerely yours,
Yasunori Yamamoto

Re: How to parse huge RDF data in a tar.gz file.

2019-08-07 Thread Yasunori Yamamoto

Thank you very much!
Yes, the crash was on the second entry, and I modified the code to use
CloseShieldInputStream.
It works without any problems.

2019年8月8日(木) 2:32 Andy Seaborne :
>
> Presumably on the second entry?
>
> Protect the parser stream from the unwanted close with
> CloseShieldInputStream:
>
> On 07/08/2019 17:57, Yasunori Yamamoto wrote:
> > Hi Andy,
> >
> > Thank you for your reply.
> > Is the following code what you assume?
> > If so, it crashed with Exception in thread "main"
> > java.lang.NullPointerException.
> >
> > TarArchiveInputStream tarInput = new TarArchiveInputStream(new ...);
> > TarArchiveEntry currentEntry;
> > while ((currentEntry = tarInput.getNextTarEntry()) != null) {
> > ...
>
>  InputStream in = new CloseShieldInputStream(tarInput);
>
> >parser_object = RDFParserBuilder
> >  .create()
> >  .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
> >  .source(tarInput)
>
> .source(in)
>
> >  .checking(checking)
> >  .lang(lang)
> >  .build();
> > ...
> > }
> >
>
>
> which is also a good way of putting a breakpoint to track down who is
> calling close()
>
> (TokenizeText if the first entry is a Turtle file. And  in the JDK XML
> parser if the fist entry is RDF/XML)
>
>  Andy
>
> > Error stack follows.
> > at 
> > org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:296)
> > at java.io.InputStream.skip(java.base@9-internal/InputStream.java:351)
> > at org.apache.commons.compress.utils.IOUtils.skip(IOUtils.java:111)
> > at 
> > org.apache.commons.compress.archivers.tar.TarArchiveInputStream.skipRecordPadding(TarArchiveInputStream.java:344)
> > at 
> > org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:271)
> > at ... ( where my code calls tarInput.getNextTarEntry() )
> >
> > Regards,
> > Yasunori
> >
> > 2019年8月7日(水) 18:04 Andy Seaborne :
> >>
> >> Yasunori,
> >>
> >> It should be possible to pass the InputStream for the tar entry contents
> >> directly to the RDFParserBuilder.source, no need to convert to a string
> >> first.
> >>
> >> IIRC TarArchiveInputStream is a bit weird - it signals "end of file" at
> >> the end of the tar archive entry, the the app moves to the next entry
> >> and the input stream is then for that entry and can be passed to a new
> >> RDFParserBuilder call.
> >>
> >> An RDFParser does not close an inputStream it is passed.
> >>
> >> It will need a new RDFParser for each entry.
> >>
> >> If that is now hat is happened, please let us know.
> >>
> >>   Andy
> >>
> >>
> >> On 06/08/2019 23:31, Yasunori Yamamoto wrote:
> >>> Files in a tar are in RDF/XML or Turtle.
> >>>
> >>> Yasunori
> >>>
> >>> 2019/08/07 3:11、ajs6f のメール:
> >>>
> >>> In what format are these RDF files?
> >>>
> >>> ajs6f
> >>>
>  On Aug 6, 2019, at 10:05 AM, Yasunori Yamamoto  
>  wrote:
> 
>  Hello, I'm trying to learn how to parse RDF data archived in a tar.gz
>  file (e.g., rdfdatasets.tar.gz that contains a set of RDF data files)
>  within my Java program.
>  The following code does work properly, but it is inefficient because
>  the process reads and loads the entire RDF data in an entry of the
>  given tar.gz file into a main memory before parsing.
>  So, could you please let me know a better way to save a memory space ?
> 
>  TarArchiveInputStream tarInput = new TarArchiveInputStream(new
>  GzipCompressorInputStream(new FileInputStream(filename)));
>  TarArchiveEntry currentEntry;
>  PipedRDFIterator iter = new
>  PipedRDFIterator(buffersize, false, pollTimeout, maxPolls);
>  final PipedRDFStream inputStream = new PipedTriplesStream(iter);
> 
>  while ((currentEntry = tarInput.getNextTarEntry()) != null) {
>  String currentFile = currentEntry.getName();
>  Lang lang = RDFLanguages.filenameToLang(currentFile);
>  parser_object = RDFParserBuilder
>  .create()
>  .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
>  .source(new StringReader(CharStreams.toString(new
>  InputStreamReader(tarInput
>  .checking(checking)
>  .lang(lang)
>  .build();
>  parser_object.parse(inputStream);
>  }
>  tarInput.close();
> 
>  Sincerely yours,
>  Yasunori Yamamoto

Re: Sensible size limit for SPARQL update payload to Fuseki2?

RE: Sensible size limit for SPARQL update payload to Fuseki2?

Re: How to parse huge RDF data in a tar.gz file.

Re: Sensible size limit for SPARQL update payload to Fuseki2?

Re: Sensible size limit for SPARQL update payload to Fuseki2?

Re: Sensible size limit for SPARQL update payload to Fuseki2?

RE: Sensible size limit for SPARQL update payload to Fuseki2?

RE: Sensible size limit for SPARQL update payload to Fuseki2?

Re: RE: Sensible size limit for SPARQL update payload to Fuseki2?

RE: Sensible size limit for SPARQL update payload to Fuseki2?

RE: RE: Sensible size limit for SPARQL update payload to Fuseki2?

Re: How to parse huge RDF data in a tar.gz file.

Re: How to parse huge RDF data in a tar.gz file.

Re: How to parse huge RDF data in a tar.gz file.

14 matches

Site Navigation

Mail list logo

Footer information