Re: TDB triple storage
On 26/07/16 15:10, Chao Wang wrote: You are right about the reasoner. I used GenericRuleReasoner and loaded a few rules from external file. This statement reasoner.setOWLTranslation(true) is the cause of the issue. Not sure what it does. It's a horrible horrible hack that finds all explicit owl:intersectionOf groups in the data and inserts a set of forward and backward rules for each group (to compute and recognize the relevant subClassOf deductions). Used by the OWL reasoners but shouldn't normally be used for your own rule sets. Dave On 7/26/16, 7:42 AM, "Andy Seaborne" <a...@apache.org> wrote: On 26/07/16 12:08, Chao Wang wrote: Changed code to use RDFFormat.TURTLE_BLOCKS, Set -Xmx8192m on 16g i7 laptop Still getting out of memory error after running for a while, Any suggestions? A complete, minimal example. That is, something someone else can run, and just large enough to illustrate the issue. Also details of which version of Jena, and which OS. The reasoner setup is probably a factor. Andy On 7/25/16, 4:41 PM, "Andy Seaborne" <a...@apache.org> wrote: On 25/07/16 21:14, Chao Wang wrote: Hi Dave, As you suggested, I have computed the closure in memory, totaling over 4 millions triples. trying to serialize it. Is there a direct API to serialize the whole model into TDB? Tried to serialize into file, keep getting memory issue. What's the typical resource need for this size of model? If you are getting problems as you write out the file, try using one of the streaming formats. The default format for RDF/XML or Turtle is "pretty" and takes a significant amount of working space for analysis before writing. Some streaming output formats are: Lang.NTRIPLES RDFFormat.TURTLE_BLOCKS https://jena.apache.org/documentation/io/rdf-output.html Or does it fail during writing, after some output? Andy From: Dave Reynolds [dave.e.reyno...@gmail.com] Sent: Thursday, July 21, 2016 9:09 AM To: users@jena.apache.org Subject: Re: TDB triple storage On 21/07/16 13:45, Chao Wang wrote: Thanks Dave, So my fuseki has configuration using TDB with OWL reasoner. I preloaded the TDB with tdbloader, then starts up fuseki. My question is when fuseki starts up, does it load all triples including inferred triples into memory? Yes. It's actually slightly worse than that. All the inferences will be in memory (including intermediate state) which will be bigger than than source data. But the data itself isn't loaded explicitly which means that the reasoner is going back to TDB for each query which is a further slow down. Using a lighter reasoner config (OWL Micro if you are not already using it) may help. Otherwise, if your data is stable, then as I say, compute the closure once in memory, off line. Store that in TDB. Then have your fuseki configuration use that precomputed closure with no runtime inference. Dave I am experiencing hanging sparql query. works fine with a small dataset. I am hoping reasoning is not done during query time... From: Dave Reynolds [dave.e.reyno...@gmail.com] Sent: Thursday, July 21, 2016 3:35 AM To: users@jena.apache.org Subject: Re: TDB triple storage On 21/07/16 02:09, Chao Wang wrote: A newbie question: Does jena store the inferred triples into tdb? If yes, when? No. The current reasoners operate in memory. If you wish you can take the results of inference (either the entire closure or the results of some selective queries) and store those back in TDB yourself. A common pattern would be use separate named graphs for the original data and for the inference closure and use union-default. All this under your control but is not automatically done for you. There is also some support for generating a partial RDFS inference closure at the time you load TDB. Dave
Re: TDB triple storage
You are right about the reasoner. I used GenericRuleReasoner and loaded a few rules from external file. This statement reasoner.setOWLTranslation(true) is the cause of the issue. Not sure what it does. On 7/26/16, 7:42 AM, "Andy Seaborne" <a...@apache.org> wrote: >On 26/07/16 12:08, Chao Wang wrote: >> Changed code to use RDFFormat.TURTLE_BLOCKS, Set -Xmx8192m on 16g i7 laptop >> Still getting out of memory error after running for a while, Any suggestions? > >A complete, minimal example. That is, something someone else can run, >and just large enough to illustrate the issue. > >Also details of which version of Jena, and which OS. > >The reasoner setup is probably a factor. > > Andy > >> >> >> >> >> On 7/25/16, 4:41 PM, "Andy Seaborne" <a...@apache.org> wrote: >> >>> On 25/07/16 21:14, Chao Wang wrote: >>>> Hi Dave, >>>> As you suggested, I have computed the closure in memory, totaling over 4 >>>> millions triples. trying to serialize it. >>>> Is there a direct API to serialize the whole model into TDB? >>>> Tried to serialize into file, keep getting memory issue. What's the >>>> typical resource need for this size of model? >>> >>> If you are getting problems as you write out the file, try using one of >>> the streaming formats. The default format for RDF/XML or Turtle is >>> "pretty" and takes a significant amount of working space for analysis >>> before writing. >>> >>> Some streaming output formats are: >>> >>> Lang.NTRIPLES >>> RDFFormat.TURTLE_BLOCKS >>> >>> https://jena.apache.org/documentation/io/rdf-output.html >>> >>> Or does it fail during writing, after some output? >>> >>> Andy >>> >>>> >>>> From: Dave Reynolds [dave.e.reyno...@gmail.com] >>>> Sent: Thursday, July 21, 2016 9:09 AM >>>> To: users@jena.apache.org >>>> Subject: Re: TDB triple storage >>>> >>>> On 21/07/16 13:45, Chao Wang wrote: >>>>> Thanks Dave, >>>>> So my fuseki has configuration using TDB with OWL reasoner. I preloaded >>>>> the TDB with tdbloader, then starts up fuseki. >>>>> My question is when fuseki starts up, does it load all triples including >>>>> inferred triples into memory? >>>> >>>> Yes. It's actually slightly worse than that. All the inferences will be >>>> in memory (including intermediate state) which will be bigger than than >>>> source data. But the data itself isn't loaded explicitly which means >>>> that the reasoner is going back to TDB for each query which is a further >>>> slow down. >>>> >>>> Using a lighter reasoner config (OWL Micro if you are not already using >>>> it) may help. >>>> >>>> Otherwise, if your data is stable, then as I say, compute the closure >>>> once in memory, off line. Store that in TDB. Then have your fuseki >>>> configuration use that precomputed closure with no runtime inference. >>>> >>>> Dave >>>> >>>>> I am experiencing hanging sparql query. works fine with a small dataset. >>>>> I am hoping reasoning is not done during query time... >>>>> >>>>> From: Dave Reynolds [dave.e.reyno...@gmail.com] >>>>> Sent: Thursday, July 21, 2016 3:35 AM >>>>> To: users@jena.apache.org >>>>> Subject: Re: TDB triple storage >>>>> >>>>> On 21/07/16 02:09, Chao Wang wrote: >>>>>> A newbie question: >>>>>> Does jena store the inferred triples into tdb? If yes, when? >>>>> >>>>> No. The current reasoners operate in memory. >>>>> >>>>> If you wish you can take the results of inference (either the entire >>>>> closure or the results of some selective queries) and store those back >>>>> in TDB yourself. A common pattern would be use separate named graphs for >>>>> the original data and for the inference closure and use union-default. >>>>> All this under your control but is not automatically done for you. >>>>> >>>>> There is also some support for generating a partial RDFS inference >>>>> closure at the time you load TDB. >>>>> >>>>> Dave >>>>> >>> >
Re: TDB triple storage
On 26/07/16 12:08, Chao Wang wrote: Changed code to use RDFFormat.TURTLE_BLOCKS, Set -Xmx8192m on 16g i7 laptop Still getting out of memory error after running for a while, Any suggestions? A complete, minimal example. That is, something someone else can run, and just large enough to illustrate the issue. Also details of which version of Jena, and which OS. The reasoner setup is probably a factor. Andy On 7/25/16, 4:41 PM, "Andy Seaborne" <a...@apache.org> wrote: On 25/07/16 21:14, Chao Wang wrote: Hi Dave, As you suggested, I have computed the closure in memory, totaling over 4 millions triples. trying to serialize it. Is there a direct API to serialize the whole model into TDB? Tried to serialize into file, keep getting memory issue. What's the typical resource need for this size of model? If you are getting problems as you write out the file, try using one of the streaming formats. The default format for RDF/XML or Turtle is "pretty" and takes a significant amount of working space for analysis before writing. Some streaming output formats are: Lang.NTRIPLES RDFFormat.TURTLE_BLOCKS https://jena.apache.org/documentation/io/rdf-output.html Or does it fail during writing, after some output? Andy From: Dave Reynolds [dave.e.reyno...@gmail.com] Sent: Thursday, July 21, 2016 9:09 AM To: users@jena.apache.org Subject: Re: TDB triple storage On 21/07/16 13:45, Chao Wang wrote: Thanks Dave, So my fuseki has configuration using TDB with OWL reasoner. I preloaded the TDB with tdbloader, then starts up fuseki. My question is when fuseki starts up, does it load all triples including inferred triples into memory? Yes. It's actually slightly worse than that. All the inferences will be in memory (including intermediate state) which will be bigger than than source data. But the data itself isn't loaded explicitly which means that the reasoner is going back to TDB for each query which is a further slow down. Using a lighter reasoner config (OWL Micro if you are not already using it) may help. Otherwise, if your data is stable, then as I say, compute the closure once in memory, off line. Store that in TDB. Then have your fuseki configuration use that precomputed closure with no runtime inference. Dave I am experiencing hanging sparql query. works fine with a small dataset. I am hoping reasoning is not done during query time... From: Dave Reynolds [dave.e.reyno...@gmail.com] Sent: Thursday, July 21, 2016 3:35 AM To: users@jena.apache.org Subject: Re: TDB triple storage On 21/07/16 02:09, Chao Wang wrote: A newbie question: Does jena store the inferred triples into tdb? If yes, when? No. The current reasoners operate in memory. If you wish you can take the results of inference (either the entire closure or the results of some selective queries) and store those back in TDB yourself. A common pattern would be use separate named graphs for the original data and for the inference closure and use union-default. All this under your control but is not automatically done for you. There is also some support for generating a partial RDFS inference closure at the time you load TDB. Dave
Re: TDB triple storage
Changed code to use RDFFormat.TURTLE_BLOCKS, Set -Xmx8192m on 16g i7 laptop Still getting out of memory error after running for a while, Any suggestions? On 7/25/16, 4:41 PM, "Andy Seaborne" <a...@apache.org> wrote: >On 25/07/16 21:14, Chao Wang wrote: >> Hi Dave, >> As you suggested, I have computed the closure in memory, totaling over 4 >> millions triples. trying to serialize it. >> Is there a direct API to serialize the whole model into TDB? >> Tried to serialize into file, keep getting memory issue. What's the typical >> resource need for this size of model? > >If you are getting problems as you write out the file, try using one of >the streaming formats. The default format for RDF/XML or Turtle is >"pretty" and takes a significant amount of working space for analysis >before writing. > >Some streaming output formats are: > >Lang.NTRIPLES >RDFFormat.TURTLE_BLOCKS > >https://jena.apache.org/documentation/io/rdf-output.html > >Or does it fail during writing, after some output? > > Andy > >> >> From: Dave Reynolds [dave.e.reyno...@gmail.com] >> Sent: Thursday, July 21, 2016 9:09 AM >> To: users@jena.apache.org >> Subject: Re: TDB triple storage >> >> On 21/07/16 13:45, Chao Wang wrote: >>> Thanks Dave, >>> So my fuseki has configuration using TDB with OWL reasoner. I preloaded the >>> TDB with tdbloader, then starts up fuseki. >>> My question is when fuseki starts up, does it load all triples including >>> inferred triples into memory? >> >> Yes. It's actually slightly worse than that. All the inferences will be >> in memory (including intermediate state) which will be bigger than than >> source data. But the data itself isn't loaded explicitly which means >> that the reasoner is going back to TDB for each query which is a further >> slow down. >> >> Using a lighter reasoner config (OWL Micro if you are not already using >> it) may help. >> >> Otherwise, if your data is stable, then as I say, compute the closure >> once in memory, off line. Store that in TDB. Then have your fuseki >> configuration use that precomputed closure with no runtime inference. >> >> Dave >> >>> I am experiencing hanging sparql query. works fine with a small dataset. I >>> am hoping reasoning is not done during query time... >>> >>> From: Dave Reynolds [dave.e.reyno...@gmail.com] >>> Sent: Thursday, July 21, 2016 3:35 AM >>> To: users@jena.apache.org >>> Subject: Re: TDB triple storage >>> >>> On 21/07/16 02:09, Chao Wang wrote: >>>> A newbie question: >>>> Does jena store the inferred triples into tdb? If yes, when? >>> >>> No. The current reasoners operate in memory. >>> >>> If you wish you can take the results of inference (either the entire >>> closure or the results of some selective queries) and store those back >>> in TDB yourself. A common pattern would be use separate named graphs for >>> the original data and for the inference closure and use union-default. >>> All this under your control but is not automatically done for you. >>> >>> There is also some support for generating a partial RDFS inference >>> closure at the time you load TDB. >>> >>> Dave >>> >
RE: TDB triple storage
Hi Dave, As you suggested, I have computed the closure in memory, totaling over 4 millions triples. trying to serialize it. Is there a direct API to serialize the whole model into TDB? Tried to serialize into file, keep getting memory issue. What's the typical resource need for this size of model? From: Dave Reynolds [dave.e.reyno...@gmail.com] Sent: Thursday, July 21, 2016 9:09 AM To: users@jena.apache.org Subject: Re: TDB triple storage On 21/07/16 13:45, Chao Wang wrote: > Thanks Dave, > So my fuseki has configuration using TDB with OWL reasoner. I preloaded the > TDB with tdbloader, then starts up fuseki. > My question is when fuseki starts up, does it load all triples including > inferred triples into memory? Yes. It's actually slightly worse than that. All the inferences will be in memory (including intermediate state) which will be bigger than than source data. But the data itself isn't loaded explicitly which means that the reasoner is going back to TDB for each query which is a further slow down. Using a lighter reasoner config (OWL Micro if you are not already using it) may help. Otherwise, if your data is stable, then as I say, compute the closure once in memory, off line. Store that in TDB. Then have your fuseki configuration use that precomputed closure with no runtime inference. Dave > I am experiencing hanging sparql query. works fine with a small dataset. I am > hoping reasoning is not done during query time... > > From: Dave Reynolds [dave.e.reyno...@gmail.com] > Sent: Thursday, July 21, 2016 3:35 AM > To: users@jena.apache.org > Subject: Re: TDB triple storage > > On 21/07/16 02:09, Chao Wang wrote: >> A newbie question: >> Does jena store the inferred triples into tdb? If yes, when? > > No. The current reasoners operate in memory. > > If you wish you can take the results of inference (either the entire > closure or the results of some selective queries) and store those back > in TDB yourself. A common pattern would be use separate named graphs for > the original data and for the inference closure and use union-default. > All this under your control but is not automatically done for you. > > There is also some support for generating a partial RDFS inference > closure at the time you load TDB. > > Dave >
Re: TDB triple storage
On 21/07/16 13:45, Chao Wang wrote: Thanks Dave, So my fuseki has configuration using TDB with OWL reasoner. I preloaded the TDB with tdbloader, then starts up fuseki. My question is when fuseki starts up, does it load all triples including inferred triples into memory? Yes. It's actually slightly worse than that. All the inferences will be in memory (including intermediate state) which will be bigger than than source data. But the data itself isn't loaded explicitly which means that the reasoner is going back to TDB for each query which is a further slow down. Using a lighter reasoner config (OWL Micro if you are not already using it) may help. Otherwise, if your data is stable, then as I say, compute the closure once in memory, off line. Store that in TDB. Then have your fuseki configuration use that precomputed closure with no runtime inference. Dave I am experiencing hanging sparql query. works fine with a small dataset. I am hoping reasoning is not done during query time... From: Dave Reynolds [dave.e.reyno...@gmail.com] Sent: Thursday, July 21, 2016 3:35 AM To: users@jena.apache.org Subject: Re: TDB triple storage On 21/07/16 02:09, Chao Wang wrote: A newbie question: Does jena store the inferred triples into tdb? If yes, when? No. The current reasoners operate in memory. If you wish you can take the results of inference (either the entire closure or the results of some selective queries) and store those back in TDB yourself. A common pattern would be use separate named graphs for the original data and for the inference closure and use union-default. All this under your control but is not automatically done for you. There is also some support for generating a partial RDFS inference closure at the time you load TDB. Dave
RE: TDB triple storage
Thanks Dave, So my fuseki has configuration using TDB with OWL reasoner. I preloaded the TDB with tdbloader, then starts up fuseki. My question is when fuseki starts up, does it load all triples including inferred triples into memory? I am experiencing hanging sparql query. works fine with a small dataset. I am hoping reasoning is not done during query time... From: Dave Reynolds [dave.e.reyno...@gmail.com] Sent: Thursday, July 21, 2016 3:35 AM To: users@jena.apache.org Subject: Re: TDB triple storage On 21/07/16 02:09, Chao Wang wrote: > A newbie question: > Does jena store the inferred triples into tdb? If yes, when? No. The current reasoners operate in memory. If you wish you can take the results of inference (either the entire closure or the results of some selective queries) and store those back in TDB yourself. A common pattern would be use separate named graphs for the original data and for the inference closure and use union-default. All this under your control but is not automatically done for you. There is also some support for generating a partial RDFS inference closure at the time you load TDB. Dave
Re: TDB triple storage
On 21/07/16 02:09, Chao Wang wrote: A newbie question: Does jena store the inferred triples into tdb? If yes, when? No. The current reasoners operate in memory. If you wish you can take the results of inference (either the entire closure or the results of some selective queries) and store those back in TDB yourself. A common pattern would be use separate named graphs for the original data and for the inference closure and use union-default. All this under your control but is not automatically done for you. There is also some support for generating a partial RDFS inference closure at the time you load TDB. Dave