Re: TDB triple storage

2016-07-26 Thread Chao Wang
You are right about the reasoner. I used GenericRuleReasoner and loaded a few 
rules from external file.
This statement reasoner.setOWLTranslation(true) is the cause of the issue. Not 
sure what it does.




On 7/26/16, 7:42 AM, "Andy Seaborne" <a...@apache.org> wrote:

>On 26/07/16 12:08, Chao Wang wrote:
>> Changed code to use RDFFormat.TURTLE_BLOCKS, Set -Xmx8192m on 16g i7 laptop
>> Still getting out of memory error after running for a while, Any suggestions?
>
>A complete, minimal example. That is, something someone else can run, 
>and just large enough to illustrate the issue.
>
>Also details of which version of Jena, and which OS.
>
>The reasoner setup is probably a factor.
>
>   Andy
>
>>
>>
>>
>>
>> On 7/25/16, 4:41 PM, "Andy Seaborne" <a...@apache.org> wrote:
>>
>>> On 25/07/16 21:14, Chao Wang wrote:
>>>> Hi Dave,
>>>> As you suggested, I have computed the closure in memory, totaling over 4 
>>>> millions triples. trying to serialize it.
>>>> Is there a direct API to serialize the whole model into TDB?
>>>> Tried to serialize into file, keep getting memory issue. What's the 
>>>> typical resource need for this size of model?
>>>
>>> If you are getting problems as you write out the file, try using one of
>>> the streaming formats.  The default format for RDF/XML or Turtle is
>>> "pretty" and takes a significant amount of working space for analysis
>>> before writing.
>>>
>>> Some streaming output formats are:
>>>
>>> Lang.NTRIPLES
>>> RDFFormat.TURTLE_BLOCKS
>>>
>>> https://jena.apache.org/documentation/io/rdf-output.html
>>>
>>> Or does it fail during writing, after some output?
>>>
>>> Andy
>>>
>>>> 
>>>> From: Dave Reynolds [dave.e.reyno...@gmail.com]
>>>> Sent: Thursday, July 21, 2016 9:09 AM
>>>> To: users@jena.apache.org
>>>> Subject: Re: TDB triple storage
>>>>
>>>> On 21/07/16 13:45, Chao Wang wrote:
>>>>> Thanks Dave,
>>>>> So my fuseki has configuration using TDB with OWL reasoner. I preloaded 
>>>>> the TDB with tdbloader, then starts up fuseki.
>>>>> My question is when fuseki starts up, does it load all triples including 
>>>>> inferred triples into memory?
>>>>
>>>> Yes. It's actually slightly worse than that. All the inferences will be
>>>> in memory (including intermediate state) which will be bigger than than
>>>> source data. But the data itself isn't loaded explicitly which means
>>>> that the reasoner is going back to TDB for each query which is a further
>>>> slow down.
>>>>
>>>> Using a lighter reasoner config (OWL Micro if you are not already using
>>>> it) may help.
>>>>
>>>> Otherwise, if your data is stable, then as I say, compute the closure
>>>> once in memory, off line. Store that in TDB. Then have your fuseki
>>>> configuration use that precomputed closure with no runtime inference.
>>>>
>>>> Dave
>>>>
>>>>> I am experiencing hanging sparql query. works fine with a small dataset. 
>>>>> I am hoping reasoning is not done during query time...
>>>>> 
>>>>> From: Dave Reynolds [dave.e.reyno...@gmail.com]
>>>>> Sent: Thursday, July 21, 2016 3:35 AM
>>>>> To: users@jena.apache.org
>>>>> Subject: Re: TDB triple storage
>>>>>
>>>>> On 21/07/16 02:09, Chao Wang wrote:
>>>>>> A newbie question:
>>>>>> Does jena store the inferred triples into tdb? If yes, when?
>>>>>
>>>>> No. The current reasoners operate in memory.
>>>>>
>>>>> If you wish you can take the results of inference (either the entire
>>>>> closure or the results of some selective queries) and store those back
>>>>> in TDB yourself. A common pattern would be use separate named graphs for
>>>>> the original data and for the inference closure and use union-default.
>>>>> All this under your control but is not automatically done for you.
>>>>>
>>>>> There is also some support for generating a partial RDFS inference
>>>>> closure at the time you load TDB.
>>>>>
>>>>> Dave
>>>>>
>>>
>


Re: TDB triple storage

2016-07-26 Thread Chao Wang
Changed code to use RDFFormat.TURTLE_BLOCKS, Set -Xmx8192m on 16g i7 laptop
Still getting out of memory error after running for a while, Any suggestions?




On 7/25/16, 4:41 PM, "Andy Seaborne" <a...@apache.org> wrote:

>On 25/07/16 21:14, Chao Wang wrote:
>> Hi Dave,
>> As you suggested, I have computed the closure in memory, totaling over 4 
>> millions triples. trying to serialize it.
>> Is there a direct API to serialize the whole model into TDB?
>> Tried to serialize into file, keep getting memory issue. What's the typical 
>> resource need for this size of model?
>
>If you are getting problems as you write out the file, try using one of 
>the streaming formats.  The default format for RDF/XML or Turtle is 
>"pretty" and takes a significant amount of working space for analysis 
>before writing.
>
>Some streaming output formats are:
>
>Lang.NTRIPLES
>RDFFormat.TURTLE_BLOCKS
>
>https://jena.apache.org/documentation/io/rdf-output.html
>
>Or does it fail during writing, after some output?
>
> Andy
>
>> 
>> From: Dave Reynolds [dave.e.reyno...@gmail.com]
>> Sent: Thursday, July 21, 2016 9:09 AM
>> To: users@jena.apache.org
>> Subject: Re: TDB triple storage
>>
>> On 21/07/16 13:45, Chao Wang wrote:
>>> Thanks Dave,
>>> So my fuseki has configuration using TDB with OWL reasoner. I preloaded the 
>>> TDB with tdbloader, then starts up fuseki.
>>> My question is when fuseki starts up, does it load all triples including 
>>> inferred triples into memory?
>>
>> Yes. It's actually slightly worse than that. All the inferences will be
>> in memory (including intermediate state) which will be bigger than than
>> source data. But the data itself isn't loaded explicitly which means
>> that the reasoner is going back to TDB for each query which is a further
>> slow down.
>>
>> Using a lighter reasoner config (OWL Micro if you are not already using
>> it) may help.
>>
>> Otherwise, if your data is stable, then as I say, compute the closure
>> once in memory, off line. Store that in TDB. Then have your fuseki
>> configuration use that precomputed closure with no runtime inference.
>>
>> Dave
>>
>>> I am experiencing hanging sparql query. works fine with a small dataset. I 
>>> am hoping reasoning is not done during query time...
>>> 
>>> From: Dave Reynolds [dave.e.reyno...@gmail.com]
>>> Sent: Thursday, July 21, 2016 3:35 AM
>>> To: users@jena.apache.org
>>> Subject: Re: TDB triple storage
>>>
>>> On 21/07/16 02:09, Chao Wang wrote:
>>>> A newbie question:
>>>> Does jena store the inferred triples into tdb? If yes, when?
>>>
>>> No. The current reasoners operate in memory.
>>>
>>> If you wish you can take the results of inference (either the entire
>>> closure or the results of some selective queries) and store those back
>>> in TDB yourself. A common pattern would be use separate named graphs for
>>> the original data and for the inference closure and use union-default.
>>> All this under your control but is not automatically done for you.
>>>
>>> There is also some support for generating a partial RDFS inference
>>> closure at the time you load TDB.
>>>
>>> Dave
>>>
>


RE: TDB triple storage

2016-07-25 Thread Chao Wang
Hi Dave,
As you suggested, I have computed the closure in memory, totaling over 4 
millions triples. trying to serialize it.
Is there a direct API to serialize the whole model into TDB?
Tried to serialize into file, keep getting memory issue. What's the typical 
resource need for this size of model?

From: Dave Reynolds [dave.e.reyno...@gmail.com]
Sent: Thursday, July 21, 2016 9:09 AM
To: users@jena.apache.org
Subject: Re: TDB triple storage

On 21/07/16 13:45, Chao Wang wrote:
> Thanks Dave,
> So my fuseki has configuration using TDB with OWL reasoner. I preloaded the 
> TDB with tdbloader, then starts up fuseki.
> My question is when fuseki starts up, does it load all triples including 
> inferred triples into memory?

Yes. It's actually slightly worse than that. All the inferences will be
in memory (including intermediate state) which will be bigger than than
source data. But the data itself isn't loaded explicitly which means
that the reasoner is going back to TDB for each query which is a further
slow down.

Using a lighter reasoner config (OWL Micro if you are not already using
it) may help.

Otherwise, if your data is stable, then as I say, compute the closure
once in memory, off line. Store that in TDB. Then have your fuseki
configuration use that precomputed closure with no runtime inference.

Dave

> I am experiencing hanging sparql query. works fine with a small dataset. I am 
> hoping reasoning is not done during query time...
> 
> From: Dave Reynolds [dave.e.reyno...@gmail.com]
> Sent: Thursday, July 21, 2016 3:35 AM
> To: users@jena.apache.org
> Subject: Re: TDB triple storage
>
> On 21/07/16 02:09, Chao Wang wrote:
>> A newbie question:
>> Does jena store the inferred triples into tdb? If yes, when?
>
> No. The current reasoners operate in memory.
>
> If you wish you can take the results of inference (either the entire
> closure or the results of some selective queries) and store those back
> in TDB yourself. A common pattern would be use separate named graphs for
> the original data and for the inference closure and use union-default.
> All this under your control but is not automatically done for you.
>
> There is also some support for generating a partial RDFS inference
> closure at the time you load TDB.
>
> Dave
>


RE: TDB triple storage

2016-07-21 Thread Chao Wang
Thanks Dave,
So my fuseki has configuration using TDB with OWL reasoner. I preloaded the TDB 
with tdbloader, then starts up fuseki.
My question is when fuseki starts up, does it load all triples including 
inferred triples into memory?
I am experiencing hanging sparql query. works fine with a small dataset. I am 
hoping reasoning is not done during query time...

From: Dave Reynolds [dave.e.reyno...@gmail.com]
Sent: Thursday, July 21, 2016 3:35 AM
To: users@jena.apache.org
Subject: Re: TDB triple storage

On 21/07/16 02:09, Chao Wang wrote:
> A newbie question:
> Does jena store the inferred triples into tdb? If yes, when?

No. The current reasoners operate in memory.

If you wish you can take the results of inference (either the entire
closure or the results of some selective queries) and store those back
in TDB yourself. A common pattern would be use separate named graphs for
the original data and for the inference closure and use union-default.
All this under your control but is not automatically done for you.

There is also some support for generating a partial RDFS inference
closure at the time you load TDB.

Dave


test question - ja:RDFDataset not defined in assembler.ttl

2016-06-30 Thread Chao Wang
I saw many configuration files used ja:RDFDataset, but I don’t see 
ja:RDFDataset defined in 
https://jena.apache.org/documentation/assembler/assembler.ttl