Re: Question about Dataset as created from a collection of ont files

Jack Park Sun, 08 Feb 2015 11:21:06 -0800

I have no clue how it is loading the owl file since it's not anywhere in
that physical space.


Right now, I am not even building an OntModel, just a Model.

Still, it's an open issue why it thinks it's supposed to be loading an owl
file; there are URIs in the TDB database with that name in them.

Lots to think about.
Many thanks!

On Sun, Feb 8, 2015 at 10:27 AM, Andy Seaborne <[email protected]> wrote:

> On 08/02/15 17:39, Jack Park wrote:
>
>> Thanks! Much food for thought here.
>> I wrote no code that uses FileManager, so I have no clue where that comes
>> from.
>> I gather, then, that I create the dataset first, then just getDefaultModel
>> each time I need it -- which, in theory, is just once, when I create the
>> OntModel (not in that code).
>>
>
> If the OntModel has inference, you may well run out of memory, as might
> large amounts of imports processing.
>
> You have to recreate the OntModel each time when used transactionally - it
> has a link to the database.  You may wish to increase the scope of the
> transaction.
>
> This is consistent with your original report.
> """
> showing all the ont classes it is reading. Then, it runs out of
> heap space and crashes.
> """
>
> BiomedGT.owl is 130Mb.  Something is loading it.  It's not the database.
>
>         Andy
>
>
>
>>
>> On Sun, Feb 8, 2015 at 9:28 AM, Andy Seaborne <[email protected]> wrote:
>>
>>  On 08/02/15 16:09, Jack Park wrote:
>>>
>>>  I could create a gist of the code, but here is the core. What would be
>>>> needed is the database I am using as well. It turns out that my dyslexia
>>>> (my bad) told me I had fully commented out making the model; that's not
>>>> true. Just running that snippet of code: createDataset runs fine and
>>>> creates a 12k log file.
>>>>
>>>> That means that my original question was a bad one after all. The issue
>>>> resides in making the model: here is that code:
>>>>
>>>> public Model getModel() {
>>>> Model m = null;
>>>> Dataset dataset = TDBFactory.createDataset(dbPath) ;
>>>>
>>>>
>>> This will not cause OOME unless you have set the heap very low.
>>>
>>> Are you on a 32 bit machine?
>>>
>>> For either 32 or 64 bit heap, a 1.5 G heap is fine for this call.
>>> Actually, much smaller will work - all the caches are delayed.
>>>
>>>   dataset.begin(ReadWrite.READ);
>>>
>>>> m = dataset.getDefaultModel();
>>>> dataset.end();
>>>> return m;
>>>> }
>>>>
>>>>
>>> That code has a transaction leak - "m" is the model in the transaction
>>> ...
>>> but you said dataset.end();
>>>
>>> Get the model each time with dataset.getDefaultModel();
>>> (it's a cheap operation).
>>>
>>>
>>>  The log file blooms rather rapidly (including a nullpointer from a
>>>> "Mangled
>>>> prefix map: graph name=" error (perhaps that's because the mechanism I
>>>> used
>>>> to load Fuseki did not actually name a graph?
>>>>
>>>> A fragment of that log is this
>>>>
>>>> DEBUG 2015-02-08 07:57:14,524 [main] - readModel(model,
>>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl)
>>>>
>>>>
>>> Who is called readModel?  It's not TDB.
>>>
>>> Something is calling FileManager operations (if it is
>>> FileManager.readModel)
>>>
>>> TDB does not call FileManager.
>>>
>>> You could put a break point on FileManager.readModel to get a call stack.
>>>
>>>          Andy
>>>
>>>   DEBUG 2015-02-08 07:57:14,524 [main] - readModel(model,
>>>
>>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl, null)
>>>> DEBUG 2015-02-08 07:57:14,524 [main] - Not mapped:
>>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl
>>>> DEBUG 2015-02-08 07:57:14,525 [main] - open(
>>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl)
>>>> DEBUG 2015-02-08 07:57:14,525 [main] - Not mapped:
>>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl
>>>> DEBUG 2015-02-08 07:57:14,629 [main] - [1] GET
>>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl
>>>> DEBUG 2015-02-08 07:57:15,396 [main] - Connection request: [route: {}->
>>>> http://ncicb.nci.nih.gov][total kept alive: 0; route allocated: 0 of 5;
>>>> total allocated: 0 of 10]
>>>>
>>>>
>>>> On Sun, Feb 8, 2015 at 7:48 AM, Dave Reynolds <
>>>> [email protected]>
>>>> wrote:
>>>>
>>>>   Hi Jack,
>>>>
>>>>>
>>>>> There's nothing wrong with the line of code that I can see. That is the
>>>>> normal way to open a TDB dataset. TDB does not load all the data into
>>>>> memory and there are no magic flags to set.
>>>>>
>>>>> What I'm asking is what log file you are talking about.
>>>>>
>>>>> I don't see why a program with just that line of code and nothing else
>>>>> should be creating any log files at all, so I'm not clear on what log
>>>>> file
>>>>> you are talking about. I guess you might have a log4j.properties file
>>>>> in
>>>>> your path set to DEBUG level and writing to a file instead of
>>>>> stdout/err
>>>>> but that's still not going result in hundreds of Mb of output.
>>>>>
>>>>> Can you provide a complete minimal example of what you are doing?
>>>>>
>>>>> Dave
>>>>>
>>>>>
>>>>> On 08/02/15 15:12, Jack Park wrote:
>>>>>
>>>>>   Sorry. While I am not new to Jena (I built systems with it in the
>>>>>
>>>>>> previous
>>>>>> decade when it was much simpler to use), I suppose I get confused
>>>>>> easily.
>>>>>> The code snippet I showed was found all over the web in tutorials, and
>>>>>> so
>>>>>> forth. I spent some time reading around in the Jena source code and
>>>>>> find
>>>>>> it
>>>>>> hard to believe that "there's nothing to write a log..." since several
>>>>>> classes are loaded and exercised just by calling
>>>>>> TDBFactory.createDataset.
>>>>>>
>>>>>> But, I confess that I don't see anything in that code which suggests
>>>>>> it
>>>>>> should actually read the database itself (though that might be the
>>>>>> case).
>>>>>> Still it created a 238+mb log file before crashing.
>>>>>>
>>>>>> Rooting around in the Fuseki code to see how it boots a given database
>>>>>> is
>>>>>> truly difficult. Maybe someone familiar with the code can explain it.
>>>>>>
>>>>>> Maybe there is a better way to boot a given TDB database and create an
>>>>>> OntModel from that?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Feb 8, 2015 at 6:27 AM, Dave Reynolds <
>>>>>> [email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>    On 08/02/15 13:55, Jack Park wrote:
>>>>>>
>>>>>>
>>>>>>>    Hi Dave,
>>>>>>>
>>>>>>>
>>>>>>>> Thanks for your response.
>>>>>>>> I should have stated more clearly that the code I did show is *all*
>>>>>>>> the
>>>>>>>> code that is running. That snippet:
>>>>>>>>
>>>>>>>> Dataset dataset = TDBFactory.createDataset(dbPath) ;
>>>>>>>>
>>>>>>>> is what is running when the system gets an OutOfMemory error. Even
>>>>>>>> with
>>>>>>>> a
>>>>>>>> 4gb heap, it still blows. All the code which does "begin", Model =
>>>>>>>> ...,"
>>>>>>>> and so forth has been commented out.
>>>>>>>>
>>>>>>>> The behavior according to the log is that somewhere in the
>>>>>>>> createDataset
>>>>>>>> code, it is reading every class in the ontology stored in the TDB
>>>>>>>> database
>>>>>>>> it is, I presume, opening.
>>>>>>>>
>>>>>>>>
>>>>>>>>   What log? If that is literally the only code line then there's
>>>>>>>>
>>>>>>> nothing to
>>>>>>> write a log and certainly nothing that will go round trying to read
>>>>>>> classes.
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>>
>>>>>>>     That's the current puzzle. It's almost as if there is some system
>>>>>>> property
>>>>>>>
>>>>>>>   I need to set somewhere which tells it that this is not an
>>>>>>> in-memory
>>>>>>>
>>>>>>>> event.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Jack
>>>>>>>>
>>>>>>>> On Sun, Feb 8, 2015 at 2:06 AM, Dave Reynolds <
>>>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>     On 07/02/15 22:18, Jack Park wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>      I used the Jena to load, on behalf of Fuseki, a collection of
>>>>>>>>> owl
>>>>>>>>> files.
>>>>>>>>>
>>>>>>>>>   There might be 4gb of data all totaled in there.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Now, rather than use Fuseki to access that data, I am writing code
>>>>>>>>>> which
>>>>>>>>>> will use a Dataset opened against that database to create an
>>>>>>>>>> OntModel.
>>>>>>>>>>
>>>>>>>>>> I use this code, taken from a variety of sources:
>>>>>>>>>>
>>>>>>>>>> Dataset dataset = TDBFactory.createDataset(dbPath) ;
>>>>>>>>>>
>>>>>>>>>> where dbPath points to the directory where Jena made the database.
>>>>>>>>>>
>>>>>>>>>> When I boot Fuseki against that data, it boots quickly and without
>>>>>>>>>> any
>>>>>>>>>> issues.
>>>>>>>>>>
>>>>>>>>>> When I run that code against the same data, firstly, it blossoms a
>>>>>>>>>> logfile
>>>>>>>>>>
>>>>>>>>>>     260 mb, showing all the ont classes it is reading. Then, it
>>>>>>>>>> runs
>>>>>>>>>> out of
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>      heap space and crashes.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>    Simply accessing data in a TDBDataset won't load it all into
>>>>>>>>>> memory
>>>>>>>>>>
>>>>>>>>>>  so
>>>>>>>>> the
>>>>>>>>> problem will be in how you are creating the OntModels.
>>>>>>>>>
>>>>>>>>> Since you don't show "that code" it is hard know what the problem
>>>>>>>>> is.
>>>>>>>>>
>>>>>>>>> It *might* be you have dynamic imports processing switched on and
>>>>>>>>> so
>>>>>>>>> your
>>>>>>>>> OntModels are going out to the original sources and reloading them.
>>>>>>>>>
>>>>>>>>> It is possible to do imports processing but have the imports be
>>>>>>>>> found
>>>>>>>>> as
>>>>>>>>> database models [1] but in your case since you have all the
>>>>>>>>> ontologies
>>>>>>>>> in
>>>>>>>>> there anyway then I would just switch off all imports processing.
>>>>>>>>>
>>>>>>>>> Or it might be nothing to do with imports processing but a bug in
>>>>>>>>> how
>>>>>>>>> you
>>>>>>>>> are creating the OntModels. Not enough information to tell.
>>>>>>>>>
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>>> [1] There used to be a somewhat old example of how to do this in
>>>>>>>>> the
>>>>>>>>> documentation but I can't find it in the current web site.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Question about Dataset as created from a collection of ont files

Reply via email to