Re: Question about Dataset as created from a collection of ont files

Jack Park Sun, 08 Feb 2015 09:42:57 -0800

Thanks! Much food for thought here.
I wrote no code that uses FileManager, so I have no clue where that comes
from.
I gather, then, that I create the dataset first, then just getDefaultModel
each time I need it -- which, in theory, is just once, when I create the
OntModel (not in that code).



On Sun, Feb 8, 2015 at 9:28 AM, Andy Seaborne <[email protected]> wrote:

> On 08/02/15 16:09, Jack Park wrote:
>
>> I could create a gist of the code, but here is the core. What would be
>> needed is the database I am using as well. It turns out that my dyslexia
>> (my bad) told me I had fully commented out making the model; that's not
>> true. Just running that snippet of code: createDataset runs fine and
>> creates a 12k log file.
>>
>> That means that my original question was a bad one after all. The issue
>> resides in making the model: here is that code:
>>
>> public Model getModel() {
>> Model m = null;
>> Dataset dataset = TDBFactory.createDataset(dbPath) ;
>>
>
> This will not cause OOME unless you have set the heap very low.
>
> Are you on a 32 bit machine?
>
> For either 32 or 64 bit heap, a 1.5 G heap is fine for this call.
> Actually, much smaller will work - all the caches are delayed.
>
>  dataset.begin(ReadWrite.READ);
>> m = dataset.getDefaultModel();
>> dataset.end();
>> return m;
>> }
>>
>
> That code has a transaction leak - "m" is the model in the transaction ...
> but you said dataset.end();
>
> Get the model each time with dataset.getDefaultModel();
> (it's a cheap operation).
>
>
>> The log file blooms rather rapidly (including a nullpointer from a
>> "Mangled
>> prefix map: graph name=" error (perhaps that's because the mechanism I
>> used
>> to load Fuseki did not actually name a graph?
>>
>> A fragment of that log is this
>>
>> DEBUG 2015-02-08 07:57:14,524 [main] - readModel(model,
>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl)
>>
>
> Who is called readModel?  It's not TDB.
>
> Something is calling FileManager operations (if it is
> FileManager.readModel)
>
> TDB does not call FileManager.
>
> You could put a break point on FileManager.readModel to get a call stack.
>
>         Andy
>
>  DEBUG 2015-02-08 07:57:14,524 [main] - readModel(model,
>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl, null)
>> DEBUG 2015-02-08 07:57:14,524 [main] - Not mapped:
>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl
>> DEBUG 2015-02-08 07:57:14,525 [main] - open(
>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl)
>> DEBUG 2015-02-08 07:57:14,525 [main] - Not mapped:
>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl
>> DEBUG 2015-02-08 07:57:14,629 [main] - [1] GET
>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl
>> DEBUG 2015-02-08 07:57:15,396 [main] - Connection request: [route: {}->
>> http://ncicb.nci.nih.gov][total kept alive: 0; route allocated: 0 of 5;
>> total allocated: 0 of 10]
>>
>>
>> On Sun, Feb 8, 2015 at 7:48 AM, Dave Reynolds <[email protected]>
>> wrote:
>>
>>  Hi Jack,
>>>
>>> There's nothing wrong with the line of code that I can see. That is the
>>> normal way to open a TDB dataset. TDB does not load all the data into
>>> memory and there are no magic flags to set.
>>>
>>> What I'm asking is what log file you are talking about.
>>>
>>> I don't see why a program with just that line of code and nothing else
>>> should be creating any log files at all, so I'm not clear on what log
>>> file
>>> you are talking about. I guess you might have a log4j.properties file in
>>> your path set to DEBUG level and writing to a file instead of stdout/err
>>> but that's still not going result in hundreds of Mb of output.
>>>
>>> Can you provide a complete minimal example of what you are doing?
>>>
>>> Dave
>>>
>>>
>>> On 08/02/15 15:12, Jack Park wrote:
>>>
>>>  Sorry. While I am not new to Jena (I built systems with it in the
>>>> previous
>>>> decade when it was much simpler to use), I suppose I get confused
>>>> easily.
>>>> The code snippet I showed was found all over the web in tutorials, and
>>>> so
>>>> forth. I spent some time reading around in the Jena source code and find
>>>> it
>>>> hard to believe that "there's nothing to write a log..." since several
>>>> classes are loaded and exercised just by calling
>>>> TDBFactory.createDataset.
>>>>
>>>> But, I confess that I don't see anything in that code which suggests it
>>>> should actually read the database itself (though that might be the
>>>> case).
>>>> Still it created a 238+mb log file before crashing.
>>>>
>>>> Rooting around in the Fuseki code to see how it boots a given database
>>>> is
>>>> truly difficult. Maybe someone familiar with the code can explain it.
>>>>
>>>> Maybe there is a better way to boot a given TDB database and create an
>>>> OntModel from that?
>>>>
>>>>
>>>>
>>>> On Sun, Feb 8, 2015 at 6:27 AM, Dave Reynolds <
>>>> [email protected]>
>>>> wrote:
>>>>
>>>>   On 08/02/15 13:55, Jack Park wrote:
>>>>
>>>>>
>>>>>   Hi Dave,
>>>>>
>>>>>>
>>>>>> Thanks for your response.
>>>>>> I should have stated more clearly that the code I did show is *all*
>>>>>> the
>>>>>> code that is running. That snippet:
>>>>>>
>>>>>> Dataset dataset = TDBFactory.createDataset(dbPath) ;
>>>>>>
>>>>>> is what is running when the system gets an OutOfMemory error. Even
>>>>>> with
>>>>>> a
>>>>>> 4gb heap, it still blows. All the code which does "begin", Model =
>>>>>> ...,"
>>>>>> and so forth has been commented out.
>>>>>>
>>>>>> The behavior according to the log is that somewhere in the
>>>>>> createDataset
>>>>>> code, it is reading every class in the ontology stored in the TDB
>>>>>> database
>>>>>> it is, I presume, opening.
>>>>>>
>>>>>>
>>>>>>  What log? If that is literally the only code line then there's
>>>>> nothing to
>>>>> write a log and certainly nothing that will go round trying to read
>>>>> classes.
>>>>>
>>>>> Dave
>>>>>
>>>>>
>>>>>    That's the current puzzle. It's almost as if there is some system
>>>>> property
>>>>>
>>>>>  I need to set somewhere which tells it that this is not an in-memory
>>>>>> event.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Jack
>>>>>>
>>>>>> On Sun, Feb 8, 2015 at 2:06 AM, Dave Reynolds <
>>>>>> [email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>    On 07/02/15 22:18, Jack Park wrote:
>>>>>>
>>>>>>
>>>>>>>    I used the Jena to load, on behalf of Fuseki, a collection of owl
>>>>>>> files.
>>>>>>>
>>>>>>>  There might be 4gb of data all totaled in there.
>>>>>>>>
>>>>>>>> Now, rather than use Fuseki to access that data, I am writing code
>>>>>>>> which
>>>>>>>> will use a Dataset opened against that database to create an
>>>>>>>> OntModel.
>>>>>>>>
>>>>>>>> I use this code, taken from a variety of sources:
>>>>>>>>
>>>>>>>> Dataset dataset = TDBFactory.createDataset(dbPath) ;
>>>>>>>>
>>>>>>>> where dbPath points to the directory where Jena made the database.
>>>>>>>>
>>>>>>>> When I boot Fuseki against that data, it boots quickly and without
>>>>>>>> any
>>>>>>>> issues.
>>>>>>>>
>>>>>>>> When I run that code against the same data, firstly, it blossoms a
>>>>>>>> logfile
>>>>>>>>
>>>>>>>>    260 mb, showing all the ont classes it is reading. Then, it runs
>>>>>>>> out of
>>>>>>>>
>>>>>>>>
>>>>>>>>>    heap space and crashes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>   Simply accessing data in a TDBDataset won't load it all into
>>>>>>>> memory
>>>>>>>>
>>>>>>> so
>>>>>>> the
>>>>>>> problem will be in how you are creating the OntModels.
>>>>>>>
>>>>>>> Since you don't show "that code" it is hard know what the problem is.
>>>>>>>
>>>>>>> It *might* be you have dynamic imports processing switched on and so
>>>>>>> your
>>>>>>> OntModels are going out to the original sources and reloading them.
>>>>>>>
>>>>>>> It is possible to do imports processing but have the imports be found
>>>>>>> as
>>>>>>> database models [1] but in your case since you have all the
>>>>>>> ontologies
>>>>>>> in
>>>>>>> there anyway then I would just switch off all imports processing.
>>>>>>>
>>>>>>> Or it might be nothing to do with imports processing but a bug in how
>>>>>>> you
>>>>>>> are creating the OntModels. Not enough information to tell.
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>> [1] There used to be a somewhat old example of how to do this in the
>>>>>>> documentation but I can't find it in the current web site.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Question about Dataset as created from a collection of ont files

Reply via email to