Re: Question about Dataset as created from a collection of ont files

Andy Seaborne Sun, 08 Feb 2015 09:30:06 -0800

On 08/02/15 16:09, Jack Park wrote:

I could create a gist of the code, but here is the core. What would be
needed is the database I am using as well. It turns out that my dyslexia
(my bad) told me I had fully commented out making the model; that's not
true. Just running that snippet of code: createDataset runs fine and
creates a 12k log file.


That means that my original question was a bad one after all. The issue
resides in making the model: here is that code:

public Model getModel() {
Model m = null;
Dataset dataset = TDBFactory.createDataset(dbPath) ;


This will not cause OOME unless you have set the heap very low.

Are you on a 32 bit machine?

For either 32 or 64 bit heap, a 1.5 G heap is fine for this call.Actually, much smaller will work - all the caches are delayed.

dataset.begin(ReadWrite.READ);
m = dataset.getDefaultModel();
dataset.end();
return m;
}

That code has a transaction leak - "m" is the model in the transaction... but you said dataset.end();


Get the model each time with dataset.getDefaultModel();
(it's a cheap operation).


The log file blooms rather rapidly (including a nullpointer from a "Mangled
prefix map: graph name=" error (perhaps that's because the mechanism I used
to load Fuseki did not actually name a graph?

A fragment of that log is this

DEBUG 2015-02-08 07:57:14,524 [main] - readModel(model,
http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl)


Who is called readModel?  It's not TDB.

Something is calling FileManager operations (if it is FileManager.readModel)

TDB does not call FileManager.

You could put a break point on FileManager.readModel to get a call stack.

        Andy

DEBUG 2015-02-08 07:57:14,524 [main] - readModel(model,
http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl, null)
DEBUG 2015-02-08 07:57:14,524 [main] - Not mapped:
http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl
DEBUG 2015-02-08 07:57:14,525 [main] - open(
http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl)
DEBUG 2015-02-08 07:57:14,525 [main] - Not mapped:
http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl
DEBUG 2015-02-08 07:57:14,629 [main] - [1] GET
http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl
DEBUG 2015-02-08 07:57:15,396 [main] - Connection request: [route: {}->
http://ncicb.nci.nih.gov][total kept alive: 0; route allocated: 0 of 5;
total allocated: 0 of 10]


On Sun, Feb 8, 2015 at 7:48 AM, Dave Reynolds <[email protected]>
wrote:

Hi Jack,

There's nothing wrong with the line of code that I can see. That is the
normal way to open a TDB dataset. TDB does not load all the data into
memory and there are no magic flags to set.

What I'm asking is what log file you are talking about.

I don't see why a program with just that line of code and nothing else
should be creating any log files at all, so I'm not clear on what log file
you are talking about. I guess you might have a log4j.properties file in
your path set to DEBUG level and writing to a file instead of stdout/err
but that's still not going result in hundreds of Mb of output.

Can you provide a complete minimal example of what you are doing?

Dave


On 08/02/15 15:12, Jack Park wrote:

Sorry. While I am not new to Jena (I built systems with it in the previous
decade when it was much simpler to use), I suppose I get confused easily.
The code snippet I showed was found all over the web in tutorials, and so
forth. I spent some time reading around in the Jena source code and find
it
hard to believe that "there's nothing to write a log..." since several
classes are loaded and exercised just by calling TDBFactory.createDataset.

But, I confess that I don't see anything in that code which suggests it
should actually read the database itself (though that might be the case).
Still it created a 238+mb log file before crashing.

Rooting around in the Fuseki code to see how it boots a given database is
truly difficult. Maybe someone familiar with the code can explain it.

Maybe there is a better way to boot a given TDB database and create an
OntModel from that?



On Sun, Feb 8, 2015 at 6:27 AM, Dave Reynolds <[email protected]>
wrote:

  On 08/02/15 13:55, Jack Park wrote:


  Hi Dave,


Thanks for your response.
I should have stated more clearly that the code I did show is *all* the
code that is running. That snippet:

Dataset dataset = TDBFactory.createDataset(dbPath) ;

is what is running when the system gets an OutOfMemory error. Even with
a
4gb heap, it still blows. All the code which does "begin", Model = ...,"
and so forth has been commented out.

The behavior according to the log is that somewhere in the createDataset
code, it is reading every class in the ontology stored in the TDB
database
it is, I presume, opening.

What log? If that is literally the only code line then there's nothing to
write a log and certainly nothing that will go round trying to read
classes.

Dave


   That's the current puzzle. It's almost as if there is some system
property

I need to set somewhere which tells it that this is not an in-memory
event.

Thoughts?

Jack

On Sun, Feb 8, 2015 at 2:06 AM, Dave Reynolds <
[email protected]>
wrote:

   On 07/02/15 22:18, Jack Park wrote:


   I used the Jena to load, on behalf of Fuseki, a collection of owl
files.

There might be 4gb of data all totaled in there.

Now, rather than use Fuseki to access that data, I am writing code
which
will use a Dataset opened against that database to create an OntModel.

I use this code, taken from a variety of sources:

Dataset dataset = TDBFactory.createDataset(dbPath) ;

where dbPath points to the directory where Jena made the database.

When I boot Fuseki against that data, it boots quickly and without any
issues.

When I run that code against the same data, firstly, it blossoms a
logfile

   260 mb, showing all the ont classes it is reading. Then, it runs
out of


   heap space and crashes.



  Simply accessing data in a TDBDataset won't load it all into memory

so
the
problem will be in how you are creating the OntModels.

Since you don't show "that code" it is hard know what the problem is.

It *might* be you have dynamic imports processing switched on and so
your
OntModels are going out to the original sources and reloading them.

It is possible to do imports processing but have the imports be found
as
database models [1] but in your case since you have all the ontologies
in
there anyway then I would just switch off all imports processing.

Or it might be nothing to do with imports processing but a bug in how
you
are creating the OntModels. Not enough information to tell.

Dave

[1] There used to be a somewhat old example of how to do this in the
documentation but I can't find it in the current web site.

Re: Question about Dataset as created from a collection of ont files

Reply via email to