Thanks! Much food for thought here. I wrote no code that uses FileManager, so I have no clue where that comes from. I gather, then, that I create the dataset first, then just getDefaultModel each time I need it -- which, in theory, is just once, when I create the OntModel (not in that code).
On Sun, Feb 8, 2015 at 9:28 AM, Andy Seaborne <[email protected]> wrote: > On 08/02/15 16:09, Jack Park wrote: > >> I could create a gist of the code, but here is the core. What would be >> needed is the database I am using as well. It turns out that my dyslexia >> (my bad) told me I had fully commented out making the model; that's not >> true. Just running that snippet of code: createDataset runs fine and >> creates a 12k log file. >> >> That means that my original question was a bad one after all. The issue >> resides in making the model: here is that code: >> >> public Model getModel() { >> Model m = null; >> Dataset dataset = TDBFactory.createDataset(dbPath) ; >> > > This will not cause OOME unless you have set the heap very low. > > Are you on a 32 bit machine? > > For either 32 or 64 bit heap, a 1.5 G heap is fine for this call. > Actually, much smaller will work - all the caches are delayed. > > dataset.begin(ReadWrite.READ); >> m = dataset.getDefaultModel(); >> dataset.end(); >> return m; >> } >> > > That code has a transaction leak - "m" is the model in the transaction ... > but you said dataset.end(); > > Get the model each time with dataset.getDefaultModel(); > (it's a cheap operation). > > >> The log file blooms rather rapidly (including a nullpointer from a >> "Mangled >> prefix map: graph name=" error (perhaps that's because the mechanism I >> used >> to load Fuseki did not actually name a graph? >> >> A fragment of that log is this >> >> DEBUG 2015-02-08 07:57:14,524 [main] - readModel(model, >> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl) >> > > Who is called readModel? It's not TDB. > > Something is calling FileManager operations (if it is > FileManager.readModel) > > TDB does not call FileManager. > > You could put a break point on FileManager.readModel to get a call stack. > > Andy > > DEBUG 2015-02-08 07:57:14,524 [main] - readModel(model, >> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl, null) >> DEBUG 2015-02-08 07:57:14,524 [main] - Not mapped: >> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl >> DEBUG 2015-02-08 07:57:14,525 [main] - open( >> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl) >> DEBUG 2015-02-08 07:57:14,525 [main] - Not mapped: >> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl >> DEBUG 2015-02-08 07:57:14,629 [main] - [1] GET >> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl >> DEBUG 2015-02-08 07:57:15,396 [main] - Connection request: [route: {}-> >> http://ncicb.nci.nih.gov][total kept alive: 0; route allocated: 0 of 5; >> total allocated: 0 of 10] >> >> >> On Sun, Feb 8, 2015 at 7:48 AM, Dave Reynolds <[email protected]> >> wrote: >> >> Hi Jack, >>> >>> There's nothing wrong with the line of code that I can see. That is the >>> normal way to open a TDB dataset. TDB does not load all the data into >>> memory and there are no magic flags to set. >>> >>> What I'm asking is what log file you are talking about. >>> >>> I don't see why a program with just that line of code and nothing else >>> should be creating any log files at all, so I'm not clear on what log >>> file >>> you are talking about. I guess you might have a log4j.properties file in >>> your path set to DEBUG level and writing to a file instead of stdout/err >>> but that's still not going result in hundreds of Mb of output. >>> >>> Can you provide a complete minimal example of what you are doing? >>> >>> Dave >>> >>> >>> On 08/02/15 15:12, Jack Park wrote: >>> >>> Sorry. While I am not new to Jena (I built systems with it in the >>>> previous >>>> decade when it was much simpler to use), I suppose I get confused >>>> easily. >>>> The code snippet I showed was found all over the web in tutorials, and >>>> so >>>> forth. I spent some time reading around in the Jena source code and find >>>> it >>>> hard to believe that "there's nothing to write a log..." since several >>>> classes are loaded and exercised just by calling >>>> TDBFactory.createDataset. >>>> >>>> But, I confess that I don't see anything in that code which suggests it >>>> should actually read the database itself (though that might be the >>>> case). >>>> Still it created a 238+mb log file before crashing. >>>> >>>> Rooting around in the Fuseki code to see how it boots a given database >>>> is >>>> truly difficult. Maybe someone familiar with the code can explain it. >>>> >>>> Maybe there is a better way to boot a given TDB database and create an >>>> OntModel from that? >>>> >>>> >>>> >>>> On Sun, Feb 8, 2015 at 6:27 AM, Dave Reynolds < >>>> [email protected]> >>>> wrote: >>>> >>>> On 08/02/15 13:55, Jack Park wrote: >>>> >>>>> >>>>> Hi Dave, >>>>> >>>>>> >>>>>> Thanks for your response. >>>>>> I should have stated more clearly that the code I did show is *all* >>>>>> the >>>>>> code that is running. That snippet: >>>>>> >>>>>> Dataset dataset = TDBFactory.createDataset(dbPath) ; >>>>>> >>>>>> is what is running when the system gets an OutOfMemory error. Even >>>>>> with >>>>>> a >>>>>> 4gb heap, it still blows. All the code which does "begin", Model = >>>>>> ...," >>>>>> and so forth has been commented out. >>>>>> >>>>>> The behavior according to the log is that somewhere in the >>>>>> createDataset >>>>>> code, it is reading every class in the ontology stored in the TDB >>>>>> database >>>>>> it is, I presume, opening. >>>>>> >>>>>> >>>>>> What log? If that is literally the only code line then there's >>>>> nothing to >>>>> write a log and certainly nothing that will go round trying to read >>>>> classes. >>>>> >>>>> Dave >>>>> >>>>> >>>>> That's the current puzzle. It's almost as if there is some system >>>>> property >>>>> >>>>> I need to set somewhere which tells it that this is not an in-memory >>>>>> event. >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Jack >>>>>> >>>>>> On Sun, Feb 8, 2015 at 2:06 AM, Dave Reynolds < >>>>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>> On 07/02/15 22:18, Jack Park wrote: >>>>>> >>>>>> >>>>>>> I used the Jena to load, on behalf of Fuseki, a collection of owl >>>>>>> files. >>>>>>> >>>>>>> There might be 4gb of data all totaled in there. >>>>>>>> >>>>>>>> Now, rather than use Fuseki to access that data, I am writing code >>>>>>>> which >>>>>>>> will use a Dataset opened against that database to create an >>>>>>>> OntModel. >>>>>>>> >>>>>>>> I use this code, taken from a variety of sources: >>>>>>>> >>>>>>>> Dataset dataset = TDBFactory.createDataset(dbPath) ; >>>>>>>> >>>>>>>> where dbPath points to the directory where Jena made the database. >>>>>>>> >>>>>>>> When I boot Fuseki against that data, it boots quickly and without >>>>>>>> any >>>>>>>> issues. >>>>>>>> >>>>>>>> When I run that code against the same data, firstly, it blossoms a >>>>>>>> logfile >>>>>>>> >>>>>>>> 260 mb, showing all the ont classes it is reading. Then, it runs >>>>>>>> out of >>>>>>>> >>>>>>>> >>>>>>>>> heap space and crashes. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> Simply accessing data in a TDBDataset won't load it all into >>>>>>>> memory >>>>>>>> >>>>>>> so >>>>>>> the >>>>>>> problem will be in how you are creating the OntModels. >>>>>>> >>>>>>> Since you don't show "that code" it is hard know what the problem is. >>>>>>> >>>>>>> It *might* be you have dynamic imports processing switched on and so >>>>>>> your >>>>>>> OntModels are going out to the original sources and reloading them. >>>>>>> >>>>>>> It is possible to do imports processing but have the imports be found >>>>>>> as >>>>>>> database models [1] but in your case since you have all the >>>>>>> ontologies >>>>>>> in >>>>>>> there anyway then I would just switch off all imports processing. >>>>>>> >>>>>>> Or it might be nothing to do with imports processing but a bug in how >>>>>>> you >>>>>>> are creating the OntModels. Not enough information to tell. >>>>>>> >>>>>>> Dave >>>>>>> >>>>>>> [1] There used to be a somewhat old example of how to do this in the >>>>>>> documentation but I can't find it in the current web site. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
