I have no clue how it is loading the owl file since it's not anywhere in that physical space.
Right now, I am not even building an OntModel, just a Model. Still, it's an open issue why it thinks it's supposed to be loading an owl file; there are URIs in the TDB database with that name in them. Lots to think about. Many thanks! On Sun, Feb 8, 2015 at 10:27 AM, Andy Seaborne <[email protected]> wrote: > On 08/02/15 17:39, Jack Park wrote: > >> Thanks! Much food for thought here. >> I wrote no code that uses FileManager, so I have no clue where that comes >> from. >> I gather, then, that I create the dataset first, then just getDefaultModel >> each time I need it -- which, in theory, is just once, when I create the >> OntModel (not in that code). >> > > If the OntModel has inference, you may well run out of memory, as might > large amounts of imports processing. > > You have to recreate the OntModel each time when used transactionally - it > has a link to the database. You may wish to increase the scope of the > transaction. > > This is consistent with your original report. > """ > showing all the ont classes it is reading. Then, it runs out of > heap space and crashes. > """ > > BiomedGT.owl is 130Mb. Something is loading it. It's not the database. > > Andy > > > >> >> On Sun, Feb 8, 2015 at 9:28 AM, Andy Seaborne <[email protected]> wrote: >> >> On 08/02/15 16:09, Jack Park wrote: >>> >>> I could create a gist of the code, but here is the core. What would be >>>> needed is the database I am using as well. It turns out that my dyslexia >>>> (my bad) told me I had fully commented out making the model; that's not >>>> true. Just running that snippet of code: createDataset runs fine and >>>> creates a 12k log file. >>>> >>>> That means that my original question was a bad one after all. The issue >>>> resides in making the model: here is that code: >>>> >>>> public Model getModel() { >>>> Model m = null; >>>> Dataset dataset = TDBFactory.createDataset(dbPath) ; >>>> >>>> >>> This will not cause OOME unless you have set the heap very low. >>> >>> Are you on a 32 bit machine? >>> >>> For either 32 or 64 bit heap, a 1.5 G heap is fine for this call. >>> Actually, much smaller will work - all the caches are delayed. >>> >>> dataset.begin(ReadWrite.READ); >>> >>>> m = dataset.getDefaultModel(); >>>> dataset.end(); >>>> return m; >>>> } >>>> >>>> >>> That code has a transaction leak - "m" is the model in the transaction >>> ... >>> but you said dataset.end(); >>> >>> Get the model each time with dataset.getDefaultModel(); >>> (it's a cheap operation). >>> >>> >>> The log file blooms rather rapidly (including a nullpointer from a >>>> "Mangled >>>> prefix map: graph name=" error (perhaps that's because the mechanism I >>>> used >>>> to load Fuseki did not actually name a graph? >>>> >>>> A fragment of that log is this >>>> >>>> DEBUG 2015-02-08 07:57:14,524 [main] - readModel(model, >>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl) >>>> >>>> >>> Who is called readModel? It's not TDB. >>> >>> Something is calling FileManager operations (if it is >>> FileManager.readModel) >>> >>> TDB does not call FileManager. >>> >>> You could put a break point on FileManager.readModel to get a call stack. >>> >>> Andy >>> >>> DEBUG 2015-02-08 07:57:14,524 [main] - readModel(model, >>> >>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl, null) >>>> DEBUG 2015-02-08 07:57:14,524 [main] - Not mapped: >>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl >>>> DEBUG 2015-02-08 07:57:14,525 [main] - open( >>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl) >>>> DEBUG 2015-02-08 07:57:14,525 [main] - Not mapped: >>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl >>>> DEBUG 2015-02-08 07:57:14,629 [main] - [1] GET >>>> http://ncicb.nci.nih.gov/xml/owl/EVS/BiomedGT.owl >>>> DEBUG 2015-02-08 07:57:15,396 [main] - Connection request: [route: {}-> >>>> http://ncicb.nci.nih.gov][total kept alive: 0; route allocated: 0 of 5; >>>> total allocated: 0 of 10] >>>> >>>> >>>> On Sun, Feb 8, 2015 at 7:48 AM, Dave Reynolds < >>>> [email protected]> >>>> wrote: >>>> >>>> Hi Jack, >>>> >>>>> >>>>> There's nothing wrong with the line of code that I can see. That is the >>>>> normal way to open a TDB dataset. TDB does not load all the data into >>>>> memory and there are no magic flags to set. >>>>> >>>>> What I'm asking is what log file you are talking about. >>>>> >>>>> I don't see why a program with just that line of code and nothing else >>>>> should be creating any log files at all, so I'm not clear on what log >>>>> file >>>>> you are talking about. I guess you might have a log4j.properties file >>>>> in >>>>> your path set to DEBUG level and writing to a file instead of >>>>> stdout/err >>>>> but that's still not going result in hundreds of Mb of output. >>>>> >>>>> Can you provide a complete minimal example of what you are doing? >>>>> >>>>> Dave >>>>> >>>>> >>>>> On 08/02/15 15:12, Jack Park wrote: >>>>> >>>>> Sorry. While I am not new to Jena (I built systems with it in the >>>>> >>>>>> previous >>>>>> decade when it was much simpler to use), I suppose I get confused >>>>>> easily. >>>>>> The code snippet I showed was found all over the web in tutorials, and >>>>>> so >>>>>> forth. I spent some time reading around in the Jena source code and >>>>>> find >>>>>> it >>>>>> hard to believe that "there's nothing to write a log..." since several >>>>>> classes are loaded and exercised just by calling >>>>>> TDBFactory.createDataset. >>>>>> >>>>>> But, I confess that I don't see anything in that code which suggests >>>>>> it >>>>>> should actually read the database itself (though that might be the >>>>>> case). >>>>>> Still it created a 238+mb log file before crashing. >>>>>> >>>>>> Rooting around in the Fuseki code to see how it boots a given database >>>>>> is >>>>>> truly difficult. Maybe someone familiar with the code can explain it. >>>>>> >>>>>> Maybe there is a better way to boot a given TDB database and create an >>>>>> OntModel from that? >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Feb 8, 2015 at 6:27 AM, Dave Reynolds < >>>>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>> On 08/02/15 13:55, Jack Park wrote: >>>>>> >>>>>> >>>>>>> Hi Dave, >>>>>>> >>>>>>> >>>>>>>> Thanks for your response. >>>>>>>> I should have stated more clearly that the code I did show is *all* >>>>>>>> the >>>>>>>> code that is running. That snippet: >>>>>>>> >>>>>>>> Dataset dataset = TDBFactory.createDataset(dbPath) ; >>>>>>>> >>>>>>>> is what is running when the system gets an OutOfMemory error. Even >>>>>>>> with >>>>>>>> a >>>>>>>> 4gb heap, it still blows. All the code which does "begin", Model = >>>>>>>> ...," >>>>>>>> and so forth has been commented out. >>>>>>>> >>>>>>>> The behavior according to the log is that somewhere in the >>>>>>>> createDataset >>>>>>>> code, it is reading every class in the ontology stored in the TDB >>>>>>>> database >>>>>>>> it is, I presume, opening. >>>>>>>> >>>>>>>> >>>>>>>> What log? If that is literally the only code line then there's >>>>>>>> >>>>>>> nothing to >>>>>>> write a log and certainly nothing that will go round trying to read >>>>>>> classes. >>>>>>> >>>>>>> Dave >>>>>>> >>>>>>> >>>>>>> That's the current puzzle. It's almost as if there is some system >>>>>>> property >>>>>>> >>>>>>> I need to set somewhere which tells it that this is not an >>>>>>> in-memory >>>>>>> >>>>>>>> event. >>>>>>>> >>>>>>>> Thoughts? >>>>>>>> >>>>>>>> Jack >>>>>>>> >>>>>>>> On Sun, Feb 8, 2015 at 2:06 AM, Dave Reynolds < >>>>>>>> [email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 07/02/15 22:18, Jack Park wrote: >>>>>>>> >>>>>>>> >>>>>>>> I used the Jena to load, on behalf of Fuseki, a collection of >>>>>>>>> owl >>>>>>>>> files. >>>>>>>>> >>>>>>>>> There might be 4gb of data all totaled in there. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Now, rather than use Fuseki to access that data, I am writing code >>>>>>>>>> which >>>>>>>>>> will use a Dataset opened against that database to create an >>>>>>>>>> OntModel. >>>>>>>>>> >>>>>>>>>> I use this code, taken from a variety of sources: >>>>>>>>>> >>>>>>>>>> Dataset dataset = TDBFactory.createDataset(dbPath) ; >>>>>>>>>> >>>>>>>>>> where dbPath points to the directory where Jena made the database. >>>>>>>>>> >>>>>>>>>> When I boot Fuseki against that data, it boots quickly and without >>>>>>>>>> any >>>>>>>>>> issues. >>>>>>>>>> >>>>>>>>>> When I run that code against the same data, firstly, it blossoms a >>>>>>>>>> logfile >>>>>>>>>> >>>>>>>>>> 260 mb, showing all the ont classes it is reading. Then, it >>>>>>>>>> runs >>>>>>>>>> out of >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> heap space and crashes. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Simply accessing data in a TDBDataset won't load it all into >>>>>>>>>> memory >>>>>>>>>> >>>>>>>>>> so >>>>>>>>> the >>>>>>>>> problem will be in how you are creating the OntModels. >>>>>>>>> >>>>>>>>> Since you don't show "that code" it is hard know what the problem >>>>>>>>> is. >>>>>>>>> >>>>>>>>> It *might* be you have dynamic imports processing switched on and >>>>>>>>> so >>>>>>>>> your >>>>>>>>> OntModels are going out to the original sources and reloading them. >>>>>>>>> >>>>>>>>> It is possible to do imports processing but have the imports be >>>>>>>>> found >>>>>>>>> as >>>>>>>>> database models [1] but in your case since you have all the >>>>>>>>> ontologies >>>>>>>>> in >>>>>>>>> there anyway then I would just switch off all imports processing. >>>>>>>>> >>>>>>>>> Or it might be nothing to do with imports processing but a bug in >>>>>>>>> how >>>>>>>>> you >>>>>>>>> are creating the OntModels. Not enough information to tell. >>>>>>>>> >>>>>>>>> Dave >>>>>>>>> >>>>>>>>> [1] There used to be a somewhat old example of how to do this in >>>>>>>>> the >>>>>>>>> documentation but I can't find it in the current web site. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
