Hi Alessandro, It's much better now. Disk usage in sling/felix/bundle85/data/tdb-data/mgraph folder is steady at 1.4M.
Open files were at ~700 a start-up, they increased up to ~1.600 after two tests. Now after each test they jump at ~2.100 and then decrease back to ~1.600. And it's also much faster than before. I'll continue testing now with our customized ruleset. BR David On Fri, Mar 16, 2012 at 7:38 PM, Alessandro Adamou <[email protected]>wrote: > Hi David, > > after quite some work today I rewrote part of the Refactor Engine to avoid > creating useless graphs. > > Many were blank ontologies created along with the SEO scope. They are no > longer created. > > Many of the other graphs that you see are due to the fact that the engine > merges together the entity signatures into an OntoNet session. Every such > signature ends up resulting in its own ontology and therefore a graph in > Clerezza/TDB. > > I have not modified this second behaviour, but I have seen to it that the > refactor engine now destroys its own session *and its contents* when > computeEnhancements() completes. This means a lot of space occupied during > analysis but freed up right thereafter. > > It's more brutal than I wanted it to be, but a better implementation will > come up once I add a couple new features to OntoNet that should make the > process more reasonable. > > On the upside, the engine code is now smaller by some 250 lines. > > It would be super if you could update and try it out. > > Thanks > > Alessandro > > P.S. now I'm glad I added the "ontonet" prefix to those graph names... > > > > On 3/16/12 12:40 PM, David Riccitelli wrote: > >> From what I've seen so far, yes. But it could depend on your engine >>> configuration using a richer set of rules. >>> >> >> Same thing happens when we use the default rules set (seo_rules.sem) from >> SVN. >> >> We did not customize any other part of the installation with the exception >> of loading a local DBpedia index in sling/datafiles. >> >> David >> >> On Fri, Mar 16, 2012 at 12:27 PM, Alessandro Adamou<[email protected]>** >> wrote: >> >> On 3/16/12 11:16 AM, David Riccitelli wrote: >>> >>> Is this issue happening to us only? >>>> >>>> From what I've seen so far, yes. But it could depend on your engine >>> configuration using a richer set of rules. >>> >>> Alessandro >>> >>> On Fri, Mar 16, 2012 at 12:12 PM, Alessandro Adamou<[email protected] >>> >** >>> >>>> wrote: >>>> >>>> One thing that it would be great to do is to detect the ontology ID >>>> >>>>> *before* creating the TripleCollection in Clerezza, so any mappings >>>>> could >>>>> be done before storing. >>>>> >>>>> But I don't know how this can be done with not so much code. >>>>> >>>>> Perhaps creating an IndexedGraph, exploring its content, then creating >>>>> the >>>>> Graph in the TcManager with the same content and the right graph name, >>>>> then >>>>> finally clearing the IndexedGraph could work. >>>>> >>>>> But it still means having twice the resource usage (disk+memory) for a >>>>> period. >>>>> >>>>> Alessandro >>>>> >>>>> >>>>> >>>>> On 3/16/12 10:56 AM, Alessandro Adamou wrote: >>>>> >>>>> Hi David, >>>>> >>>>>> well, I guess that depends pretty much on how heavy the usage of >>>>>> OntoNet >>>>>> is in your Stanbol installation. >>>>>> >>>>>> Those are graphs created when OntoNet has to load an ontology from its >>>>>> content rather than from a Web URI, so it cannot know the ontology ID >>>>>> earlier. >>>>>> >>>>>> This happens e.g. by POSTing the ontology as the payload or by >>>>>> passing a >>>>>> GraphContentInputSource to the Java API. >>>>>> >>>>>> Now I do not know why these graphs are created (perhaps the refactor >>>>>> engine could be loading some), but I do know that a Clerezza graph in >>>>>> Jena >>>>>> TDB occupies a LOT of disk space. >>>>>> >>>>>> Suffice it to say that my bundled had stored nine graphs of<100 >>>>>> triples >>>>>> each. Their disk space was about 1.8 GB, but when I tried to make a >>>>>> zipfile >>>>>> out of it, it came out as about 2MB! >>>>>> >>>>>> Alessandro >>>>>> >>>>>> >>>>>> On 3/16/12 10:30 AM, David Riccitelli wrote: >>>>>> >>>>>> Dears, >>>>>> >>>>>>> As I ran into disk issues, I found that this folder: >>>>>>> sling/felix/bundleXXX/data/******tdb-data/mgraph >>>>>>> >>>>>>> >>>>>>> >>>>>>> where XX is the bundle of: >>>>>>> Clerezza - SCB Jena TDB Storage Provider >>>>>>> org.apache.clerezza.rdf.jena.******tdb.storage >>>>>>> >>>>>>> >>>>>>> >>>>>>> took almost 70 gbytes of disk space (then the disk space has been >>>>>>> exhausted). >>>>>>> >>>>>>> These are some of the files I found inside: >>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology889 >>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology1041 >>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology395 >>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology363 >>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology661 >>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology786 >>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology608 >>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology213 >>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology188 >>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology602 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Any clues? >>>>>>> >>>>>>> Thanks, >>>>>>> David Riccitelli >>>>>>> >>>>>>> ******************************************************************** >>>>>>> ************************ >>>>>>> >>>>>>> >>>>>>> InsideOut10 s.r.l. >>>>>>> P.IVA: IT-11381771002 >>>>>>> Fax: +39 0110708239 >>>>>>> --- >>>>>>> LinkedIn: >>>>>>> http://it.linkedin.com/in/******riccitelli<http://it.linkedin.com/in/****riccitelli> >>>>>>> <http://it.linkedin.**com/in/**riccitelli<http://it.linkedin.com/in/**riccitelli> >>>>>>> > >>>>>>> <http://it.linkedin.**com/in/**riccitelli<http://it.linkedin.** >>>>>>> com/in/riccitelli <http://it.linkedin.com/in/riccitelli>> >>>>>>> Twitter: ziodave >>>>>>> --- >>>>>>> Layar Partner >>>>>>> Network<http://www.layar.com/******<http://www.layar.com/****> >>>>>>> <http://www.layar.com/**> >>>>>>> publishing/developers/list/?******page=1&country=&city=&** >>>>>>> keyword=**** >>>>>>> insideout10&lpn=1<http://www.****layar.com/publishing/** >>>>>>> developers/list/?page=1&****country=&city=&keyword=**** >>>>>>> insideout10&lpn=1<http://www.**layar.com/publishing/** >>>>>>> developers/list/?page=1&**country=&city=&keyword=**insideout10&lpn=1<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1> >>>>>>> > >>>>>>> ******************************************************************** >>>>>>> ************************ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>> >>>>> M.Sc. Alessandro Adamou >>>>> >>>>> Alma Mater Studiorum - Università di Bologna >>>>> Department of Computer Science >>>>> Mura Anteo Zamboni 7, 40127 Bologna - Italy >>>>> >>>>> Semantic Technology Laboratory (STLab) >>>>> Institute for Cognitive Science and Technology (ISTC) >>>>> National Research Council (CNR) >>>>> Via Nomentana 56, 00161 Rome - Italy >>>>> >>>>> >>>>> "I will give you everything, so long as you do not demand anything." >>>>> (Ettore Petrolini, 1930) >>>>> >>>>> Not sent from my iSnobTechDevice >>>>> >>>>> >>>>> >>>>> -- >>> M.Sc. Alessandro Adamou >>> >>> Alma Mater Studiorum - Università di Bologna >>> Department of Computer Science >>> Mura Anteo Zamboni 7, 40127 Bologna - Italy >>> >>> Semantic Technology Laboratory (STLab) >>> Institute for Cognitive Science and Technology (ISTC) >>> National Research Council (CNR) >>> Via Nomentana 56, 00161 Rome - Italy >>> >>> >>> "I will give you everything, so long as you do not demand anything." >>> (Ettore Petrolini, 1930) >>> >>> Not sent from my iSnobTechDevice >>> >>> >>> >> > > -- > M.Sc. Alessandro Adamou > > Alma Mater Studiorum - Università di Bologna > Department of Computer Science > Mura Anteo Zamboni 7, 40127 Bologna - Italy > > Semantic Technology Laboratory (STLab) > Institute for Cognitive Science and Technology (ISTC) > National Research Council (CNR) > Via Nomentana 56, 00161 Rome - Italy > > > "I will give you everything, so long as you do not demand anything." > (Ettore Petrolini, 1930) > > Not sent from my iSnobTechDevice > > -- David Riccitelli ******************************************************************************** InsideOut10 s.r.l. P.IVA: IT-11381771002 Fax: +39 0110708239 --- LinkedIn: http://it.linkedin.com/in/riccitelli Twitter: ziodave --- Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1> ********************************************************************************
