The customized ruleset is working as well... I'll keep it running and see that it is stable.
I experienced another issue, which is unrelated so I'll open a different thread. Thanks for your help! David On Sat, Mar 17, 2012 at 10:01 AM, David Riccitelli <[email protected]>wrote: > Hi Alessandro, > > It's much better now. Disk usage > in sling/felix/bundle85/data/tdb-data/mgraph folder is steady at 1.4M. > > Open files were at ~700 a start-up, they increased up to ~1.600 after two > tests. Now after each test they jump at ~2.100 and then decrease back to > ~1.600. > > And it's also much faster than before. > > I'll continue testing now with our customized ruleset. > > BR > David > > > On Fri, Mar 16, 2012 at 7:38 PM, Alessandro Adamou <[email protected]>wrote: > >> Hi David, >> >> after quite some work today I rewrote part of the Refactor Engine to >> avoid creating useless graphs. >> >> Many were blank ontologies created along with the SEO scope. They are no >> longer created. >> >> Many of the other graphs that you see are due to the fact that the engine >> merges together the entity signatures into an OntoNet session. Every such >> signature ends up resulting in its own ontology and therefore a graph in >> Clerezza/TDB. >> >> I have not modified this second behaviour, but I have seen to it that the >> refactor engine now destroys its own session *and its contents* when >> computeEnhancements() completes. This means a lot of space occupied during >> analysis but freed up right thereafter. >> >> It's more brutal than I wanted it to be, but a better implementation will >> come up once I add a couple new features to OntoNet that should make the >> process more reasonable. >> >> On the upside, the engine code is now smaller by some 250 lines. >> >> It would be super if you could update and try it out. >> >> Thanks >> >> Alessandro >> >> P.S. now I'm glad I added the "ontonet" prefix to those graph names... >> >> >> >> On 3/16/12 12:40 PM, David Riccitelli wrote: >> >>> From what I've seen so far, yes. But it could depend on your engine >>>> configuration using a richer set of rules. >>>> >>> >>> Same thing happens when we use the default rules set (seo_rules.sem) from >>> SVN. >>> >>> We did not customize any other part of the installation with the >>> exception >>> of loading a local DBpedia index in sling/datafiles. >>> >>> David >>> >>> On Fri, Mar 16, 2012 at 12:27 PM, Alessandro Adamou<[email protected]>* >>> *wrote: >>> >>> On 3/16/12 11:16 AM, David Riccitelli wrote: >>>> >>>> Is this issue happening to us only? >>>>> >>>>> From what I've seen so far, yes. But it could depend on your engine >>>> configuration using a richer set of rules. >>>> >>>> Alessandro >>>> >>>> On Fri, Mar 16, 2012 at 12:12 PM, Alessandro Adamou<[email protected] >>>> >** >>>> >>>>> wrote: >>>>> >>>>> One thing that it would be great to do is to detect the ontology ID >>>>> >>>>>> *before* creating the TripleCollection in Clerezza, so any mappings >>>>>> could >>>>>> be done before storing. >>>>>> >>>>>> But I don't know how this can be done with not so much code. >>>>>> >>>>>> Perhaps creating an IndexedGraph, exploring its content, then creating >>>>>> the >>>>>> Graph in the TcManager with the same content and the right graph name, >>>>>> then >>>>>> finally clearing the IndexedGraph could work. >>>>>> >>>>>> But it still means having twice the resource usage (disk+memory) for a >>>>>> period. >>>>>> >>>>>> Alessandro >>>>>> >>>>>> >>>>>> >>>>>> On 3/16/12 10:56 AM, Alessandro Adamou wrote: >>>>>> >>>>>> Hi David, >>>>>> >>>>>>> well, I guess that depends pretty much on how heavy the usage of >>>>>>> OntoNet >>>>>>> is in your Stanbol installation. >>>>>>> >>>>>>> Those are graphs created when OntoNet has to load an ontology from >>>>>>> its >>>>>>> content rather than from a Web URI, so it cannot know the ontology ID >>>>>>> earlier. >>>>>>> >>>>>>> This happens e.g. by POSTing the ontology as the payload or by >>>>>>> passing a >>>>>>> GraphContentInputSource to the Java API. >>>>>>> >>>>>>> Now I do not know why these graphs are created (perhaps the refactor >>>>>>> engine could be loading some), but I do know that a Clerezza graph in >>>>>>> Jena >>>>>>> TDB occupies a LOT of disk space. >>>>>>> >>>>>>> Suffice it to say that my bundled had stored nine graphs of<100 >>>>>>> triples >>>>>>> each. Their disk space was about 1.8 GB, but when I tried to make a >>>>>>> zipfile >>>>>>> out of it, it came out as about 2MB! >>>>>>> >>>>>>> Alessandro >>>>>>> >>>>>>> >>>>>>> On 3/16/12 10:30 AM, David Riccitelli wrote: >>>>>>> >>>>>>> Dears, >>>>>>> >>>>>>>> As I ran into disk issues, I found that this folder: >>>>>>>> sling/felix/bundleXXX/data/******tdb-data/mgraph >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> where XX is the bundle of: >>>>>>>> Clerezza - SCB Jena TDB Storage Provider >>>>>>>> org.apache.clerezza.rdf.jena.******tdb.storage >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> took almost 70 gbytes of disk space (then the disk space has been >>>>>>>> exhausted). >>>>>>>> >>>>>>>> These are some of the files I found inside: >>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology889 >>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology1041 >>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology395 >>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology363 >>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology661 >>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology786 >>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology608 >>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology213 >>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology188 >>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology602 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Any clues? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David Riccitelli >>>>>>>> >>>>>>>> **************************************************************** >>>>>>>> **** >>>>>>>> ************************ >>>>>>>> >>>>>>>> >>>>>>>> InsideOut10 s.r.l. >>>>>>>> P.IVA: IT-11381771002 >>>>>>>> Fax: +39 0110708239 >>>>>>>> --- >>>>>>>> LinkedIn: >>>>>>>> http://it.linkedin.com/in/******riccitelli<http://it.linkedin.com/in/****riccitelli> >>>>>>>> <http://it.linkedin.**com/in/**riccitelli<http://it.linkedin.com/in/**riccitelli> >>>>>>>> > >>>>>>>> <http://it.linkedin.**com/in/**riccitelli<http://it.linkedin.** >>>>>>>> com/in/riccitelli <http://it.linkedin.com/in/riccitelli>> >>>>>>>> Twitter: ziodave >>>>>>>> --- >>>>>>>> Layar Partner >>>>>>>> Network<http://www.layar.com/******<http://www.layar.com/****> >>>>>>>> <http://www.layar.com/**> >>>>>>>> publishing/developers/list/?******page=1&country=&city=&** >>>>>>>> keyword=**** >>>>>>>> insideout10&lpn=1<http://www.****layar.com/publishing/** >>>>>>>> developers/list/?page=1&****country=&city=&keyword=**** >>>>>>>> insideout10&lpn=1<http://www.**layar.com/publishing/** >>>>>>>> developers/list/?page=1&**country=&city=&keyword=** >>>>>>>> insideout10&lpn=1<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1> >>>>>>>> > >>>>>>>> **************************************************************** >>>>>>>> **** >>>>>>>> ************************ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> >>>>>> M.Sc. Alessandro Adamou >>>>>> >>>>>> Alma Mater Studiorum - Università di Bologna >>>>>> Department of Computer Science >>>>>> Mura Anteo Zamboni 7, 40127 Bologna - Italy >>>>>> >>>>>> Semantic Technology Laboratory (STLab) >>>>>> Institute for Cognitive Science and Technology (ISTC) >>>>>> National Research Council (CNR) >>>>>> Via Nomentana 56, 00161 Rome - Italy >>>>>> >>>>>> >>>>>> "I will give you everything, so long as you do not demand anything." >>>>>> (Ettore Petrolini, 1930) >>>>>> >>>>>> Not sent from my iSnobTechDevice >>>>>> >>>>>> >>>>>> >>>>>> -- >>>> M.Sc. Alessandro Adamou >>>> >>>> Alma Mater Studiorum - Università di Bologna >>>> Department of Computer Science >>>> Mura Anteo Zamboni 7, 40127 Bologna - Italy >>>> >>>> Semantic Technology Laboratory (STLab) >>>> Institute for Cognitive Science and Technology (ISTC) >>>> National Research Council (CNR) >>>> Via Nomentana 56, 00161 Rome - Italy >>>> >>>> >>>> "I will give you everything, so long as you do not demand anything." >>>> (Ettore Petrolini, 1930) >>>> >>>> Not sent from my iSnobTechDevice >>>> >>>> >>>> >>> >> >> -- >> M.Sc. Alessandro Adamou >> >> Alma Mater Studiorum - Università di Bologna >> Department of Computer Science >> Mura Anteo Zamboni 7, 40127 Bologna - Italy >> >> Semantic Technology Laboratory (STLab) >> Institute for Cognitive Science and Technology (ISTC) >> National Research Council (CNR) >> Via Nomentana 56, 00161 Rome - Italy >> >> >> "I will give you everything, so long as you do not demand anything." >> (Ettore Petrolini, 1930) >> >> Not sent from my iSnobTechDevice >> >> > > > -- > David Riccitelli > > > ******************************************************************************** > InsideOut10 s.r.l. > P.IVA: IT-11381771002 > Fax: +39 0110708239 > --- > LinkedIn: http://it.linkedin.com/in/riccitelli > Twitter: ziodave > --- > Layar Partner > Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1> > > ******************************************************************************** > > -- David Riccitelli ******************************************************************************** InsideOut10 s.r.l. P.IVA: IT-11381771002 Fax: +39 0110708239 --- LinkedIn: http://it.linkedin.com/in/riccitelli Twitter: ziodave --- Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1> ********************************************************************************
