Hi Alessandro,

It's much better now. Disk usage
in sling/felix/bundle85/data/tdb-data/mgraph folder is steady at 1.4M.

Open files were at ~700 a start-up, they increased up to ~1.600 after two
tests. Now after each test they jump at ~2.100 and then decrease back to
~1.600.

And it's also much faster than before.

I'll continue testing now with our customized ruleset.

BR
David

On Fri, Mar 16, 2012 at 7:38 PM, Alessandro Adamou <[email protected]>wrote:

> Hi David,
>
> after quite some work today I rewrote part of the Refactor Engine to avoid
> creating useless graphs.
>
> Many were blank ontologies created along with the SEO scope. They are no
> longer created.
>
> Many of the other graphs that you see are due to the fact that the engine
> merges together the entity signatures into an OntoNet session. Every such
> signature ends up resulting in its own ontology and therefore a graph in
> Clerezza/TDB.
>
> I have not modified this second behaviour, but I have seen to it that the
> refactor engine now destroys its own session *and its contents* when
> computeEnhancements() completes. This means a lot of space occupied during
> analysis but freed up right thereafter.
>
> It's more brutal than I wanted it to be, but a better implementation will
> come up once I add a couple new features to OntoNet that should make the
> process more reasonable.
>
> On the upside, the engine code is now smaller by some 250 lines.
>
> It would be super if you could update and try it out.
>
> Thanks
>
> Alessandro
>
> P.S. now I'm glad I added the "ontonet" prefix to those graph names...
>
>
>
> On 3/16/12 12:40 PM, David Riccitelli wrote:
>
>>  From what I've seen so far, yes. But it could depend on your engine
>>> configuration using a richer set of rules.
>>>
>>
>> Same thing happens when we use the default rules set (seo_rules.sem) from
>> SVN.
>>
>> We did not customize any other part of the installation with the exception
>> of loading a local DBpedia index in sling/datafiles.
>>
>> David
>>
>> On Fri, Mar 16, 2012 at 12:27 PM, Alessandro Adamou<[email protected]>**
>> wrote:
>>
>>  On 3/16/12 11:16 AM, David Riccitelli wrote:
>>>
>>>  Is this issue happening to us only?
>>>>
>>>>   From what I've seen so far, yes. But it could depend on your engine
>>> configuration using a richer set of rules.
>>>
>>> Alessandro
>>>
>>>  On Fri, Mar 16, 2012 at 12:12 PM, Alessandro Adamou<[email protected]
>>> >**
>>>
>>>> wrote:
>>>>
>>>>  One thing that it would be great to do is to detect the ontology ID
>>>>
>>>>> *before* creating the TripleCollection in Clerezza, so any mappings
>>>>> could
>>>>> be done before storing.
>>>>>
>>>>> But I don't know how this can be done with not so much code.
>>>>>
>>>>> Perhaps creating an IndexedGraph, exploring its content, then creating
>>>>> the
>>>>> Graph in the TcManager with the same content and the right graph name,
>>>>> then
>>>>> finally clearing the IndexedGraph could work.
>>>>>
>>>>> But it still means having twice the resource usage (disk+memory) for a
>>>>> period.
>>>>>
>>>>> Alessandro
>>>>>
>>>>>
>>>>>
>>>>> On 3/16/12 10:56 AM, Alessandro Adamou wrote:
>>>>>
>>>>>  Hi David,
>>>>>
>>>>>> well, I guess that depends pretty much on how heavy the usage of
>>>>>> OntoNet
>>>>>> is in your Stanbol installation.
>>>>>>
>>>>>> Those are graphs created when OntoNet has to load an ontology from its
>>>>>> content rather than from a Web URI, so it cannot know the ontology ID
>>>>>> earlier.
>>>>>>
>>>>>> This happens e.g. by POSTing the ontology as the payload or by
>>>>>> passing a
>>>>>> GraphContentInputSource to the Java API.
>>>>>>
>>>>>> Now I do not know why these graphs are created (perhaps the refactor
>>>>>> engine could be loading some), but I do know that a Clerezza graph in
>>>>>> Jena
>>>>>> TDB occupies a LOT of disk space.
>>>>>>
>>>>>> Suffice it to say that my bundled had stored nine graphs of<100
>>>>>> triples
>>>>>> each. Their disk space was about 1.8 GB, but when I tried to make a
>>>>>> zipfile
>>>>>> out of it, it came out as about 2MB!
>>>>>>
>>>>>> Alessandro
>>>>>>
>>>>>>
>>>>>> On 3/16/12 10:30 AM, David Riccitelli wrote:
>>>>>>
>>>>>>  Dears,
>>>>>>
>>>>>>> As I ran into disk issues, I found that this folder:
>>>>>>>  sling/felix/bundleXXX/data/******tdb-data/mgraph
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> where XX is the bundle of:
>>>>>>>  Clerezza - SCB Jena TDB Storage Provider
>>>>>>> org.apache.clerezza.rdf.jena.******tdb.storage
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> took almost 70 gbytes of disk space (then the disk space has been
>>>>>>> exhausted).
>>>>>>>
>>>>>>> These are some of the files I found inside:
>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology889
>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology1041
>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology395
>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology363
>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology661
>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology786
>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology608
>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology213
>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology188
>>>>>>> 193M ./ontonet%3A%3Ainputstream%******3Aontology602
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Any clues?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David Riccitelli
>>>>>>>
>>>>>>> ********************************************************************
>>>>>>> ************************
>>>>>>>
>>>>>>>
>>>>>>> InsideOut10 s.r.l.
>>>>>>> P.IVA: IT-11381771002
>>>>>>> Fax: +39 0110708239
>>>>>>> ---
>>>>>>> LinkedIn: 
>>>>>>> http://it.linkedin.com/in/******riccitelli<http://it.linkedin.com/in/****riccitelli>
>>>>>>> <http://it.linkedin.**com/in/**riccitelli<http://it.linkedin.com/in/**riccitelli>
>>>>>>> >
>>>>>>> <http://it.linkedin.**com/in/**riccitelli<http://it.linkedin.**
>>>>>>> com/in/riccitelli <http://it.linkedin.com/in/riccitelli>>
>>>>>>> Twitter: ziodave
>>>>>>> ---
>>>>>>> Layar Partner 
>>>>>>> Network<http://www.layar.com/******<http://www.layar.com/****>
>>>>>>> <http://www.layar.com/**>
>>>>>>> publishing/developers/list/?******page=1&country=&city=&**
>>>>>>> keyword=****
>>>>>>> insideout10&lpn=1<http://www.****layar.com/publishing/**
>>>>>>> developers/list/?page=1&****country=&city=&keyword=****
>>>>>>> insideout10&lpn=1<http://www.**layar.com/publishing/**
>>>>>>> developers/list/?page=1&**country=&city=&keyword=**insideout10&lpn=1<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>>>>>>> >
>>>>>>> ********************************************************************
>>>>>>> ************************
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   --
>>>>>>
>>>>> M.Sc. Alessandro Adamou
>>>>>
>>>>> Alma Mater Studiorum - Università di Bologna
>>>>> Department of Computer Science
>>>>> Mura Anteo Zamboni 7, 40127 Bologna - Italy
>>>>>
>>>>> Semantic Technology Laboratory (STLab)
>>>>> Institute for Cognitive Science and Technology (ISTC)
>>>>> National Research Council (CNR)
>>>>> Via Nomentana 56, 00161 Rome - Italy
>>>>>
>>>>>
>>>>> "I will give you everything, so long as you do not demand anything."
>>>>> (Ettore Petrolini, 1930)
>>>>>
>>>>> Not sent from my iSnobTechDevice
>>>>>
>>>>>
>>>>>
>>>>>  --
>>> M.Sc. Alessandro Adamou
>>>
>>> Alma Mater Studiorum - Università di Bologna
>>> Department of Computer Science
>>> Mura Anteo Zamboni 7, 40127 Bologna - Italy
>>>
>>> Semantic Technology Laboratory (STLab)
>>> Institute for Cognitive Science and Technology (ISTC)
>>> National Research Council (CNR)
>>> Via Nomentana 56, 00161 Rome - Italy
>>>
>>>
>>> "I will give you everything, so long as you do not demand anything."
>>> (Ettore Petrolini, 1930)
>>>
>>> Not sent from my iSnobTechDevice
>>>
>>>
>>>
>>
>
> --
> M.Sc. Alessandro Adamou
>
> Alma Mater Studiorum - Università di Bologna
> Department of Computer Science
> Mura Anteo Zamboni 7, 40127 Bologna - Italy
>
> Semantic Technology Laboratory (STLab)
> Institute for Cognitive Science and Technology (ISTC)
> National Research Council (CNR)
> Via Nomentana 56, 00161 Rome - Italy
>
>
> "I will give you everything, so long as you do not demand anything."
> (Ettore Petrolini, 1930)
>
> Not sent from my iSnobTechDevice
>
>


-- 
David Riccitelli

********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner 
Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
********************************************************************************

Reply via email to