Hi thierry

did you happen to have a techreport or any description of your work?


Stef


Le 11/11/16 à 11:44, Thierry Goubier a écrit :
Le 11/11/2016 à 11:29, Stephan Eggermont a écrit :
On 10/11/16 21:35, Igor Stasenko wrote:
No, no, no! This is simply not true.
It is you, who writes the code that generates a lot of statistical
data/analysis data, and its output is fairly predictable.. else you are
not collecting any data, but just a random noise, isn't?

That would be green field development. In brown field development, I
only get in when people start noticing there is a problem (why do we
need more than 4GBytes for this?). At that point I want to be able to
load everything they can give me in an image so I can start analyzing
and structuring it.

I mean, Doru is light years ahead of me and many others in field of data
analysis.. so what i can advise to him on his playground?

Well, the current FAMIX model implementation is clearly not well
structured for analyzing large code bases. And it is difficult to
partition because of unpredictable access patterns and high
interconnection.

This is why you look for a general purpose, efficient off-loading scheme, trying to optimize a general case and get reasonable performance out of it (a.k.a fuel, but designed for partial unloading / loading: allow dangling references in a unit of load, focus on per-page units to match the underlying storage layer or network).

I wrote one such layer for VW a long time ago, but didn't had time to experiment / qualify some of the techniques in it. There was an interesting attempt (IMHO ... wasn't qualified) at combining paging and automatic refinement of application working set, based on previous experience implementing a hierarchical 2D object access scheme for large datasets on slow medium (decreased access time from 30 minutes to about a few seconds).

The other approach I would look is take some of the support code for such an automatic layer and use it to unload parts of my model;, and I'm pretty sure that, if I don't bench intensively, I'll get the partitioning wrong :(

Overall, an interesting subject, totally not valid from a scientific point of view (the database guys have already solved everything). Only valid as a hobby, or if a company is ready to pay for a solution.

Thierry




Reply via email to