On 10 November 2016 at 19:58, Stephan Eggermont <[email protected]> wrote:
> Igor wrote: > >Now i hope at the end of the day, > >the guys who doing data mining/statistical > >analysis will finally shut up and happily > >be able to work with more bloat without > >need of learning a ways to properly > >manage memory & resources, and > >implement them finally. > > The actual problem is of course having to work with all that data before > you understand the structure. Or highly interconnected structures with > unpredictable access patterns. Partial graphs are nice, once you understand > how to partition. Needing to understand how to partition first is a > dependency I'd rather avoid. > > No, no, no! This is simply not true. It is you, who writes the code that generates a lot of statistical data/analysis data, and its output is fairly predictable.. else you are not collecting any data, but just a random noise, isn't? Those graphs are far from being unpredictable, because they are product of a software you wrote. Its not unpredictable, unless you claim that code you write is unpredictable, then i wonder, what are you doing in a field of data analysis, if you admit that your data is nothing but just a dice roll? If you cannot tame & reason about the complexity of own code, then maybe better to change occupation and go work in casino? :) I mean, Doru is light years ahead of me and many others in field of data analysis.. so what i can advise to him on his playground? You absolutely right, that the most hardest part, you identified, is find the way how you dissect the graph data on smaller chunks. And storing such dissected graph in chunks on a hard drive outside of image and loading in case of need, is just nothing compared to the first part. And if Doru can't handle this, then who else can? Me? I have nothing comparing to his experience in that field. I had very little/occasional experience in my career where i had to deal with such domain. Cmon.. > >Because even if you can fit all data in > >memory, consider how much time it takes > >for GC to scan 4+ Gb of memory, > > That's often not what is happening. The large data is mostly static, so > gets moved out of new space very quickly. Otherwise working with large data > quickly becomes annoying indeed. I fully agree with you on that. > > Stephan > > -- Best regards, Igor Stasenko.
