Hi Phil, 2016-11-23 12:17 GMT+01:00 [email protected] < [email protected]>:
> [ ...] > > It is really important to have such features to avoid massive GC pauses. > > My use case is to load the data sets from here. > https://www.google.be/url?sa=t&source=web&rct=j&url=http:// > proba-v.vgt.vito.be/sites/default/files/Product_User_ > Manual.pdf&ved=0ahUKEwjwlOG-4L7QAhWBniwKHZVmDZcQFggpMAI&usg= > AFQjCNGRME9ZyHWQ8yCPgAQBDi1PUmzhbQ&sig2=eyaT4DlWCTjqUdQGBhFY0w > I've used that type of data before, a long time ago. I consider that tiled / on-demand block loading is the way to go for those. Work with the header as long as possible, stream tiles if you need to work on the full data set. There is a good chance that: 1- You're memory bound for anything you compute with them 2- I/O times dominates, or become low enough to don't care (very fast SSDs) 3- It's very rare that you need full random access on the complete array 4- GC doesn't matter Stream computing is your solution! This is how the raster GIS are implemented. What is hard for me is manipulating a very large graph, or a sparse very large structure, like a huge Famix model or a FPGA layout model with a full design layed out on top. There, you're randomly accessing the whole of the structure (or at least you see no obvious partition) and the structure is too large for the memory or the GC. This is why I had a long time ago this idea of a in-memory working-set / on-disk full structure with automatic determination of what the working set is. For pointers, have a look at the Graph500 and HPCG benchmarks, especially the efficiency (ratio to peak) of HPCG runs, to see how difficult these cases are. Regards, Thierry
