Thanks Thierry.

Please also see that with new satellites, the resolution is ever increasing
(e.g. Sentinel
http://m.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Overview4)

I understand the tile thing and indeed a lot of the algos work on tiles,
but there are other ways to do this and especially with real time geo
queries on custom defined polygons, you go only so far with tiles. A reason
why we are using GeoTrellis backed by Accumulo in order to pump data very
fast in random order.

We are adding 30+ servers to the cluster at the moment just to deal with
the sizes as there is a project mapping energy landscape
https://vito.be/en/land-use/land-use/energy-landscapes. This thing is
throwing YARN containers and uses CPU like, intensively. It is not uncommon
for me to see their workload eating everything for a serious amount of CPU
seconds.

It would be silly not to plug Pharo into all of this infrastructure I
think.

Especially given the PhD/Postdoc/brainiacs per square meter there. If you
have seen the Lost TV show, well, it kind of feels working there at that
place. Especially given that is is kind of hidden in the woods.

Maybe you could have interesting interactions with them. These guys also
have their own nuclear reactor and geothermal drilling.

Phil



On Wed, Nov 23, 2016 at 1:30 PM, Thierry Goubier <[email protected]>
wrote:

> Hi Phil,
>
> 2016-11-23 12:17 GMT+01:00 [email protected] <
> [email protected]>:
>
>> [ ...]
>>
>> It is really important to have such features to avoid massive GC pauses.
>>
>> My use case is to load the data sets from here.
>> https://www.google.be/url?sa=t&source=web&rct=j&url=http://p
>> roba-v.vgt.vito.be/sites/default/files/Product_User_Manual.
>> pdf&ved=0ahUKEwjwlOG-4L7QAhWBniwKHZVmDZcQFggpMAI&usg=AFQjCNG
>> RME9ZyHWQ8yCPgAQBDi1PUmzhbQ&sig2=eyaT4DlWCTjqUdQGBhFY0w
>>
> I've used that type of data before, a long time ago.
>
> I consider that tiled / on-demand block loading is the way to go for
> those. Work with the header as long as possible, stream tiles if you need
> to work on the full data set. There is a good chance that:
>
> 1- You're memory bound for anything you compute with them
> 2- I/O times dominates, or become low enough to don't care (very fast SSDs)
> 3- It's very rare that you need full random access on the complete array
> 4- GC doesn't matter
>
> Stream computing is your solution! This is how the raster GIS are
> implemented.
>
> What is hard for me is manipulating a very large graph, or a sparse very
> large structure, like a huge Famix model or a FPGA layout model with a full
> design layed out on top. There, you're randomly accessing the whole of the
> structure (or at least you see no obvious partition) and the structure is
> too large for the memory or the GC.
>
> This is why I had a long time ago this idea of a in-memory working-set /
> on-disk full structure with automatic determination of what the working set
> is.
>
> For pointers, have a look at the Graph500 and HPCG benchmarks, especially
> the efficiency (ratio to peak) of HPCG runs, to see how difficult these
> cases are.
>
> Regards,
>
> Thierry
>

Reply via email to