Le 23/11/2016 à 20:11, [email protected] a écrit :


On Wed, Nov 23, 2016 at 4:16 PM, Thierry Goubier
<[email protected] <mailto:[email protected]>> wrote:



    2016-11-23 15:46 GMT+01:00 [email protected]
    <mailto:[email protected]> <[email protected]
    <mailto:[email protected]>>:

        Thanks Thierry.

        Please also see that with new satellites, the resolution is ever
        increasing (e.g.
        Sentinel 
http://m.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Overview4
        
<http://m.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Overview4>)


    It has allways been so. Anytime you reach a reasonable size, they
    send a new satellite with higher res / larger images :)



        I understand the tile thing and indeed a lot of the algos work
        on tiles, but there are other ways to do this and especially
        with real time geo queries on custom defined polygons, you go
        only so far with tiles. A reason why we are using GeoTrellis
        backed by Accumulo in order to pump data very fast in random order.


    But that mean you're dealing with preprocessed / graph georeferenced
    data (aka openstreetmap type of data). If you're dealing with
    raster, your polygons are approximated by a set of tiles (with a
    nice tile size well suited to your network / disk array).

    I had reasonable success a long time ago (1991, I think), for
    Ifremer, with an unbalanced, sort of quadtree based decomposition
    for highly irregular curves on the seabed. Tree node size / tile
    size was computed to be exactly equal to the disk block size on a
    very slow medium. That sort of work is in the line of a geographic
    index for a database: optimise query accesses to geo-referenced
    objects... what is hard, and probably what you are doing, is
    combining geographic queries with graph queries (give me all houses
    in Belgium within a ten minutes bus + walk trip to a primary school)(*)

    (*) One can work that out on a raster for speed. This is what GRASS
    does for example.

    (**) I asked a student to accelerate some raster processing on a
    very small FPGA a long time ago. Once he had understood he could
    pipeline the design to increase the frequency, he then discovered
    that the FPGA would happily grok data faster than the computer bus
    could provide it :) leaving no bandwith for the data to be written
    back to memory.


Yes, but network can be pretty fast with bonded Ethernet interfaces
these days.

You mean they are not using HPC interconnects ?

        We are adding 30+ servers to the cluster at the moment just to
        deal with the sizes as there is a project mapping energy
        landscape https://vito.be/en/land-use/land-use/energy-landscapes
        <https://vito.be/en/land-use/land-use/energy-landscapes>. This
        thing is throwing YARN containers and uses CPU like,
        intensively. It is not uncommon for me to see their workload
        eating everything for a serious amount of CPU seconds.


    Only a few seconds ?


CPU-seconds, that the cluster usage unit for CPU.
http://serverfault.com/questions/138703/a-definition-for-a-cpu-second
So, says couple millions of them on a 640 core setup. CPU power is the
limiting factor in these workloads it seems.

If I understand well, the cluster has enough memory to load in RAM all the data, then.

        It would be silly not to plug Pharo into all of this
        infrastructure I think.


    I've had quite bad results with Pharo on compute intensive code
    recently, so I'd plan carefully how I use it. On that sort of
    hardware, in the projects I'm working on, 1000x faster than Pharo on
    a single node is about an expected target.


Sure, but lower level C/C++ things are run from Python or Java, so Pharo
will not do worse. The good bit about Pharo is that one can ship a
preloaded image and that is easier than sending gigabyte (!) sized
uberjars around, that Java will unzip before running, also true with
Python myriad of dependencies. An image file appears super small then.

Agreed. Pharo 64bits is interesting there because it installs a lot better than the 32bits version. And as far as I could see, at least as stable as the 32bits version for my needs.

        Especially given the PhD/Postdoc/brainiacs per square meter
        there. If you have seen the Lost TV show, well, it kind of feels
        working there at that place. Especially given that is is kind of
        hidden in the woods.

        Maybe you could have interesting interactions with them. These
        guys also have their own nuclear reactor and geothermal drilling.


    I'd be interested, because we're working a bit on high performance
    parallel runtimes and compilation for those. If one day you happen
    to be ready to talk about it in our place? South of Paris, not too
    hard to reach by public transport :)

Sure, that would be awesome. But Q1Y17 then because my schedule is
pretty packed at the moment. I can show you the thing over the web from
my side, so you can see where are in terms of systems. I guess you are
much more advanced but one of the goals of the project here is to be
pretty approachable and gather a community that will cross pollinate
algos and datasets for network effects.

Ok. We can arrange that; I'm also quite busy until year end ;) The goal here is also to make such high performance systems more usable, but, on average, the targeted system is a bit more HPC-oriented (dedicated interconnects, nodes with GPUs or Xeon Phi). We also have some interesting work going on with microservers (highly-packed, high-efficiency servers with lower power cpus, ARM, FPGAs).

Thierry

Reply via email to