Le 23/11/2016 à 20:11, [email protected] a écrit :
On Wed, Nov 23, 2016 at 4:16 PM, Thierry Goubier
<[email protected] <mailto:[email protected]>> wrote:
2016-11-23 15:46 GMT+01:00 [email protected]
<mailto:[email protected]> <[email protected]
<mailto:[email protected]>>:
Thanks Thierry.
Please also see that with new satellites, the resolution is ever
increasing (e.g.
Sentinel
http://m.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Overview4
<http://m.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Overview4>)
It has allways been so. Anytime you reach a reasonable size, they
send a new satellite with higher res / larger images :)
I understand the tile thing and indeed a lot of the algos work
on tiles, but there are other ways to do this and especially
with real time geo queries on custom defined polygons, you go
only so far with tiles. A reason why we are using GeoTrellis
backed by Accumulo in order to pump data very fast in random order.
But that mean you're dealing with preprocessed / graph georeferenced
data (aka openstreetmap type of data). If you're dealing with
raster, your polygons are approximated by a set of tiles (with a
nice tile size well suited to your network / disk array).
I had reasonable success a long time ago (1991, I think), for
Ifremer, with an unbalanced, sort of quadtree based decomposition
for highly irregular curves on the seabed. Tree node size / tile
size was computed to be exactly equal to the disk block size on a
very slow medium. That sort of work is in the line of a geographic
index for a database: optimise query accesses to geo-referenced
objects... what is hard, and probably what you are doing, is
combining geographic queries with graph queries (give me all houses
in Belgium within a ten minutes bus + walk trip to a primary school)(*)
(*) One can work that out on a raster for speed. This is what GRASS
does for example.
(**) I asked a student to accelerate some raster processing on a
very small FPGA a long time ago. Once he had understood he could
pipeline the design to increase the frequency, he then discovered
that the FPGA would happily grok data faster than the computer bus
could provide it :) leaving no bandwith for the data to be written
back to memory.
Yes, but network can be pretty fast with bonded Ethernet interfaces
these days.
You mean they are not using HPC interconnects ?
We are adding 30+ servers to the cluster at the moment just to
deal with the sizes as there is a project mapping energy
landscape https://vito.be/en/land-use/land-use/energy-landscapes
<https://vito.be/en/land-use/land-use/energy-landscapes>. This
thing is throwing YARN containers and uses CPU like,
intensively. It is not uncommon for me to see their workload
eating everything for a serious amount of CPU seconds.
Only a few seconds ?
CPU-seconds, that the cluster usage unit for CPU.
http://serverfault.com/questions/138703/a-definition-for-a-cpu-second
So, says couple millions of them on a 640 core setup. CPU power is the
limiting factor in these workloads it seems.
If I understand well, the cluster has enough memory to load in RAM all
the data, then.
It would be silly not to plug Pharo into all of this
infrastructure I think.
I've had quite bad results with Pharo on compute intensive code
recently, so I'd plan carefully how I use it. On that sort of
hardware, in the projects I'm working on, 1000x faster than Pharo on
a single node is about an expected target.
Sure, but lower level C/C++ things are run from Python or Java, so Pharo
will not do worse. The good bit about Pharo is that one can ship a
preloaded image and that is easier than sending gigabyte (!) sized
uberjars around, that Java will unzip before running, also true with
Python myriad of dependencies. An image file appears super small then.
Agreed. Pharo 64bits is interesting there because it installs a lot
better than the 32bits version. And as far as I could see, at least as
stable as the 32bits version for my needs.
Especially given the PhD/Postdoc/brainiacs per square meter
there. If you have seen the Lost TV show, well, it kind of feels
working there at that place. Especially given that is is kind of
hidden in the woods.
Maybe you could have interesting interactions with them. These
guys also have their own nuclear reactor and geothermal drilling.
I'd be interested, because we're working a bit on high performance
parallel runtimes and compilation for those. If one day you happen
to be ready to talk about it in our place? South of Paris, not too
hard to reach by public transport :)
Sure, that would be awesome. But Q1Y17 then because my schedule is
pretty packed at the moment. I can show you the thing over the web from
my side, so you can see where are in terms of systems. I guess you are
much more advanced but one of the goals of the project here is to be
pretty approachable and gather a community that will cross pollinate
algos and datasets for network effects.
Ok. We can arrange that; I'm also quite busy until year end ;) The goal
here is also to make such high performance systems more usable, but, on
average, the targeted system is a bit more HPC-oriented (dedicated
interconnects, nodes with GPUs or Xeon Phi). We also have some
interesting work going on with microservers (highly-packed,
high-efficiency servers with lower power cpus, ARM, FPGAs).
Thierry