On Wed, Nov 23, 2016 at 12:53 AM, Eliot Miranda <[email protected]>
wrote:

>
>
> On Tue, Nov 22, 2016 at 10:26 AM, Sven Van Caekenberghe <[email protected]>
> wrote:
>
>>
>> > On 22 Nov 2016, at 19:16, [email protected] wrote:
>> >
>> >
>> >
>> > On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko <[email protected]>
>> wrote:
>> >
>> >
>> > On 15 November 2016 at 02:18, Eliot Miranda <[email protected]>
>> wrote:
>> > Hi Phil,
>> >
>> > On Thu, Nov 10, 2016 at 2:19 AM, [email protected] <[email protected]>
>> wrote:
>> >
>> >
>> > On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <
>> [email protected]> wrote:
>> >
>> > 2016-11-10 9:49 GMT+01:00 [email protected] <[email protected]>:
>> > Ah, but then it may be more interesting to have a data image (maybe a
>> lot of these) and a front end image.
>> >
>> > Isn't Seamless something that could help us here? No need to bring the
>> data back, just manipulate it through proxies.
>> >
>> > Problem that server image will anyway perform GC. And it will be slow
>> if server image is big which will stop all world.
>> >
>> > What if we asked it to not do any GC at all? Like if we have tons of
>> RAM, why bother? Especially if what it is used to is to keep datasets: load
>> them, save image to disk. When needed trash the loaded stuff and reload
>> from zero.
>> >
>> > Basically that is what happens with Spark.
>> >
>> > http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo
>> > https://0x0fff.com/spark-misconceptions/
>> >
>> > While global GC may not be useful for big-data scavenging probably will
>> be for any non-trivial query.  But I think I see a misconception here.  The
>> large RAM on a multiword machine would be divided up between the cores.  It
>> makes no sense to run a single Smalltalk across lots of cores (we're a long
>> way from having a thread-safe class library).  It makes much more sense to
>> have one Smalltalk per core.  So that brings the heap sizes down and makes
>> GC less scary.
>> >
>> > yep, that approach what we're tried in HydraVM
>> >
>> >
>> > and Tachyon/Alluxio is kind of solving this kind of issue (may be nice
>> to have that interacting with Pharo image). http://www.alluxio.org/ This
>> thing basically keeps stuff in memory in case one needs to reuse the data
>> between workload runs.
>> >
>> > Sure.  We have all the facilities we need to do this.  We can add and
>> remove code at runtime so we can keep live instances running, and send the
>> code to them along with the data we want them to crunch.
>> >
>> >
>> > Or have an object memory for work and one for datasets (first one gets
>> GC'd, the other one isn't).
>> >
>> > Or have policies which one can switch.  There are quite a few levers
>> into the GC from the image and one can easily switch off global GC with the
>> right levers.  One doesn't need a VM that doesn't contain a GC.  One needs
>> an image that is using the right policy.
>> >
>> > or just mark whole data (sub)graphs with some bit, telling GC to skip
>> over this so it won't attempt to scan it treating them as always alive..
>> > this is where we getting back to my idea of heap spaces, where you can
>> toss a subgraph into a special heap space that has such policy, that it is
>> never scanned/GCed automatically and can be triggered only manually or
>> something like that.
>> >
>> > Could be very useful for all kinds of large binary data, like videos
>> and sounds that we can load once and keep in the heap space.
>> >
>> > How hard would it be to get something like that?
>>
>> Large binary data poses no problem (as long as it's not a copying GC).
>> Since a binary blob contains no subpointers, no work needs to be done. A 1M
>> or 1G ByteArray is the same amount of GC work.
>>
>
> +1
>

Amen to that. But a dataset made of a gazillion of composites is not the
same, right?

Phil

>
> _,,,^..^,,,_
> best, Eliot
>

Reply via email to