On 15 November 2016 at 02:18, Eliot Miranda <eliot.mira...@gmail.com> wrote:

> Hi Phil,
>
> On Thu, Nov 10, 2016 at 2:19 AM, p...@highoctane.be <p...@highoctane.be>
> wrote:
>
>>
>>
>> On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <dionisi...@gmail.com>
>> wrote:
>>
>>>
>>> 2016-11-10 9:49 GMT+01:00 p...@highoctane.be <p...@highoctane.be>:
>>>
>>>> Ah, but then it may be more interesting to have a data image (maybe a
>>>> lot of these) and a front end image.
>>>>
>>>> Isn't Seamless something that could help us here? No need to bring the
>>>> data back, just manipulate it through proxies.
>>>>
>>>
>>> Problem that server image will anyway perform GC. And it will be slow if
>>> server image is big which will stop all world.
>>>
>>
>> What if we asked it to not do any GC at all? Like if we have tons of RAM,
>> why bother? Especially if what it is used to is to keep datasets: load
>> them, save image to disk. When needed trash the loaded stuff and reload
>> from zero.
>>
>> Basically that is what happens with Spark.
>>
>> http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo
>> https://0x0fff.com/spark-misconceptions/
>>
>
> While global GC may not be useful for big-data scavenging probably will be
> for any non-trivial query.  But I think I see a misconception here.  The
> large RAM on a multiword machine would be divided up between the cores.  It
> makes no sense to run a single Smalltalk across lots of cores (we're a long
> way from having a thread-safe class library).  It makes much more sense to
> have one Smalltalk per core.  So that brings the heap sizes down and makes
> GC less scary.
>

yep, that approach what we're tried in HydraVM


>
>
>> and Tachyon/Alluxio is kind of solving this kind of issue (may be nice to
>> have that interacting with Pharo image). http://www.alluxio.org/ This
>> thing basically keeps stuff in memory in case one needs to reuse the data
>> between workload runs.
>>
>
> Sure.  We have all the facilities we need to do this.  We can add and
> remove code at runtime so we can keep live instances running, and send the
> code to them along with the data we want them to crunch.
>
>
>>
>> Or have an object memory for work and one for datasets (first one gets
>> GC'd, the other one isn't).
>>
>
> Or have policies which one can switch.  There are quite a few levers into
> the GC from the image and one can easily switch off global GC with the
> right levers.  One doesn't need a VM that doesn't contain a GC.  One needs
> an image that is using the right policy.
>

or just mark whole data (sub)graphs with some bit, telling GC to skip over
this so it won't attempt to scan it treating them as always alive..
this is where we getting back to my idea of heap spaces, where you can toss
a subgraph into a special heap space that has such policy, that it is never
scanned/GCed automatically and can be triggered only manually or something
like that.


>
> Phil
>>
>
> _,,,^..^,,,_
> best, Eliot
>



-- 
Best regards,
Igor Stasenko.

Reply via email to