> On 22 Nov 2016, at 19:16, [email protected] wrote:
> 
> 
> 
> On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko <[email protected]> wrote:
> 
> 
> On 15 November 2016 at 02:18, Eliot Miranda <[email protected]> wrote:
> Hi Phil,
> 
> On Thu, Nov 10, 2016 at 2:19 AM, [email protected] <[email protected]> 
> wrote:
> 
> 
> On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <[email protected]> 
> wrote:
> 
> 2016-11-10 9:49 GMT+01:00 [email protected] <[email protected]>:
> Ah, but then it may be more interesting to have a data image (maybe a lot of 
> these) and a front end image.
> 
> Isn't Seamless something that could help us here? No need to bring the data 
> back, just manipulate it through proxies.
> 
> Problem that server image will anyway perform GC. And it will be slow if 
> server image is big which will stop all world.
> 
> What if we asked it to not do any GC at all? Like if we have tons of RAM, why 
> bother? Especially if what it is used to is to keep datasets: load them, save 
> image to disk. When needed trash the loaded stuff and reload from zero.
> 
> Basically that is what happens with Spark.
> 
> http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo
> https://0x0fff.com/spark-misconceptions/
> 
> While global GC may not be useful for big-data scavenging probably will be 
> for any non-trivial query.  But I think I see a misconception here.  The 
> large RAM on a multiword machine would be divided up between the cores.  It 
> makes no sense to run a single Smalltalk across lots of cores (we're a long 
> way from having a thread-safe class library).  It makes much more sense to 
> have one Smalltalk per core.  So that brings the heap sizes down and makes GC 
> less scary.
> 
> yep, that approach what we're tried in HydraVM 
>  
>  
> and Tachyon/Alluxio is kind of solving this kind of issue (may be nice to 
> have that interacting with Pharo image). http://www.alluxio.org/ This thing 
> basically keeps stuff in memory in case one needs to reuse the data between 
> workload runs.
> 
> Sure.  We have all the facilities we need to do this.  We can add and remove 
> code at runtime so we can keep live instances running, and send the code to 
> them along with the data we want them to crunch.
>  
> 
> Or have an object memory for work and one for datasets (first one gets GC'd, 
> the other one isn't).
> 
> Or have policies which one can switch.  There are quite a few levers into the 
> GC from the image and one can easily switch off global GC with the right 
> levers.  One doesn't need a VM that doesn't contain a GC.  One needs an image 
> that is using the right policy.
> 
> or just mark whole data (sub)graphs with some bit, telling GC to skip over 
> this so it won't attempt to scan it treating them as always alive..
> this is where we getting back to my idea of heap spaces, where you can toss a 
> subgraph into a special heap space that has such policy, that it is never 
> scanned/GCed automatically and can be triggered only manually or something 
> like that.
> 
> Could be very useful for all kinds of large binary data, like videos and 
> sounds that we can load once and keep in the heap space.
> 
> How hard would it be to get something like that?

Large binary data poses no problem (as long as it's not a copying GC). Since a 
binary blob contains no subpointers, no work needs to be done. A 1M or 1G 
ByteArray is the same amount of GC work.

> Phil
>  
>  
> 
> Phil
> 
> _,,,^..^,,,_
> best, Eliot
> 
> 
> 
> -- 
> Best regards,
> Igor Stasenko.


Reply via email to