> On 22 Nov 2016, at 19:16, [email protected] wrote: > > > > On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko <[email protected]> wrote: > > > On 15 November 2016 at 02:18, Eliot Miranda <[email protected]> wrote: > Hi Phil, > > On Thu, Nov 10, 2016 at 2:19 AM, [email protected] <[email protected]> > wrote: > > > On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <[email protected]> > wrote: > > 2016-11-10 9:49 GMT+01:00 [email protected] <[email protected]>: > Ah, but then it may be more interesting to have a data image (maybe a lot of > these) and a front end image. > > Isn't Seamless something that could help us here? No need to bring the data > back, just manipulate it through proxies. > > Problem that server image will anyway perform GC. And it will be slow if > server image is big which will stop all world. > > What if we asked it to not do any GC at all? Like if we have tons of RAM, why > bother? Especially if what it is used to is to keep datasets: load them, save > image to disk. When needed trash the loaded stuff and reload from zero. > > Basically that is what happens with Spark. > > http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo > https://0x0fff.com/spark-misconceptions/ > > While global GC may not be useful for big-data scavenging probably will be > for any non-trivial query. But I think I see a misconception here. The > large RAM on a multiword machine would be divided up between the cores. It > makes no sense to run a single Smalltalk across lots of cores (we're a long > way from having a thread-safe class library). It makes much more sense to > have one Smalltalk per core. So that brings the heap sizes down and makes GC > less scary. > > yep, that approach what we're tried in HydraVM > > > and Tachyon/Alluxio is kind of solving this kind of issue (may be nice to > have that interacting with Pharo image). http://www.alluxio.org/ This thing > basically keeps stuff in memory in case one needs to reuse the data between > workload runs. > > Sure. We have all the facilities we need to do this. We can add and remove > code at runtime so we can keep live instances running, and send the code to > them along with the data we want them to crunch. > > > Or have an object memory for work and one for datasets (first one gets GC'd, > the other one isn't). > > Or have policies which one can switch. There are quite a few levers into the > GC from the image and one can easily switch off global GC with the right > levers. One doesn't need a VM that doesn't contain a GC. One needs an image > that is using the right policy. > > or just mark whole data (sub)graphs with some bit, telling GC to skip over > this so it won't attempt to scan it treating them as always alive.. > this is where we getting back to my idea of heap spaces, where you can toss a > subgraph into a special heap space that has such policy, that it is never > scanned/GCed automatically and can be triggered only manually or something > like that. > > Could be very useful for all kinds of large binary data, like videos and > sounds that we can load once and keep in the heap space. > > How hard would it be to get something like that?
Large binary data poses no problem (as long as it's not a copying GC). Since a binary blob contains no subpointers, no work needs to be done. A 1M or 1G ByteArray is the same amount of GC work. > Phil > > > > Phil > > _,,,^..^,,,_ > best, Eliot > > > > -- > Best regards, > Igor Stasenko.
