On Wed, Nov 23, 2016 at 10:51 AM, Igor Stasenko <[email protected]> wrote:
> > > On 23 November 2016 at 10:50, [email protected] <[email protected]> > wrote: > >> >> >> On Wed, Nov 23, 2016 at 12:53 AM, Eliot Miranda <[email protected]> >> wrote: >> >>> >>> >>> On Tue, Nov 22, 2016 at 10:26 AM, Sven Van Caekenberghe <[email protected]> >>> wrote: >>> >>>> >>>> > On 22 Nov 2016, at 19:16, [email protected] wrote: >>>> > >>>> > >>>> > >>>> > On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko <[email protected]> >>>> wrote: >>>> > >>>> > >>>> > On 15 November 2016 at 02:18, Eliot Miranda <[email protected]> >>>> wrote: >>>> > Hi Phil, >>>> > >>>> > On Thu, Nov 10, 2016 at 2:19 AM, [email protected] < >>>> [email protected]> wrote: >>>> > >>>> > >>>> > On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov < >>>> [email protected]> wrote: >>>> > >>>> > 2016-11-10 9:49 GMT+01:00 [email protected] <[email protected]>: >>>> > Ah, but then it may be more interesting to have a data image (maybe a >>>> lot of these) and a front end image. >>>> > >>>> > Isn't Seamless something that could help us here? No need to bring >>>> the data back, just manipulate it through proxies. >>>> > >>>> > Problem that server image will anyway perform GC. And it will be slow >>>> if server image is big which will stop all world. >>>> > >>>> > What if we asked it to not do any GC at all? Like if we have tons of >>>> RAM, why bother? Especially if what it is used to is to keep datasets: load >>>> them, save image to disk. When needed trash the loaded stuff and reload >>>> from zero. >>>> > >>>> > Basically that is what happens with Spark. >>>> > >>>> > http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo >>>> > https://0x0fff.com/spark-misconceptions/ >>>> > >>>> > While global GC may not be useful for big-data scavenging probably >>>> will be for any non-trivial query. But I think I see a misconception >>>> here. The large RAM on a multiword machine would be divided up between the >>>> cores. It makes no sense to run a single Smalltalk across lots of cores >>>> (we're a long way from having a thread-safe class library). It makes much >>>> more sense to have one Smalltalk per core. So that brings the heap sizes >>>> down and makes GC less scary. >>>> > >>>> > yep, that approach what we're tried in HydraVM >>>> > >>>> > >>>> > and Tachyon/Alluxio is kind of solving this kind of issue (may be >>>> nice to have that interacting with Pharo image). >>>> http://www.alluxio.org/ This thing basically keeps stuff in memory in >>>> case one needs to reuse the data between workload runs. >>>> > >>>> > Sure. We have all the facilities we need to do this. We can add and >>>> remove code at runtime so we can keep live instances running, and send the >>>> code to them along with the data we want them to crunch. >>>> > >>>> > >>>> > Or have an object memory for work and one for datasets (first one >>>> gets GC'd, the other one isn't). >>>> > >>>> > Or have policies which one can switch. There are quite a few levers >>>> into the GC from the image and one can easily switch off global GC with the >>>> right levers. One doesn't need a VM that doesn't contain a GC. One needs >>>> an image that is using the right policy. >>>> > >>>> > or just mark whole data (sub)graphs with some bit, telling GC to skip >>>> over this so it won't attempt to scan it treating them as always alive.. >>>> > this is where we getting back to my idea of heap spaces, where you >>>> can toss a subgraph into a special heap space that has such policy, that it >>>> is never scanned/GCed automatically and can be triggered only manually or >>>> something like that. >>>> > >>>> > Could be very useful for all kinds of large binary data, like videos >>>> and sounds that we can load once and keep in the heap space. >>>> > >>>> > How hard would it be to get something like that? >>>> >>>> Large binary data poses no problem (as long as it's not a copying GC). >>>> Since a binary blob contains no subpointers, no work needs to be done. A 1M >>>> or 1G ByteArray is the same amount of GC work. >>>> >>> >>> +1 >>> >> >> Amen to that. But a dataset made of a gazillion of composites is not the >> same, right? >> >> yep, as soon as you have references in your data, you add more work for GC > That's what I tought. I have seen Craig Latta marking some objects with special flags in the object headers. Could there be some generic mechanism there now that we have 64-bit, super large headers? Like setting/resetting a kind of bitmask to let some spaces be GC'd or left alone? Things that we could manage image side? (damn, I need more money in the bank to let me work on these things for a long stretch, it is so frustrating </end of rant>). Phil > >> Phil >> >>> >>> _,,,^..^,,,_ >>> best, Eliot >>> >> >> > > > -- > Best regards, > Igor Stasenko. >
