Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

[email protected] Wed, 23 Nov 2016 02:42:51 -0800

On Wed, Nov 23, 2016 at 10:51 AM, Igor Stasenko <[email protected]> wrote:


>
>
> On 23 November 2016 at 10:50, [email protected] <[email protected]>
> wrote:
>
>>
>>
>> On Wed, Nov 23, 2016 at 12:53 AM, Eliot Miranda <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Tue, Nov 22, 2016 at 10:26 AM, Sven Van Caekenberghe <[email protected]>
>>> wrote:
>>>
>>>>
>>>> > On 22 Nov 2016, at 19:16, [email protected] wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko <[email protected]>
>>>> wrote:
>>>> >
>>>> >
>>>> > On 15 November 2016 at 02:18, Eliot Miranda <[email protected]>
>>>> wrote:
>>>> > Hi Phil,
>>>> >
>>>> > On Thu, Nov 10, 2016 at 2:19 AM, [email protected] <
>>>> [email protected]> wrote:
>>>> >
>>>> >
>>>> > On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <
>>>> [email protected]> wrote:
>>>> >
>>>> > 2016-11-10 9:49 GMT+01:00 [email protected] <[email protected]>:
>>>> > Ah, but then it may be more interesting to have a data image (maybe a
>>>> lot of these) and a front end image.
>>>> >
>>>> > Isn't Seamless something that could help us here? No need to bring
>>>> the data back, just manipulate it through proxies.
>>>> >
>>>> > Problem that server image will anyway perform GC. And it will be slow
>>>> if server image is big which will stop all world.
>>>> >
>>>> > What if we asked it to not do any GC at all? Like if we have tons of
>>>> RAM, why bother? Especially if what it is used to is to keep datasets: load
>>>> them, save image to disk. When needed trash the loaded stuff and reload
>>>> from zero.
>>>> >
>>>> > Basically that is what happens with Spark.
>>>> >
>>>> > http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo
>>>> > https://0x0fff.com/spark-misconceptions/
>>>> >
>>>> > While global GC may not be useful for big-data scavenging probably
>>>> will be for any non-trivial query.  But I think I see a misconception
>>>> here.  The large RAM on a multiword machine would be divided up between the
>>>> cores.  It makes no sense to run a single Smalltalk across lots of cores
>>>> (we're a long way from having a thread-safe class library).  It makes much
>>>> more sense to have one Smalltalk per core.  So that brings the heap sizes
>>>> down and makes GC less scary.
>>>> >
>>>> > yep, that approach what we're tried in HydraVM
>>>> >
>>>> >
>>>> > and Tachyon/Alluxio is kind of solving this kind of issue (may be
>>>> nice to have that interacting with Pharo image).
>>>> http://www.alluxio.org/ This thing basically keeps stuff in memory in
>>>> case one needs to reuse the data between workload runs.
>>>> >
>>>> > Sure.  We have all the facilities we need to do this.  We can add and
>>>> remove code at runtime so we can keep live instances running, and send the
>>>> code to them along with the data we want them to crunch.
>>>> >
>>>> >
>>>> > Or have an object memory for work and one for datasets (first one
>>>> gets GC'd, the other one isn't).
>>>> >
>>>> > Or have policies which one can switch.  There are quite a few levers
>>>> into the GC from the image and one can easily switch off global GC with the
>>>> right levers.  One doesn't need a VM that doesn't contain a GC.  One needs
>>>> an image that is using the right policy.
>>>> >
>>>> > or just mark whole data (sub)graphs with some bit, telling GC to skip
>>>> over this so it won't attempt to scan it treating them as always alive..
>>>> > this is where we getting back to my idea of heap spaces, where you
>>>> can toss a subgraph into a special heap space that has such policy, that it
>>>> is never scanned/GCed automatically and can be triggered only manually or
>>>> something like that.
>>>> >
>>>> > Could be very useful for all kinds of large binary data, like videos
>>>> and sounds that we can load once and keep in the heap space.
>>>> >
>>>> > How hard would it be to get something like that?
>>>>
>>>> Large binary data poses no problem (as long as it's not a copying GC).
>>>> Since a binary blob contains no subpointers, no work needs to be done. A 1M
>>>> or 1G ByteArray is the same amount of GC work.
>>>>
>>>
>>> +1
>>>
>>
>> Amen to that. But a dataset made of a gazillion of composites is not the
>> same, right?
>>
>> yep, as soon as you have references in your data, you add more work for GC
>

That's what I tought. I have seen Craig Latta marking some objects with
special flags in the object headers. Could there be some generic mechanism
there now that we have 64-bit, super large headers? Like setting/resetting
a kind of bitmask to let some spaces be GC'd or left alone? Things that we
could manage image side?

(damn, I need more money in the bank to let me work on these things for a
long stretch, it is so frustrating </end of rant>).

Phil


>
>> Phil
>>
>>>
>>> _,,,^..^,,,_
>>> best, Eliot
>>>
>>
>>
>
>
> --
> Best regards,
> Igor Stasenko.
>

Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

Reply via email to