On 23 November 2016 at 12:41, [email protected] <[email protected]> wrote:

>
>
> On Wed, Nov 23, 2016 at 10:51 AM, Igor Stasenko <[email protected]>
> wrote:
>
>>
>>
>> On 23 November 2016 at 10:50, [email protected] <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Wed, Nov 23, 2016 at 12:53 AM, Eliot Miranda <[email protected]
>>> > wrote:
>>>
>>>>
>>>>
>>>> On Tue, Nov 22, 2016 at 10:26 AM, Sven Van Caekenberghe <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> > On 22 Nov 2016, at 19:16, [email protected] wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> > On 15 November 2016 at 02:18, Eliot Miranda <[email protected]>
>>>>> wrote:
>>>>> > Hi Phil,
>>>>> >
>>>>> > On Thu, Nov 10, 2016 at 2:19 AM, [email protected] <
>>>>> [email protected]> wrote:
>>>>> >
>>>>> >
>>>>> > On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <
>>>>> [email protected]> wrote:
>>>>> >
>>>>> > 2016-11-10 9:49 GMT+01:00 [email protected] <[email protected]>:
>>>>> > Ah, but then it may be more interesting to have a data image (maybe
>>>>> a lot of these) and a front end image.
>>>>> >
>>>>> > Isn't Seamless something that could help us here? No need to bring
>>>>> the data back, just manipulate it through proxies.
>>>>> >
>>>>> > Problem that server image will anyway perform GC. And it will be
>>>>> slow if server image is big which will stop all world.
>>>>> >
>>>>> > What if we asked it to not do any GC at all? Like if we have tons of
>>>>> RAM, why bother? Especially if what it is used to is to keep datasets: 
>>>>> load
>>>>> them, save image to disk. When needed trash the loaded stuff and reload
>>>>> from zero.
>>>>> >
>>>>> > Basically that is what happens with Spark.
>>>>> >
>>>>> > http://sujee.net/2015/01/22/understanding-spark-caching/#.WC
>>>>> RIgy0rKpo
>>>>> > https://0x0fff.com/spark-misconceptions/
>>>>> >
>>>>> > While global GC may not be useful for big-data scavenging probably
>>>>> will be for any non-trivial query.  But I think I see a misconception
>>>>> here.  The large RAM on a multiword machine would be divided up between 
>>>>> the
>>>>> cores.  It makes no sense to run a single Smalltalk across lots of cores
>>>>> (we're a long way from having a thread-safe class library).  It makes much
>>>>> more sense to have one Smalltalk per core.  So that brings the heap sizes
>>>>> down and makes GC less scary.
>>>>> >
>>>>> > yep, that approach what we're tried in HydraVM
>>>>> >
>>>>> >
>>>>> > and Tachyon/Alluxio is kind of solving this kind of issue (may be
>>>>> nice to have that interacting with Pharo image).
>>>>> http://www.alluxio.org/ This thing basically keeps stuff in memory in
>>>>> case one needs to reuse the data between workload runs.
>>>>> >
>>>>> > Sure.  We have all the facilities we need to do this.  We can add
>>>>> and remove code at runtime so we can keep live instances running, and send
>>>>> the code to them along with the data we want them to crunch.
>>>>> >
>>>>> >
>>>>> > Or have an object memory for work and one for datasets (first one
>>>>> gets GC'd, the other one isn't).
>>>>> >
>>>>> > Or have policies which one can switch.  There are quite a few levers
>>>>> into the GC from the image and one can easily switch off global GC with 
>>>>> the
>>>>> right levers.  One doesn't need a VM that doesn't contain a GC.  One needs
>>>>> an image that is using the right policy.
>>>>> >
>>>>> > or just mark whole data (sub)graphs with some bit, telling GC to
>>>>> skip over this so it won't attempt to scan it treating them as always
>>>>> alive..
>>>>> > this is where we getting back to my idea of heap spaces, where you
>>>>> can toss a subgraph into a special heap space that has such policy, that 
>>>>> it
>>>>> is never scanned/GCed automatically and can be triggered only manually or
>>>>> something like that.
>>>>> >
>>>>> > Could be very useful for all kinds of large binary data, like videos
>>>>> and sounds that we can load once and keep in the heap space.
>>>>> >
>>>>> > How hard would it be to get something like that?
>>>>>
>>>>> Large binary data poses no problem (as long as it's not a copying GC).
>>>>> Since a binary blob contains no subpointers, no work needs to be done. A 
>>>>> 1M
>>>>> or 1G ByteArray is the same amount of GC work.
>>>>>
>>>>
>>>> +1
>>>>
>>>
>>> Amen to that. But a dataset made of a gazillion of composites is not the
>>> same, right?
>>>
>>> yep, as soon as you have references in your data, you add more work for
>> GC
>>
>
> That's what I tought. I have seen Craig Latta marking some objects with
> special flags in the object headers. Could there be some generic mechanism
> there now that we have 64-bit, super large headers? Like setting/resetting
> a kind of bitmask to let some spaces be GC'd or left alone? Things that we
> could manage image side?
>
> well, adding bit(s) is just a simplest part of story. the main one is
implement GC discipline to not walk over marked object(s), but as well, is
by having a mechanism to ensure that marked object(s) form a closed
subgraph (i.e. there's no references coming outside of it)
scanning+marking a graph is usually a simple matter, you just need to
provide a root(s). I had experiments with it in HydraVM, with a process we
called mytosis - but it has slightly different purpose:
- i implemented two primitives, the one that scans graph and reports if it
fully isolated
and another one is to basically clone the graph into separate memory region
to start it as an image in own thread etc.
But in our scenario, i imagine, that you cannot fully avoid external
references - the most obvious one is instance->class references. In that
case, we need some kind of mechanism to ensure that class objects that
referenced by object(s) in desired data set are kept in a system as long as
our blob is unchanges. That could be solved by simply declaring a 'fixed'
set of external references per a subgraph which live as a normal object(s)
in a system, with only exception, like i mentioned, that it need to ensured
it won't be GCed, or even better won't be moved as long as our isolated
graph is in use.
Then the only what is left is to set the whole graph into read-only mode
and you're ready to go..
And then, as you can imagine, having such mechanism opens even more
interesting opportunities, like offloading graph on disk and/or (re)loading
it on demand etc. Which is closely related to my flame-topic in this thread
:)
But the point is, that identifying subgraph(s) and designating it cannot be
automated - this will always be a responsibility of user(s), because only
user knows best, what he wants to be used as a static data and what are not
etc etc.


> (damn, I need more money in the bank to let me work on these things for a
> long stretch, it is so frustrating </end of rant>).
>
> Phil
>
>
>>
>>> Phil
>>>
>>>>
>>>> _,,,^..^,,,_
>>>> best, Eliot
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko.
>>
>
>


-- 
Best regards,
Igor Stasenko.

Reply via email to