Re: Garbage collection (crashing on Windows)

Ben Rubinstein Mon, 22 Aug 2016 05:56:26 -0700

Mark,

Thanks so much for this detailed and very useful response. A few quickfollow-up questions.

1)

>https://blogs.technet.microsoft.com/askperf/2007/03/23/memory-management-demystifying-3gb/This looks like a great tip - before I go into the ring with the client's ITdept (always a tricky exercise) can I just check that LiveCode does have theIMAGE_FILE_LARGE_ADDRESS_AWARE flag set in the image header as described inthat article? And do you know from which version that is true?



(not a question)
> We actually changed this mechanism to make it less conservative in
> 6.7.11, 7.1.4 and 8.0 onwards. Previously, deleted objects wouldn't get
> actually freed until the root event loop runs

Aha! Sadly I just updated the app to 6.7.11 without improving the situation(and on an experimental only basis, to 8.1 also, similarly without improvement).

2)

LiveCode doesn't use what is generally referred to as 'garbage collection'
as it generally frees 'things' up as soon as they are no longer referenced.

But does 'freed' literally release the memory, or just mark the object asavailable? Surely you still need to do some kind of garbage collection inorder to collapse what may be isolated fragments of 'free' memory?


3)
In another thread (20/08/2016 19:44) you just wrote:

> For optimization purposes the best approach is to measure the amount ofmemory *actually* in use before and after any particular operation you perform- just as you do with time when profiling for speed (rather than memoryfootprint).

How can we do this? As noted hasMemory is defunct; heapSpace is Mac only; isthere a method I can use to profile the memory usage?



Many thanks,

Ben

On 19/08/2016 18:42, Mark Waddingham wrote:

Hi Ben,

When I got to the end of this email I remembered something quite pertinent -
you mentioned that the limit you were hitting was 2Gb... One thing to check is
that the install of Windows you are running on cannot be poked to actually
raise this limit to 3Gb:

https://blogs.technet.microsoft.com/askperf/2007/03/23/memory-management-demystifying-3gb/


Perhaps other's with more insider Windows knowledge can chip in there. It will
depend on the machine, the version of Windows and probably lots of other
factors. Given that 'hardware is cheap' compared to rewriting software - if
the windows install being used currently does not use that 'trick', and can
be, you'll probably find you get a fair bit of mileage with a bit of computer
configuration - rather than coding!

Assuming that cannot be done then...

On 2016-08-17 19:52, Ben Rubinstein wrote:

Please refresh my memory: is there any way to cause/allow garbage to
be collected without ending all script running?


LiveCode doesn't use what is generally referred to as 'garbage collection' as
it generally frees 'things' up as soon as they are no longer referenced. Now I
say 'generally' because things fall into two classes:

   1) Values (strings, arrays, data, numbers)

   2) Objects (stacks, cards, buttons etc.)

I'll deal with Objects first:

Objects are deleted as soon as they can be relative to the requirements of the
engine. We actually changed this mechanism to make it less conservative in
6.7.11, 7.1.4 and 8.0 onwards. Previously, deleted objects wouldn't get
actually freed until the root event loop runs (i.e. when there is no script
running); now they will generally get freed much closer to when they are
deleted, especially if they were created 'at the same level or above' where
the object is deleted. e.g.

   on foo
     create control bar
     delete control bar
   end foo

Here the delete will free the object immediately (as the engine knows that it
cannot have any internal references to it internally - in particular on the C
stack).

It sounds like the problem you are having (assuming you aren't creating and
deleting lots of controls) is to do with values and so...

Values are freed *as soon as* there is no longer any reference to them. In 6.7
and before that would be whenever a variable is changed (the old value was
released immediately), or whenever the variable goes out of scope (e.g. locals
in a handler get released when the handler ends, script locals are released
when the object is deleted). In 7.0+ this happens as soon as there are no
variables referencing the same instance of the value. e.g.

  (1) local tVariable1, tVariable2
  (2) put "foo" & "bar" into tVariable1
  (3) put tVariable1 into tVariable2
  (4) put empty into tVariable1

After step (3), tVariable1 and tVariable2 will reference the same value. At
step (4) the reference tVariable1 holds will be removed, but the value will
not be deleted (from memory) until tVariable2 changes, or goes out of scope.
The general mechanism is that values are shared when copied into different
variables, and are only copied when a variable is mutated. e.g.

  (1) local tVariable1, tVariable2
  (2) put "foo" & "bar" into tVariable1
  (3) put tVariable1 into tVariable2
  (4) put "baz" after tVariable2
  (5) put empty into tVariable1

Here, at step (4), the value referenced by tVariable2 will be copied (and so
tVariable1 and tVariable2 will no longer reference the same value), and then
changed. This means that at step (5) the value previously referenced by
tVariable1 *will* be freed, because it is not shared with tVariable2
(obviously - because tVariable2 is no longer the same value!).

The reason I was being so paedagogic in the above is that it opens an
opportunity for you to potentially reduce the memory footprint of your dataset
(which sounds like it is what is causing the problem) by doing some
pre-processing and exploiting the fact that values are not copied until they
are modified. Of course, I don't know what the structure of the data you are
processing is - so I'm going to assume you are loading in lots of text files
and breaking them up into pieces, presumably storing in arrays with the
individual array elements being numbers and strings.

In this case there are a few interesting things to note about the engine's
implementation of values...

Array keys are *always* shared (up to case). When you do:

   put tElement into tArray[tKey]

The engine first 'uniques' tKey - this means it ensures that there is only one
copy of tKey (up to case differences) in memory. So - for every single array
in memory which contains a key "foo", the value representing the key "foo"
will not be copied, just referenced from all the arrays. Note that "foo" and
"Foo", whilst referencing the same value (unless caseSensitive is true), will
be stored in memory as different values which leads to memory optimization tip 
1:

   When constructing arrays from external data, where the case of the key is
irrelevant use:
     put X into tArray[toLower(Y)] -- or toUpper (whichever you prefer)

For the values bound to by keys, the story is different. If you do:

   put myString & "1" into tArray["foo"]
   put myString & "1  into tArray["bar"]

Then the two values of the keys "foo" and "bar" *will be different*. This is
because they have been constructed differently.

You can optimize this for memory size by using another array to 'index' your
string values:

   command shareAndStoreKey @xArray, pKey, pValue
     set the caseSensitive to true -- this is assuming your values are
sensitive to case
     if pValue is not among the keys of sValueCache then
         put pValue into sValueCache[pValue]
     end if
     put sValueCache[pValue] into xArray[pKey]
   end command

After you have processed all your arrays like this, and 'put empty into
sValueCache' - all string elements in your arrays which are case-sensitively
the same will share the same value.

Of course, you can play the same trick with arrays - although it is a little
more tricky, admittedly.

So, anyway, before anyone asks 'why doesn't the engine just do this?'
(particularly since it does so for array keys) then the answer is performance.
It is costly to work out which values (which are computed dynamically, or are
substrings of another string in different places) are actually the same - thus
you'd end up saving memory but costing performance if the engine uniqued
*everything*.

So, the next question is probably going to be, 'why does the engine do it for
array keys then?' and the answer here is because string comparison is slow -
case-less string comparison more so. When you lookup a key in an associative
array, it might well take multiple string comparisons to find. By 'uniquing'
the strings used in array keys, after the engine has processed the lookup
request it is a constant time operation to do each of these comparisons to
find the actual element you want. On balance, this means you save time -
assuming that you are accessing your arrays much more frequently than building
them - which is usually the case.

Now, all the above I say with caution - the engine may change how it works in
the future. It might become more 'clever' in some cases, and less 'clever' in
others; thus you should only go as far to try and optimize your code for
memory footprint (if you can afford the cost of the pre-processing) if YOU
REALLY NEED TO.

Clearly, in your (Ben's) case you really do - you are hitting the windows 2Gb
process limit at the moment, and it sounds like it is a batch process running
unattended so an initial 'memory miminization process' run on the dataset is
probably a cost you can afford to pay.

Anyway, without more details of what you are needing to do the above might be
completely useless...

Just my 2 pence.

Warmest Regards,

Mark.



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Garbage collection (crashing on Windows)

Reply via email to