I could reproduce with Squeak 5 and Squeak 4.6 so I was able to use the VM
profiler.

I used:

| prototype |
prototype := BorderedMorph new.
10000
timesRepeat: [prototype veryDeepCopy]

The profiler ran for ~0.8 second. Maybe you can have more accurate result
with longer bench but it takes already a while to display results.

*In Spur:*

1044 samples in the VM (1057 samples in the entire program)  98.77% of total

754 samples in generated vm code 72.22% of entire vm (71.33% of total)
290 samples in vanilla vm code 27.78% of entire vm (27.44% of total)

% of generated vm code (% of total) (samples) (cumulative)
37.67%    (26.87%) Dictionary>>associationsDo: (284) (37.67%)
30.37%    (21.67%) Object>>at: (229) (68.04%)
  3.18%    (  2.27%) WeakKeyDictionary>>associationsDo: (24) (71.22%)
  2.12%    (  1.51%) MethodDictionary>>scanFor: (16) (73.34%)
  1.59%    (  1.14%) SmallInteger>>\\ (12) (74.93%)

% of vanilla vm code (% of total) (samples) (cumulative)
58.62%    (16.08%) scavengeReferentsOf (170) (58.62%)
24.48%    (  6.72%) instantiateClassindexableSize (71) (83.10%)
  2.76%    (  0.76%) copyAndForward (8) (85.86%)
  2.07%    (  0.57%) doScavenge (6) (87.93%)
  2.07%    (  0.57%) lookupOrdinaryNoMNUEtcInClass (6) (90.00%)

13 samples in the rest    1.23% of total

% of rest (% of total) (samples) (cumulative)
53.85%    (  0.66%) SCDynamicStoreCreate (7) (53.85%)
46.15%    (  0.57%) _platform_memmove$VARIANT$sse42 (6) (100.0%)
46.15%    (  0.57%) OSAtomicAdd32Barrier (6) (146.15%)

**GCs**
full 0 totalling 0ms (0% elapsed time)
scavenges 258 totalling 151ms (18.97% elapsed time), avg 0.585ms

*In SqueakV3:*

1189 samples in the VM (1196 samples in the entire program)  99.41% of total

998 samples in generated vm code 83.94% of entire vm (83.44% of total)
191 samples in vanilla vm code 16.06% of entire vm (15.97% of total)

% of generated vm code (% of total) (samples) (cumulative)
43.19%    (36.04%) Object>>at: (431) (43.19%)
35.07%    (29.26%) Dictionary>>associationsDo: (350) (78.26%)
  1.90%    (  1.59%) WeakKeyDictionary>>associationsDo: (19) (80.16%)
  1.30%    (  1.09%) Object>>veryDeepCopyWith: (13) (81.46%)
  1.20%    (  1.00%) BlockClosure>>value (12) (82.67%)

% of vanilla vm code (% of total) (samples) (cumulative)
30.37%    (  4.85%) instantiateClassindexableSize (58) (30.37%)
13.61%    (  2.17%) incrementalGC (26) (43.98%)
  9.42%    (  1.51%) markAndTrace (18) (53.40%)
  8.38%    (  1.34%) marryFrameSP (16) (61.78%)
  6.81%    (  1.09%) eeInstantiateSmallClassnumSlots (13) (68.59%)
  5.24%    (  0.84%) createClosureNumArgsnumCopiedstartpc (10) (73.82%)
  4.71%    (  0.75%) mapPointersInObjectsFromto (9) (78.53%)

**GCs**
full 0 totalling 0ms (0% elapsed time)
incr 270 totalling 46ms (5.094% elapsed time), avg 0.17ms

*Conclusion ?*

It feels that the main issue is with the scavenges that take overall more
time in Spur in this case. Especially, the method scavengeReferentsOf seems
to be guilty and may be worth looking at.

There are other details, like now a little bit of execution time seems to
be used to allocate new memory segments and associationsDo: looks slower,
likely because of == and forwarders, but this is not as important.

What do you think Eliot ?


2015-12-19 11:06 GMT+01:00 Denis Kudriashov <[email protected]>:

> Hi
>
> 2015-12-19 4:18 GMT+01:00 Eliot Miranda <[email protected]>:
>
>> and used in some copying code to access indexable fields.  Chris Muller
>> found this in some of his Magma code and so we added a pair of new
>> primitives, primitiveSlotAt and primitiveSlotAtPut that access any slot
>> within the sequence of named followed by indexable inst vars, avoiding the
>> primitive failure. So in Squeak Spur instVarAt: now reads:
>>
>> instVarAt: index
>> "Primitive. Answer a fixed variable in an object. The numbering of the
>> variables
>> corresponds to the named instance variables, followed by the indexed
>> instance
>> variables. Fail if the index is not an Integer or is not the index of a
>> fixed variable.
>> Essential. See Object documentation whatIsAPrimitive."
>>
>> <primitive: 173 error: ec>
>> self primitiveFailed
>>
>
> Same code used in Pharo too.
>
> I try same test on latest squeak images:
>
> Cog: '10,100 per second. 98.8 microseconds per run.'
> Spur:  '7,610 per second. 131 microseconds per run.'
>

Reply via email to