no time to think about it but I found that interesting.
Begin forwarded message: > From: Paul Baumann <[email protected]> > Subject: [vwnc] [Tuning] Set class>>new: > Date: January 11, 2012 1:11:19 AM GMT+01:00 > To: "'VWNC'" <[email protected]> > > Inspect this code to see hashed collection instances grouped by their free > space percentage: > > ^((ObjectMemory allInstancesOfAny: Set withAllSubclasses > asArray) asOrderedCollection > reject: [:ea | ea size < 10 ]) > groupedBy: [:ea | ea capacity > / ea size - 1.0 ] > > Hashing collections in VW need to have extra space allocated to act as search > limits between logical groups. Sets performance drops there are aren't many > free slots allocated. When there are only 25% free slots allocated then the > collection grows. VW rounds up to a prime number for instance creation and > then targets 33% rounded up to a prime for growths afterwards. A target of > 33% rounded to prime skimps on memory at the expense of performance. > > As a collection grows to the minimum 25% free space goal (in #fullCheck) > there is a 75% chance of crossing into the next logical bucket before finding > a nil value. Crossing into the next logical bucket just once can double the > number of slots searched for a miss. Four logical buckets may be traversed > for each search-miss by the time the 25% minimum is reached. > > When VW skimps on allocation then it has to grow collections more frequently > and has to do more comparisons for each lookup. That is a general performance > problem, but there is another problem with how VW allocates memory for > pre-sized collections... > > When you see collections created for an expected size like these: > > anArray := Array new: 1000. > aSet := Set new: 1000. > anOc := OrderedCollection new: 1000. > aDict := Dictionary new: 1000. > > How many items would you expect each of these collections to efficiently > contain? Clearly the Array will be exactly 1000. The OrderedCollection will > take up to 1000 additions before it needs to grow. Pre-sized collections are > created to avoid incremental growth costs. Is it not common and natural to > think that the Set and Dictionary would also be created to efficiently > contain 1000 items too? Would you be surprised to learn that all subclasses > of Set in VW (and likely only VW) are inefficient with this kind of > pre-allocation when you add 1000 objects to them? > > No application code should be expected to know how Sets are implemented and > to pad sizes having knowledge of VW implementation details! Look at the > capacity variances that VW creates even though the Set has been created each > time to anticipate 1000 additions: > > (Set withAll: (Array new: 1000)) capacity > 1511 > > (Set withAll: ((1 to: 1000) collect: [:i | Object new ])) capacity > 1511 > > (Set new: 1000) capacity > 1009 > > ((Set new: 1000) addAll: ((1 to: 1000) collect: [:i | Object new ]); > yourself) capacity > 2027 > > ((Set new) addAll: ((1 to: 1000) collect: [:i | Object new ]); yourself) > capacity > 1361 > > The first four should all have the same result. The last one can be different > because it is not pre-sized. The third example shows that only 0.9% free > space is allocated in addition to the requested size. The fourth example > shows that the under-allocation caused a growth that then over-allocated to > end at 102.7% free space. The costs of the third and fourth example can be > seen in more detail with this test: > > | objects set timings cap0 t0 t1 | > objects := (1 to: ObjectMemory maximumIdentityHashValue) collect: [:i | > Object new]. > timings := OrderedCollection new: objects size. > t0 := t1 := Time microsecondClock. > set := Set new: objects size. > cap0 := set capacity. > objects size to: 1 by: -1 do: [ :i | > set add: (objects at: i). > i \\ 1000 = 1 ifTrue: [timings add: (t1 negated + (t1 :=Time > microsecondClock))]. > ]. > (Array new: 5) > at: 1 put: cap0; "Starting capacity" > at: 2 put: set capacity; "Ending capacity" > at: 3 put: (1 - (set capacity / set size) * -100.0); > "Percentage of free slots" > at: 4 put: timings asArray; "Timings of each 1000" > at: 5 put: t1 - t0; "Total time" > yourself > > #(17417 34939 113.264 #(147 264 262 261 262 262 265 261 261 260 261 260 260 > 2474 184 184 184) 6312) "primes" > > A difference between the first and second numbers indicates growth costs were > incurred while adding. The third number shows there were 1.13 free slots > allocated for each logical bucket. The last number is total execution time. > Notice that performance of each group improves 41% just after the collection > grows. > > Load a package named ICXVwHashTuning from the Cincom Public Code Repository > and try the test again. > > #(28670 28670 74.9985 #(104 182 179 179 180 263 180 183 185 184 190 189 188 > 188 189 254 188) 3205) "After loading ICXVwHashTuning" > > Total execution time was cut by 49% while consuming 18% less memory and > ending with a (very efficient) 75% free slots. > > When application code says "Set new: 1000" then they mean that they want a > Set instance that should expect to contain 1000 additions. It is the > responsibility of the Set to know how much physical space should be allocated > to contain 1000 items. These are simple changes that make a broad > performance improvement while also having intuitive performance. Cincom > should take notice of opportunities like this but at least you can improve > the performance of your existing code if they don't. > > You might want to also execute "(ObjectMemory allInstancesOfAny: Set > withAllSubclasses asArray) do: [:ea | ea trim ]." to optimize free space > allocation for all the hashed instances. You may notice a performance boost > from doing that once after loading ICXVwHashTuning. > > Paul Baumann > > > This message may contain confidential information and is intended for > specific recipients unless explicitly noted otherwise. If you have reason to > believe you are not an intended recipient of this message, please delete it > and notify the sender. This message may not represent the opinion of > IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and > does not constitute a contract or guarantee. Unencrypted electronic mail is > not secure and the recipient of this message is expected to provide > safeguards from viruses and pursue alternate means of communication where > privacy or a binding message is desired. > _______________________________________________ > vwnc mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
