Great results, Marcus. > On 29 Jan 2018, at 09:18, Marcus Denker <[email protected]> wrote: > > Right now #embeddSourceInTrailer encoded and decodes every method to utf8. > This is fairly slow. > > We do not need to actually use utf8, the only thing important is that we > interpret the bits correctly when we decode (wide string or not?). > As a first step we then can even just utf8 encode the widestrings, there are > not many in the image.
As a speedup it is certainly a good strategy to encode ByteStrings into Latin1 ByteStrings, since this is a no-op. But I would always encode WideStrings as UTF-8 since that is a much more efficient, variable length encoding. Storing a WideStrings as 32-bit characters would be quite wasteful. Intuitively it feels like a simple compression scheme with a shared dictionary of a couple of thousand of the most common substrings in method source code would be able to compress sources quite a bit. Such compression would not break literal searching. > Benchmark: > [SystemNavigation default browseMethodsWithSourceString: 'Method source with > it' matchCase: true] timeToRun. > > sources from Disk: > "0:00:00:02.103" "0:00:00:02.265" > > CompiledMethod allInstances do: #embeddSourceInTrailer. > > "embedded sources" > "0:00:00:02.442" "0:00:00:02.924" > > embedded after speed improvements (already in Pharo7): > "0:00:00:01.273" "0:00:00:01.345" > > (fun thing a lot of that time is spend in the progress bar... without that > and not sorting twice, we are at ~0.8 seconds for full image search over all > sourceā¦) > > Interesting: > > [Smalltalk compiler evaluate: '3+4'] bench > > before: "'4,673 per second'" > after: "'6,008 per second'" > > This is because we embedd sources of Doit Methods so we can debug DoIts > without the need to decompile. After thinking about it, > I think the embedding for Doit was wrong. As the AST for Doits is transformed > (return added, temp access via arg for DoItIn:), we > have to pretty-print the AST, which is not a good idea as eval speed should > be optimised for. > > We just gain temp names, which we can do differently if we really need them. > -> we should just rely on the decompiler. I wild that change later this week. > > Marcus
