Re: [Pharo-dev] Speed up #embeddSourceInTrailer read and write

Marcus Denker Mon, 29 Jan 2018 01:16:17 -0800


> On 29 Jan 2018, at 09:59, Sven Van Caekenberghe <[email protected]> wrote:
> 
> Great results, Marcus.
> 
>> On 29 Jan 2018, at 09:18, Marcus Denker <[email protected]> wrote:
>> 
>> Right now #embeddSourceInTrailer encoded and decodes every method to utf8. 
>> This is fairly slow.
>> 
>> We do not need to actually use utf8, the only thing important is that we 
>> interpret the bits correctly when we decode (wide string or not?).
>> As a first step we then can even just utf8 encode the widestrings, there are 
>> not many in the image.
> 
> As a speedup it is certainly a good strategy to encode ByteStrings into 
> Latin1 ByteStrings, since this is a no-op. But I would always encode 
> WideStrings as UTF-8 since that is a much more efficient, variable length 
> encoding. Storing a WideStrings as 32-bit characters would be quite wasteful.
> 
> Intuitively it feels like a simple compression scheme with a shared 
> dictionary of a couple of thousand of the most common substrings in method 
> source code would be able to compress sources quite a bit. Such compression 
> would not break literal searching.



Yes, and for real search speed we could look again into indexing… It should be 
possible to build a search index on demand before the first search and
cache it (so it would never be saved in the image and never waste memory in 
deployment).

With the we could be even get real time full text search. That is, it would be 
faster then “senders of” is now.

        Marcus

Re: [Pharo-dev] Speed up #embeddSourceInTrailer read and write

Reply via email to