Nadav, can you clarify what we’re really trying to accomplish here?  "Smaller 
binaries” isn’t too important of a goal in and of itself.

Are we trying to:
– reduce storage used on disk
– reduce load time
– reduce loaded memory footprint
– make emitting swift binaries more efficient
– something else?

Yes, I know, “all of the above”, but understanding something about what’s most 
important would help evaluate the proposal.

It’s also worth keeping in mind that iOS and OS X have been aggressively 
adopting pervasive system-wide compression both on disk and in memory.  This 
trend will continue, and it makes it quite a bit less important for individual 
components to explicitly adopt compression techniques themselves, except in 
cases where there’s a lot of special structure that those components can 
leverage to get better compression than a general-purpose lossless compressor 
can manage (images and sound are the two obvious examples of this, but also 
cases like huge arrays of floating-point data where the low-order bits don’t 
matter, etc).  Linux hasn’t been as aggressive about doing this yet, but 
pervasive system-level compression is The Future.

– Steve

> On Dec 20, 2015, at 5:17 AM, Dmitri Gribenko <griboz...@gmail.com> wrote:
> 
> + Stephen Canon, because he probably has good ideas in this domain.
> 
> On Fri, Dec 18, 2015 at 3:42 PM, Nadav Rotem via swift-dev 
> <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:
> 
> What’s next?
> 
> The small experiment I described above showed that compressing the names in 
> the string table has a huge potential for reducing the size of swift 
> binaries. I’d like for us (swift-developers) to talk about the implications 
> of this change and start working on the two tasks of tightening our existing 
> mangling format and on implementing a new compression layer on top. 
> 
> Hi Nadav,
> 
> This is a great start that shows that there is a potential for improvement in 
> our mangled names!
> 
> To make this effort more visible, I would suggest creating a bug on 
> https://bugs.swift.org/ <https://bugs.swift.org/> .
> 
> I think we survey existing solutions that industry has developed for 
> compressing short messages.  What comes to mind:
> 
> - header compression in HTTP2:
> https://http2.github.io/http2-spec/compression.html 
> <https://http2.github.io/http2-spec/compression.html>
> 
> - PPM algorithms are one of the best-performing compression algorithms for 
> text.
> 
> - Arithmetic coding is also a natural starting point for experimentation.
> 
> Since the input mangled name also comes in a restricted character set, we 
> could also remove useless bits first, and try an existing compression 
> algorithm on the resulting binary string.
> 
> We should also build a scheme that uses shortest one between the compressed 
> and non-compressed names.
> 
> For running experiments it would be useful to publish a sample corpus of 
> mangled names that we will be using for comparing the algorithms and 
> approaches.
> 
> I also have a concern about making mangled names completely unreadable.  
> Today, I can frequently at least get a gist of what the referenced entity is 
> without a demangler.  What we could do is make the name consist of a 
> human-readable prefix that encodes just the base name and a compressed suffix 
> that encodes the rest of the information.
> 
> _T<length><class name><length><method name><compressed suffix>
> 
> We would be able to use references to the class and the method name from the 
> compressed part, so that character data isn't completely wasted.
> 
> This scheme that injects human-readable parts will also allow the debugger to 
> quickly match the names without the need to decompress them.
> 
> We should also investigate improving existing mangling scheme to produce 
> shorter results.  For example, one idea that comes to mind is using base-60 
> instead of base-10 for single-digit numbers that that specify identifier 
> length, falling back to base-10 for longer numbers to avoid ambiguity.  This 
> would save one character for every identifier longer than 9 characters and 
> shorter than 60, which is actually the common case.
> 
> Dmitri
> 
> -- 
> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <griboz...@gmail.com 
> <mailto:griboz...@gmail.com>>*/

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to