Nadav, can you clarify what we’re really trying to accomplish here? "Smaller binaries” isn’t too important of a goal in and of itself.
Are we trying to: – reduce storage used on disk – reduce load time – reduce loaded memory footprint – make emitting swift binaries more efficient – something else? Yes, I know, “all of the above”, but understanding something about what’s most important would help evaluate the proposal. It’s also worth keeping in mind that iOS and OS X have been aggressively adopting pervasive system-wide compression both on disk and in memory. This trend will continue, and it makes it quite a bit less important for individual components to explicitly adopt compression techniques themselves, except in cases where there’s a lot of special structure that those components can leverage to get better compression than a general-purpose lossless compressor can manage (images and sound are the two obvious examples of this, but also cases like huge arrays of floating-point data where the low-order bits don’t matter, etc). Linux hasn’t been as aggressive about doing this yet, but pervasive system-level compression is The Future. – Steve > On Dec 20, 2015, at 5:17 AM, Dmitri Gribenko <griboz...@gmail.com> wrote: > > + Stephen Canon, because he probably has good ideas in this domain. > > On Fri, Dec 18, 2015 at 3:42 PM, Nadav Rotem via swift-dev > <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote: > > What’s next? > > The small experiment I described above showed that compressing the names in > the string table has a huge potential for reducing the size of swift > binaries. I’d like for us (swift-developers) to talk about the implications > of this change and start working on the two tasks of tightening our existing > mangling format and on implementing a new compression layer on top. > > Hi Nadav, > > This is a great start that shows that there is a potential for improvement in > our mangled names! > > To make this effort more visible, I would suggest creating a bug on > https://bugs.swift.org/ <https://bugs.swift.org/> . > > I think we survey existing solutions that industry has developed for > compressing short messages. What comes to mind: > > - header compression in HTTP2: > https://http2.github.io/http2-spec/compression.html > <https://http2.github.io/http2-spec/compression.html> > > - PPM algorithms are one of the best-performing compression algorithms for > text. > > - Arithmetic coding is also a natural starting point for experimentation. > > Since the input mangled name also comes in a restricted character set, we > could also remove useless bits first, and try an existing compression > algorithm on the resulting binary string. > > We should also build a scheme that uses shortest one between the compressed > and non-compressed names. > > For running experiments it would be useful to publish a sample corpus of > mangled names that we will be using for comparing the algorithms and > approaches. > > I also have a concern about making mangled names completely unreadable. > Today, I can frequently at least get a gist of what the referenced entity is > without a demangler. What we could do is make the name consist of a > human-readable prefix that encodes just the base name and a compressed suffix > that encodes the rest of the information. > > _T<length><class name><length><method name><compressed suffix> > > We would be able to use references to the class and the method name from the > compressed part, so that character data isn't completely wasted. > > This scheme that injects human-readable parts will also allow the debugger to > quickly match the names without the need to decompress them. > > We should also investigate improving existing mangling scheme to produce > shorter results. For example, one idea that comes to mind is using base-60 > instead of base-10 for single-digit numbers that that specify identifier > length, falling back to base-10 for longer numbers to avoid ambiguity. This > would save one character for every identifier longer than 9 characters and > shorter than 60, which is actually the common case. > > Dmitri > > -- > main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if > (j){printf("%d\n",i);}}} /*Dmitri Gribenko <griboz...@gmail.com > <mailto:griboz...@gmail.com>>*/
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev