Re: Policing dead/zombie code in m-c
On Tue, Jun 3, 2014 at 9:13 PM, Trevor Saunders trev.saund...@gmail.com wrote: On Tue, Jun 03, 2014 at 12:08:52PM -0400, Ehsan Akhgari wrote: On 2014-06-03, 5:57 AM, Henri Sivonen wrote: On Wed, May 28, 2014 at 5:45 PM, Nathan Froyd froy...@mozilla.com wrote: Assuming that ICU is already compiled with the moral equivalent of GCC's -ffunction-sections -fdata-sections or MSVC's /Gy, then statically linking ICU into libxul should already strip out all the un-needed ICU bits (when using the appropriate linker option). Disabling ICU IDNA code made libxul smaller, so it seems practice doesn't match the above theory. (I.e. clearly the linker wasn't automatically dropping ICU IDNA code.) Well, was this tested in a local build, or ones done in automation? Tested with tryserver builds. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On Wed, May 28, 2014 at 5:45 PM, Nathan Froyd froy...@mozilla.com wrote: Assuming that ICU is already compiled with the moral equivalent of GCC's -ffunction-sections -fdata-sections or MSVC's /Gy, then statically linking ICU into libxul should already strip out all the un-needed ICU bits (when using the appropriate linker option). Disabling ICU IDNA code made libxul smaller, so it seems practice doesn't match the above theory. (I.e. clearly the linker wasn't automatically dropping ICU IDNA code.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On 2014-06-03, 5:57 AM, Henri Sivonen wrote: On Wed, May 28, 2014 at 5:45 PM, Nathan Froyd froy...@mozilla.com wrote: Assuming that ICU is already compiled with the moral equivalent of GCC's -ffunction-sections -fdata-sections or MSVC's /Gy, then statically linking ICU into libxul should already strip out all the un-needed ICU bits (when using the appropriate linker option). Disabling ICU IDNA code made libxul smaller, so it seems practice doesn't match the above theory. (I.e. clearly the linker wasn't automatically dropping ICU IDNA code.) The linker can only eliminate symbols that are not referenced by anything else. Basically the linker creates a graph and preserves everything that is referenced by some roots, for example, main, static initializers, and very importantly, vtables. So, anything that is a virtual function or is referenced by one will remain in the binary even if there is no code _calling_ that virtual function anywhere. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On Tue, Jun 03, 2014 at 12:08:52PM -0400, Ehsan Akhgari wrote: On 2014-06-03, 5:57 AM, Henri Sivonen wrote: On Wed, May 28, 2014 at 5:45 PM, Nathan Froyd froy...@mozilla.com wrote: Assuming that ICU is already compiled with the moral equivalent of GCC's -ffunction-sections -fdata-sections or MSVC's /Gy, then statically linking ICU into libxul should already strip out all the un-needed ICU bits (when using the appropriate linker option). Disabling ICU IDNA code made libxul smaller, so it seems practice doesn't match the above theory. (I.e. clearly the linker wasn't automatically dropping ICU IDNA code.) Well, was this tested in a local build, or ones done in automation? We don't seem to pass --gc-sections to the linker on local builds presumably to make builds faster. The linker can only eliminate symbols that are not referenced by anything else. Basically the linker creates a graph and preserves everything that is referenced by some roots, for example, main, static initializers, and very I'd expect static initializers don't get explicitly added to the set of roots since they should be rooted through .init_array / .ctors. This also misses that function exported from the library are obviously roots. and ICU exports a *ton* of symbols from libxul which I'm trying to fix for ELF targets in bug 1019744. importantly, vtables. So, anything that is a virtual function or is referenced by one will remain in the binary even if there is no code _calling_ that virtual function anywhere. I'd expect that vtables aren't special if there class is hidden and you can gc all the references in the library to the vtable then you can gc the vtable too because nobody can get at it. If the class isn't local then you have to keep the vtable, but that would be true anyway because its a global symbol. Trev Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
This is at least in part due to bug 915735, you can go and read the bug to see that I did the best I could there, given the constraints that I had (not understanding how the magic of the interaction of ICU and our build system worked, not understanding which parts of ICU we actually want to use, the performance and code quality limitations of MSVC's PGO compiler, the fact that increasing the amount of time we allow our builds to progress is rocket science, etc. I don’t have numbers for this change specifically, but during my investigation I found that on Mac statically linking ICU into libmozjs reduced the binary size by almost 600KB, compared to building ICU as a separate DLL. We should probably be able to make that difference smaller with a bit of work. We just need to get ICU to build in a configuration where the library only exports what we need, of course it would be better to just not build what we don't need. On Apr 25, 2014, at 0:31 , Henri Sivonen hsivo...@hsivonen.fi wrote: Therefore, it looks like we should turn off (if we haven't already): * The ICU LayoutEngine. * Ustdio * ICU encoding converters and their mapping tables. * ICU break iterators and their data. * ICU transliterators and their data. All done, except for a few remaining encoding converters: However, that flag is misdesigned in the sense that it considers US-ASCII, ISO-8859-1, UTF-7, UTF-32, CESU-8, SCSU and BOCU-1 as non-legacy, even though, frankly, those are legacy, too. (UTF-16 is legacy also, but it's legacy we need, since both ICU and Gecko are UTF-16 legacy code bases!) http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#267 UTF-16 may be legacy from the point of view of encoding HTML pages, but otherwise it’s an essential part of most of today’s computing environments – and the alternative in many cases would be UTF-32. But at least four of the above are clearly obsolete. for talking to win32 API sure, but win32 is a crazy beast, and we aren't using ICU for that anyway. Also, If the ICU build system is an configurable enough, I think we should consider identifying what parts of ICU we can delete even though the build system doesn't let us to and then automate the deletion as a script so that it can be repeated with future imports of ICU. That seems worth investigating, but getting the build system to strip out as much as possible might be more effective – see the next item. easier probably, better no think of the hundreds of people who are stuck building it all the time. On Apr 25, 2014, at 8:12 , Trevor Saunders trev.saund...@gmail.com wrote: Well, it really depends how you approach the problem of finding code thats dead. One way is to decide code really should be dead by reading it and knowing what it accomplishes doesn't matter. Another is to prove to your self that code is unreachable and so can be removed without effecting anything. Both of these require a pretty good bit of knowledge and skill. On the other hand the below question prompts another approach. Is there a way to give the linker a list of functions that you want to have as public entry points of a dynamically linked library, and have it strip out everything that can’t be reached from these functions? That’s essentially what happens when you statically link a library into another, and the list of ICU functions that Mozilla code calls wouldn’t be excessively long. The bulk of them can be found in this stubs section: http://mxr.mozilla.org/mozilla-central/source/js/src/builtin/Intl.cpp#62 yes, though how hard they are to use with the ICU build system is left as an exercise for the reader. Trev On Apr 28, 2014, at 5:59 , Henri Sivonen hsivo...@hsivonen.fi wrote: I haven't looked into Ustdio. That’s disabled by the --enable-icuio=no flag in http://mxr.mozilla.org/mozilla-central/source/build/autoconf/icu.m4#213 However, I noticed that we aren't turning off ICU IDNA but we should: https://bugzilla.mozilla.org/show_bug.cgi?id=1002435 Thanks for finding and fixing this issue, as well as 1002437! On Apr 25, 2014, at 0:31 , Henri Sivonen hsivo...@hsivonen.fi wrote: Using system ICU seems wrong in terms of correctness. That's the reason why we don't use system ICU on Mac and desktop Linux, right? Correctness from the user’s point of view – quality of locale and time zone data – is clearly an issue (conformance with the Ecma standard is not, as Jeff pointed out). In addition, a system ICU isn’t always available to us: On Mac, it’s not public API; on some Linux versions, it’s built with version numbers baked into function names, which we can’t link against. On May 1, 2014, at 12:25 , Jeff Walden jwalden+...@mit.edu wrote: On 04/28/2014 05:59 AM, Henri Sivonen wrote: Hopefully we didn't remove collation rules, since that's the part we are supposed to
Re: Policing dead/zombie code in m-c
- Original Message - Is there a way to give the linker a list of functions that you want to have as public entry points of a dynamically linked library, and have it strip out everything that can’t be reached from these functions? That’s essentially what happens when you statically link a library into another, and the list of ICU functions that Mozilla code calls wouldn’t be excessively long. The bulk of them can be found in this stubs section: http://mxr.mozilla.org/mozilla-central/source/js/src/builtin/Intl.cpp#62 Assuming that ICU is already compiled with the moral equivalent of GCC's -ffunction-sections -fdata-sections or MSVC's /Gy, then statically linking ICU into libxul should already strip out all the un-needed ICU bits (when using the appropriate linker option). I see that intl/icu/source/configure.ac has bits to turn those on for Linux, but not for Mac or Windows. I can't tell whether we enable all the flags to convince ICU to use those options, though we might be passing some of Gecko's compilation flags into ICU's build process, in which case this is a moot point. For the dynamic library case, if you're using symbol maps (Linux) or export lists (Windows), using -ffunction-sections -fdata-sections or equivalent should given you the same effect, assuming you trim down your mapfile appropriately. -Nathan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
Sorry, I only now discovered this thread. Below are replies to many of the issues raised with regards to the work I did adding ICU and the ECMAScript Internationalization API early last year. A lot of background information is in this document (no longer fully up to date): http://lindenbergsoftware.com/mozilla/implementation.html and in Jeff’s more current one: https://wiki.mozilla.org/User:Waldo/Internationalization_API On Apr 24, 2014, at 5:31 , Henri Sivonen hsivo...@hsivonen.fi wrote: * Are we building and shipping dead code in ICU on B2G? B2G doesn’t have ICU and the internationalization API yet: https://bugzilla.mozilla.org/show_bug.cgi?id=866301 …but of course that’s only a short-term fix. On Apr 24, 2014, at 11:31 , ISHIKAWA,chiaki ishik...@yk.rim.or.jp wrote: There is at least some code in ICU that is - used by FF, but - not used by TB. As far as I know, the ECMAScript Internationalization API is so far only enabled for desktop Firefox. On Apr 24, 2014, at 5:49 , Till Schneidereit t...@tillschneidereit.net wrote: One idea was to slim down the ICU build and get rid of some things not needed for Intl. I might very well be mistaken, but I'm not aware of this having happened. It did happen. ICU has a number of build flags that can be used to reduce code and data size. I used the applicable subset to shrink the binary size to less than half of the size of a full ICU build. Mozilla doesn’t even have the full set of ICU source files – the import script drops the sources for encoding mappings, break iterators, language names and much else. http://mxr.mozilla.org/mozilla-central/source/build/autoconf/icu.m4#145 http://mxr.mozilla.org/mozilla-central/source/build/autoconf/icu.m4#213 http://mxr.mozilla.org/mozilla-central/source/intl/update-icu.sh#32 On Apr 24, 2014, at 16:03 , Ehsan Akhgari ehsan.akhg...@gmail.com wrote: ... We build and ship *all* of ICU, and presumably ship a tiny portion of it. We don’t – see above. This is at least in part due to bug 915735, you can go and read the bug to see that I did the best I could there, given the constraints that I had (not understanding how the magic of the interaction of ICU and our build system worked, not understanding which parts of ICU we actually want to use, the performance and code quality limitations of MSVC's PGO compiler, the fact that increasing the amount of time we allow our builds to progress is rocket science, etc. I don’t have numbers for this change specifically, but during my investigation I found that on Mac statically linking ICU into libmozjs reduced the binary size by almost 600KB, compared to building ICU as a separate DLL. On Apr 25, 2014, at 0:31 , Henri Sivonen hsivo...@hsivonen.fi wrote: Therefore, it looks like we should turn off (if we haven't already): * The ICU LayoutEngine. * Ustdio * ICU encoding converters and their mapping tables. * ICU break iterators and their data. * ICU transliterators and their data. All done, except for a few remaining encoding converters: However, that flag is misdesigned in the sense that it considers US-ASCII, ISO-8859-1, UTF-7, UTF-32, CESU-8, SCSU and BOCU-1 as non-legacy, even though, frankly, those are legacy, too. (UTF-16 is legacy also, but it's legacy we need, since both ICU and Gecko are UTF-16 legacy code bases!) http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#267 UTF-16 may be legacy from the point of view of encoding HTML pages, but otherwise it’s an essential part of most of today’s computing environments – and the alternative in many cases would be UTF-32. But at least four of the above are clearly obsolete. Also, If the ICU build system is an configurable enough, I think we should consider identifying what parts of ICU we can delete even though the build system doesn't let us to and then automate the deletion as a script so that it can be repeated with future imports of ICU. That seems worth investigating, but getting the build system to strip out as much as possible might be more effective – see the next item. On Apr 25, 2014, at 8:12 , Trevor Saunders trev.saund...@gmail.com wrote: Well, it really depends how you approach the problem of finding code thats dead. One way is to decide code really should be dead by reading it and knowing what it accomplishes doesn't matter. Another is to prove to your self that code is unreachable and so can be removed without effecting anything. Both of these require a pretty good bit of knowledge and skill. On the other hand the below question prompts another approach. Is there a way to give the linker a list of functions that you want to have as public entry points of a dynamically linked library, and have it strip out everything that can’t be reached from these functions? That’s essentially what happens when you statically link a library into another, and the list of ICU functions that Mozilla code calls wouldn’t be
Re: Policing dead/zombie code in m-c
On 04/28/2014 05:59 AM, Henri Sivonen wrote: Hopefully we didn't remove collation rules, since that's the part we are supposed to be using ICU for! :-) I'm uncertain exactly what the terminology means necessarily for us, but intl/icu-patches/genrb-omitCollationRules.diff exists because a flag passed in the ICU build process to omit collation rules was broken in 52, and we had to fix it ourselves. (Well, technically I wrote a patch-fix, submitted it upstream, then they rewrote it to be sane and I backported it to our tree. :-) ) So we do omit collation-related data, that was claimed in the code and elsewhere to be collation rules, that's not being used by us. Jeff ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On 2014-04-29, 4:46 AM, Henri Sivonen wrote: On Mon, Apr 28, 2014 at 5:23 PM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: On 2014-04-28, 10:17 AM, Henri Sivonen wrote: On Mon, Apr 28, 2014 at 4:36 PM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: On 2014-04-28, 8:59 AM, Henri Sivonen wrote: New question: We have various scriptable nsIFoo stuff (e.g. nsIParserService, nsIScriptableUConv) on the fringes of Gecko for use by mailnews, the Firefox UI or extensions. Since Gaia doesn't use XPCOM, those things are dead code in B2G, right? Would it make sense to invest time in #ifdefing them out for B2G? Or are they already left out by some mechanism not obvious to me? No, there is no such mechanism. We should turn these off where we can. OK. Is if CONFIG['MOZ_WIDGET_TOOLKIT'] != 'gonk': the right check to use in moz.build? Depends on what you mean. That check would be true for b2g desktop for example, since it doesn't use gonk. If you want to turn something off for b2g entirely, use CONFIG['MOZ_B2G']. Thanks. I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1003028 . When attempting to file dependencies, I discovered that some cruft that I thought could be turned off for B2G is used more broadly than I thought, because previously I was looking at C++ callers forgetting that so .jsm files in m-c are still B2G-relevant. Is there an easy way to see what JS code in mozilla-central is used by B2G? Not that I know of. You need to look at the moz.build/jar.mn file and see if the file is included based on any conditionals. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On Mon, Apr 28, 2014 at 5:23 PM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: On 2014-04-28, 10:17 AM, Henri Sivonen wrote: On Mon, Apr 28, 2014 at 4:36 PM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: On 2014-04-28, 8:59 AM, Henri Sivonen wrote: New question: We have various scriptable nsIFoo stuff (e.g. nsIParserService, nsIScriptableUConv) on the fringes of Gecko for use by mailnews, the Firefox UI or extensions. Since Gaia doesn't use XPCOM, those things are dead code in B2G, right? Would it make sense to invest time in #ifdefing them out for B2G? Or are they already left out by some mechanism not obvious to me? No, there is no such mechanism. We should turn these off where we can. OK. Is if CONFIG['MOZ_WIDGET_TOOLKIT'] != 'gonk': the right check to use in moz.build? Depends on what you mean. That check would be true for b2g desktop for example, since it doesn't use gonk. If you want to turn something off for b2g entirely, use CONFIG['MOZ_B2G']. Thanks. I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1003028 . When attempting to file dependencies, I discovered that some cruft that I thought could be turned off for B2G is used more broadly than I thought, because previously I was looking at C++ callers forgetting that so .jsm files in m-c are still B2G-relevant. Is there an easy way to see what JS code in mozilla-central is used by B2G? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On 2014-04-28, 8:59 AM, Henri Sivonen wrote: New question: We have various scriptable nsIFoo stuff (e.g. nsIParserService, nsIScriptableUConv) on the fringes of Gecko for use by mailnews, the Firefox UI or extensions. Since Gaia doesn't use XPCOM, those things are dead code in B2G, right? Would it make sense to invest time in #ifdefing them out for B2G? Or are they already left out by some mechanism not obvious to me? No, there is no such mechanism. We should turn these off where we can. With bug 996061, even marking them as non-scriptable would help. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On 04/24/2014 05:49 AM, Till Schneidereit wrote: Questions: * Are we building and shipping dead code in ICU on B2G? I don't know the state of ICU on B2G, but if we have it enabled there, then almost certainly, yes. There's doubtless *some* dead code in the ICU we build, yes. (I don't remember whether we build ICU on b2g yet, tho.) Various ICU components were removed as part of building it into Mozilla (I'd have to check Makefile.in/moz.build to get a complete list). It's possible there are more, but I thought Norbert picked off the low-hanging fruit at the very least. Someone might be able to further split up ICU into subcomponents and pick off more, with enough investment of time in it. * The bug about building useless code in ICU has been open since November. Whose responsibility is it to make sure we stop building stuff that we don't use in ICU? I guess that this isn't entirely clear. In practice it's largely no one's. I've been the one to update ICU the one time it's happened so far (we're due for another at this point, I just need to find time to do it). But that's just porting our patch set forward, plus debugging upstream issues. I have no particular expertise (beyond that of any random Mozillian) in tackling this sort of thing in ICU. * Do we have any mechanisms in place for preventing stuff like the ICU encoding converters becoming part of the building the future? When people propose to import third-party code, do reviewers typically ask if we are importing more than we intend to use? Clearly, considering that it is hard to get people to remove unused code from the build after the code has landed, we shouldn't have allowed code like the ICU encoding converters to become part of the build in the first place? I agree in spirit, but this is kind of pedantic. I really don't think clearly is so obviously true, given that by nature of how we use ICU, we're only feeding it UChar/jschar/char16_t UTF-16 strings, so it's totally non-obvious how ICU encoders would ever be triggered. Not ideal, sure. But as we use it, encoders simply don't come into things, at least not in any user-visible way. (Conceivably they might be invoked in the back end when reading locale data from the file system, but that's presumably correctly encoded not as UTF-7 or so. It only really matters if the encoders are visible on the web, which I don't believe they are.) I remember that back when Norbert first presented his Intl implementation work at the JS work week in 2012, we discussed the impact ICU would have on download size. One idea was to slim down the ICU build and get rid of some things not needed for Intl. I might very well be mistaken, but I'm not aware of this having happened. It happened. We removed a number of things like collation rules, some of the ICU components, and more beyond that. We looked at what other vendors were doing and saw that Google essentially just bundle all of ICU and swallow the increase in download size. Well, that was justification for shipping ICU. Not so much justification for shipping more of it than needed. Also, properly integrating ICU into our build system was a major undertaking in itself, so getting to a point where we could ship it at all might have taken precedence. That's reasonably accurate, combined with the basic steps to prune size and built code having already been taken. If there's more of ICU that's easily pruned, I'm all for it. If pruning harder requires splitting ICU up into more components, that's fine, too. But someone has to do it, and my skills and knowledge lie elsewhere. Jeff ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On 04/25/2014 02:10 AM, Henri Sivonen wrote: Different means old, right? Having an old version is a correctness problem, if the CLDR has changed since. Depends what's considered correctness. The ECMA Internationalization API doesn't specify behavior, so older just means not-as-good, not incorrect. (Technically you could expose the Intl API without *any* locale data. It'd defeat the purpose of Intl, but it would not be incorrect.) I don't know to what extent older data is a correctness issue for anything else using ICU. I'd be kind of astonished if slightly-old data ever constituted a black-letter correctness issue. Specs don't/can't move fast enough for the difference between ICU/CLDR 51/52 to matter, correctness-wise. Jeff ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On Thu, Apr 24, 2014 at 4:20 PM, Benoit Jacob jacob.benoi...@gmail.com wrote: 2014-04-24 8:31 GMT-04:00 Henri Sivonen hsivo...@hsivonen.fi: I have prepared a queue of patches that removes Netscape-era (circa 1999) internationalization code that efforts to implement the Encoding Standard have shown unnecessary to have in Firefox. This makes libxul on ARMv7 smaller by 181 KB, so that's a win. Have we measured the impact of this change on actual memory usage (as opposed to virtual address space size) ? No, we haven't. I don't have a B2G phone, but I could give my whole patch queue in one diff to someone who wants to try. Have we explored how much this problem could be automatically helped by the linker being smart about locality? Not to my knowledge, but I'm very skeptical about getting these benefits by having the linker be smart so that the dead code ends up on memory pages that aren't actually mapped to real RAM. The code that is no longer in use is sufficiently intermingled with code that's still is in use. Useful and useless plain old C data is included side-by-side. Useful and useless classes are included next to each other in unified compilation units. Since the classes are instantiated via XPCOM, a linker that's unaware of XPCOM couldn't tell that some classes are in use and some aren't via static analysis. All of them would look equally dead or alive depending on what we do you take on the root of the caller chain being function pointers in a contract ID table. Using PGO to determine what's dead code and what's not wouldn't work, either, if the profiling run was load mozilla.org, because the run would exercise too little code, or if the profiling run was all the unit tests, because the profiling run would exercise too much code. On Fri, Apr 25, 2014 at 2:03 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: * Are we building and shipping dead code in ICU on B2G? No. That is at least partly covered by bug 864843. Using system ICU seems wrong in terms of correctness. That's the reason why we don't use system ICU on Mac and desktop Linux, right? For a given phone, the Android base system practically never updates, so for a given Firefox version, the Web-exposed APIs would have as many behaviors as there are differing ICU snapshots on different Android versions out there. As for B2G, considering that Gonk is supposed to update less often than Gecko, it seems like a bad idea to have ICU be part of Gonk rather than part of Gecko on B2G. In my experience, ICU is unfortunately a hot potato. :( The real blocker there is finding someone who can tell us what bits of ICU _are_ used in the JS engine. Apart from ICU initialization/shutdown, the callers seem to be http://mxr.mozilla.org/mozilla-central/source/js/src/builtin/Intl.cpp and http://mxr.mozilla.org/mozilla-central/source/js/src/jsstr.cpp#852 . So the JS engine uses: * Collation * Number formatting * Date and time formatting * Normalization It looks like the JS engine has its own copy of the Unicode database for other purposes. It seems like that should be unified with ICU so that there'd be only one copy of the Unicode database. Additionally, we should probably rewrite nsCollation users to use ICU collation and delete nsCollation. Therefore, it looks like we should turn off (if we haven't already): * The ICU LayoutEngine. * Ustdio * ICU encoding converters and their mapping tables. * ICU break iterators and their data. * ICU transliterators and their data. http://apps.icu-project.org/datacustom/ gives a good idea of what there is to turn off. The parts used in Gecko for input type=number are pretty small. And of course someone needs to figure out the black magic of conveying the information to the ICU build system. So it looks like we already build with UCONFIG_NO_LEGACY_CONVERSION: http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#264 However, that flag is misdesigned in the sense that it considers US-ASCII, ISO-8859-1, UTF-7, UTF-32, CESU-8, SCSU and BOCU-1 as non-legacy, even though, frankly, those are legacy, too. (UTF-16 is legacy also, but it's legacy we need, since both ICU and Gecko are UTF-16 legacy code bases!) http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#267 So I guess the situation isn't quite as bad as I thought. We should probably set UCONFIG_NO_CONVERSION to 1 and U_CHARSET_IS_UTF8 to 1 per: http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#248 After all, we should easily be able to make sure that we don't use non-UTF-8 encodings when passing char* to ICU. Also, If the ICU build system is an configurable enough, I think we should consider identifying what parts of ICU we can delete even though the build system doesn't let us to and then automate the deletion as a script so that it can be repeated with future imports of ICU. * Do we have any mechanisms in place for
Re: Policing dead/zombie code in m-c
On Fri, Apr 25, 2014 at 10:31 AM, Henri Sivonen hsivo...@hsivonen.fi wrote: So it looks like we already build with UCONFIG_NO_LEGACY_CONVERSION: http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#264 Oops. No. I misread. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On Fri, Apr 25, 2014 at 10:31:44AM +0300, Henri Sivonen wrote: Using system ICU seems wrong in terms of correctness. That's the reason why we don't use system ICU on Mac and desktop Linux, right? Actually the main reason is that every version of desktop linux has a different version of ICU, which has a different ABI. I /guess/ some linux distros are actually using system ICU. Mike ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On Fri, Apr 25, 2014 at 10:31 AM, Henri Sivonen hsivo...@hsivonen.fi wrote: Therefore, it looks like we should turn off (if we haven't already): * The ICU LayoutEngine. I don't find this code in the tree at all. * Ustdio I don't yet know what we are doing about this one. * ICU encoding converters and their mapping tables. It looks like we are actually turning of the converters that need conversion tables. http://mxr.mozilla.org/mozilla-central/source/build/autoconf/icu.m4#146 (Previously, I thought we had customized uconfig.h, but then I found icu.m4.) We are still building algorithmic converters, though. I'll try to turn those off. * ICU break iterators and their data. * ICU transliterators and their data. Looking at the icu.m4 script, it looks like we already turn off the code for these. It's not yet clear to me what part of the build system is responsible for omitting the associated data, but considering that the word as that ICU added around 3 MB, we have to be subsetting the data already. Additionally, I noticed that we are not using ICU IDNA support and IDNA support can be turned off with a define. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On Fri, Apr 25, 2014 at 11:26 AM, Mike Hommey m...@glandium.org wrote: On Fri, Apr 25, 2014 at 10:31:44AM +0300, Henri Sivonen wrote: Using system ICU seems wrong in terms of correctness. That's the reason why we don't use system ICU on Mac and desktop Linux, right? Actually the main reason is that every version of desktop linux has a different version of ICU Different means old, right? Having an old version is a correctness problem, if the CLDR has changed since. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
2014-04-25 3:31 GMT-04:00 Henri Sivonen hsivo...@hsivonen.fi: On Thu, Apr 24, 2014 at 4:20 PM, Benoit Jacob jacob.benoi...@gmail.com wrote: 2014-04-24 8:31 GMT-04:00 Henri Sivonen hsivo...@hsivonen.fi: I have prepared a queue of patches that removes Netscape-era (circa 1999) internationalization code that efforts to implement the Encoding Standard have shown unnecessary to have in Firefox. This makes libxul on ARMv7 smaller by 181 KB, so that's a win. Have we measured the impact of this change on actual memory usage (as opposed to virtual address space size) ? No, we haven't. I don't have a B2G phone, but I could give my whole patch queue in one diff to someone who wants to try. Have we explored how much this problem could be automatically helped by the linker being smart about locality? Not to my knowledge, but I'm very skeptical about getting these benefits by having the linker be smart so that the dead code ends up on memory pages that aren't actually mapped to real RAM. The code that is no longer in use is sufficiently intermingled with code that's still is in use. Useful and useless plain old C data is included side-by-side. Useful and useless classes are included next to each other in unified compilation units. Since the classes are instantiated via XPCOM, a linker that's unaware of XPCOM couldn't tell that some classes are in use and some aren't via static analysis. All of them would look equally dead or alive depending on what we do you take on the root of the caller chain being function pointers in a contract ID table. Using PGO to determine what's dead code and what's not wouldn't work, either, if the profiling run was load mozilla.org, because the run would exercise too little code, or if the profiling run was all the unit tests, because the profiling run would exercise too much code. Thanks for this answer, it does totally make sense (and shed light on the specifics here that make this hard to solve automatically). Benoit On Fri, Apr 25, 2014 at 2:03 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: * Are we building and shipping dead code in ICU on B2G? No. That is at least partly covered by bug 864843. Using system ICU seems wrong in terms of correctness. That's the reason why we don't use system ICU on Mac and desktop Linux, right? For a given phone, the Android base system practically never updates, so for a given Firefox version, the Web-exposed APIs would have as many behaviors as there are differing ICU snapshots on different Android versions out there. As for B2G, considering that Gonk is supposed to update less often than Gecko, it seems like a bad idea to have ICU be part of Gonk rather than part of Gecko on B2G. In my experience, ICU is unfortunately a hot potato. :( The real blocker there is finding someone who can tell us what bits of ICU _are_ used in the JS engine. Apart from ICU initialization/shutdown, the callers seem to be http://mxr.mozilla.org/mozilla-central/source/js/src/builtin/Intl.cpp and http://mxr.mozilla.org/mozilla-central/source/js/src/jsstr.cpp#852 . So the JS engine uses: * Collation * Number formatting * Date and time formatting * Normalization It looks like the JS engine has its own copy of the Unicode database for other purposes. It seems like that should be unified with ICU so that there'd be only one copy of the Unicode database. Additionally, we should probably rewrite nsCollation users to use ICU collation and delete nsCollation. Therefore, it looks like we should turn off (if we haven't already): * The ICU LayoutEngine. * Ustdio * ICU encoding converters and their mapping tables. * ICU break iterators and their data. * ICU transliterators and their data. http://apps.icu-project.org/datacustom/ gives a good idea of what there is to turn off. The parts used in Gecko for input type=number are pretty small. And of course someone needs to figure out the black magic of conveying the information to the ICU build system. So it looks like we already build with UCONFIG_NO_LEGACY_CONVERSION: http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#264 However, that flag is misdesigned in the sense that it considers US-ASCII, ISO-8859-1, UTF-7, UTF-32, CESU-8, SCSU and BOCU-1 as non-legacy, even though, frankly, those are legacy, too. (UTF-16 is legacy also, but it's legacy we need, since both ICU and Gecko are UTF-16 legacy code bases!) http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#267 So I guess the situation isn't quite as bad as I thought. We should probably set UCONFIG_NO_CONVERSION to 1 and U_CHARSET_IS_UTF8 to 1 per: http://mxr.mozilla.org/mozilla-central/source/intl/icu/source/common/unicode/uconfig.h#248 After all, we should easily be able to make sure that we don't use non-UTF-8 encodings when passing char* to ICU. Also, If the ICU
Re: Policing dead/zombie code in m-c
On Fri, Apr 25, 2014 at 12:10:24PM +0300, Henri Sivonen wrote: On Fri, Apr 25, 2014 at 11:26 AM, Mike Hommey m...@glandium.org wrote: On Fri, Apr 25, 2014 at 10:31:44AM +0300, Henri Sivonen wrote: Using system ICU seems wrong in terms of correctness. That's the reason why we don't use system ICU on Mac and desktop Linux, right? Actually the main reason is that every version of desktop linux has a different version of ICU Different means old, right? Having an old version is a correctness problem, if the CLDR has changed since. Can be both ways. Before we upgraded to 52, the version on my system was newer than the one in nightly. In all likeliness, the version on my system is still newer than the one in one or several or release, beta and aurora. ICU changes soname for each version, so it's never backwards compatible. Mike ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On Fri, Apr 25, 2014 at 10:42:43AM -0400, Mike Hoye wrote: On 2014-04-25, 3:31 AM, Henri Sivonen wrote: On Thu, Apr 24, 2014 at 4:20 PM, Benoit Jacob jacob.benoi...@gmail.com wrote: * How should we identify code that we build but that isn't used anywhere? I'm afraid we need humans for that. Yeah, but how do we get humans to do that? We ask them! There are thousands of people out there who want to help us do stuff, if we can give them the right stuff to do. Noob question here: how accessible is the profiling process, and can we/would it be useful to make that participatory? Well, it really depends how you approach the problem of finding code thats dead. One way is to decide code really should be dead by reading it and knowing what it accomplishes doesn't matter. Another is to prove to your self that code is unreachable and so can be removed without effecting anything. Both of these require a pretty good bit of knowledge and skill. On the other hand the below question prompts another approach. If we ask a thousand people to run a nightly under a profiler for a few days and aggregated the results, would that be valuable information? If the users are a good distribution and we collect a list of every function we run we should be able to compare that to a list of things that go into libxul, and maybe we'd find something useful. Trev - mhoye ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On 2014-04-25, 11:12 AM, Trevor Saunders wrote: On Fri, Apr 25, 2014 at 10:42:43AM -0400, Mike Hoye wrote: If we ask a thousand people to run a nightly under a profiler for a few days and aggregated the results, would that be valuable information? If the users are a good distribution and we collect a list of every function we run we should be able to compare that to a list of things that go into libxul, and maybe we'd find something useful. I guess the question becomes how practical it is to set this up and run it, vs. the anticipated value of the results. (Whether it's a good fit for this task or not, community-assisted tooling is something we can do that not a lot of other orgs can, and I think it's worth remembering that we've got that potentially very heavy club in the bag.) - mhoye ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On 2014-04-25, 3:31 AM, Henri Sivonen wrote: On Fri, Apr 25, 2014 at 2:03 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: * Are we building and shipping dead code in ICU on B2G? No. That is at least partly covered by bug 864843. Using system ICU seems wrong in terms of correctness. That's the reason why we don't use system ICU on Mac and desktop Linux, right? If I'm reading that bug correctly, we're not trying to use the system ICU there. In my experience, ICU is unfortunately a hot potato. :( The real blocker there is finding someone who can tell us what bits of ICU _are_ used in the JS engine. Apart from ICU initialization/shutdown, the callers seem to be http://mxr.mozilla.org/mozilla-central/source/js/src/builtin/Intl.cpp and http://mxr.mozilla.org/mozilla-central/source/js/src/jsstr.cpp#852 . So the JS engine uses: * Collation * Number formatting * Date and time formatting * Normalization It looks like the JS engine has its own copy of the Unicode database for other purposes. It seems like that should be unified with ICU so that there'd be only one copy of the Unicode database. Additionally, we should probably rewrite nsCollation users to use ICU collation and delete nsCollation. Therefore, it looks like we should turn off (if we haven't already): * The ICU LayoutEngine. * Ustdio * ICU encoding converters and their mapping tables. * ICU break iterators and their data. * ICU transliterators and their data. http://apps.icu-project.org/datacustom/ gives a good idea of what there is to turn off. Are you looking into these? They all seem like great things to investigate. Also, If the ICU build system is an configurable enough, I think we should consider identifying what parts of ICU we can delete even though the build system doesn't let us to and then automate the deletion as a script so that it can be repeated with future imports of ICU. You need to ask Waldo about that, I think, since he is the person who did the latest ICU update. * Do we have any mechanisms in place for preventing stuff like the ICU encoding converters becoming part of the building the future? No, that is not possible to automate. I was thinking of policy / review solutions. Oh I see. I think this should be part of the normal code review process already. But perhaps we just need to act on it better. I can't think of any policy changes which would be of a substantial help, the best that I can think of is reminding people through these kinds of discussions (and thanks for starting this one!) * How should we identify code that we build but that isn't used anywhere? I'm afraid we need humans for that. Yeah, but how do we get humans to do that? It's a hard question. Not everybody is going to care much about this. In the case of something like ICU it's actually relatively easy for someone to detect the dead code here even without intimate knowledge of the code being used, but in many cases determining dead code really requires a bit of domain expertise, or the willingness to read and learn some new code. I think it would be a great idea if module owners filed mentored bugs on deleting the dead code that they already know of, or even to investigate whether something that they suspect is actually dead code. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Policing dead/zombie code in m-c
We intend to ship Gecko on RAM-constrained devices. Yet, we build and ship code that is pure bloat: code that is built with Firefox/B2G but is used only by c-c code or code that's built with Firefox/B2G but not used by anyone. I have prepared a queue of patches that removes Netscape-era (circa 1999) internationalization code that efforts to implement the Encoding Standard have shown unnecessary to have in Firefox. This makes libxul on ARMv7 smaller by 181 KB, so that's a win. However, especially in the context of slimming down our own set of encoding converters, it's rather demotivating to see that at least on desktop, we are building ICU encoding converters that we don't use. See https://bugzilla.mozilla.org/show_bug.cgi?id=944348 . This isn't even a matter of building code that some might argue we maybe should use (https://bugzilla.mozilla.org/show_bug.cgi?id=724540). We are even building ICU encoding converters that we should never use even if we gave up on our own converters. We're building stuff like BOCU-1 that's explicitly banned by the HTML spec! (In general, it's uncool that abandoned researchy stuff like BOCU-1 is included by default in a widely-used production library like ICU.) Questions: * Are we building and shipping dead code in ICU on B2G? * The bug about building useless code in ICU has been open since November. Whose responsibility is it to make sure we stop building stuff that we don't use in ICU? * Do we have any mechanisms in place for preventing stuff like the ICU encoding converters becoming part of the building the future? When people propose to import third-party code, do reviewers typically ask if we are importing more than we intend to use? Clearly, considering that it is hard to get people to remove unused code from the build after the code has landed, we shouldn't have allowed code like the ICU encoding converters to become part of the build in the first place? * How should we identify code that we build but that isn't used anywhere? * How should we identify code that we build as part of Firefox but is used only in other apps (Thunderbird, SeaMonkey, etc.)? * Are there obvious places that people should inspect for code that's being built but not used? Some libs that got imported for WebRTC maybe? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
(CC'ing people who have worked on the ICU integration) On Thu, Apr 24, 2014 at 2:31 PM, Henri Sivonen hsivo...@hsivonen.fi wrote: However, especially in the context of slimming down our own set of encoding converters, it's rather demotivating to see that at least on desktop, we are building ICU encoding converters that we don't use. See https://bugzilla.mozilla.org/show_bug.cgi?id=944348 . This isn't even a matter of building code that some might argue we maybe should use (https://bugzilla.mozilla.org/show_bug.cgi?id=724540). We are even building ICU encoding converters that we should never use even if we gave up on our own converters. We're building stuff like BOCU-1 that's explicitly banned by the HTML spec! (In general, it's uncool that abandoned researchy stuff like BOCU-1 is included by default in a widely-used production library like ICU.) Questions: * Are we building and shipping dead code in ICU on B2G? I don't know the state of ICU on B2G, but if we have it enabled there, then almost certainly, yes. * The bug about building useless code in ICU has been open since November. Whose responsibility is it to make sure we stop building stuff that we don't use in ICU? I guess that this isn't entirely clear. * Do we have any mechanisms in place for preventing stuff like the ICU encoding converters becoming part of the building the future? When people propose to import third-party code, do reviewers typically ask if we are importing more than we intend to use? Clearly, considering that it is hard to get people to remove unused code from the build after the code has landed, we shouldn't have allowed code like the ICU encoding converters to become part of the build in the first place? I remember that back when Norbert first presented his Intl implementation work at the JS work week in 2012, we discussed the impact ICU would have on download size. One idea was to slim down the ICU build and get rid of some things not needed for Intl. I might very well be mistaken, but I'm not aware of this having happened. We looked at what other vendors were doing and saw that Google essentially just bundle all of ICU and swallow the increase in download size. Also, properly integrating ICU into our build system was a major undertaking in itself, so getting to a point where we could ship it at all might have taken precedence. -- till ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
2014-04-24 8:31 GMT-04:00 Henri Sivonen hsivo...@hsivonen.fi: I have prepared a queue of patches that removes Netscape-era (circa 1999) internationalization code that efforts to implement the Encoding Standard have shown unnecessary to have in Firefox. This makes libxul on ARMv7 smaller by 181 KB, so that's a win. Have we measured the impact of this change on actual memory usage (as opposed to virtual address space size) ? Have we explored how much this problem could be automatically helped by the linker being smart about locality? I totally agree about the value of removing dead code (if only to make the codebase easier to read and maintain), I just wonder if there might be a shortcut to get the short-term memory usage benefits that you mention. Benoit However, especially in the context of slimming down our own set of encoding converters, it's rather demotivating to see that at least on desktop, we are building ICU encoding converters that we don't use. See https://bugzilla.mozilla.org/show_bug.cgi?id=944348 . This isn't even a matter of building code that some might argue we maybe should use (https://bugzilla.mozilla.org/show_bug.cgi?id=724540). We are even building ICU encoding converters that we should never use even if we gave up on our own converters. We're building stuff like BOCU-1 that's explicitly banned by the HTML spec! (In general, it's uncool that abandoned researchy stuff like BOCU-1 is included by default in a widely-used production library like ICU.) Questions: * Are we building and shipping dead code in ICU on B2G? * The bug about building useless code in ICU has been open since November. Whose responsibility is it to make sure we stop building stuff that we don't use in ICU? * Do we have any mechanisms in place for preventing stuff like the ICU encoding converters becoming part of the building the future? When people propose to import third-party code, do reviewers typically ask if we are importing more than we intend to use? Clearly, considering that it is hard to get people to remove unused code from the build after the code has landed, we shouldn't have allowed code like the ICU encoding converters to become part of the build in the first place? * How should we identify code that we build but that isn't used anywhere? * How should we identify code that we build as part of Firefox but is used only in other apps (Thunderbird, SeaMonkey, etc.)? * Are there obvious places that people should inspect for code that's being built but not used? Some libs that got imported for WebRTC maybe? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On 4/24/2014 9:20 AM, Benoit Jacob wrote: 2014-04-24 8:31 GMT-04:00 Henri Sivonen hsivo...@hsivonen.fi: I have prepared a queue of patches that removes Netscape-era (circa 1999) internationalization code that efforts to implement the Encoding Standard have shown unnecessary to have in Firefox. This makes libxul on ARMv7 smaller by 181 KB, so that's a win. Have we measured the impact of this change on actual memory usage (as opposed to virtual address space size) ? We are also limited by disk space in the system partition on devices like tarako. So reducing lib size has a benefit even if the code is usually paged out. Ben ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On Thu, Apr 24, 2014 at 09:20:06AM -0400, Benoit Jacob wrote: 2014-04-24 8:31 GMT-04:00 Henri Sivonen hsivo...@hsivonen.fi: I have prepared a queue of patches that removes Netscape-era (circa 1999) internationalization code that efforts to implement the Encoding Standard have shown unnecessary to have in Firefox. This makes libxul on ARMv7 smaller by 181 KB, so that's a win. Have we measured the impact of this change on actual memory usage (as opposed to virtual address space size) ? that's probably harder, and might depend on how much spare RAM you have on the test machine. Have we explored how much this problem could be automatically helped by the linker being smart about locality? people have and are working on that, so to some degree it probably is already happening. Trev I totally agree about the value of removing dead code (if only to make the codebase easier to read and maintain), I just wonder if there might be a shortcut to get the short-term memory usage benefits that you mention. Benoit However, especially in the context of slimming down our own set of encoding converters, it's rather demotivating to see that at least on desktop, we are building ICU encoding converters that we don't use. See https://bugzilla.mozilla.org/show_bug.cgi?id=944348 . This isn't even a matter of building code that some might argue we maybe should use (https://bugzilla.mozilla.org/show_bug.cgi?id=724540). We are even building ICU encoding converters that we should never use even if we gave up on our own converters. We're building stuff like BOCU-1 that's explicitly banned by the HTML spec! (In general, it's uncool that abandoned researchy stuff like BOCU-1 is included by default in a widely-used production library like ICU.) Questions: * Are we building and shipping dead code in ICU on B2G? * The bug about building useless code in ICU has been open since November. Whose responsibility is it to make sure we stop building stuff that we don't use in ICU? * Do we have any mechanisms in place for preventing stuff like the ICU encoding converters becoming part of the building the future? When people propose to import third-party code, do reviewers typically ask if we are importing more than we intend to use? Clearly, considering that it is hard to get people to remove unused code from the build after the code has landed, we shouldn't have allowed code like the ICU encoding converters to become part of the build in the first place? * How should we identify code that we build but that isn't used anywhere? * How should we identify code that we build as part of Firefox but is used only in other apps (Thunderbird, SeaMonkey, etc.)? * Are there obvious places that people should inspect for code that's being built but not used? Some libs that got imported for WebRTC maybe? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
(2014/04/24 21:49), Till Schneidereit wrote: (CC'ing people who have worked on the ICU integration) On Thu, Apr 24, 2014 at 2:31 PM, Henri Sivonen hsivo...@hsivonen.fi wrote: However, especially in the context of slimming down our own set of encoding converters, it's rather demotivating to see that at least on desktop, we are building ICU encoding converters that we don't use. See https://bugzilla.mozilla.org/show_bug.cgi?id=944348 . This isn't even a matter of building code that some might argue we maybe should use (https://bugzilla.mozilla.org/show_bug.cgi?id=724540). We are even building ICU encoding converters that we should never use even if we gave up on our own converters. We're building stuff like BOCU-1 that's explicitly banned by the HTML spec! (In general, it's uncool that abandoned researchy stuff like BOCU-1 is included by default in a widely-used production library like ICU.) Questions: * Are we building and shipping dead code in ICU on B2G? I don't know the state of ICU on B2G, but if we have it enabled there, then almost certainly, yes. * The bug about building useless code in ICU has been open since November. Whose responsibility is it to make sure we stop building stuff that we don't use in ICU? I guess that this isn't entirely clear. * Do we have any mechanisms in place for preventing stuff like the ICU encoding converters becoming part of the building the future? When people propose to import third-party code, do reviewers typically ask if we are importing more than we intend to use? Clearly, considering that it is hard to get people to remove unused code from the build after the code has landed, we shouldn't have allowed code like the ICU encoding converters to become part of the build in the first place? I remember that back when Norbert first presented his Intl implementation work at the JS work week in 2012, we discussed the impact ICU would have on download size. One idea was to slim down the ICU build and get rid of some things not needed for Intl. I might very well be mistaken, but I'm not aware of this having happened. We looked at what other vendors were doing and saw that Google essentially just bundle all of ICU and swallow the increase in download size. Also, properly integrating ICU into our build system was a major undertaking in itself, so getting to a point where we could ship it at all might have taken precedence. -- till This is just a random observation, not answering the original questions. There is at least some code in ICU that is - used by FF, but - not used by TB. I noticed this when I began compiling FF ( under ./mozilla subdirectory of C-C, which is basically a copy of M-C [could be outdated by a few patches behind.]. There are a few files which did not compile if I enabled -DDEBUG=1 on my compilation command line due to broken code within #ifdef DEBUG, #endif. But this compilation failure was only seen during FF build. TB build using the same tree did not complain about this. So obviously TB does not use such files. Just an observation I noticed recently. Bug 963090 - Compilation error while compiling DEBUG BUILD of Firefox (C-C) rbnf.cpp, ucbuf.c https://bugzilla.mozilla.org/show_bug.cgi?id=963090 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On Thu, Apr 24, 2014 at 4:03 PM, Ehsan Akhgari ehsan.akhg...@gmail.comwrote: * Are there obvious places that people should inspect for code that's being built but not used? Some libs that got imported for WebRTC maybe? Nothing big comes to my mind. Perhaps hunspell on b2g? https://bugzilla.mozilla.org/show_bug.cgi?id=611781: Project to reduce the size of NSS libraries included in Firefox distributions. IIRC, this would cut 500KB of object code. Cheers, Brian ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On 2014-04-24, 7:24 PM, Mike Hommey wrote: On Thu, Apr 24, 2014 at 07:03:09PM -0400, Ehsan Akhgari wrote: On 2014-04-24, 8:31 AM, Henri Sivonen wrote: However, especially in the context of slimming down our own set of encoding converters, it's rather demotivating to see that at least on desktop, we are building ICU encoding converters that we don't use. See https://bugzilla.mozilla.org/show_bug.cgi?id=944348 . Please allow me to upset you even further! We build and ship *all* of ICU, and presumably ship a tiny portion of it. This is at least in part due to bug 915735, you can go and read the bug to see that I did the best I could there, given the constraints that I had (not understanding how the magic of the interaction of ICU and our build system worked, not understanding which parts of ICU we actually want to use, the performance and code quality limitations of MSVC's PGO compiler, the fact that increasing the amount of time we allow our builds to progress is rocket science, etc. None of this was fun, but please understand that it wasn't like somebody sat down and decided that it's a great idea to build and ship all of ICU. I'm still amazed that we allowed ICU to land that way at all. And there I don't mean your changes, which don't particularly change how much of ICU is built. I'm pretty sure my changes did make things worse, since they effectively made the linker unable to eliminate anything assuming that it did back when ICU was built into mozjs.dll... Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policing dead/zombie code in m-c
On 2014-04-24, at 05:31, Henri Sivonen hsivo...@hsivonen.fi wrote: * Are there obvious places that people should inspect for code that's being built but not used? Some libs that got imported for WebRTC maybe? https://bugzilla.mozilla.org/show_bug.cgi?id=1001114 I’m told that sipcc is 3M stripped. We certainly don’t use all of it, but I’m not sure how much can be safely removed. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform