Re: Unused GCC builtins
On 01/24/2018 07:09 AM, Jakub Jelinek wrote: On Wed, Jan 24, 2018 at 03:04:55PM +0100, Manuel Rigger wrote: In a second step, we also considered internal builtins and found that the vararg handling builtins (__builtin_va_start, __builtin_va_end, __builtin_va_arg, and __builtin_va_copy) are relied upon by many projects, even though they are undocumented in GCC's builtins API. Could they be added to the documentation? Why? What is documented is va_start/va_end/va_arg/va_copy, that is what people should use, the builtins are just internal implementation of those macros. There are a number of reasons why documenting visible APIs is helpful whether or not they are meant to be used by end users. Features that are not meant to be used should be documented as such. Mentioning that they are meant only for internal use makes their purpose clear and sets the right expectation about the level of support and portability between GCC versions. It also makes it clear that we didn't forget to document them by accident. The manual isn't just a reference for GCC users. It's also a helpful reference for developers of GCC-compatible compilers who are not allowed to read GCC source code due to copyright or licensing constraints, or for people maintaining or supporting their own GCC-based operating environments. Finally, it is also a reference for GCC developers. For all these reasons I think every built-in that can be used (intentionally or otherwise) deserves to be documented in the manual. Martin
Re: Unused GCC builtins
* Jakub Jelinek: > On Wed, Jan 24, 2018 at 03:04:55PM +0100, Manuel Rigger wrote: >> In a second step, we also considered internal builtins and found that the >> vararg handling builtins (__builtin_va_start, __builtin_va_end, >> __builtin_va_arg, and __builtin_va_copy) are relied upon by many projects, >> even though they are undocumented in GCC's builtins API. Could they be >> added to the documentation? > > Why? What is documented is va_start/va_end/va_arg/va_copy, that is > what people should use, the builtins are just internal implementation of > those macros. And these builtins differ from the math builtins because is provided by GCC, but is not, and there are many different implementations.
Re: Unused GCC builtins
On Wed, Jan 24, 2018 at 03:04:55PM +0100, Manuel Rigger wrote: > In a second step, we also considered internal builtins and found that the > vararg handling builtins (__builtin_va_start, __builtin_va_end, > __builtin_va_arg, and __builtin_va_copy) are relied upon by many projects, > even though they are undocumented in GCC's builtins API. Could they be > added to the documentation? Why? What is documented is va_start/va_end/va_arg/va_copy, that is what people should use, the builtins are just internal implementation of those macros. Jakub
Re: Unused GCC builtins
Thank you for all answers, which are very useful for us! As you pointed out, we only considered GitHub projects. If I understood correctly, builtins would still not be deprecated even if we considered all other open-source hosting sites because closed-source projects could still rely on them, right? Additionally, target-specific builtins could not be deprecated or removed because of vendor ABIs. Several of you noted that we did not consider internal builtins that are used in the implementation of GCC headers or directly by the compiler. Also the documentation mentions that GCC provides "a large number of built-in functions other than the ones mentioned" for "internal use" which "are not documented here because they may change from time to time" (see https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Other-Builtins.html#Other- Builtins). We deliberately looked only at public builtins (and not internal ones), as we are mainly interested in the effort needed to support GCC builtins in other tools that process C code (e.g., other compilers or analysis tools). We want to prevent that such tool developers need to implement internal or unused builtins. So even if we cannot remove the implementation of a builtin, removing it from the documentation could already be a win. In a second step, we also considered internal builtins and found that the vararg handling builtins (__builtin_va_start, __builtin_va_end, __builtin_va_arg, and __builtin_va_copy) are relied upon by many projects, even though they are undocumented in GCC's builtins API. Could they be added to the documentation? Thanks, Manuel 2018-01-22 19:29 GMT+01:00 Florian Weimer: > * Manuel Rigger: > > > Details: We downloaded all C projects from GitHub that had more than 80 > > GitHub stars, which yielded almost 5,000 projects with a total of more > > than one billion lines of C code. We filtered GCC, forks of GCC, and > > other compilers as we did not want to incorporate internal usages of GCC > > builtins or test cases. We extracted all builtin names from the GCC > > docs, and also tried to find such names in the source code, which we > > considered as builtin usages. > > You actually need to compile the sources with an instrumented compiler > to discover uses of built-ins. Not all references will have verbatim, > textual references in source code, but their names are constructed > using preprocessor macros. This happens for the majority of the > floating-point-related built-ins you listed, I think. >
Re: Unused GCC builtins
* Manuel Rigger: > Details: We downloaded all C projects from GitHub that had more than 80 > GitHub stars, which yielded almost 5,000 projects with a total of more > than one billion lines of C code. We filtered GCC, forks of GCC, and > other compilers as we did not want to incorporate internal usages of GCC > builtins or test cases. We extracted all builtin names from the GCC > docs, and also tried to find such names in the source code, which we > considered as builtin usages. You actually need to compile the sources with an instrumented compiler to discover uses of built-ins. Not all references will have verbatim, textual references in source code, but their names are constructed using preprocessor macros. This happens for the majority of the floating-point-related built-ins you listed, I think.
Re: Unused GCC builtins
On Mon, Jan 22, 2018 at 7:55 AM, David Brownwrote: > On 22/01/18 16:46, Manuel Rigger wrote: >> Hi everyone, >> >> As part of my research, we have been analyzing the usage of GCC builtins >> in 5,000 C GitHub projects. One of our findings is that many of these >> builtins are unused, even though they are described in the documentation >> (see https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html#C-Extensions) >> and obviously took time to develop and maintain. I’ve uploaded a CSV >> file with the unused builtins to >> http://ssw.jku.at/General/Staff/ManuelRigger/unused-builtins.csv. >> >> Details: We downloaded all C projects from GitHub that had more than 80 >> GitHub stars, which yielded almost 5,000 projects with a total of more >> than one billion lines of C code. We filtered GCC, forks of GCC, and >> other compilers as we did not want to incorporate internal usages of GCC >> builtins or test cases. We extracted all builtin names from the GCC >> docs, and also tried to find such names in the source code, which we >> considered as builtin usages. We excluded subdirectories with GCC or >> Clang, and removed other false positives. In total, we found 320k >> builtin usages in these projects, and 3030 unused builtins out of a >> total of 6039 builtins. >> >> What is your take on this? Do you believe that some of these unused >> builtins could be removed from the GCC docs or deprecated? Or are they >> used in special "niche" domains that we did not consider? If yes, do you >> think it is worth to maintain them? Are some of them only used in C++ >> projects? Might it be possible to remove their implementations (which >> has already happened for the Cilk Plus builtins)? >> >> We would be glad for any feedback. >> >> - Manuel >> > > Many of these are going to be used automatically by the compiler. You > write "strdup" in your code, and the compiler treats it as > "__builtin_strdup". I don't know that such functions need to be > documented as extensions, but they are certainly in use. > > You will also find that a large number of the builtins are for specific > target processors, and projects using them are not going to turn up on > GitHub. They will be used in embedded software that is not open source. And the many of the target ones are used indirectly via another function/macro (e.g. __builtin_ia32_ptestc256). The function/macro is defined in a header that GCC provides too. Thanks, Andrew > > I am sure there are builtins that are rarely or never used - but I doubt > if it is anything like as many as you have identified from this survey. > > >
Re: Unused GCC builtins
On Mon, Jan 22, 2018 at 04:55:42PM +0100, David Brown wrote: > Many of these are going to be used automatically by the compiler. You > write "strdup" in your code, and the compiler treats it as > "__builtin_strdup". I don't know that such functions need to be > documented as extensions, but they are certainly in use. > > You will also find that a large number of the builtins are for specific > target processors, and projects using them are not going to turn up on > GitHub. They will be used in embedded software that is not open source. Not just that. If the statistics e.g. ignored GCC headers, then obviously it will miss most of the target builtins, because the normal and only supported way for the target builtins is to use them through the intrinsic inline functions or macros provided by those headers. So, take those out (usually a vendor ABI is something that says what intrinsics are provided, so even if you made statistics on what intrinsic is used in the 5000 most popular projects, we still couldn't remove them) and taking out the above category, where the builtins are just an alternative for a standard function and depending on prototype and chosen standard some functions are treated like builtins, pretty much nothing remains in your survey. Jakub
Re: Unused GCC builtins
On 1/22/2018 9:55 AM, David Brown wrote: On 22/01/18 16:46, Manuel Rigger wrote: Hi everyone, As part of my research, we have been analyzing the usage of GCC builtins in 5,000 C GitHub projects. One of our findings is that many of these builtins are unused, even though they are described in the documentation (see https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html#C-Extensions) and obviously took time to develop and maintain. I’ve uploaded a CSV file with the unused builtins to http://ssw.jku.at/General/Staff/ManuelRigger/unused-builtins.csv. Details: We downloaded all C projects from GitHub that had more than 80 GitHub stars, which yielded almost 5,000 projects with a total of more than one billion lines of C code. We filtered GCC, forks of GCC, and other compilers as we did not want to incorporate internal usages of GCC builtins or test cases. We extracted all builtin names from the GCC docs, and also tried to find such names in the source code, which we considered as builtin usages. We excluded subdirectories with GCC or Clang, and removed other false positives. In total, we found 320k builtin usages in these projects, and 3030 unused builtins out of a total of 6039 builtins. What is your take on this? Do you believe that some of these unused builtins could be removed from the GCC docs or deprecated? Or are they used in special "niche" domains that we did not consider? If yes, do you think it is worth to maintain them? Are some of them only used in C++ projects? Might it be possible to remove their implementations (which has already happened for the Cilk Plus builtins)? We would be glad for any feedback. - Manuel Many of these are going to be used automatically by the compiler. You write "strdup" in your code, and the compiler treats it as "__builtin_strdup". I don't know that such functions need to be documented as extensions, but they are certainly in use. You will also find that a large number of the builtins are for specific target processors, and projects using them are not going to turn up on GitHub. They will be used in embedded software that is not open source. I am sure there are builtins that are rarely or never used - but I doubt if it is anything like as many as you have identified from this survey. My first thought was that there is a lot of free and open source software that is not hosted at github. Larger projects are often self-hosted. Does this list cover all GNU, Savannah, sourceware.org, Apache, KDE, *BSD, Mozilla, etc projects? You might get lucky and some like RTEMS and FreeBSD (I think) have a github mirror. But github is not the entire universe of free and open source software. --joel sherrill RTEMS
Re: Unused GCC builtins
On 22/01/18 16:46, Manuel Rigger wrote: > Hi everyone, > > As part of my research, we have been analyzing the usage of GCC builtins > in 5,000 C GitHub projects. One of our findings is that many of these > builtins are unused, even though they are described in the documentation > (see https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html#C-Extensions) > and obviously took time to develop and maintain. I’ve uploaded a CSV > file with the unused builtins to > http://ssw.jku.at/General/Staff/ManuelRigger/unused-builtins.csv. > > Details: We downloaded all C projects from GitHub that had more than 80 > GitHub stars, which yielded almost 5,000 projects with a total of more > than one billion lines of C code. We filtered GCC, forks of GCC, and > other compilers as we did not want to incorporate internal usages of GCC > builtins or test cases. We extracted all builtin names from the GCC > docs, and also tried to find such names in the source code, which we > considered as builtin usages. We excluded subdirectories with GCC or > Clang, and removed other false positives. In total, we found 320k > builtin usages in these projects, and 3030 unused builtins out of a > total of 6039 builtins. > > What is your take on this? Do you believe that some of these unused > builtins could be removed from the GCC docs or deprecated? Or are they > used in special "niche" domains that we did not consider? If yes, do you > think it is worth to maintain them? Are some of them only used in C++ > projects? Might it be possible to remove their implementations (which > has already happened for the Cilk Plus builtins)? > > We would be glad for any feedback. > > - Manuel > Many of these are going to be used automatically by the compiler. You write "strdup" in your code, and the compiler treats it as "__builtin_strdup". I don't know that such functions need to be documented as extensions, but they are certainly in use. You will also find that a large number of the builtins are for specific target processors, and projects using them are not going to turn up on GitHub. They will be used in embedded software that is not open source. I am sure there are builtins that are rarely or never used - but I doubt if it is anything like as many as you have identified from this survey.