Re: [PATCH v5] docs/zh_CN: add translations in zh_CN/dev-tools/gcov

2021-04-14 Thread Fangrui Song

Reviewed-by: Fangrui Song 

Inlined some suggestions.

On 2021-04-14, Alex Shi wrote:

Reviewed-by: Alex Shi 

On 2021/4/14 下午9:21, Wu XiangCheng wrote:

From: Bernard Zhao 

Add new zh translations
* zh_CN/dev-tools/gcov.rst
* zh_CN/dev-tools/index.rst
and link them to zh_CN/index.rst

Signed-off-by: Bernard Zhao 
Reviewed-by: Wu XiangCheng 
Signed-off-by: Wu XiangCheng 
---
base: linux-next
commit 269dd42f4776 ("docs/zh_CN: add riscv to zh_CN index")

Changes since V4:
* modified some words under Alex Shi's advices

Changes since V3:
* update to newest linux-next
* fix ``
* fix tags
* fix list indent

Changes since V2:
* fix some inaccurate translation

Changes since V1:
* add index.rst in dev-tools and link to to zh_CN/index.rst
* fix some inaccurate translation

 .../translations/zh_CN/dev-tools/gcov.rst | 265 ++
 .../translations/zh_CN/dev-tools/index.rst|  35 +++
 Documentation/translations/zh_CN/index.rst|   1 +
 3 files changed, 301 insertions(+)
 create mode 100644 Documentation/translations/zh_CN/dev-tools/gcov.rst
 create mode 100644 Documentation/translations/zh_CN/dev-tools/index.rst

diff --git a/Documentation/translations/zh_CN/dev-tools/gcov.rst 
b/Documentation/translations/zh_CN/dev-tools/gcov.rst
new file mode 100644
index ..7515b488bc4e
--- /dev/null
+++ b/Documentation/translations/zh_CN/dev-tools/gcov.rst
@@ -0,0 +1,265 @@
+.. include:: ../disclaimer-zh_CN.rst
+
+:Original: Documentation/dev-tools/gcov.rst
+:Translator: 赵军奎 Bernard Zhao 
+
+在Linux内核里使用gcov做代码覆盖率检查
+=
+
+gcov是linux中已经集成的一个分析模块,该模块在内核中对GCC的代码覆盖率统


instrumentation 一般译作 插桩,而非 分析。


+计提供了支持。
+linux内核运行时的代码覆盖率数据会以gcov兼容的格式存储在debug-fs中,可


专有名词 Linux 应大写。


+以通过gcov的 ``-o`` 选项(如下示例)获得指定文件的代码运行覆盖率统计数据
+(需要跳转到内核编译路径下并且要有root权限)::
+
+# cd /tmp/linux-out
+# gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c
+
+这将在当前目录中创建带有执行计数注释的源代码文件。
+在获得这些统计文件后,可以使用图形化的 gcov_ 前端工具(比如 lcov_ ),来实现
+自动化处理linux内核的覆盖率运行数据,同时生成易于阅读的HTML格式文件。
+
+可能的用途:
+
+* 调试(用来判断每一行的代码是否已经运行过)
+* 测试改进(如何修改测试代码,尽可能地覆盖到没有运行过的代码)
+* 内核配置优化(对于某一个选项配置,如果关联的代码从来没有运行过,是
+  否还需要这个配置)


minimizing: 优化 -> 最小化/简化


+.. _gcov: https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
+.. _lcov: http://ltp.sourceforge.net/coverage/lcov.php
+
+
+准备
+
+
+内核打开如下配置::
+
+CONFIG_DEBUG_FS=y
+CONFIG_GCOV_KERNEL=y
+
+获取整个内核的覆盖率数据,还需要打开::
+
+CONFIG_GCOV_PROFILE_ALL=y
+
+需要注意的是,整个内核开启覆盖率统计会造成内核镜像文件尺寸的增大,
+同时内核运行的也会变慢一些。


s/的//


+另外,并不是所有的架构都支持整个内核开启覆盖率统计。
+
+代码运行覆盖率数据只在debugfs挂载完成后才可以访问::
+
+mount -t debugfs none /sys/kernel/debug
+
+
+定制化
+--
+
+如果要单独针对某一个路径或者文件进行代码覆盖率统计,可以在内核相应路
+径的Makefile中增加如下的配置:
+
+- 单独统计单个文件(例如main.o)::
+
+GCOV_PROFILE_main.o := y
+
+- 单独统计某一个路径::
+
+GCOV_PROFILE := y
+
+如果要在整个内核的覆盖率统计(开启CONFIG_GCOV_PROFILE_ALL)中单独排除
+某一个文件或者路径,可以使用如下的方法::
+
+GCOV_PROFILE_main.o := n
+
+和::
+
+GCOV_PROFILE := n
+
+此机制仅支持链接到内核镜像或编译为内核模块的文件。
+
+
+相关文件
+
+
+gcov功能需要在debugfs中创建如下文件:
+
+``/sys/kernel/debug/gcov``
+gcov相关功能的根路径
+
+``/sys/kernel/debug/gcov/reset``
+全局复位文件:向该文件写入数据后会将所有的gcov统计数据清0
+
+``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcda``
+gcov工具可以识别的覆盖率统计数据文件,向该文件写入数据后
+ 会将本文件的gcov统计数据清0
+
+``/sys/kernel/debug/gcov/path/to/compile/dir/file.gcno``
+gcov工具需要的软连接文件(指向编译时生成的信息统计文件),这个文件是
+在gcc编译时如果配置了选项 ``-ftest-coverage`` 时生成的。
+
+
+针对模块的统计
+--
+
+内核中的模块会动态的加载和卸载,模块卸载时对应的数据会被清除掉。
+gcov提供了一种机制,通过保留相关数据的副本来收集这部分卸载模块的覆盖率数据。
+模块卸载后这些备份数据在debugfs中会继续存在。
+一旦这个模块重新加载,模块关联的运行统计会被初始化成debugfs中备份的数据。
+
+可以通过对内核参数gcov_persist的修改来停用gcov对模块的备份机制::
+
+gcov_persist = 0
+
+在运行时,用户还可以通过写入模块的数据文件或者写入gcov复位文件来丢弃已卸
+载模块的数据。
+
+
+编译机和测试机分离
+--
+
+gcov的内核分析架构支持内核的编译和运行是在同一台机器上,也可以编译和运


分析 -> 插桩


+行是在不同的机器上。
+如果内核编译和运行是不同的机器,那么需要额外的准备工作,这取决于gcov工具
+是在哪里使用的:
+
+.. _gcov-test_zh:
+
+a) 若gcov运行在测试机上
+
+测试机上面gcov工具的版本必须要跟内核编译机器使用的gcc版本相兼容,
+同时下面的文件要从编译机拷贝到测试机上:
+
+从源代码中:
+  - 所有的C文件和头文件
+
+从编译目录中:
+  - 所有的C文件和头文件
+  - 所有的.gcda文件和.gcno文件
+  - 所有目录的链接
+
+特别需要注意,测试机器上面的目录结构跟编译机器上面的目录机构必须
+完全一致。
+如果文件是软链接,需要替换成真正的目录文件(这是由make的当前工作
+目录变量CURDIR引起的)。
+
+.. _gcov-build_zh:
+
+b) 若gcov运行在编译机上
+
+测试用例运行结束后,如下的文件需要从测试机中拷贝到编译机上:
+
+从sysfs中的gcov目录中:
+  - 所有的.gcda文件
+  - 所有的.gcno文件软链接
+
+这些文件可以拷贝到编译机的任意目录下,gcov使用-o选项指定拷贝的
+目录。
+
+比如一个是示例的目录结构如下::
+
+  /tmp/linux:内核源码目录
+  /tmp/out:  内核编译文件路径(make O=指定)
+  /tmp/coverage: 从测试机器上面拷贝的数据文件路径
+
+  [user@build] cd /tmp/out
+  [user@build] gcov -o /tmp/coverage/tmp/out/init main.c
+
+
+关于编译器的注意事项
+
+
+GCC和LLVM gcov工具不一定兼容。
+如果编译器是GCC,使用 gcov_ 来处理.gcno和.gcda文件,如果是Clang编译器,
+则使用 llvm-cov_ 。
+
+.. _gcov: https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
+.. _llvm-cov: https://llvm.org/docs/CommandGuide/llvm-cov.html
+
+GCC和Clang gcov之间的版本差异由Kconfig处理的。
+kconfig会根据编译工具链的检查自动选择合适的gcov格式。
+
+问题定位
+
+
+可能出现的问题1
+编译到

Re: [PATCH 2/2] gcov: re-drop support for clang-10

2021-04-07 Thread Fangrui Song



On 2021-04-07, Nick Desaulniers wrote:

LLVM changed the expected function signatures for
llvm_gcda_emit_function() in the clang-11 release.  Drop the older
implementations and require folks to upgrade their compiler if they're
interested in GCOV support.

Signed-off-by: Nick Desaulniers 
---
kernel/gcov/clang.c | 40 
1 file changed, 40 deletions(-)

diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c
index 1747204541bf..78c4dc751080 100644
--- a/kernel/gcov/clang.c
+++ b/kernel/gcov/clang.c
@@ -69,9 +69,6 @@ struct gcov_fn_info {

u32 ident;
u32 checksum;
-#if CONFIG_CLANG_VERSION < 11
-   u8 use_extra_checksum;
-#endif
u32 cfg_checksum;

u32 num_counters;
@@ -113,23 +110,6 @@ void llvm_gcda_start_file(const char *orig_filename, u32 
version, u32 checksum)
}
EXPORT_SYMBOL(llvm_gcda_start_file);

-#if CONFIG_CLANG_VERSION < 11
-void llvm_gcda_emit_function(u32 ident, u32 func_checksum,
-   u8 use_extra_checksum, u32 cfg_checksum)
-{
-   struct gcov_fn_info *info = kzalloc(sizeof(*info), GFP_KERNEL);
-
-   if (!info)
-   return;
-
-   INIT_LIST_HEAD(>head);
-   info->ident = ident;
-   info->checksum = func_checksum;
-   info->use_extra_checksum = use_extra_checksum;
-   info->cfg_checksum = cfg_checksum;
-   list_add_tail(>head, _info->functions);
-}
-#else
void llvm_gcda_emit_function(u32 ident, u32 func_checksum, u32 cfg_checksum)
{
struct gcov_fn_info *info = kzalloc(sizeof(*info), GFP_KERNEL);
@@ -143,7 +123,6 @@ void llvm_gcda_emit_function(u32 ident, u32 func_checksum, 
u32 cfg_checksum)
info->cfg_checksum = cfg_checksum;
list_add_tail(>head, _info->functions);
}
-#endif
EXPORT_SYMBOL(llvm_gcda_emit_function);

void llvm_gcda_emit_arcs(u32 num_counters, u64 *counters)
@@ -274,16 +253,8 @@ int gcov_info_is_compatible(struct gcov_info *info1, 
struct gcov_info *info2)
!list_is_last(_ptr2->head, >functions)) {
if (fn_ptr1->checksum != fn_ptr2->checksum)
return false;
-#if CONFIG_CLANG_VERSION < 11
-   if (fn_ptr1->use_extra_checksum != fn_ptr2->use_extra_checksum)
-   return false;
-   if (fn_ptr1->use_extra_checksum &&
-   fn_ptr1->cfg_checksum != fn_ptr2->cfg_checksum)
-   return false;
-#else
if (fn_ptr1->cfg_checksum != fn_ptr2->cfg_checksum)
return false;
-#endif
fn_ptr1 = list_next_entry(fn_ptr1, head);
fn_ptr2 = list_next_entry(fn_ptr2, head);
}
@@ -403,21 +374,10 @@ size_t convert_to_gcda(char *buffer, struct gcov_info 
*info)
u32 i;

pos += store_gcov_u32(buffer, pos, GCOV_TAG_FUNCTION);
-#if CONFIG_CLANG_VERSION < 11
-   pos += store_gcov_u32(buffer, pos,
-   fi_ptr->use_extra_checksum ? 3 : 2);
-#else
pos += store_gcov_u32(buffer, pos, 3);
-#endif
pos += store_gcov_u32(buffer, pos, fi_ptr->ident);
pos += store_gcov_u32(buffer, pos, fi_ptr->checksum);
-#if CONFIG_CLANG_VERSION < 11
-   if (fi_ptr->use_extra_checksum)
-   pos += store_gcov_u32(buffer, pos, 
fi_ptr->cfg_checksum);
-#else
pos += store_gcov_u32(buffer, pos, fi_ptr->cfg_checksum);
-#endif
-
pos += store_gcov_u32(buffer, pos, GCOV_TAG_COUNTER_BASE);
pos += store_gcov_u32(buffer, pos, fi_ptr->num_counters * 2);
for (i = 0; i < fi_ptr->num_counters; i++)
--
2.31.1.295.g9ea45b61b8-goog



Looks good for both. Thanks!

Reviewed-by: Fangrui Song 


Re: [PATCH] riscv: Use $(LD) instead of $(CC) to link vDSO

2021-03-26 Thread Fangrui Song



On 2021-03-25, Nathan Chancellor wrote:

Currently, the VDSO is being linked through $(CC). This does not match
how the rest of the kernel links objects, which is through the $(LD)
variable.

When linking with clang, there are a couple of warnings about flags that
will not be used during the link:

clang-12: warning: argument unused during compilation: '-no-pie' 
[-Wunused-command-line-argument]
clang-12: warning: argument unused during compilation: '-pg' 
[-Wunused-command-line-argument]

'-no-pie' was added in commit 85602bea297f ("RISC-V: build vdso-dummy.o
with -no-pie") to override '-pie' getting added to the ld command from
distribution versions of GCC that enable PIE by default. It is
technically no longer needed after commit c2c81bb2f691 ("RISC-V: Fix the
VDSO symbol generaton for binutils-2.35+"), which removed vdso-dummy.o
in favor of generating vdso-syms.S from vdso.so with $(NM) but this also
resolves the issue in case it ever comes back due to having full control
over the $(LD) command. '-pg' is for function tracing, it is not used
during linking as clang states.


Looks good.

-pg affects the link action: it changes crt1.o to gcrt1.o.
Since the Makefile uses -nostdlib, crt1.o is suppressed, so -pg
is entirely unneeded.
(-nostdlib implies -nostartfiles so the previous usage has a redundant
option.)


These flags could be removed/filtered to fix the warnings but it is
easier to just match the rest of the kernel and use $(LD) directly for
linking. See commits

 fe00e50b2db8 ("ARM: 8858/1: vdso: use $(LD) instead of $(CC) to link VDSO")
 691efbedc60d ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO")
 2ff906994b6c ("MIPS: VDSO: Use $(LD) instead of $(CC) to link VDSO")
 2b2a25845d53 ("s390/vdso: Use $(LD) instead of $(CC) to link vDSO")

for more information.

The flags are converted to linker flags and '--eh-frame-hdr' is added to
match what is added by GCC implicitly, which can be seen by adding '-v'
to GCC's invocation.


Another minor change which may be shipped together: --hash-style=both
can be --hash-style=gnu. We don't need sysv .hash . The glibc/musl
support for .gnu.hash has been there for years. .gnu.hash is often
smaller than .hash .

Reviewed-by: Fangrui Song 


Additionally, since this area is being modified, use the $(OBJCOPY)
variable instead of an open coded $(CROSS_COMPILE)objcopy so that the
user's choice of objcopy binary is respected.

Link: https://github.com/ClangBuiltLinux/linux/issues/803
Link: https://github.com/ClangBuiltLinux/linux/issues/970
Signed-off-by: Nathan Chancellor 
---
arch/riscv/kernel/vdso/Makefile | 12 
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 71a315e73cbe..ca2b40dfd24b 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -41,11 +41,10 @@ KASAN_SANITIZE := n
$(obj)/vdso.o: $(obj)/vdso.so

# link rule for the .so file, .lds has to be first
-SYSCFLAGS_vdso.so.dbg = $(c_flags)
$(obj)/vdso.so.dbg: $(src)/vdso.lds $(obj-vdso) FORCE
$(call if_changed,vdsold)
-SYSCFLAGS_vdso.so.dbg = -shared -s -Wl,-soname=linux-vdso.so.1 \
-   -Wl,--build-id=sha1 -Wl,--hash-style=both
+LDFLAGS_vdso.so.dbg = -shared -s -soname=linux-vdso.so.1 \
+   --build-id=sha1 --hash-style=both --eh-frame-hdr

# We also create a special relocatable object that should mirror the symbol
# table and layout of the linked DSO. With ld --just-symbols we can then
@@ -60,13 +59,10 @@ $(obj)/%.so: $(obj)/%.so.dbg FORCE

# actual build commands
# The DSO images are built using a special linker script
-# Add -lgcc so rv32 gets static muldi3 and lshrdi3 definitions.
# Make sure only to export the intended __vdso_xxx symbol offsets.
quiet_cmd_vdsold = VDSOLD  $@
-  cmd_vdsold = $(CC) $(KBUILD_CFLAGS) $(call cc-option, -no-pie) -nostdlib 
-nostartfiles $(SYSCFLAGS_$(@F)) \
-   -Wl,-T,$(filter-out FORCE,$^) -o $@.tmp && \
-   $(CROSS_COMPILE)objcopy \
-   $(patsubst %, -G __vdso_%, $(vdso-syms)) $@.tmp $@ 
&& \
+  cmd_vdsold = $(LD) $(ld_flags) -T $(filter-out FORCE,$^) -o $@.tmp && \
+   $(OBJCOPY) $(patsubst %, -G __vdso_%, $(vdso-syms)) $@.tmp $@ 
&& \
   rm $@.tmp

# Extracts symbol offsets from the VDSO, converting them into an assembly file
--
2.31.0

--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clang-built-linux+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clang-built-linux/20210325215156.1986901-1-nathan%40kernel.org.


Re: [PATCH] riscv: Use $(LD) instead of $(CC) to link vDSO

2021-03-26 Thread Fangrui Song



On 2021-03-26, Nathan Chancellor wrote:

On Sat, Mar 27, 2021 at 12:05:34AM +0800, kernel test robot wrote:

Hi Nathan,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.12-rc4 next-20210326]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Nathan-Chancellor/riscv-Use-LD-instead-of-CC-to-link-vDSO/20210326-055421
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
002322402dafd846c424ffa9240a937f49b48c42
config: riscv-randconfig-r032-20210326 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
f490a5969bd52c8a48586f134ff8f02ccbb295b3)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/dfdcaf93f40f0d15ffc3f25128442c1688e612d6
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Nathan-Chancellor/riscv-Use-LD-instead-of-CC-to-link-vDSO/20210326-055421
git checkout dfdcaf93f40f0d15ffc3f25128442c1688e612d6
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv


For the record, I tried to use this script to reproduce but it has a
couple of bugs:

1. It does not download the right version of clang. This report says
that it is clang-13 but the one that the script downloaded is clang-12.

2. It does not download it to the right location. The script expects
~/0day/clang-latest but it is downloaded to ~/0day/clang it seems. I
symlinked it to get around it.


If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> riscv64-linux-gnu-objcopy: 'arch/riscv/kernel/vdso/vdso.so.dbg': No such file


This error only occurs because of errors before it that are not shown
due to a denylist:

ld.lld: error: arch/riscv/kernel/vdso/rt_sigreturn.o:(.text+0x0): relocation 
R_RISCV_ALIGN requires unimplemented linker relaxation; recompile with 
-mno-relax
ld.lld: error: arch/riscv/kernel/vdso/getcpu.o:(.text+0x0): relocation 
R_RISCV_ALIGN requires unimplemented linker relaxation; recompile with 
-mno-relax
ld.lld: error: arch/riscv/kernel/vdso/flush_icache.o:(.text+0x0): relocation 
R_RISCV_ALIGN requires unimplemented linker relaxation; recompile with 
-mno-relax

My patch only adds another occurrence of this error because we move from
$(CC)'s default linker (in clang's case, ld.bfd) to $(LD), which in the
case of 0day appears to be ld.lld. ld.lld should not be used with RISC-V
in its current form due to errors of this nature, which happen without
my patch as well:

https://github.com/ClangBuiltLinux/linux/issues/1020

Linker relaxation in ld.lld for RISC-V is an ongoing debate/process.
Please give RISC-V the current treatment as s390 with ld.lld for the
time being to get meaningful reports. We will reach out once that issue
has been resolved.



TL;DR: Patch exposes existing issue with LD=ld.lld that would have
happened without it in different areas, the report can be ignored.


Yes, lkp frequently reports this error. It can be suppressed by using
-mno-relax... if ld.lld is picked.

Hmm. This motivated me to file
https://github.com/riscv/riscv-elf-psabi-doc/issues/183
R_RISCV_ALIGN friendly to linkers not supporting relaxation 
(riscv_relax_delete_bytes).


Cheers!
Nathan

--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clang-built-linux+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clang-built-linux/20210326235839.zgfvmtfxrb3hy6i4%40archlinux-ax161.


Re: [PATCH 3/3] riscv: Select HAVE_DYNAMIC_FTRACE when -fpatchable-function-entry is available

2021-03-25 Thread Fangrui Song

On 2021-03-25, Nathan Chancellor wrote:

clang prior to 13.0.0 does not support -fpatchable-function-entry for
RISC-V.

clang: error: unsupported option '-fpatchable-function-entry=8' for target 
'riscv64-unknown-linux-gnu'

To avoid this error, only select HAVE_DYNAMIC_FTRACE when this option is
not available.


If clang -fpatchable-function-entry=8 does not error "unsupported
option" for one target, it means the backend feature is supported on
this target.

Reviewed-by: Fangrui Song 


Fixes: afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")
Link: https://github.com/ClangBuiltLinux/linux/issues/1268
Reported-by: kernel test robot 
Signed-off-by: Nathan Chancellor 
---
arch/riscv/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 87d7b52f278f..ba1d07640b66 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -227,7 +227,7 @@ config ARCH_RV64I
bool "RV64I"
select 64BIT
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && GCC_VERSION >= 5
-   select HAVE_DYNAMIC_FTRACE if MMU
+   select HAVE_DYNAMIC_FTRACE if MMU && 
$(cc-option,-fpatchable-function-entry=8)
select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_GRAPH_TRACER
--
2.31.0

--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clang-built-linux+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clang-built-linux/20210325223807.2423265-4-nathan%40kernel.org.


Re: [PATCH 1/3] scripts/recordmcount.pl: Fix RISC-V regex for clang

2021-03-25 Thread Fangrui Song

On 2021-03-25, Nathan Chancellor wrote:

Clang can generate R_RISCV_CALL_PLT relocations to _mcount:

$ llvm-objdump -dr build/riscv/init/main.o | rg mcount
   000e:  R_RISCV_CALL_PLT _mcount
   004e:  R_RISCV_CALL_PLT _mcount

After this, the __start_mcount_loc section is properly generated and
function tracing still works.



R_RISCV_CALL_PLT can replace R_RISCV_CALL in all use cases.
R_RISCV_CALL can/may be deprecated:
https://github.com/ClangBuiltLinux/linux/issues/1331#issuecomment-802468296

Reviewed-by: Fangrui Song 



Cc: sta...@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1331
Signed-off-by: Nathan Chancellor 
---
scripts/recordmcount.pl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 867860ea57da..a36df04cfa09 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -392,7 +392,7 @@ if ($arch eq "x86_64") {
$mcount_regex = "^\\s*([0-9a-fA-F]+):.*\\s_mcount\$";
} elsif ($arch eq "riscv") {
$function_regex = "^([0-9a-fA-F]+)\\s+<([^.0-9][0-9a-zA-Z_\\.]+)>:";
-$mcount_regex = "^\\s*([0-9a-fA-F]+):\\sR_RISCV_CALL\\s_mcount\$";
+$mcount_regex = "^\\s*([0-9a-fA-F]+):\\sR_RISCV_CALL(_PLT)?\\s_mcount\$";
$type = ".quad";
$alignment = 2;
} elsif ($arch eq "nds32") {
--
2.31.0

--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clang-built-linux+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clang-built-linux/20210325223807.2423265-2-nathan%40kernel.org.


Re: [PATCH] gcov: fix clang-11+ support

2021-03-12 Thread Fangrui Song

On 2021-03-12, Nick Desaulniers wrote:

On Fri, Mar 12, 2021 at 12:25 PM 'Fangrui Song' via Clang Built Linux
 wrote:


function_name can be unconditionally deleted. It is not used by llvm-cov
gcov.  You'll need to delete a few assignments to gcov_info_free but you
can then unify the gcov_fn_info_dup and gcov_info_free implementations.

LG. On big-endian systems, clang < 11 emitted .gcno/.gcda files do not
work with llvm-cov gcov < 11.  To fix it and make .gcno/.gcda work with
gcc gcov I chose to break compatibility (and make all the breaking
changes like deleting some CC1 options) in a short window. At that time
I was not aware that there is the kernel implementation. Later on I was
CCed on a few https://github.com/ClangBuiltLinux/linux/ gcov issues but
I forgot to mention the interface change.


These are all good suggestions. Since in v2 I'll drop support for
clang < 11, I will skip additional patches to disable GCOV when using
older clang for BE, and the function_name cleanup.


Only llvm_gcda_start_file & llvm_gcda_emit_function need version
dispatch. In that case (since there will just be two #if in the file) we don't 
even need

  depends on CC_IS_GCC || CLANG_VERSION >= 11


Now in clang 11 onward, clang --coverage defaults to the gcov 4.8
compatible format. You can specify the CC1 option (internal option,
subject to change) -coverage-version to make it compatible with other
versions' gcov.

-Xclang -coverage-version='407*' => 4.7
-Xclang -coverage-version='704*' => 7.4
-Xclang -coverage-version='B02*' => 10.2 (('B'-'A')*10 = 10)


How come LLVM doesn't default to 10.2 format, if it can optionally
produce it?  We might be able to reuse more code in the kernel between
the two impelementations, though I expect the symbols the runtime is
expected to provide will still differ. Seeing the `B` in `B02*` is
also curious.

Thanks for the review, will include your tag in v2.


4.8 has the widest range of compiler support. gcov 4.8~7.* use the same format.

clang instrumentation does not support the column field (useless in my opinion) 
introduced in gcov 9, so it just writes zeros.


Re: [PATCH] gcov: fix clang-11+ support

2021-03-12 Thread Fangrui Song

On 2021-03-12, Nick Desaulniers wrote:

LLVM changed the expected function signatures for llvm_gcda_start_file()
and llvm_gcda_emit_function() in the clang-11 release. Users of clang-11
or newer may have noticed their kernels failing to boot due to a panic
when enabling CONFIG_GCOV_KERNEL=y +CONFIG_GCOV_PROFILE_ALL=y.  Fix up
the function signatures so calling these functions doesn't panic the
kernel.

When we drop clang-10 support from the kernel, we should carefully
update the original implementations to try to preserve git blame,
deleting these implementations.

Link: https://reviews.llvm.org/rGcdd683b516d147925212724b09ec6fb792a40041
Link: https://reviews.llvm.org/rG13a633b438b6500ecad9e4f936ebadf3411d0f44
Cc: Fangrui Song 
Reported-by: Prasad Sodagudi
Signed-off-by: Nick Desaulniers 
---
kernel/gcov/clang.c | 69 +
1 file changed, 69 insertions(+)

diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c
index c94b820a1b62..20e6760ec05d 100644
--- a/kernel/gcov/clang.c
+++ b/kernel/gcov/clang.c
@@ -75,7 +75,9 @@ struct gcov_fn_info {

u32 num_counters;
u64 *counters;
+#if __clang_major__ < 11
const char *function_name;
+#endif


function_name can be unconditionally deleted. It is not used by llvm-cov
gcov.  You'll need to delete a few assignments to gcov_info_free but you
can then unify the gcov_fn_info_dup and gcov_info_free implementations.


};

static struct gcov_info *current_info;
@@ -105,6 +107,7 @@ void llvm_gcov_init(llvm_gcov_callback writeout, 
llvm_gcov_callback flush)
}
EXPORT_SYMBOL(llvm_gcov_init);

+#if __clang_major__ < 11
void llvm_gcda_start_file(const char *orig_filename, const char version[4],
u32 checksum)
{
@@ -113,7 +116,17 @@ void llvm_gcda_start_file(const char *orig_filename, const 
char version[4],
current_info->checksum = checksum;
}
EXPORT_SYMBOL(llvm_gcda_start_file);
+#else
+void llvm_gcda_start_file(const char *orig_filename, u32 version, u32 checksum)
+{
+   current_info->filename = orig_filename;
+   current_info->version = version;
+   current_info->checksum = checksum;
+}
+EXPORT_SYMBOL(llvm_gcda_start_file);
+#endif


LG. On big-endian systems, clang < 11 emitted .gcno/.gcda files do not
work with llvm-cov gcov < 11.  To fix it and make .gcno/.gcda work with
gcc gcov I chose to break compatibility (and make all the breaking
changes like deleting some CC1 options) in a short window. At that time
I was not aware that there is the kernel implementation. Later on I was
CCed on a few https://github.com/ClangBuiltLinux/linux/ gcov issues but
I forgot to mention the interface change.

Now in clang 11 onward, clang --coverage defaults to the gcov 4.8
compatible format. You can specify the CC1 option (internal option,
subject to change) -coverage-version to make it compatible with other
versions' gcov.

-Xclang -coverage-version='407*' => 4.7
-Xclang -coverage-version='704*' => 7.4
-Xclang -coverage-version='B02*' => 10.2 (('B'-'A')*10 = 10)

Reviewed-by: Fangrui Song 


+#if __clang_major__ < 11
void llvm_gcda_emit_function(u32 ident, const char *function_name,
u32 func_checksum, u8 use_extra_checksum, u32 cfg_checksum)
{
@@ -133,6 +146,24 @@ void llvm_gcda_emit_function(u32 ident, const char 
*function_name,
list_add_tail(>head, _info->functions);
}
EXPORT_SYMBOL(llvm_gcda_emit_function);
+#else
+void llvm_gcda_emit_function(u32 ident, u32 func_checksum,
+   u8 use_extra_checksum, u32 cfg_checksum)
+{
+   struct gcov_fn_info *info = kzalloc(sizeof(*info), GFP_KERNEL);
+
+   if (!info)
+   return;
+
+   INIT_LIST_HEAD(>head);
+   info->ident = ident;
+   info->checksum = func_checksum;
+   info->use_extra_checksum = use_extra_checksum;
+   info->cfg_checksum = cfg_checksum;
+   list_add_tail(>head, _info->functions);
+}
+EXPORT_SYMBOL(llvm_gcda_emit_function);
+#endif

void llvm_gcda_emit_arcs(u32 num_counters, u64 *counters)
{
@@ -295,6 +326,7 @@ void gcov_info_add(struct gcov_info *dst, struct gcov_info 
*src)
}
}

+#if __clang_major__ < 11
static struct gcov_fn_info *gcov_fn_info_dup(struct gcov_fn_info *fn)
{
size_t cv_size; /* counter values size */
@@ -322,6 +354,28 @@ static struct gcov_fn_info *gcov_fn_info_dup(struct 
gcov_fn_info *fn)
kfree(fn_dup);
return NULL;
}
+#else
+static struct gcov_fn_info *gcov_fn_info_dup(struct gcov_fn_info *fn)
+{
+   size_t cv_size; /* counter values size */
+   struct gcov_fn_info *fn_dup = kmemdup(fn, sizeof(*fn),
+   GFP_KERNEL);
+   if (!fn_dup)
+   return NULL;
+   INIT_LIST_HEAD(_dup->head);
+
+   cv_size = fn->num_counters * sizeof(fn->counters[0]);
+   fn_dup->counters = vmalloc(cv_size);
+   if (!fn_dup->counters) {
+   kfree(fn_dup);
+ 

Re: [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

2021-03-10 Thread Fangrui Song

On 2021-03-10, Nicolas Pitre wrote:

On Mon, 1 Mar 2021, Nicholas Piggin wrote:


Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
> Unlike what Nick expected in his submission, I now think the annotations
> will be needed for LTO just like they are for --gc-sections.

Yeah I wasn't sure exactly what LTO looks like or how it would work.
I thought perhaps LTO might be able to find dead code with circular /
back references, we could put references from the code back to these
tables or something so they would be kept without KEEP. I don't know, I
was handwaving!

I managed to get powerpc (and IIRC x86?) working with gc sections with
those KEEP annotations, but effectiveness of course is far worse than
what Nicolas was able to achieve with all his techniques and tricks.

But yes unless there is some other mechanism to handle these tables,
then KEEP probably has to stay. I suggest this wants a very explicit and
systematic way to handle it (maybe with some toolchain support) rather
than trying to just remove things case by case and see what breaks.

I don't know if Nicolas is still been working on his shrinking patches
recenty but he probably knows more than anyone about this stuff.


Looks like not much has changed since last time I played with this stuff.

There is a way to omit the KEEP() on tables, but something must create a
dependency from the code being pointed to by those tables to the table
entries themselves. I did write my findings in the following article
(just skip over the introductory blurb):

https://lwn.net/Articles/741494/


Hey, this article taught me R_*_NONE which motivated me to add various R_*_NONE
support to LLVM 9!

In the weekend I noticed that with binutils>=2.26, one can use
.reloc ., BFD_RELOC_NONE, target
(https://sourceware.org/bugzilla/show_bug.cgi?id=27530 ).
I implemented it for many targets in LLVM, but that will require 13.0.0.


Once that dependency is there, then the KEEP() may go and
garbage-collecting a function will also garbage-collect the table entry
automatically.

OTOH this trickery is not needed with LTO as garbage collection happens
at the source code optimization level. The KEEP() may remain in that
case as unneeded table entries will simply not be created in the first
place.


For Thin LTO, --gc-sections is still very useful.
I have more notes in 
https://maskray.me/blog/2021-02-28-linker-garbage-collection#link-time-optimization
 .


Re: [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

2021-03-10 Thread Fangrui Song

On 2021-03-10, Arnd Bergmann wrote:

On Wed, Mar 10, 2021 at 9:50 PM Masahiro Yamada  wrote:

On Mon, Mar 1, 2021 at 10:11 AM Nicholas Piggin  wrote:
> Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:




masahiro@oscar:~/ref/linux$ echo  'void this_func_is_unused(void) {}'
>>  kernel/cpu.c
masahiro@oscar:~/ref/linux$ export
CROSS_COMPILE=/home/masahiro/tools/powerpc-10.1.0/bin/powerpc-linux-
masahiro@oscar:~/ref/linux$ make ARCH=powerpc  defconfig
masahiro@oscar:~/ref/linux$ ./scripts/config  -e EXPERT
masahiro@oscar:~/ref/linux$ ./scripts/config  -e LD_DEAD_CODE_DATA_ELIMINATION
masahiro@oscar:~/ref/linux$
~/tools/powerpc-10.1.0/bin/powerpc-linux-nm -n  vmlinux | grep
this_func
c0170560 T .this_func_is_unused
c1d8d560 D this_func_is_unused
masahiro@oscar:~/ref/linux$ grep DEAD_CODE_ .config
CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y


If I remember correctly,
LD_DEAD_CODE_DATA_ELIMINATION dropped unused functions
when I tried it last time.


--gc-sections drops unused sections.
If the unused function is part of a larger section which is retained due to 
other symbols (-fno-function-sections),
the unused section will be retained as well.




I also tried arm64 with a HAVE_LD_DEAD_CODE_DATA_ELIMINATION hack.
The result was the same.



Am I missing something?


It's possible that it only works in combination with CLANG_LTO now
because something broke. I definitely saw a reduction in kernel
size when both options are enabled, but did not try a simple test
case like you did.

Maybe some other reference gets created that prevents the function
from being garbage-collected unless that other option is removed
as well?

Arnd


I believe with LLVM regular LTO, --gc-sections has very little benefit
on compiler generated sections. It is still useful for assembly generated 
sections
(but most such sections are probably needed):

* Target specific optimizations can drop references on constants (e.g. `memcpy(..., 
, sizeof(constant));`)
* Due to phase ordering issues some definitions are not discarded by the 
optimizer.

For ThinLTO there are more compiler generated sections discarded by 
`--gc-sections`:

* ThinLTO can cause a definition to be imported to other modules. The original 
definition may be unneeded after imports.
* The definition may survive after intra-module optimization. After imports, a 
round of (inter-module) IR optimizations after 
`computeDeadSymbolsWithConstProp` may make the definition unneeded.
* Symbol resolution is conservative.

Regarding symbol resolution, symbol resolution happens before LTO and LTO 
happens before --gc-sections. The symbol resolution process may be 
conservative: it may communicate to LTO that some symbols are referenced by 
regular object files while in the GC stage the references turn out to not exist 
because of discarded sections with more precise GC roots.

(I've added the above points to my 
https://maskray.me/blog/2021-02-28-linker-garbage-collection#link-time-optimization
 )


Re: [PATCH v2 1/2] Makefile: Remove '--gcc-toolchain' flag

2021-03-09 Thread Fangrui Song



On 2021-03-09, Nathan Chancellor wrote:

This flag was originally added to allow clang to find the GNU cross
tools in commit 785f11aa595b ("kbuild: Add better clang cross build
support"). This flag was not enough to find the tools at times so
'--prefix' was added to the list in commit ef8c4ed9db80 ("kbuild: allow
to use GCC toolchain not in Clang search path") and improved upon in
commit ca9b31f6bb9c ("Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang
cross compilation"). Now that '--prefix' specifies a full path and
prefix, '--gcc-toolchain' serves no purpose because the kernel builds
with '-nostdinc' and '-nostdlib'.

This has been verified with self compiled LLVM 10.0.1 and LLVM 13.0.0 as
well as a distribution version of LLVM 11.1.0 without binutils in the
LLVM toolchain locations.

Link: https://reviews.llvm.org/D97902
Signed-off-by: Nathan Chancellor 


The wording looks good.

Reviewed-by: Fangrui Song 


[PATCH] Replace __toc_start + 0x8000 with .TOC.

2021-03-06 Thread Fangrui Song
TOC relocations are like GOT relocations on other architectures.
However, unlike other architectures, GNU ld's ppc64 port defines .TOC.
relative to the .got output section instead of the linker synthesized
.got input section. LLD defines .TOC. as the .got input section plus
0x8000. When CONFIG_PPC_OF_BOOT_TRAMPOLINE=y,
arch/powerpc/kernel/prom_init.o is built, and LLD computed .TOC. can be
different from __toc_start defined by the linker script.

Simplify kernel_toc_addr with asm label .TOC. so that we can get rid of
__toc_start.

With this change, powernv_defconfig with CONFIG_PPC_OF_BOOT_TRAMPOLINE=y
is bootable with LLD. There is still an untriaged issue with Alexey's
configuration.

Link: https://github.com/ClangBuiltLinux/linux/issues/1318
Reported-by: Alexey Kardashevskiy 
Signed-off-by: Fangrui Song 
---
 arch/powerpc/boot/crt0.S|  2 +-
 arch/powerpc/boot/zImage.lds.S  |  1 -
 arch/powerpc/include/asm/sections.h | 10 ++
 arch/powerpc/kernel/head_64.S   |  2 +-
 arch/powerpc/kernel/vmlinux.lds.S   |  1 -
 5 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/boot/crt0.S b/arch/powerpc/boot/crt0.S
index 1d83966f5ef6..e45907fe468f 100644
--- a/arch/powerpc/boot/crt0.S
+++ b/arch/powerpc/boot/crt0.S
@@ -28,7 +28,7 @@ p_etext:  .8byte  _etext
 p_bss_start:   .8byte  __bss_start
 p_end: .8byte  _end
 
-p_toc: .8byte  __toc_start + 0x8000 - p_base
+p_toc: .8byte  .TOC. - p_base
 p_dyn: .8byte  __dynamic_start - p_base
 p_rela:.8byte  __rela_dyn_start - p_base
 p_prom:.8byte  0
diff --git a/arch/powerpc/boot/zImage.lds.S b/arch/powerpc/boot/zImage.lds.S
index d6f072865627..32cf7816292f 100644
--- a/arch/powerpc/boot/zImage.lds.S
+++ b/arch/powerpc/boot/zImage.lds.S
@@ -39,7 +39,6 @@ SECTIONS
   . = ALIGN(256);
   .got :
   {
-__toc_start = .;
 *(.got)
 *(.toc)
   }
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 324d7b298ec3..bd22ca0b5eca 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -48,14 +48,8 @@ static inline int in_kernel_text(unsigned long addr)
 
 static inline unsigned long kernel_toc_addr(void)
 {
-   /* Defined by the linker, see vmlinux.lds.S */
-   extern unsigned long __toc_start;
-
-   /*
-* The TOC register (r2) points 32kB into the TOC, so that 64kB of
-* the TOC can be addressed using a single machine instruction.
-*/
-   return (unsigned long)(&__toc_start) + 0x8000UL;
+   extern unsigned long toc asm(".TOC.");
+   return (unsigned long)();
 }
 
 static inline int overlaps_interrupt_vector_text(unsigned long start,
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index ece7f97bafff..9542d03b2efe 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -899,7 +899,7 @@ _GLOBAL(relative_toc)
blr
 
 .balign 8
-p_toc: .8byte  __toc_start + 0x8000 - 0b
+p_toc: .8byte  .TOC. - 0b
 
 /*
  * This is where the main kernel code starts.
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 72fa3c00229a..c28f4e5bae3f 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -328,7 +328,6 @@ SECTIONS
 
. = ALIGN(256);
.got : AT(ADDR(.got) - LOAD_OFFSET) {
-   __toc_start = .;
 #ifndef CONFIG_RELOCATABLE
__prom_init_toc_start = .;
arch/powerpc/kernel/prom_init.o*(.toc .got)
-- 
2.30.1.766.gb4fecdf3b7-goog



Re: [PATCH 1/2] Makefile: Remove '--gcc-toolchain' flag

2021-03-03 Thread Fangrui Song



On 2021-03-03, Masahiro Yamada wrote:

Hi.

On Wed, Mar 3, 2021 at 6:44 AM Fangrui Song  wrote:


Reviewed-by: Fangrui Song 

Thanks for the clean-up!
--gcc-toolchain= is an obsscure way searching for GCC installation prefixes 
(--prefix).
The logic is complex and different for different distributions/architectures.

If we specify --prefix= (-B) explicitly, --gcc-toolchain is not needed.



I tested this, and worked for me too.

Before applying this patch, could you please
help me understand the logic?




I checked the manual
(https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-b-dir)



-B, --prefix , --prefix=
   Add  to search path for binaries and object files used implicitly

--gcc-toolchain=, -gcc-toolchain 
   Use the gcc toolchain at the given directory


Hmm, this description is too concise
to understand how it works...



I use Ubuntu 20.10.

I use distro's default clang
located in /usr/bin/clang.

I place my aarch64 linaro toolchain in
/home/masahiro/tools/aarch64-linaro-7.5/bin/aarch64-linux-gnu-gcc,
which is not in my PATH environment.




From my some experiments,

clang --target=aarch64-linux-gnu -no-integrated-as \
--prefix=/home/masahiro/tools/aarch64-linaro-7.5/bin/aarch64-linux-gnu-  ...

works almost equivalent to

PATH=/home/masahiro/tools/aarch64-linaro-7.5/bin:$PATH \
clang --target=aarch64-linux-gnu -no-integrated-as ...


Then, clang will pick up aarch64-linux-gnu-as
found in the search path.

Is this correct?


On the other hand, I could not understand
what the purpose of --gcc-toolchain= is.


Even if I add --gcc-toolchain=/home/masahiro/tools/aarch64-linaro-7.5,
it does not make any difference, and it is completely useless.


I read the comment from stephenhines:
https://github.com/ClangBuiltLinux/linux/issues/78

How could --gcc-toolchain be used
in a useful way?


--gcc-toolchain was introduced in
https://reviews.llvm.org/rG1af7c219c7113a35415651127f05cdf056b63110
to provide a flexible alternative to autoconf configure-time 
--with-gcc-toolchain (now cmake variable GCC_INSTALL_PREFIX).

I agree the option is confusing, the documentation is poor, and probably very 
few people understand what it does.
I apologize that my previous reply is not particular correct.
So the more correct answer is below: 



A --prefix option can specify either of

1) A directory (for something like /a/b/lib/gcc/arm-linux-androideabi, this 
should be /a/b, the parent directory of 'lib')
2) A path fragment like /usr/bin/aarch64-linux-gnu-

The directory values of the --prefix list and --gcc-toolchain are used to 
detect GCC installation directories. The directory is used to fetch include 
directories, system library directories and binutils directories (as, objcopy).
(See below, Linux kernel only needs the binutils executables, so the 
include/library logic is really useless to us)

The logic is around 
https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Gnu.cpp#L1910

  Prefixes = --prefix/-B list (only the directory subset is effective)
  StringRef GCCToolchainDir = --gcc-toolchain= or CMake variable 
GCC_INSTALL_PREFIX
  if (GCCToolchainDir != "") {
Prefixes.push_back(std::string(GCCToolchainDir));
  } else {
if (!D.SysRoot.empty()) {
  Prefixes.push_back(D.SysRoot);
  // Add D.SysRoot+"/usr" to Prefixes. Some distributions add more 
directories.
  AddDefaultGCCPrefixes(TargetTriple, Prefixes, D.SysRoot);
}

// D.InstalledDir is the directory of the clang executable, e.g. /usr/bin
Prefixes.push_back(D.InstalledDir + "/..");

if (D.SysRoot.empty())
  AddDefaultGCCPrefixes(TargetTriple, Prefixes, D.SysRoot);
  }

  // Gentoo / ChromeOS specific logic.
  // I think this block is misplaced.
  if (GCCToolchainDir == "" || GCCToolchainDir == D.SysRoot + "/usr") {
...
  }

  // Loop over the various components which exist and select the best GCC
  // installation available. GCC installs are ranked by version number.
  Version = GCCVersion::Parse("0.0.0");
  for (const std::string  : Prefixes) {
auto  = D.getVFS();
if (!VFS.exists(Prefix))
  continue;

// CandidateLibDirs is a subset of {/lib64, /lib32, /lib}.
for (StringRef Suffix : CandidateLibDirs) {
  const std::string LibDir = Prefix + Suffix.str();
  if (!VFS.exists(LibDir))
continue;
  bool GCCDirExists = VFS.exists(LibDir + "/gcc");
  bool GCCCrossDirExists = VFS.exists(LibDir + "/gcc-cross");

  // Precise match. Detect $Prefix/lib/$--target
  ScanLibDirForGCCTriple(TargetTriple, Args, LibDir, TargetTriple.str(),
 false, GCCDirExists, GCCCrossDirExists);
  // Usually empty.
  for (StringRef Candidate : ExtraTripleAliases) // Try these first.
ScanLibDirForGCCTriple(TargetTriple, Args, LibDir, Candidate, false,
   GCCDirExists, GCCCrossDirExists);
  //

Re: [PATCH 2/2] Makefile: Only specify '--prefix=' when building with clang + GNU as

2021-03-02 Thread Fangrui Song

On 2021-03-02, Nathan Chancellor wrote:

When building with LLVM_IAS=1, there is no point to specifying
'--prefix=' because that flag is only used to find the cross assembler,
which is clang itself when building with LLVM_IAS=1. All of the other
tools are invoked directly from PATH or a full path specified via the
command line, which does not depend on the value of '--prefix='.

Sharing commands to reproduce issues becomes a little bit easier without
a '--prefix=' value because that '--prefix=' value is specific to a
user's machine due to it being an absolute path.

Signed-off-by: Nathan Chancellor 


Reviewed-by: Fangrui Song 

clang can spawn GNU as (if -f?no-integrated-as is specified) and GNU
objcopy (-f?no-integrated-as and -gsplit-dwarf and -g[123]).

With LLVM_IAS=1, these cases are ruled out.


Re: [PATCH 1/2] Makefile: Remove '--gcc-toolchain' flag

2021-03-02 Thread Fangrui Song

Reviewed-by: Fangrui Song 

Thanks for the clean-up!
--gcc-toolchain= is an obsscure way searching for GCC installation prefixes 
(--prefix).
The logic is complex and different for different distributions/architectures.

If we specify --prefix= (-B) explicitly, --gcc-toolchain is not needed.

On 2021-03-02, Nathan Chancellor wrote:

This is not necessary anymore now that we specify '--prefix=', which
tells clang exactly where to find the GNU cross tools. This has been
verified with self compiled LLVM 10.0.1 and LLVM 13.0.0 as well as a
distribution version of LLVM 11.1.0 without binutils in the LLVM
toolchain locations.

Signed-off-by: Nathan Chancellor 
---
Makefile | 4 
1 file changed, 4 deletions(-)

diff --git a/Makefile b/Makefile
index f9b54da2fca0..c20f0ad8be73 100644
--- a/Makefile
+++ b/Makefile
@@ -568,10 +568,6 @@ ifneq ($(CROSS_COMPILE),)
CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%))
GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
CLANG_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR)$(notdir $(CROSS_COMPILE))
-GCC_TOOLCHAIN  := $(realpath $(GCC_TOOLCHAIN_DIR)/..)
-endif
-ifneq ($(GCC_TOOLCHAIN),)
-CLANG_FLAGS+= --gcc-toolchain=$(GCC_TOOLCHAIN)
endif
ifneq ($(LLVM_IAS),1)
CLANG_FLAGS += -no-integrated-as

base-commit: 7a7fd0de4a9804299793e564a555a49c1fc924cb
--
2.31.0.rc0.75.gec125d1bc1

--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clang-built-linux+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clang-built-linux/20210302210646.3044738-1-nathan%40kernel.org.


Re: [PATCH v8] pgo: add clang's Profile Guided Optimization infrastructure

2021-02-28 Thread Fangrui Song

On 2021-02-28, Fangrui Song wrote:

Reviewed-by: Fangrui Song 

Some minor items below:

On 2021-02-26, 'Bill Wendling' via Clang Built Linux wrote:

From: Sami Tolvanen 

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

$ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
$ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

$ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native, unlike
the clang support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen 
Co-developed-by: Bill Wendling 
Signed-off-by: Bill Wendling 
---
v8: - Rebased on top-of-tree.
v7: - Fix minor build failure reported by Sedat.
v6: - Add better documentation about the locking scheme and other things.
  - Rename macros to better match the same macros in LLVM's source code.
v5: - Correct padding calculation, discovered by Nathan Chancellor.
v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
own popcount implementation, based on Nick Desaulniers's comment.
v3: - Added change log section based on Sedat Dilek's comments.
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
testing.
  - Corrected documentation, re PGO flags when using LTO, based on Fangrui
Song's comments.
---
Documentation/dev-tools/index.rst |   1 +
Documentation/dev-tools/pgo.rst   | 127 +
MAINTAINERS   |   9 +
Makefile  |   3 +
arch/Kconfig  |   1 +
arch/x86/Kconfig  |   1 +
arch/x86/boot/Makefile|   1 +
arch/x86/boot/compressed/Makefile |   1 +
arch/x86/crypto/Makefile  |   4 +
arch/x86/entry/vdso/Makefile  |   1 +
arch/x86/kernel/vmlinux.lds.S |   2 +
arch/x86/platform/efi/Makefile|   1 +
arch/x86/purgatory/Makefile   |   1 +
arch/x86/realmode/rm/Makefile |   1 +
arch/x86/um/vdso/Makefile |   1 +
drivers/firmware/efi/libstub/Makefile |   1 +
include/asm-generic/vmlinux.lds.h |  44 +++
kernel/Makefile   |   1 +
kernel/pgo/Kconfig|  35 +++
kernel/pgo/Makefile   |   5 +
kernel/pgo/fs.c   | 389 ++
kernel/pgo/instrument.c   | 189 +
kernel/pgo/pgo.h  | 203 ++
scripts/Makefile.lib  |  10 +
24 files changed, 1032 insertions(+)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst 
b/Documentation/dev-tools/index.rst
index f7809c7b1ba9..8d6418e85806 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
  kgdb
  kselftest
  kunit/index
+   pgo


.. only::  subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index ..b7f11d8405b7
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Using PGO with the Linux kernel
+===
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: 
https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===
+
+Configure the kernel with:
+
+.. code-block:: make
+
+   CONFIG_DEBUG_FS=y
+   CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+   mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+  .. code-block:: make
+
+ PGO_PROFILE_main.o 

Re: [PATCH v8] pgo: add clang's Profile Guided Optimization infrastructure

2021-02-28 Thread Fangrui Song

Reviewed-by: Fangrui Song 

Some minor items below:

On 2021-02-26, 'Bill Wendling' via Clang Built Linux wrote:

From: Sami Tolvanen 

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool
before it can be used during recompilation:

 $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
 $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can now be used by the compiler:

 $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ...

This initial submission is restricted to x86, as that's the platform we
know works. This restriction can be lifted once other platforms have
been verified to work with PGO.

Note that this method of profiling the kernel is clang-native, unlike
the clang support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen 
Co-developed-by: Bill Wendling 
Signed-off-by: Bill Wendling 
---
v8: - Rebased on top-of-tree.
v7: - Fix minor build failure reported by Sedat.
v6: - Add better documentation about the locking scheme and other things.
   - Rename macros to better match the same macros in LLVM's source code.
v5: - Correct padding calculation, discovered by Nathan Chancellor.
v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our
 own popcount implementation, based on Nick Desaulniers's comment.
v3: - Added change log section based on Sedat Dilek's comments.
v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's
 testing.
   - Corrected documentation, re PGO flags when using LTO, based on Fangrui
 Song's comments.
---
Documentation/dev-tools/index.rst |   1 +
Documentation/dev-tools/pgo.rst   | 127 +
MAINTAINERS   |   9 +
Makefile  |   3 +
arch/Kconfig  |   1 +
arch/x86/Kconfig  |   1 +
arch/x86/boot/Makefile|   1 +
arch/x86/boot/compressed/Makefile |   1 +
arch/x86/crypto/Makefile  |   4 +
arch/x86/entry/vdso/Makefile  |   1 +
arch/x86/kernel/vmlinux.lds.S |   2 +
arch/x86/platform/efi/Makefile|   1 +
arch/x86/purgatory/Makefile   |   1 +
arch/x86/realmode/rm/Makefile |   1 +
arch/x86/um/vdso/Makefile |   1 +
drivers/firmware/efi/libstub/Makefile |   1 +
include/asm-generic/vmlinux.lds.h |  44 +++
kernel/Makefile   |   1 +
kernel/pgo/Kconfig|  35 +++
kernel/pgo/Makefile   |   5 +
kernel/pgo/fs.c   | 389 ++
kernel/pgo/instrument.c   | 189 +
kernel/pgo/pgo.h  | 203 ++
scripts/Makefile.lib  |  10 +
24 files changed, 1032 insertions(+)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst 
b/Documentation/dev-tools/index.rst
index f7809c7b1ba9..8d6418e85806 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
   kgdb
   kselftest
   kunit/index
+   pgo


.. only::  subproject and html
diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst
new file mode 100644
index ..b7f11d8405b7
--- /dev/null
+++ b/Documentation/dev-tools/pgo.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Using PGO with the Linux kernel
+===
+
+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel
+when building with Clang. The profiling data is exported via the ``pgo``
+debugfs directory.
+
+.. _PGO: 
https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
+
+
+Preparation
+===
+
+Configure the kernel with:
+
+.. code-block:: make
+
+   CONFIG_DEBUG_FS=y
+   CONFIG_PGO_CLANG=y
+
+Note that kernels compiled with profiling flags will be significantly larger
+and run slower.
+
+Profiling data will only become accessible once debugfs has been mounted:
+
+.. code-block:: sh
+
+   mount -t debugfs none /sys/kernel/debug
+
+
+Customization
+=
+
+You can enable or disable profiling for individual file and directories by
+adding a line similar to the following to the respective kernel Makefile:
+
+- For a single file (e.g. main.o)
+
+  .. code-block:: make
+
+ PGO_PROFILE_main.o := y
+
+- For all files in on

Re: [PATCH RFC] x86: remove toolchain check for X32 ABI capability

2021-02-27 Thread Fangrui Song

On 2021-02-28, Masahiro Yamada wrote:

This commit reverts 0bf6276392e9 ("x32: Warn and disable rather than
error if binutils too old").

The help text in arch/x86/Kconfig says enabling the X32 ABI support
needs binutils 2.22 or later. This is met because the minimal binutils
version is 2.23 according to Documentation/process/changes.rst.

I would not say I am not familiar with toolchain configuration, but
I checked the configure.tgt code in binutils. The elf32_x86_64
emulation mode seems to be included when it is configured for the
x86_64-*-linux-* target.

I also tried lld and llvm-objcopy, and succeeded in building x32 VDSO.

I removed the compile-time check in arch/x86/Makefile, in the hope of
elf32_x86_64 being always supported.

With this, CONFIG_X86_X32 and CONFIG_X86_X32_ABI will be equivalent.
Rename the former to the latter.


Hi Masahiro, the cleanup looks nice!

As of LLVM toolchain support, I don't know any user using LLVM binary
utilities or LLD.
The support on binary utitlies should be minimum anyway (EM_X86_64,
ELFCLASS32, ELFDATA2LSB are mostly all the tool needs to know for many 
utilities), so
many of they should just work.

For llvm-objcopy, I know two issues related to `$(OBJCOPY) -O elf32-x86-64`
(actually `objcopy -I elf64-x86-64 -O elf32-x86-64`).  Such an operation tries
to convert an ELFCLASS64 object file to an ELFCLASS32 object file. It is not 
very clear
what GNU objcopy does. llvm-objcopy is dumb and does not do fancy CLASS 
conversion.

* {gcc,clang} -gz{,=zlib} produced object files. The Elf{32,64}_Chdr headers 
are different.
  Seems that GNU objcopy can convert the headers 
(https://github.com/ClangBuiltLinux/linux/issues/514).
  llvm-objcopy cannot do it.
* Seems that GNU objcopy can convert .note.gnu.property 
(https://github.com/ClangBuiltLinux/linux/issues/1141#issuecomment-678798228)
  llvm-objcopy cannot do it.


On the linker side, I know TLS relaxations and IBT need special care and I
believe LLD does not handle them correctly. Thankfully the kernel does not use
thread-local storage so this is not an issue. So perhaps for most configurations
it is already working.  Since you've tested it, that is good news to me:)


Re: [PATCH] arm64: vmlinux.lds.S: keep .entry.tramp.text section

2021-02-26 Thread Fangrui Song



On 2021-02-26, Kees Cook wrote:

On Fri, Feb 26, 2021 at 03:03:39PM +0100, Arnd Bergmann wrote:

From: Arnd Bergmann 

When building with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION,
I sometimes see an assertion

 ld.lld: error: Entry trampoline text too big


Heh, "too big" seems a weird report for having it discarded. :)

Any idea on this Fangrui?

( I see this is https://github.com/ClangBuiltLinux/linux/issues/1311 )


This diagnostic is from an ASSERT in arch/arm64/kernel/vmlinux.lds

  ASSERT((__entry_tramp_text_end - __entry_tramp_text_start) == (1 << 16),
   "Entry trampoline text too big")

In our case (aarch64-linux-gnu-ld or LLD, --gc-sections), all the input 
sections with this name
are discarded, so the output section is either absent (GNU ld) or empty (LLD).

KEEP  makes the sections GC roots, and it is appropriate to use here.


However, I worry that many other KEEP keywords in vmlinux.lds are unnecessary:
https://lore.kernel.org/linux-arm-kernel/20210226211323.arkvjnr4hifxa...@google.com/

git log -S KEEP -- include/asm-generic/vmlinux.lds.h => there is quite a
bit unjustified usage. Sure, adding KEEP (GC root) is easy and
works around problems, but it not good for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.

Reviewed-by: Fangrui Song 







This happens when any reference to the trampoline is discarded at link
time. Marking the section as KEEP() avoids the assertion, but I have
not figured out whether this is the correct solution for the underlying
problem.

Signed-off-by: Arnd Bergmann 


As a work-around, it seems fine to me.

Reviewed-by: Kees Cook 

-Kees


---
 arch/arm64/kernel/vmlinux.lds.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 926cdb597a45..c5ee9d5842db 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -96,7 +96,7 @@ jiffies = jiffies_64;
 #define TRAMP_TEXT \
. = ALIGN(PAGE_SIZE);   \
__entry_tramp_text_start = .;   \
-   *(.entry.tramp.text)\
+   KEEP(*(.entry.tramp.text))  \
. = ALIGN(PAGE_SIZE);   \
__entry_tramp_text_end = .;
 #else
--
2.29.2



--
Kees Cook


Re: [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

2021-02-26 Thread Fangrui Song

On 2021-02-25, Arnd Bergmann wrote:

From: Arnd Bergmann 

When looking at kernel size optimizations, I found that arm64
does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION,
which enables the --gc-sections flag to the linker.

I see that for a defconfig build with llvm, there are some
notable improvements from enabling this, in particular when
combined with the recently added CONFIG_LTO_CLANG_THIN
and CONFIG_TRIM_UNUSED_KSYMS:

  textdata bss dec hex filename
16570322 10998617 506468 28075407 1ac658f defconfig/vmlinux
16318793 10569913 506468 27395174 1a20466 trim_defconfig/vmlinux
16281234 10984848 504291 27770373 1a7be05 gc_defconfig/vmlinux
16029705 10556880 504355 27090940 19d5ffc gc+trim_defconfig/vmlinux
17040142 11102945 504196 28647283 1b51f73 thinlto_defconfig/vmlinux
16788613 10663201 504196 27956010 1aa932a thinlto+trim_defconfig/vmlinux
16347062 11043384 502499 27892945 1a99cd1 gc+thinlto_defconfig/vmlinux
15759453 10532792 502395 26794640 198da90 gc+thinlto+trim_defconfig/vmlinux

I needed a small change to the linker script to get clean randconfig
builds, but I have not done any meaningful boot testing on it to
see if it works. If there are no regressions, I wonder whether this
should be autmatically done for LTO builds, given that it improves
both kernel size and compile speed.

Link: 
https://lore.kernel.org/lkml/cak8p3a05vz9hskrzvtxtn+1nf9e+gqebjwtj6n23nfm+elh...@mail.gmail.com/
Signed-off-by: Arnd Bergmann 


For folks who are interested in --gc-sections on metadata sections,
I want to bring you awareness of the implication of __start_/__stop_ symbols 
and C identifier name sections.
You can see https://github.com/ClangBuiltLinux/linux/issues/1307 for a summary.
(Its linked blog article has some examples.)

In the kernel linker scripts, most C identifier name sections begin with 
double-underscore __.
Some are surrounded by `KEEP(...)`, some are not.

* A `KEEP` keyword has GC root semantics and makes ld --gc-sections ineffectful.
* Without `KEEP`, __start_/__stop_ references from a live input section
  can unnecessarily retain all the associated C identifier name input
  sections. The new ld.lld option `-z start-stop-gc` can defeat this rule.

As an example, a __start___jump_table reference from a live section
causes all `__jump_table` input section to be retained, even if you
change `KEEP(__jump_table)` to `(__jump_table)`.
(If you change the symbol name from `__start_${section}` to something
else (e.g. `__start${section}`), the rule will not apply.)


There are a lot of KEEP usage. Perhaps some can be dropped to facilitate
ld --gc-sections.


Re: [PATCH v7 1/2] Kbuild: make DWARF version a choice

2021-02-04 Thread Fangrui Song

On 2021-02-04, Nick Desaulniers wrote:

On Thu, Feb 4, 2021 at 12:28 PM Mark Wielaard  wrote:


On Thu, 2021-02-04 at 12:04 -0800, Nick Desaulniers wrote:
> On Thu, Feb 4, 2021 at 11:56 AM Mark Wielaard  wrote:
> > I agree with Jakub. Now that GCC has defaulted to DWARF5 all the
> > tools
> > have adopted to the new default version. And DWARF5 has been out
> > for
>
> "all of the tools" ?

I believe so yes, we did a mass-rebuild of all of Fedora a few weeks
back with a GCC11 pre-release and did find some tools which weren't
ready, but as far as I know all have been fixed now. I did try to


Congrats, I'm sure that was _a lot_ of work.  Our toolchain folks have
been pouring a lot of effort over getting our internal code all moved
over, and it doesn't look like it's been easy from my perspective.


coordinate with the Suse and Debian packagers too, so all the major
distros should have all the necessary updates once switching to GCC11.


That's great for users of the next Fedora release who can and will
upgrade, but I wouldn't assume kernel developers can, or will (or are
even using those distros).

Most recently, there was discussion on the list about upgrading the
minimally required version of GCC for building the kernel to GCC 5.1;
we still had developers complain about abandoning GCC 4.9.  And
Guenter shared with me a list of architectures he tests with where he
cannot upgrade the version of GCC in order to keep building them.
https://github.com/groeck/linux-build-test/blob/master/bin/stable-build-arch.sh
(I hope someone sent bug reports for those)

My intent is very much to allow for users of toolchains that have not
switched the implicit default (such as all of the supported versions
of GCC that have been released ie. up to GCC 10.2, and Clang; so all
toolchains the kernel still supports) to enjoy the size saving of
DWARF v5, and find what other tooling needs to be updated.


> > more than 4 years already. It isn't unreasonable to assume that people
> > using GCC11 will also be using the rest of the toolchain that has moved
> > on. Which DWARF consumers are you concerned about not being ready for
> > GCC defaulting to DWARF5 once GCC11 is released?
>
> Folks who don't have top of tree pahole or binutils are the two that
> come to mind.

I believe pahole just saw a 1.20 release. I am sure it will be widely
available once GCC11 is released (which will still be 1 or 2 months)
and people are actually using it. Or do you expect distros/people are
going to upgrade to GCC11 without updating their other toolchain tools?


Does no one test linux kernel builds with top of tree GCC built from
source?  Or does that require "updating their other toolchain tools?"
If I build ToT GCC from source, do I need to do the same for
binutils-gdb in order to build the kernel?  Pretty sure I don't.

https://bugzilla.redhat.com/show_bug.cgi?id=1922707 and
https://bugzilla.redhat.com/show_bug.cgi?id=1922698 look like user
reports to me, but hopefully some CI system reported earlier that
builds of the Linux kernel with GCC 11 pre release would produce the
warnings from those bug report.  Otherwise it looks like evidence that
users "are going to upgrade to GCC11 without updating their other
toolchain tools."  In the case of pahole, they could not, because
fixes were not yet written.  "Just upgrade" doesn't work if there's no
fix yet upstream.  (pahole is reported fixed for that specific issue,
FWIW).


BTW. GCC11 doesn't need top of tree binutils, it will detect the
binutils capabilities (bugs) and adjust its DWARF output based on it.


Yes, I saw 
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6923255e35a3d54f2083ad0f67edebb3f1b86506
and 
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1aeb7d7d67d167297ca2f4a97ef20f68e7546b4c.
It's nice that GCC can tightly couple to a version of binutils. Clang
without its integrated assembler can make no such assumptions about
which assembler the user will prefer to use instead at runtime, and
without binutils 2.35.1 being widely available as we all would like,
leads to issues shipping DWARF v5 by default.


>   I don't have specifics on out of tree consumers, but
> some Aarch64 extensions which had some changes to DWARF for ARMv8.3
> PAC support broke some debuggers.

It would be really helpful if you could provide some specifics. I did
fix some consumers to handle the PAC operands in CFI last year, but I
don't believe that had anything to do with the default DWARF version,
just with dealing with DW_CFA_AARCH64_negate_ra_state.


Yep, that's the one.  I suspect that the same out of tree consumers of
DWARF that did not see that coming will similarly be stumped when
presented with DWARF v5, but it's hypothetical, so not much of an
argument I'll admit.  I just wouldn't bet that an upgrade to DWARF v5
will be painless at this point in time, as evidenced by how much blood
has been poured into finding what tools out there were broken and
needed to be fixed.  I also recognize we can't drag our heels 

Re: [PATCH v7 1/2] Kbuild: make DWARF version a choice

2021-02-03 Thread Fangrui Song

On 2021-02-04, Masahiro Yamada wrote:

On Sat, Jan 30, 2021 at 10:52 AM Nathan Chancellor  wrote:


On Fri, Jan 29, 2021 at 04:44:00PM -0800, Nick Desaulniers wrote:
> Modifies CONFIG_DEBUG_INFO_DWARF4 to be a member of a choice which is
> the default. Does so in a way that's forward compatible with existing
> configs, and makes adding future versions more straightforward.
>
> GCC since ~4.8 has defaulted to this DWARF version implicitly.
>
> Suggested-by: Arvind Sankar 
> Suggested-by: Fangrui Song 
> Suggested-by: Nathan Chancellor 
> Suggested-by: Masahiro Yamada 
> Signed-off-by: Nick Desaulniers 

One comment below:

Reviewed-by: Nathan Chancellor 

> ---
>  Makefile  |  5 ++---
>  lib/Kconfig.debug | 16 +++-
>  2 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 95ab9856f357..d2b4980807e0 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -830,9 +830,8 @@ ifneq ($(LLVM_IAS),1)
>  KBUILD_AFLAGS+= -Wa,-gdwarf-2

It is probably worth a comment somewhere that assembly files will still
have DWARF v2.


I agree.
Please noting the reason will be helpful.

Could you summarize Jakub's comment in short?
https://patchwork.kernel.org/project/linux-kbuild/patch/20201022012106.1875129-1-ndesaulni...@google.com/#23727667






One more question.


Can we remove -g option like follows?


ifdef CONFIG_DEBUG_INFO_SPLIT
DEBUG_CFLAGS   += -gsplit-dwarf
-else
-DEBUG_CFLAGS   += -g
endif


GCC 11/Clang 12 -gsplit-dwarf no longer imply -g2
(https://reviews.llvm.org/D80391). May be worth checking whether
-gsplit-dwarf is used without a debug info enabling option.






In the current mainline code,
-g is the only debug option
if CONFIG_DEBUG_INFO_DWARF4 is disabled.


The GCC manual says:
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Debugging-Options.html#Debugging-Options


-g

   Produce debugging information in the operating system’s
   native format (stabs, COFF, XCOFF, or DWARF).
   GDB can work with this debugging information.


Of course, we expect the -g option will produce
the debug info in the DWARF format.





With this patch set applied, it is very explicit.

Only the format type, but also the version.

The compiler will be given either
-gdwarf-4 or -gdwarf-5,
making the -g option redundant, I think.


-gdwarf-N does imply -g2 but personally I'd not suggest remove it if it
already exists. The non-orthogonality is the reason Clang has
-fdebug-default-version (https://reviews.llvm.org/D69822).












>  endif
>
> -ifdef CONFIG_DEBUG_INFO_DWARF4
> -DEBUG_CFLAGS += -gdwarf-4
> -endif
> +dwarf-version-$(CONFIG_DEBUG_INFO_DWARF4) := 4
> +DEBUG_CFLAGS += -gdwarf-$(dwarf-version-y)
>
>  ifdef CONFIG_DEBUG_INFO_REDUCED
>  DEBUG_CFLAGS += $(call cc-option, -femit-struct-debug-baseonly) \
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index e906ea906cb7..94c1a7ed6306 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -256,13 +256,19 @@ config DEBUG_INFO_SPLIT
> to know about the .dwo files and include them.
> Incompatible with older versions of ccache.
>
> +choice
> + prompt "DWARF version"
> + help
> +   Which version of DWARF debug info to emit.
> +
>  config DEBUG_INFO_DWARF4
> - bool "Generate dwarf4 debuginfo"
> + bool "Generate DWARF Version 4 debuginfo"
>   help
> -   Generate dwarf4 debug info. This requires recent versions
> -   of gcc and gdb. It makes the debug information larger.
> -   But it significantly improves the success of resolving
> -   variables in gdb on optimized code.
> +   Generate DWARF v4 debug info. This requires gcc 4.5+ and gdb 7.0+.
> +   It makes the debug information larger, but it significantly
> +   improves the success of resolving variables in gdb on optimized code.
> +
> +endchoice # "DWARF version"
>
>  config DEBUG_INFO_BTF
>   bool "Generate BTF typeinfo"
> --
> 2.30.0.365.g02bc693789-goog
>




--
Best Regards
Masahiro Yamada


Re: [PATCH v6 2/2] Kbuild: implement support for DWARF v5

2021-01-29 Thread Fangrui Song

On 2021-01-29, Nick Desaulniers wrote:

DWARF v5 is the latest standard of the DWARF debug info format.

Feature detection of DWARF5 is onerous, especially given that we've
removed $(AS), so we must query $(CC) for DWARF5 assembler directive
support.

The DWARF version of a binary can be validated with:
$ llvm-dwarfdump vmlinux | head -n 4 | grep version
or
$ readelf --debug-dump=info vmlinux 2>/dev/null | grep Version

DWARF5 wins significantly in terms of size when mixed with compression
(CONFIG_DEBUG_INFO_COMPRESSED).

363Mvmlinux.clang12.dwarf5.compressed
434Mvmlinux.clang12.dwarf4.compressed
439Mvmlinux.clang12.dwarf2.compressed
457Mvmlinux.clang12.dwarf5
536Mvmlinux.clang12.dwarf4
548Mvmlinux.clang12.dwarf2

515Mvmlinux.gcc10.2.dwarf5.compressed
599Mvmlinux.gcc10.2.dwarf4.compressed
624Mvmlinux.gcc10.2.dwarf2.compressed
630Mvmlinux.gcc10.2.dwarf5
765Mvmlinux.gcc10.2.dwarf4
809Mvmlinux.gcc10.2.dwarf2

Though the quality of debug info is harder to quantify; size is not a
proxy for quality.

Jakub notes:
 All [GCC] 5.1 - 6.x did was start accepting -gdwarf-5 as experimental
 option that enabled some small DWARF subset (initially only a few
 DW_LANG_* codes newly added to DWARF5 drafts).  Only GCC 7 (released
 after DWARF 5 has been finalized) started emitting DWARF5 section
 headers and got most of the DWARF5 changes in...

Version check GCC so that we don't need to worry about the difference in
command line args between GNU readelf and llvm-readelf/llvm-dwarfdump to
validate the DWARF Version in the assembler feature detection script.

GNU `as` only recently gained support for specifying -gdwarf-5, so when
compiling with Clang but without Clang's integrated assembler
(LLVM_IAS=1 is not set), explicitly add -Wa,-gdwarf-5 to DEBUG_CFLAGS.

Disabled for now if CONFIG_DEBUG_INFO_BTF is set; pahole doesn't yet
recognize the new additions to the DWARF debug info. Thanks to Sedat for
the report.

Link: http://www.dwarfstd.org/doc/DWARF5.pdf
Reported-by: Sedat Dilek 
Suggested-by: Arvind Sankar 
Suggested-by: Caroline Tice 
Suggested-by: Fangrui Song 
Suggested-by: Jakub Jelinek 
Suggested-by: Masahiro Yamada 
Suggested-by: Nathan Chancellor 
Signed-off-by: Nick Desaulniers 
---
Makefile  | 12 
include/asm-generic/vmlinux.lds.h |  6 +-
lib/Kconfig.debug | 18 ++
scripts/test_dwarf5_support.sh|  8 
4 files changed, 43 insertions(+), 1 deletion(-)
create mode 100755 scripts/test_dwarf5_support.sh

diff --git a/Makefile b/Makefile
index 20141cd9319e..bed8b3b180b8 100644
--- a/Makefile
+++ b/Makefile
@@ -832,8 +832,20 @@ endif

dwarf-version-$(CONFIG_DEBUG_INFO_DWARF2) := 2
dwarf-version-$(CONFIG_DEBUG_INFO_DWARF4) := 4
+dwarf-version-$(CONFIG_DEBUG_INFO_DWARF5) := 5
DEBUG_CFLAGS+= -gdwarf-$(dwarf-version-y)

+# If using clang without the integrated assembler, we need to explicitly tell
+# GAS that we will be feeding it DWARF v5 assembler directives. Kconfig should
+# detect whether the version of GAS supports DWARF v5.
+ifdef CONFIG_CC_IS_CLANG
+ifneq ($(LLVM_IAS),1)
+ifeq ($(dwarf-version-y),5)
+DEBUG_CFLAGS   += -Wa,-gdwarf-5
+endif
+endif
+endif
+
ifdef CONFIG_DEBUG_INFO_REDUCED
DEBUG_CFLAGS+= $(call cc-option, -femit-struct-debug-baseonly) \
   $(call cc-option,-fno-var-tracking)
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 34b7e0d2346c..f8d5455cd87f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -843,7 +843,11 @@
.debug_types0 : { *(.debug_types) } \
/* DWARF 5 */   \
.debug_macro0 : { *(.debug_macro) } \
-   .debug_addr 0 : { *(.debug_addr) }
+   .debug_addr 0 : { *(.debug_addr) }  \
+   .debug_line_str 0 : { *(.debug_line_str) }  \
+   .debug_loclists 0 : { *(.debug_loclists) }  \
+   .debug_rnglists 0 : { *(.debug_rnglists) }  \
+   .debug_str_offsets  0 : { *(.debug_str_offsets) }

/* Stabs debugging sections. */
#define STABS_DEBUG \
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1850728b23e6..09146b1af20d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -273,6 +273,24 @@ config DEBUG_INFO_DWARF4
  It makes the debug information larger, but it significantly
  improves the success of resolving variables in gdb on optimized code.

+config DEBUG_INFO_DWARF5
+   bool "Generate DWARF Version 5 debuginfo"
+   depends on GCC_VERSION >= 5 || CC_IS_CLANG
+   depends on CC_IS_GCC || 
$(success,$(srctree)/scripts/test_dwarf5_support.sh $(CC) $(CLANG_FLAGS))
+   depends on !DEBUG_INFO_

Re: [PATCH v6 2/2] Kbuild: implement support for DWARF v5

2021-01-29 Thread Fangrui Song

On 2021-01-29, Nick Desaulniers wrote:

DWARF v5 is the latest standard of the DWARF debug info format.

Feature detection of DWARF5 is onerous, especially given that we've
removed $(AS), so we must query $(CC) for DWARF5 assembler directive
support.

The DWARF version of a binary can be validated with:
$ llvm-dwarfdump vmlinux | head -n 4 | grep version
or
$ readelf --debug-dump=info vmlinux 2>/dev/null | grep Version

DWARF5 wins significantly in terms of size when mixed with compression
(CONFIG_DEBUG_INFO_COMPRESSED).

363Mvmlinux.clang12.dwarf5.compressed
434Mvmlinux.clang12.dwarf4.compressed
439Mvmlinux.clang12.dwarf2.compressed
457Mvmlinux.clang12.dwarf5
536Mvmlinux.clang12.dwarf4
548Mvmlinux.clang12.dwarf2

515Mvmlinux.gcc10.2.dwarf5.compressed
599Mvmlinux.gcc10.2.dwarf4.compressed
624Mvmlinux.gcc10.2.dwarf2.compressed
630Mvmlinux.gcc10.2.dwarf5
765Mvmlinux.gcc10.2.dwarf4
809Mvmlinux.gcc10.2.dwarf2

Though the quality of debug info is harder to quantify; size is not a
proxy for quality.

Jakub notes:
 All [GCC] 5.1 - 6.x did was start accepting -gdwarf-5 as experimental
 option that enabled some small DWARF subset (initially only a few
 DW_LANG_* codes newly added to DWARF5 drafts).  Only GCC 7 (released
 after DWARF 5 has been finalized) started emitting DWARF5 section
 headers and got most of the DWARF5 changes in...

Version check GCC so that we don't need to worry about the difference in
command line args between GNU readelf and llvm-readelf/llvm-dwarfdump to
validate the DWARF Version in the assembler feature detection script.

GNU `as` only recently gained support for specifying -gdwarf-5, so when
compiling with Clang but without Clang's integrated assembler
(LLVM_IAS=1 is not set), explicitly add -Wa,-gdwarf-5 to DEBUG_CFLAGS.

Disabled for now if CONFIG_DEBUG_INFO_BTF is set; pahole doesn't yet
recognize the new additions to the DWARF debug info. Thanks to Sedat for
the report.

Link: http://www.dwarfstd.org/doc/DWARF5.pdf
Reported-by: Sedat Dilek 
Suggested-by: Arvind Sankar 
Suggested-by: Caroline Tice 
Suggested-by: Fangrui Song 
Suggested-by: Jakub Jelinek 
Suggested-by: Masahiro Yamada 
Suggested-by: Nathan Chancellor 
Signed-off-by: Nick Desaulniers 
---
Makefile  | 12 
include/asm-generic/vmlinux.lds.h |  6 +-
lib/Kconfig.debug | 18 ++
scripts/test_dwarf5_support.sh|  8 
4 files changed, 43 insertions(+), 1 deletion(-)
create mode 100755 scripts/test_dwarf5_support.sh

diff --git a/Makefile b/Makefile
index 20141cd9319e..bed8b3b180b8 100644
--- a/Makefile
+++ b/Makefile
@@ -832,8 +832,20 @@ endif

dwarf-version-$(CONFIG_DEBUG_INFO_DWARF2) := 2
dwarf-version-$(CONFIG_DEBUG_INFO_DWARF4) := 4
+dwarf-version-$(CONFIG_DEBUG_INFO_DWARF5) := 5
DEBUG_CFLAGS+= -gdwarf-$(dwarf-version-y)

+# If using clang without the integrated assembler, we need to explicitly tell
+# GAS that we will be feeding it DWARF v5 assembler directives. Kconfig should
+# detect whether the version of GAS supports DWARF v5.
+ifdef CONFIG_CC_IS_CLANG
+ifneq ($(LLVM_IAS),1)
+ifeq ($(dwarf-version-y),5)
+DEBUG_CFLAGS   += -Wa,-gdwarf-5
+endif
+endif
+endif
+
ifdef CONFIG_DEBUG_INFO_REDUCED
DEBUG_CFLAGS+= $(call cc-option, -femit-struct-debug-baseonly) \
   $(call cc-option,-fno-var-tracking)
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 34b7e0d2346c..f8d5455cd87f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -843,7 +843,11 @@
.debug_types0 : { *(.debug_types) } \
/* DWARF 5 */   \
.debug_macro0 : { *(.debug_macro) } \
-   .debug_addr 0 : { *(.debug_addr) }
+   .debug_addr 0 : { *(.debug_addr) }  \
+   .debug_line_str 0 : { *(.debug_line_str) }  \
+   .debug_loclists 0 : { *(.debug_loclists) }  \
+   .debug_rnglists 0 : { *(.debug_rnglists) }  \
+   .debug_str_offsets  0 : { *(.debug_str_offsets) }


Add .debug_names for -gdwarf-5 -gpubnames

The internal linker script of GNU ld 2.36 will have it.
https://sourceware.org/pipermail/binutils/2021-January/115064.html

(Compilers don't generate .debug_sup, I added to GNU ld just for
future-proof.).


/* Stabs debugging sections. */
#define STABS_DEBUG \
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1850728b23e6..09146b1af20d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -273,6 +273,24 @@ config DEBUG_INFO_DWARF4
  It makes the debug information larger, but it significantly
  improves the success of resolving variables in gdb on optimized code.

+config DEBUG_INFO_DWARF5
+   b

Re: [PATCH] vmlinux.lds.h: Define SANTIZER_DISCARDS with CONFIG_GCOV_KERNEL=y

2021-01-29 Thread Fangrui Song

On 2021-01-29, Nick Desaulniers wrote:

On Fri, Jan 29, 2021 at 12:11 PM Nathan Chancellor  wrote:


clang produces .eh_frame sections when CONFIG_GCOV_KERNEL is enabled,
even when -fno-asynchronous-unwind-tables is in KBUILD_CFLAGS:

$ make CC=clang vmlinux
...
ld: warning: orphan section `.eh_frame' from `init/main.o' being placed in 
section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/version.o' being placed in 
section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/do_mounts.o' being placed in 
section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/do_mounts_initrd.o' being 
placed in section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/initramfs.o' being placed in 
section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/calibrate.o' being placed in 
section `.eh_frame'
ld: warning: orphan section `.eh_frame' from `init/init_task.o' being placed in 
section `.eh_frame'
...

$ rg "GCOV_KERNEL|GCOV_PROFILE_ALL" .config
CONFIG_GCOV_KERNEL=y
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
CONFIG_GCOV_PROFILE_ALL=y

This was already handled for a couple of other options in
commit d812db78288d ("vmlinux.lds.h: Avoid KASAN and KCSAN's unwanted
sections") and there is an open LLVM bug for this issue. Take advantage
of that section for this config as well so that there are no more orphan
warnings.

Link: https://bugs.llvm.org/show_bug.cgi?id=46478
Link: https://github.com/ClangBuiltLinux/linux/issues/1069
Reported-by: kernel test robot 
Signed-off-by: Nathan Chancellor 


Reviewed-by: Nick Desaulniers 

I suspect we're going to need to add module level attributes in LLVM
IR for these options, then check those when synthesizing new function
definitions within LLVM.  At least we'll be able to point to this file
and say "hey, this is a general problem in LLVM, and here are 3
specific cases now where it's a problem."  Not a large problem, but
would help us save some bytes in the final object.  LLVM is not
producing data in this section for all code, just these synthesized
routines.


Maybe. There are also a long list of security features which may impose
additional requirements. Adding a module flag metadata for each such
feature will be a long battle. For .eh_frame, I think it is
important/generic enough and can benefit other applications that
deserves special handling (and I can look into it). For .init_array, I
am not too sure


---
 include/asm-generic/vmlinux.lds.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..f753fd449436 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -988,12 +988,13 @@
 #endif

 /*
- * Clang's -fsanitize=kernel-address and -fsanitize=thread produce
+ * Clang's -fsanitize=kernel-address, -fsanitize=thread,
+ * and -fprofile-arcs -ftest-coverage produce unwanted
  * unwanted sections (.eh_frame and .init_array.*), but
  * CONFIG_CONSTRUCTORS wants to keep any .init_array.* sections.
  * https://bugs.llvm.org/show_bug.cgi?id=46478
  */
-#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KCSAN)
+#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KCSAN) || 
defined(CONFIG_GCOV_KERNEL)
 # ifdef CONFIG_CONSTRUCTORS
 #  define SANITIZER_DISCARDS   \
*(.eh_frame)

base-commit: bec4c2968fce2f44ce62d05288a633cd99a722eb
--
2.30.0



Drop -ftest-coverage. -ftest-coverage just produces .gcno and does not
affect code generation.

Reviewed-by: Fangrui Song 


[tip: x86/build] x86/build: Treat R_386_PLT32 relocation as R_386_PC32

2021-01-28 Thread tip-bot2 for Fangrui Song
The following commit has been merged into the x86/build branch of tip:

Commit-ID: bb73d07148c405c293e576b40af37737faf23a6a
Gitweb:
https://git.kernel.org/tip/bb73d07148c405c293e576b40af37737faf23a6a
Author:Fangrui Song 
AuthorDate:Wed, 27 Jan 2021 12:56:00 -08:00
Committer: Borislav Petkov 
CommitterDate: Thu, 28 Jan 2021 12:24:06 +01:00

x86/build: Treat R_386_PLT32 relocation as R_386_PC32

This is similar to commit

  b21ebf2fb4cd ("x86: Treat R_X86_64_PLT32 as R_X86_64_PC32")

but for i386. As far as the kernel is concerned, R_386_PLT32 can be
treated the same as R_386_PC32.

R_386_PLT32/R_X86_64_PLT32 are PC-relative relocation types which
can only be used by branches. If the referenced symbol is defined
externally, a PLT will be used.

R_386_PC32/R_X86_64_PC32 are PC-relative relocation types which can be
used by address taking operations and branches. If the referenced symbol
is defined externally, a copy relocation/canonical PLT entry will be
created in the executable.

On x86-64, there is no PIC vs non-PIC PLT distinction and an
R_X86_64_PLT32 relocation is produced for both `call/jmp foo` and
`call/jmp foo@PLT` with newer (2018) GNU as/LLVM integrated assembler.
This avoids canonical PLT entries (st_shndx=0, st_value!=0).

On i386, there are 2 types of PLTs, PIC and non-PIC. Currently,
the GCC/GNU as convention is to use R_386_PC32 for non-PIC PLT and
R_386_PLT32 for PIC PLT. Copy relocations/canonical PLT entries
are possible ABI issues but GCC/GNU as will likely keep the status
quo because (1) the ABI is legacy (2) the change will drop a GNU
ld diagnostic for non-default visibility ifunc in shared objects.

clang-12 -fno-pic (since [1]) can emit R_386_PLT32 for compiler
generated function declarations, because preventing canonical PLT
entries is weighed over the rare ifunc diagnostic.

Further info for the more interested:

  https://github.com/ClangBuiltLinux/linux/issues/1210
  https://sourceware.org/bugzilla/show_bug.cgi?id=27169
  
https://github.com/llvm/llvm-project/commit/a084c0388e2a59b9556f2de008232da3f1d6
 [1]

 [ bp: Massage commit message. ]

Reported-by: Arnd Bergmann 
Signed-off-by: Fangrui Song 
Signed-off-by: Borislav Petkov 
Reviewed-by: Nick Desaulniers 
Reviewed-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Tested-by: Nathan Chancellor 
Tested-by: Sedat Dilek 
Link: https://lkml.kernel.org/r/20210127205600.1227437-1-mask...@google.com
---
 arch/x86/kernel/module.c |  1 +
 arch/x86/tools/relocs.c  | 12 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 34b153c..5e9a34b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -114,6 +114,7 @@ int apply_relocate(Elf32_Shdr *sechdrs,
*location += sym->st_value;
break;
case R_386_PC32:
+   case R_386_PLT32:
/* Add the value, subtract its position */
*location += sym->st_value - (uint32_t)location;
break;
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index ce7188c..1c3a196 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -867,9 +867,11 @@ static int do_reloc32(struct section *sec, Elf_Rel *rel, 
Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
-* NONE can be ignored and PC relative relocations don't
-* need to be adjusted.
+* NONE can be ignored and PC relative relocations don't need
+* to be adjusted. Because sym must be defined, R_386_PLT32 can
+* be treated the same way as R_386_PC32.
 */
break;
 
@@ -910,9 +912,11 @@ static int do_reloc_real(struct section *sec, Elf_Rel 
*rel, Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
-* NONE can be ignored and PC relative relocations don't
-* need to be adjusted.
+* NONE can be ignored and PC relative relocations don't need
+* to be adjusted. Because sym must be defined, R_386_PLT32 can
+* be treated the same way as R_386_PC32.
 */
break;
 


[PATCH v4] x86: Treat R_386_PLT32 as R_386_PC32

2021-01-27 Thread Fangrui Song
This is similar to commit b21ebf2fb4cd ("x86: Treat R_X86_64_PLT32 as
R_X86_64_PC32"), but for i386.  As far as Linux kernel is concerned,
R_386_PLT32 can be treated the same as R_386_PC32.

R_386_PLT32/R_X86_64_PLT32 are PC-relative relocation types which can
only be used by branches. If the referenced symbol is defined
externally, a PLT will be used.
R_386_PC32/R_X86_64_PC32 are PC-relative relocation types which can be
used by address taking operations and branches. If the referenced symbol
is defined externally, a copy relocation/canonical PLT entry will be
created in the executable.

On x86-64, there is no PIC vs non-PIC PLT distinction and an
R_X86_64_PLT32 relocation is produced for both `call/jmp foo` and
`call/jmp foo@PLT` with newer (2018) GNU as/LLVM integrated assembler.
This avoids canonical PLT entries (st_shndx=0, st_value!=0).

On i386, there are 2 types of PLTs, PIC and non-PIC. Currently the
GCC/GNU as convention is to use R_386_PC32 for non-PIC PLT and
R_386_PLT32 for PIC PLT. Copy relocations/canonical PLT entries are
possible ABI issues but GCC/GNU as will likely keep the status quo
because (1) the ABI is legacy (2) the change will drop a GNU ld
diagnostic for non-default visibility ifunc in shared objects.
https://sourceware.org/bugzilla/show_bug.cgi?id=27169

clang-12 -fno-pic (since
https://github.com/llvm/llvm-project/commit/a084c0388e2a59b9556f2de008232da3f1d6)
can emit R_386_PLT32 for compiler generated function declarations,
because preventing canonical PLT entries is weighed over the rare ifunc
diagnostic.

Link: https://github.com/ClangBuiltLinux/linux/issues/1210
Reported-by: Arnd Bergmann 
Signed-off-by: Fangrui Song 
Reviewed-by: Nick Desaulniers 
Reviewed-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Tested-by: Nathan Chancellor 

---
Change in v2:
* Improve commit message
---
Change in v3:
* Change the GCC link to the more relevant GNU as link.
* Fix the relevant llvm-project commit.
---
Change in v4:
* Improve comments and commit message
---
 arch/x86/kernel/module.c |  1 +
 arch/x86/tools/relocs.c  | 12 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 34b153cbd4ac..5e9a34b5bd74 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -114,6 +114,7 @@ int apply_relocate(Elf32_Shdr *sechdrs,
*location += sym->st_value;
break;
case R_386_PC32:
+   case R_386_PLT32:
/* Add the value, subtract its position */
*location += sym->st_value - (uint32_t)location;
break;
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index ce7188cbdae5..1c3a1962cade 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -867,9 +867,11 @@ static int do_reloc32(struct section *sec, Elf_Rel *rel, 
Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
-* NONE can be ignored and PC relative relocations don't
-* need to be adjusted.
+* NONE can be ignored and PC relative relocations don't need
+* to be adjusted. Because sym must be defined, R_386_PLT32 can
+* be treated the same way as R_386_PC32.
 */
break;
 
@@ -910,9 +912,11 @@ static int do_reloc_real(struct section *sec, Elf_Rel 
*rel, Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
-* NONE can be ignored and PC relative relocations don't
-* need to be adjusted.
+* NONE can be ignored and PC relative relocations don't need
+* to be adjusted. Because sym must be defined, R_386_PLT32 can
+* be treated the same way as R_386_PC32.
 */
break;
 
-- 
2.30.0.280.ga3ce27912f-goog



Re: [PATCH v3] x86: Treat R_386_PLT32 as R_386_PC32

2021-01-25 Thread Fangrui Song



On 2021-01-25, Borislav Petkov wrote:

It's a good thing I have a toolchain guy who can explain to me what you
guys are doing because you need to start writing those commit messages
for !toolchain developers.


How about this following message? I'll answer your questions in line as
well. Explaining everything in the message will be quite long...  If
someone is interested, I have put every possibly related matter in
https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected


This is similar to commit b21ebf2fb4cd ("x86: Treat R_X86_64_PLT32 as
R_X86_64_PC32"), but for i386.  As far as Linux kernel is concerned,
R_386_PLT32 can be treated the same as R_386_PC32.

R_386_PLT32/R_X86_64_PLT32 are PC-relative relocation types which can
only be used by branches. If the referenced symbol is defined
externally, a PLT will be used.
R_386_PC32/R_X86_64_PC32 are PC-relative relocation types which can be
used by address taking operations and branches. If the referenced symbol
is defined externally, a copy relocation/canonical PLT entry will be
created in the executable.

On x86-64, there is no PIC vs non-PIC PLT distinction and an
R_X86_64_PLT32 relocation is produced for both `call/jmp foo` and
`call/jmp foo@PLT` with newer (2018) GNU as/LLVM integrated assembler.
This avoids copy relocations/canonical PLT entries.

On i386, there are 2 types of PLTs, PIC and non-PIC. Currently the
GCC/GNU as convention is to use R_386_PC32 for non-PIC PLT and
R_386_PLT32 for PIC PLT. Copy relocations/canonical PLT entries are
possible ABI issues but GCC/GNU as will likely keep the status quo
because (1) the ABI is legacy (2) the change will drop a GNU ld
diagnostic for non-default visibility ifunc in shared objects.
https://sourceware.org/bugzilla/show_bug.cgi?id=27169

clang-12 -fno-pic (since
https://github.com/llvm/llvm-project/commit/a084c0388e2a59b9556f2de008232da3f1d6)
can emit R_386_PLT32 for compiler generated function declarations,
because preventing canonical PLT entries is weighed over the rare ifunc
diagnostic.

Link: https://github.com/ClangBuiltLinux/linux/issues/1210
Reported-by: Arnd Bergmann 
Signed-off-by: Fangrui Song 
Reviewed-by: Nick Desaulniers 
Reviewed-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Tested-by: Nathan Chancellor 



On Thu, Jan 14, 2021 at 02:48:19PM -0800, Fangrui Song wrote:

This is similar to commit b21ebf2fb4cd ("x86: Treat R_X86_64_PLT32 as
R_X86_64_PC32"), but for i386.  As far as Linux kernel is concerned,
R_386_PLT32 can be treated the same as R_386_PC32.

R_386_PC32/R_X86_64_PC32 are PC-relative relocation types with the
requirement that the symbol address is significant.
R_386_PLT32/R_X86_64_PLT32 are PC-relative relocation types without the
address significance requirement.


I was told what "significant" means in that context and while it is
clear to you, I'm pretty sure it is not clear to kernel developers who
haven't looked at toolchains in depth. So please elaborate.


Expanded "significant" to more words. See above.


On x86-64, there is no PIC vs non-PIC PLT distinction and an
R_X86_64_PLT32 relocation is produced for both `call/jmp foo` and
`call/jmp foo@PLT` with newer (2018) GNU as/LLVM integrated assembler.


Also, please explain in short why LLVM is generating R_X86_64_PLT32
relocs now? I.e., is it the same reason as why binutils does that?

I.e., mentioning the big picture of things would help as to why you're
doing this.


It has been explained. The LLVM change was in 2018, roughly the same
time when GNU as emitted R_X86_64_PLT32. I think it does not need
extended explanation because of the separate canonical PLT entries
paragraph.


On i386, there are 2 types of PLTs, PIC and non-PIC. Currently the
convention is to use R_386_PC32 for non-PIC PLT and R_386_PLT32 for PIC
PLT.


Convention in general or convention for LLVM?


Changed to "GCC/GNU as convention".


clang-12 -fno-pic (since
https://github.com/llvm/llvm-project/commit/a084c0388e2a59b9556f2de008232da3f1d6)
can emit R_386_PLT32 for compiler generated function declarations as
well to avoid a canonical PLT entry (st_shndx=0, st_value!=0) if the
symbol turns out to be defined externally. GCC/GNU as will likely keep
using R_386_PC32 because (1) the ABI is legacy (2) the change will drop
a GNU ld non-default visibility ifunc for shared objects.
https://sourceware.org/bugzilla/show_bug.cgi?id=27169


Not sure how useful this paragraph is for kernel developers...


Reorganize it a bit...


Link: https://github.com/ClangBuiltLinux/linux/issues/1210
Reported-by: Arnd Bergmann 
Signed-off-by: Fangrui Song 
Reviewed-by: Nick Desaulniers 
Reviewed-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Tested-by: Nathan Chancellor 

---
Change in v2:
* Improve commit message
---
Change in v3:
* Change the GCC link to the more relevant GNU as link.
* Fix the relevant llvm-project commit id.
---
 arch/x86/kernel/module.c | 1 +
 a

Re: [PATCH v4 00/10] Function Granular KASLR

2021-01-23 Thread Fangrui Song

On 2020-08-28, Josh Poimboeuf wrote:

On Fri, Aug 28, 2020 at 12:21:13PM +0200, Miroslav Benes wrote:

> Hi there! I was trying to find a super easy way to address this, so I
> thought the best thing would be if there were a compiler or linker
> switch to just eliminate any duplicate symbols at compile time for
> vmlinux. I filed this question on the binutils bugzilla looking to see
> if there were existing flags that might do this, but H.J. Lu went ahead
> and created a new one "-z unique", that seems to do what we would need
> it to do.
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26391
>
> When I use this option, it renames any duplicate symbols with an
> extension - for example duplicatefunc.1 or duplicatefunc.2. You could
> either match on the full unique name of the specific binary you are
> trying to patch, or you match the base name and use the extension to
> determine original position. Do you think this solution would work?

Yes, I think so (thanks, Joe, for testing!).

It looks cleaner to me than the options above, but it may just be a matter
of taste. Anyway, I'd go with full name matching, because -z unique-symbol
would allow us to remove sympos altogether, which is appealing.

> If
> so, I can modify livepatch to refuse to patch on duplicated symbols if
> CONFIG_FG_KASLR and when this option is merged into the tool chain I
> can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching
> should work in all cases.

Ok.

Josh, Petr, would this work for you too?


Sounds good to me.  Kristen, thanks for finding a solution!


(I am not subscribed. I came here via 
https://sourceware.org/bugzilla/show_bug.cgi?id=26391 (ld -z unique-symbol))


This works great after randomization because it always receives the
current address at runtime rather than relying on any kind of
buildtime address. The issue with with the live-patching code's
algorithm for resolving duplicate symbol names. If they request a
symbol by name from the kernel and there are 3 symbols with the same
name, they use the symbol's position in the built binary image to
select the correct symbol.


If a.o, b.o and c.o define local symbol 'foo'.
By position, do you mean that

* the live-patching code uses something like (findall("foo")[0], findall("foo")[1], 
findall("foo")[2]) ?
* shuffling a.o/b.o/c.o will make the returned triple different

Local symbols are not required to be unique. Instead of patching the toolchain,
have you thought about making the live-patching code smarter?
(Depend on the duplicates, such a linker option can increase the link 
time/binary size considerably
AND I don't know in what other cases such an option will be useful)

For the following example, 


https://sourceware.org/bugzilla/show_bug.cgi?id=26822

  # RUN: split-file %s %t
  # RUN: gcc -c %t/a.s -o %t/a.o
  # RUN: gcc -c %t/b.s -o %t/b.o
  # RUN: gcc -c %t/c.s -o %t/c.o
  # RUN: ld-new %t/a.o %t/b.o %t/c.o -z unique-symbol -o %t.exe
  
  #--- a.s

  a: a.1: a.2: nop
  #--- b.s
  a: nop
  #--- c.s
  a: nop

readelf -Ws output:

Symbol table '.symtab' contains 13 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
 0:  0 NOTYPE  LOCAL  DEFAULT  UND 
 1:  0 FILELOCAL  DEFAULT  ABS a.o

 2: 00401000 0 NOTYPE  LOCAL  DEFAULT1 a
 3: 00401000 0 NOTYPE  LOCAL  DEFAULT1 a.1
 4: 00401000 0 NOTYPE  LOCAL  DEFAULT1 a.2
 5:  0 FILELOCAL  DEFAULT  ABS b.o
 6: 00401001 0 NOTYPE  LOCAL  DEFAULT1 a.1
 7:  0 FILELOCAL  DEFAULT  ABS c.o
 8: 00401002 0 NOTYPE  LOCAL  DEFAULT1 a.2
 9:  0 NOTYPE  GLOBAL DEFAULT  UND _start
10: 00402000 0 NOTYPE  GLOBAL DEFAULT1 __bss_start
11: 00402000 0 NOTYPE  GLOBAL DEFAULT1 _edata
12: 00402000 0 NOTYPE  GLOBAL DEFAULT1 _end

Note that you have STT_FILE SHN_ABS symbols.
If the compiler does not produce them, they will be synthesized by GNU ld.

  https://sourceware.org/bugzilla/show_bug.cgi?id=26822
  ld.bfd copies non-STT_SECTION local symbols from input object files.  If an
  object file does not have STT_FILE symbols (no .file directive) but has
  non-STT_SECTION local symbols, ld.bfd synthesizes a STT_FILE symbol

The filenames are usually base names, so "a.o" and "a.o" in two directories will
be indistinguishable.  The live-patching code can possibly work around this by
not changing the relative order of the two "a.o".


Re: [PATCH bpf-next v2] samples/bpf: Update README.rst and Makefile for manually compiling LLVM and clang

2021-01-19 Thread Fangrui Song

On 2021-01-19, Tiezhu Yang wrote:

The current llvm/clang build procedure in samples/bpf/README.rst is
out of date. See below that the links are not accessible any more.

$ git clone http://llvm.org/git/llvm.git
Cloning into 'llvm'...
fatal: unable to access 'http://llvm.org/git/llvm.git/': Maximum (20) redirects 
followed
$ git clone --depth 1 http://llvm.org/git/clang.git
Cloning into 'clang'...
fatal: unable to access 'http://llvm.org/git/clang.git/': Maximum (20) 
redirects followed

The llvm community has adopted new ways to build the compiler. There are
different ways to build llvm/clang, the Clang Getting Started page [1] has
one way. As Yonghong said, it is better to just copy the build procedure
in Documentation/bpf/bpf_devel_QA.rst to keep consistent.

I verified the procedure and it is proved to be feasible, so we should
update README.rst to reflect the reality. At the same time, update the
related comment in Makefile.

[1] https://clang.llvm.org/get_started.html

Signed-off-by: Tiezhu Yang 
Acked-by: Yonghong Song 
---

v2: Update the commit message suggested by Yonghong,
   thank you very much.

samples/bpf/Makefile   |  2 +-
samples/bpf/README.rst | 17 ++---
2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 26fc96c..d061446 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -208,7 +208,7 @@ TPROGLDLIBS_xdpsock += -pthread -lcap
TPROGLDLIBS_xsk_fwd += -pthread

# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
-#  make M=samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
+# make M=samples/bpf LLC=~/git/llvm-project/llvm/build/bin/llc 
CLANG=~/git/llvm-project/llvm/build/bin/clang
LLC ?= llc
CLANG ?= clang
OPT ?= opt
diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
index dd34b2d..d1be438 100644
--- a/samples/bpf/README.rst
+++ b/samples/bpf/README.rst
@@ -65,17 +65,20 @@ To generate a smaller llc binary one can use::
Quick sniplet for manually compiling LLVM and clang
(build dependencies are cmake and gcc-c++)::

- $ git clone http://llvm.org/git/llvm.git
- $ cd llvm/tools
- $ git clone --depth 1 http://llvm.org/git/clang.git
- $ cd ..; mkdir build; cd build
- $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86"
- $ make -j $(getconf _NPROCESSORS_ONLN)
+ $ git clone https://github.com/llvm/llvm-project.git
+ $ mkdir -p llvm-project/llvm/build/install


llvm-project/llvm/build/install is not used.


+ $ cd llvm-project/llvm/build
+ $ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
+-DLLVM_ENABLE_PROJECTS="clang"\
+-DBUILD_SHARED_LIBS=OFF   \


-DBUILD_SHARED_LIBS=OFF is the default. It can be omitted.


+-DCMAKE_BUILD_TYPE=Release\
+-DLLVM_BUILD_RUNTIME=OFF


-DLLVM_BUILD_RUNTIME=OFF can be omitted if none of
compiler-rt/libc++/libc++abi is built.


+ $ ninja

It is also possible to point make to the newly compiled 'llc' or
'clang' command via redefining LLC or CLANG on the make command line::

- make M=samples/bpf LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
+ make M=samples/bpf LLC=~/git/llvm-project/llvm/build/bin/llc 
CLANG=~/git/llvm-project/llvm/build/bin/clang

Cross compiling samples
---
--
2.1.0

--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clang-built-linux+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clang-built-linux/1611042978-21473-1-git-send-email-yangtiezhu%40loongson.cn.


Re: [PATCH v5 2/3] Kbuild: make DWARF version a choice

2021-01-15 Thread Fangrui Song

On 2021-01-15, Sedat Dilek wrote:

On Fri, Jan 15, 2021 at 10:06 PM Nick Desaulniers
 wrote:


Modifies CONFIG_DEBUG_INFO_DWARF4 to be a member of a choice. Adds an
explicit CONFIG_DEBUG_INFO_DWARF2, which is the default. Does so in a
way that's forward compatible with existing configs, and makes adding
future versions more straightforward.

Suggested-by: Arvind Sankar 
Suggested-by: Fangrui Song 
Suggested-by: Masahiro Yamada 
Signed-off-by: Nick Desaulniers 
---
 Makefile  | 13 ++---
 lib/Kconfig.debug | 21 -
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/Makefile b/Makefile
index d49c3f39ceb4..4eb3bf7ee974 100644
--- a/Makefile
+++ b/Makefile
@@ -826,13 +826,12 @@ else
 DEBUG_CFLAGS   += -g
 endif

-ifneq ($(LLVM_IAS),1)
-KBUILD_AFLAGS  += -Wa,-gdwarf-2
-endif
-
-ifdef CONFIG_DEBUG_INFO_DWARF4
-DEBUG_CFLAGS   += -gdwarf-4
-endif
+dwarf-version-$(CONFIG_DEBUG_INFO_DWARF2) := 2
+dwarf-version-$(CONFIG_DEBUG_INFO_DWARF4) := 4
+DEBUG_CFLAGS   += -gdwarf-$(dwarf-version-y)
+# Binutils 2.35+ required for -gdwarf-4+ support.
+dwarf-aflag:= $(call as-option,-Wa$(comma)-gdwarf-$(dwarf-version-y))
+KBUILD_AFLAGS  += $(dwarf-aflag)

 ifdef CONFIG_DEBUG_INFO_REDUCED
 DEBUG_CFLAGS   += $(call cc-option, -femit-struct-debug-baseonly) \
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index dd7d8d35b2a5..e80770fac4f0 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -256,13 +256,24 @@ config DEBUG_INFO_SPLIT
  to know about the .dwo files and include them.
  Incompatible with older versions of ccache.

+choice
+   prompt "DWARF version"


Here you use "DWARF version" so keep this for v2 and v4.


+   help
+ Which version of DWARF debug info to emit.
+
+config DEBUG_INFO_DWARF2
+   bool "Generate DWARF Version 2 debuginfo"


s/DWARF Version/DWARF version


+   help
+ Generate DWARF v2 debug info.
+
 config DEBUG_INFO_DWARF4
-   bool "Generate dwarf4 debuginfo"
+   bool "Generate DWARF Version 4 debuginfo"


Same here: s/DWARF Version/DWARF version


DWARF Version 2 is fine and preferable.

I have checked DWARF Version 2/3/4/5 specifications.
"DWARF Version 2" is the official way that version is referred to...



- Sedat -


help
- Generate dwarf4 debug info. This requires recent versions
- of gcc and gdb. It makes the debug information larger.
- But it significantly improves the success of resolving
- variables in gdb on optimized code.
+ Generate DWARF v4 debug info. This requires gcc 4.5+ and gdb 7.0+.
+ It makes the debug information larger, but it significantly
+ improves the success of resolving variables in gdb on optimized code.
+
+endchoice # "DWARF version"

 config DEBUG_INFO_BTF
bool "Generate BTF typeinfo"
--
2.30.0.284.gd98b1dd5eaa7-goog



Re: [PATCH v5 1/3] Remove $(cc-option,-gdwarf-4) dependency from CONFIG_DEBUG_INFO_DWARF4

2021-01-15 Thread Fangrui Song

On 2021-01-15, Nick Desaulniers wrote:

From: Masahiro Yamada 

The -gdwarf-4 flag is supported by GCC 4.5+, and also by Clang.

You can see it at https://godbolt.org/z/6ed1oW

 For gcc 4.5.3 pane,line 37:.value 0x4
 For clang 10.0.1 pane, line 117:   .short 4

Given Documentation/process/changes.rst stating GCC 4.9 is the minimal
version, this cc-option is unneeded.

Note


CONFIG_DEBUG_INFO_DWARF4 controls the DWARF version only for C files.

As you can see in the top Makefile, -gdwarf-4 is only passed to CFLAGS.

 ifdef CONFIG_DEBUG_INFO_DWARF4
 DEBUG_CFLAGS+= -gdwarf-4
 endif

This flag is used when compiling *.c files.

On the other hand, the assembler is always given -gdwarf-2.

 KBUILD_AFLAGS   += -Wa,-gdwarf-2

Hence, the debug info that comes from *.S files is always DWARF v2.
This is simply because GAS supported only -gdwarf-2 for a long time.

Recently, GAS gained the support for --dwarf-[3|4|5] options. [1]


The gas commit description has a typo. The supported options are -gdwarf-[345] 
or --gdwarf-[345].
-gdwarf2 and --gdwarf2 are kept for compatibility.

Looks good otherwise.


And, also we have Clang integrated assembler. So, the debug info
for *.S files might be improved if we want.

In my understanding, the current code is intentional, not a bug.

[1] 
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=31bf18645d98b4d3d7357353be840e320649a67d

Reviewed-by: Nick Desaulniers 
Reviewed-by: Nathan Chancellor 
Signed-off-by: Masahiro Yamada 
---
lib/Kconfig.debug | 1 -
1 file changed, 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 78361f0abe3a..dd7d8d35b2a5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -258,7 +258,6 @@ config DEBUG_INFO_SPLIT

config DEBUG_INFO_DWARF4
bool "Generate dwarf4 debuginfo"
-   depends on $(cc-option,-gdwarf-4)
help
  Generate dwarf4 debug info. This requires recent versions
  of gcc and gdb. It makes the debug information larger.
--
2.30.0.284.gd98b1dd5eaa7-goog



Re: [PATCH] mips: vdso: fix DWARF2 warning

2021-01-15 Thread Fangrui Song



On 2021-01-15, Anders Roxell wrote:

On Fri, 15 Jan 2021 at 20:28, Nathan Chancellor
 wrote:


On Fri, Jan 15, 2021 at 08:13:30PM +0100, Anders Roxell wrote:
> When building mips tinyconifg the following warning show up
>
> make --silent --keep-going --jobs=8 
O=/home/anders/src/kernel/next/out/builddir ARCH=mips 
CROSS_COMPILE=mips-linux-gnu- HOSTCC=clang CC=clang
> /srv/src/kernel/next/arch/mips/vdso/elf.S:14:1: warning: DWARF2 only supports 
one section per compilation unit
> .pushsection .note.Linux, "a",@note ; .balign 4 ; .long 2f - 1f ; .long 4484f - 3f ; 
.long 0 ; 1:.asciz "Linux" ; 2:.balign 4 ; 3:
> ^
> /srv/src/kernel/next/arch/mips/vdso/elf.S:34:2: warning: DWARF2 only supports 
one section per compilation unit
>  .section .mips_abiflags, "a"
>  ^
>
> Rework so the mips vdso Makefile adds flag '-no-integrated-as' unless
> LLVM_IAS is defined.
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/1256
> Cc: sta...@vger.kernel.org # v4.19+
> Suggested-by: Nick Desaulniers 
> Signed-off-by: Anders Roxell 

I believe this is the better solution:

https://lore.kernel.org/r/20210115192622.3828545-1-natechancel...@gmail.com/


Yes, I agree.

Cheers,
Anders


http://lore.kernel.org/r/20201202010850.jibrjpyu6xgkf...@google.com
Personally I'd drop DWARF v2 as an option.


[PATCH v3] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols

2021-01-15 Thread Fangrui Song
clang-12 -fno-pic (since
https://github.com/llvm/llvm-project/commit/a084c0388e2a59b9556f2de008232da3f1d6)
can emit `call __stack_chk_fail@PLT` instead of `call __stack_chk_fail`
on x86.  The two forms should have identical behaviors on x86-64 but the
former causes GNU as<2.37 to produce an unreferenced undefined symbol
_GLOBAL_OFFSET_TABLE_.

(On x86-32, there is an R_386_PC32 vs R_386_PLT32 difference but the
linker behavior is identical as far as Linux kernel is concerned.)

Simply ignore _GLOBAL_OFFSET_TABLE_ for now, like what
scripts/mod/modpost.c:ignore_undef_symbol does. This also fixes the
problem for gcc/clang -fpie and -fpic, which may emit `call foo@PLT` for
external function calls on x86.

Note: ld -z defs and dynamic loaders do not error for unreferenced
undefined symbols so the module loader is reading too much.  If we ever
need to ignore more symbols, the code should be refactored to ignore
unreferenced symbols.

Reported-by: Marco Elver 
Link: https://github.com/ClangBuiltLinux/linux/issues/1250
Signed-off-by: Fangrui Song 
Reviewed-by: Nick Desaulniers 
Tested-by: Marco Elver 
Cc: 

---
Changes in v2:
* Fix Marco's email address
* Add a function ignore_undef_symbol similar to 
scripts/mod/modpost.c:ignore_undef_symbol
---
Changes in v3:
* Fix the style of a multi-line comment.
* Use static bool ignore_undef_symbol.
---
 kernel/module.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 4bf30e4b3eaa..805c49d1b86d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2348,6 +2348,21 @@ static int verify_exported_symbols(struct module *mod)
return 0;
 }
 
+static bool ignore_undef_symbol(Elf_Half emachine, const char *name)
+{
+   /*
+* On x86, PIC code and Clang non-PIC code may have call foo@PLT. GNU as
+* before 2.37 produces an unreferenced _GLOBAL_OFFSET_TABLE_ on x86-64.
+* i386 has a similar problem but may not deserve a fix.
+*
+* If we ever have to ignore many symbols, consider refactoring the 
code to
+* only warn if referenced by a relocation.
+*/
+   if (emachine == EM_386 || emachine == EM_X86_64)
+   return !strcmp(name, "_GLOBAL_OFFSET_TABLE_");
+   return false;
+}
+
 /* Change all symbols so that st_value encodes the pointer directly. */
 static int simplify_symbols(struct module *mod, const struct load_info *info)
 {
@@ -2395,8 +2410,10 @@ static int simplify_symbols(struct module *mod, const 
struct load_info *info)
break;
}
 
-   /* Ok if weak.  */
-   if (!ksym && ELF_ST_BIND(sym[i].st_info) == STB_WEAK)
+   /* Ok if weak or ignored.  */
+   if (!ksym &&
+   (ELF_ST_BIND(sym[i].st_info) == STB_WEAK ||
+ignore_undef_symbol(info->hdr->e_machine, name)))
break;
 
ret = PTR_ERR(ksym) ?: -ENOENT;
-- 
2.30.0.296.g2bfb1c46d8-goog



[PATCH v3] x86: Treat R_386_PLT32 as R_386_PC32

2021-01-14 Thread Fangrui Song
This is similar to commit b21ebf2fb4cd ("x86: Treat R_X86_64_PLT32 as
R_X86_64_PC32"), but for i386.  As far as Linux kernel is concerned,
R_386_PLT32 can be treated the same as R_386_PC32.

R_386_PC32/R_X86_64_PC32 are PC-relative relocation types with the
requirement that the symbol address is significant.
R_386_PLT32/R_X86_64_PLT32 are PC-relative relocation types without the
address significance requirement.

On x86-64, there is no PIC vs non-PIC PLT distinction and an
R_X86_64_PLT32 relocation is produced for both `call/jmp foo` and
`call/jmp foo@PLT` with newer (2018) GNU as/LLVM integrated assembler.

On i386, there are 2 types of PLTs, PIC and non-PIC. Currently the
convention is to use R_386_PC32 for non-PIC PLT and R_386_PLT32 for PIC
PLT.

clang-12 -fno-pic (since
https://github.com/llvm/llvm-project/commit/a084c0388e2a59b9556f2de008232da3f1d6)
can emit R_386_PLT32 for compiler generated function declarations as
well to avoid a canonical PLT entry (st_shndx=0, st_value!=0) if the
symbol turns out to be defined externally. GCC/GNU as will likely keep
using R_386_PC32 because (1) the ABI is legacy (2) the change will drop
a GNU ld non-default visibility ifunc for shared objects.
https://sourceware.org/bugzilla/show_bug.cgi?id=27169

Link: https://github.com/ClangBuiltLinux/linux/issues/1210
Reported-by: Arnd Bergmann 
Signed-off-by: Fangrui Song 
Reviewed-by: Nick Desaulniers 
Reviewed-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Tested-by: Nathan Chancellor 

---
Change in v2:
* Improve commit message
---
Change in v3:
* Change the GCC link to the more relevant GNU as link.
* Fix the relevant llvm-project commit id.
---
 arch/x86/kernel/module.c | 1 +
 arch/x86/tools/relocs.c  | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 34b153cbd4ac..5e9a34b5bd74 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -114,6 +114,7 @@ int apply_relocate(Elf32_Shdr *sechdrs,
*location += sym->st_value;
break;
case R_386_PC32:
+   case R_386_PLT32:
/* Add the value, subtract its position */
*location += sym->st_value - (uint32_t)location;
break;
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index ce7188cbdae5..717e48ca28b6 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -867,6 +867,7 @@ static int do_reloc32(struct section *sec, Elf_Rel *rel, 
Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
 * NONE can be ignored and PC relative relocations don't
 * need to be adjusted.
@@ -910,6 +911,7 @@ static int do_reloc_real(struct section *sec, Elf_Rel *rel, 
Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
 * NONE can be ignored and PC relative relocations don't
 * need to be adjusted.
-- 
2.30.0.296.g2bfb1c46d8-goog



[PATCH v2] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols

2021-01-14 Thread Fangrui Song
clang-12 -fno-pic (since
https://github.com/llvm/llvm-project/commit/a084c0388e2a59b9556f2de008232da3f1d6)
can emit `call __stack_chk_fail@PLT` instead of `call __stack_chk_fail`
on x86.  The two forms should have identical behaviors on x86-64 but the
former causes GNU as<2.37 to produce an unreferenced undefined symbol
_GLOBAL_OFFSET_TABLE_.

(On x86-32, there is an R_386_PC32 vs R_386_PLT32 difference but the
linker behavior is identical as far as Linux kernel is concerned.)

Simply ignore _GLOBAL_OFFSET_TABLE_ for now, like what
scripts/mod/modpost.c:ignore_undef_symbol does. This also fixes the
problem for gcc/clang -fpie and -fpic, which may emit `call foo@PLT` for
external function calls on x86.

Note: ld -z defs and dynamic loaders do not error for unreferenced
undefined symbols so the module loader is reading too much.  If we ever
need to ignore more symbols, the code should be refactored to ignore
unreferenced symbols.

Reported-by: Marco Elver 
Link: https://github.com/ClangBuiltLinux/linux/issues/1250
Signed-off-by: Fangrui Song 
---
 kernel/module.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)
---
Changes in v2:
* Fix Marco's email address
* Add a function ignore_undef_symbol similar to 
scripts/mod/modpost.c:ignore_undef_symbol

diff --git a/kernel/module.c b/kernel/module.c
index 4bf30e4b3eaa..278f5129bde2 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2348,6 +2348,20 @@ static int verify_exported_symbols(struct module *mod)
return 0;
 }
 
+static int ignore_undef_symbol(Elf_Half emachine, const char *name)
+{
+   /* On x86, PIC code and Clang non-PIC code may have call foo@PLT. GNU as
+* before 2.37 produces an unreferenced _GLOBAL_OFFSET_TABLE_ on x86-64.
+* i386 has a similar problem but may not deserve a fix.
+*
+* If we ever have to ignore many symbols, consider refactoring the 
code to
+* only warn if referenced by a relocation.
+*/
+   if (emachine == EM_386 || emachine == EM_X86_64)
+   return !strcmp(name, "_GLOBAL_OFFSET_TABLE_");
+   return 0;
+}
+
 /* Change all symbols so that st_value encodes the pointer directly. */
 static int simplify_symbols(struct module *mod, const struct load_info *info)
 {
@@ -2395,8 +2409,10 @@ static int simplify_symbols(struct module *mod, const 
struct load_info *info)
break;
}
 
-   /* Ok if weak.  */
-   if (!ksym && ELF_ST_BIND(sym[i].st_info) == STB_WEAK)
+   /* Ok if weak or ignored.  */
+   if (!ksym &&
+   (ELF_ST_BIND(sym[i].st_info) == STB_WEAK ||
+ignore_undef_symbol(info->hdr->e_machine, name)))
break;
 
ret = PTR_ERR(ksym) ?: -ENOENT;
-- 
2.30.0.296.g2bfb1c46d8-goog



[PATCH] module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols

2021-01-13 Thread Fangrui Song
clang-12 -fno-pic (since
https://github.com/llvm/llvm-project/commit/a084c0388e2a59b9556f2de008232da3f1d6)
can emit `call __stack_chk_fail@PLT` instead of `call __stack_chk_fail`
on x86.  The two forms should have identical behaviors on x86-64 but the
former causes GNU as<2.37 to produce an unreferenced undefined symbol
_GLOBAL_OFFSET_TABLE_.

(On x86-32, there is an R_386_PC32 vs R_386_PLT32 difference but the
linker behavior is identical as far as Linux kernel is concerned.)

Simply ignore _GLOBAL_OFFSET_TABLE_ for now, like what
scripts/mod/modpost.c:ignore_undef_symbol does. This also fixes the
problem for gcc/clang -fpie and -fpic, which may emit `call foo@PLT` for
external function calls on x86.

Note: ld -z defs and dynamic loaders do not error for unreferenced
undefined symbols so the module loader is reading too much.  If we ever
need to ignore more symbols, the code should be refactored to ignore
unreferenced symbols.

Reported-by: Marco Elver 
Link: https://github.com/ClangBuiltLinux/linux/issues/1250
Signed-off-by: Fangrui Song 
---
 kernel/module.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 4bf30e4b3eaa..2e2deea99289 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2395,8 +2395,14 @@ static int simplify_symbols(struct module *mod, const 
struct load_info *info)
break;
}
 
-   /* Ok if weak.  */
-   if (!ksym && ELF_ST_BIND(sym[i].st_info) == STB_WEAK)
+   /* Ok if weak. Also allow _GLOBAL_OFFSET_TABLE_:
+* GNU as before 2.37 produces an unreferenced 
_GLOBAL_OFFSET_TABLE_
+* for call foo@PLT on x86-64.  If the code ever needs 
to ignore
+* more symbols, refactor the code to only warn if 
referenced by
+* a relocation.
+*/
+   if (!ksym && (ELF_ST_BIND(sym[i].st_info) == STB_WEAK ||
+ !strcmp(name, "_GLOBAL_OFFSET_TABLE_")))
break;
 
ret = PTR_ERR(ksym) ?: -ENOENT;
-- 
2.30.0.284.gd98b1dd5eaa7-goog



Re: [PATCH v3] x86/entry: emit a symbol for register restoring thunk

2021-01-11 Thread Fangrui Song



On 2021-01-11, Nick Desaulniers wrote:

Arnd found a randconfig that produces the warning:

arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at
offset 0x3e

when building with LLVM_IAS=1 (use Clang's integrated assembler). Josh
notes:

 With the LLVM assembler stripping the .text section symbol, objtool
 has no way to reference this code when it generates ORC unwinder
 entries, because this code is outside of any ELF function.

Fangrui notes that this optimization is helpful for reducing images size
when compiling with -ffunction-sections and -fdata-sections. I have
observerd on the order of tens of thousands of symbols for the kernel
images built with those flags. A patch has been authored against GNU
binutils to match this behavior, with a new flag
--generate-unused-section-symbols=[yes|no].


https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d1bcae833b32f1408485ce69f844dcd7ded093a8
has been committed. The patch should be included in binutils 2.37.
The maintainers are welcome to the idea, but fixing all the arch-specific tests 
is tricky.

H.J. fixed the x86 tests and enabled this for x86. When binutils 2.37
come out, some other architectures may follow as well.


We can omit the .L prefix on a label to emit an entry into the symbol
table for the label, with STB_LOCAL binding.  This enables objtool to
generate proper unwind info here with LLVM_IAS=1.


Josh, I think objtool orc generate needs to synthesize STT_SECTION
symbols even if they do not exist in object files.

rg 'SYM_CODE.*\.L' reveals a few other .S files which may have similar problems.


Cc: Fangrui Song 
Link: https://github.com/ClangBuiltLinux/linux/issues/1209
Link: https://reviews.llvm.org/D93783
Link: https://sourceware.org/binutils/docs/as/Symbol-Names.html
Link: https://sourceware.org/pipermail/binutils/2020-December/114671.html
Reported-by: Arnd Bergmann 
Suggested-by: Josh Poimboeuf 
Signed-off-by: Nick Desaulniers 
---
Changes v2 -> v3:
* rework to use STB_LOCAL rather than STB_GLOBAL by dropping .L prefix,
 as per Josh.
* rename oneline to drop STB_GLOBAL in commit message.
* add link to GAS docs on .L prefix.
* drop Josh's ack since patch changed.

Changes v1 -> v2:
* Pick up Josh's Ack.
* Add commit message info about -ffunction-sections/-fdata-sections, and
 link to binutils patch.


arch/x86/entry/thunk_64.S | 8 
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/thunk_64.S b/arch/x86/entry/thunk_64.S
index ccd32877a3c4..c9a9fbf1655f 100644
--- a/arch/x86/entry/thunk_64.S
+++ b/arch/x86/entry/thunk_64.S
@@ -31,7 +31,7 @@ SYM_FUNC_START_NOALIGN(\name)
.endif

call \func
-   jmp  .L_restore
+   jmp  __thunk_restore
SYM_FUNC_END(\name)
_ASM_NOKPROBE(\name)
.endm
@@ -44,7 +44,7 @@ SYM_FUNC_END(\name)
#endif

#ifdef CONFIG_PREEMPTION
-SYM_CODE_START_LOCAL_NOALIGN(.L_restore)
+SYM_CODE_START_LOCAL_NOALIGN(__thunk_restore)
popq %r11
popq %r10
popq %r9
@@ -56,6 +56,6 @@ SYM_CODE_START_LOCAL_NOALIGN(.L_restore)
popq %rdi
popq %rbp
ret
-   _ASM_NOKPROBE(.L_restore)
-SYM_CODE_END(.L_restore)
+   _ASM_NOKPROBE(__thunk_restore)
+SYM_CODE_END(__thunk_restore)
#endif
--
2.30.0.284.gd98b1dd5eaa7-goog



Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

2021-01-11 Thread Fangrui Song

On 2021-01-11, Bill Wendling wrote:

On Mon, Jan 11, 2021 at 12:12 PM Fangrui Song  wrote:


On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:
>From: Sami Tolvanen 
>
>Enable the use of clang's Profile-Guided Optimization[1]. To generate a
>profile, the kernel is instrumented with PGO counters, a representative
>workload is run, and the raw profile data is collected from
>/sys/kernel/debug/pgo/profraw.
>
>The raw profile data must be processed by clang's "llvm-profdata" tool before
>it can be used during recompilation:
>
>  $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
>  $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
>
>Multiple raw profiles may be merged during this step.
>
>The data can be used either by the compiler if LTO isn't enabled:
>
>... -fprofile-use=vmlinux.profdata ...
>
>or by LLD if LTO is enabled:
>
>... -lto-cs-profile-file=vmlinux.profdata ...

This LLD option does not exist.
LLD does have some `--lto-*` options but the `-lto-*` form is not supported
(it clashes with -l) https://reviews.llvm.org/D79371


That's strange. I've been using that option for years now. :-) Is this
a recent change?


The more frequently used options (specifyed by the clang driver) are
-plugin-opt=... (options implemented by LLVMgold.so).
`-lto-*` is rare.


(There is an earlier -fprofile-instr-generate which does
instrumentation in Clang, but the option does not have broad usage.
It is used more for code coverage, not for optimization.
Noticeably, it does not even implement the Kirchhoff's current law
optimization)


Right. I've been told outside of this email that -fprofile-generate is
the prefered flag to use.


-fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).

clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the 
linker.
For regular PGO, this option is effectively a no-op (confirmed with CSPGO main 
developer).

So I think the "or by LLD if LTO is enabled:" part should be removed.


But what if you specify the linking step explicitly? Linux doesn't
call "clang" when linking, but "ld.lld".


Regular PGO+LTO does not need -plugin-opt=cs-profile-path=
CSPGO+LTO needs it.
Because -fprofile-use= may be used by both, Clang driver adds it.
CSPGO is relevant in this this patch, so the linker option does not need to be 
mentioned.


Re: [PATCH] pgo: add clang's Profile Guided Optimization infrastructure

2021-01-11 Thread Fangrui Song

On 2021-01-11, 'Bill Wendling' via Clang Built Linux wrote:

From: Sami Tolvanen 

Enable the use of clang's Profile-Guided Optimization[1]. To generate a
profile, the kernel is instrumented with PGO counters, a representative
workload is run, and the raw profile data is collected from
/sys/kernel/debug/pgo/profraw.

The raw profile data must be processed by clang's "llvm-profdata" tool before
it can be used during recompilation:

 $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw
 $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw

Multiple raw profiles may be merged during this step.

The data can be used either by the compiler if LTO isn't enabled:

   ... -fprofile-use=vmlinux.profdata ...

or by LLD if LTO is enabled:

   ... -lto-cs-profile-file=vmlinux.profdata ...


This LLD option does not exist.
LLD does have some `--lto-*` options but the `-lto-*` form is not supported
(it clashes with -l) https://reviews.llvm.org/D79371

(There is an earlier -fprofile-instr-generate which does
instrumentation in Clang, but the option does not have broad usage.
It is used more for code coverage, not for optimization.
Noticeably, it does not even implement the Kirchhoff's current law
optimization)

-fprofile-use= is used by both regular PGO and context-sensitive PGO (CSPGO).

clang -flto=thin -fprofile-use= passes -plugin-opt=cs-profile-path= to the 
linker.
For regular PGO, this option is effectively a no-op (confirmed with CSPGO main 
developer).

So I think the "or by LLD if LTO is enabled:" part should be removed.


This initial submission is restricted to x86, as that's the platform we know
works. This restriction can be lifted once other platforms have been verified
to work with PGO.

Note that this method of profiling the kernel is clang-native and isn't
compatible with clang's gcov support in kernel/gcov.

[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

Signed-off-by: Sami Tolvanen 
Co-developed-by: Bill Wendling 
Signed-off-by: Bill Wendling 
---
Documentation/dev-tools/index.rst |   1 +
Documentation/dev-tools/pgo.rst   | 127 +
MAINTAINERS   |   9 +
Makefile  |   3 +
arch/Kconfig  |   1 +
arch/arm/boot/bootp/Makefile  |   1 +
arch/arm/boot/compressed/Makefile |   1 +
arch/arm/vdso/Makefile|   3 +-
arch/arm64/kernel/vdso/Makefile   |   3 +-
arch/arm64/kvm/hyp/nvhe/Makefile  |   1 +
arch/mips/boot/compressed/Makefile|   1 +
arch/mips/vdso/Makefile   |   1 +
arch/nds32/kernel/vdso/Makefile   |   4 +-
arch/parisc/boot/compressed/Makefile  |   1 +
arch/powerpc/kernel/Makefile  |   6 +-
arch/powerpc/kernel/trace/Makefile|   3 +-
arch/powerpc/kernel/vdso32/Makefile   |   1 +
arch/powerpc/kernel/vdso64/Makefile   |   1 +
arch/powerpc/kexec/Makefile   |   3 +-
arch/powerpc/xmon/Makefile|   1 +
arch/riscv/kernel/vdso/Makefile   |   3 +-
arch/s390/boot/Makefile   |   1 +
arch/s390/boot/compressed/Makefile|   1 +
arch/s390/kernel/Makefile |   1 +
arch/s390/kernel/vdso64/Makefile  |   3 +-
arch/s390/purgatory/Makefile  |   1 +
arch/sh/boot/compressed/Makefile  |   1 +
arch/sh/mm/Makefile   |   1 +
arch/sparc/vdso/Makefile  |   1 +
arch/x86/Kconfig  |   1 +
arch/x86/boot/Makefile|   1 +
arch/x86/boot/compressed/Makefile |   1 +
arch/x86/entry/vdso/Makefile  |   1 +
arch/x86/kernel/vmlinux.lds.S |   2 +
arch/x86/platform/efi/Makefile|   1 +
arch/x86/purgatory/Makefile   |   1 +
arch/x86/realmode/rm/Makefile |   1 +
arch/x86/um/vdso/Makefile |   1 +
drivers/firmware/efi/libstub/Makefile |   1 +
drivers/s390/char/Makefile|   1 +
include/asm-generic/vmlinux.lds.h |  44 +++
kernel/Makefile   |   1 +
kernel/pgo/Kconfig|  34 +++
kernel/pgo/Makefile   |   5 +
kernel/pgo/fs.c   | 382 ++
kernel/pgo/instrument.c   | 147 ++
kernel/pgo/pgo.h  | 206 ++
scripts/Makefile.lib  |  10 +
48 files changed, 1017 insertions(+), 9 deletions(-)
create mode 100644 Documentation/dev-tools/pgo.rst
create mode 100644 kernel/pgo/Kconfig
create mode 100644 kernel/pgo/Makefile
create mode 100644 kernel/pgo/fs.c
create mode 100644 kernel/pgo/instrument.c
create mode 100644 kernel/pgo/pgo.h

diff --git a/Documentation/dev-tools/index.rst 
b/Documentation/dev-tools/index.rst
index f7809c7b1ba9e..8d6418e858062 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -26,6 +26,7 @@ whole; patches welcome!
   kgdb
   kselftest
   kunit/index
+   pgo


.. only::  subproject and html
diff --git a/Documentation/dev-tools/pgo.rst 

[PATCH v2] x86: Treat R_386_PLT32 as R_386_PC32

2021-01-07 Thread Fangrui Song
This is similar to commit b21ebf2fb4cd ("x86: Treat R_X86_64_PLT32 as
R_X86_64_PC32"), but for i386.  As far as Linux kernel is concerned,
R_386_PLT32 can be treated the same as R_386_PC32.

R_386_PC32/R_X86_64_PC32 are PC-relative relocation types with the
requirement that the symbol address is significant.
R_386_PLT32/R_X86_64_PLT32 are PC-relative relocation types without the
address significance requirement.

On x86-64, there is no PIC vs non-PIC PLT distinction and an
R_X86_64_PLT32 relocation is produced for both `call/jmp foo` and
`call/jmp foo@PLT` with newer (2018) GNU as/LLVM integrated assembler.

On i386, there are 2 types of PLTs, PIC and non-PIC. Currently the
convention is to use R_386_PC32 for non-PIC PLT and R_386_PLT32 for PIC
PLT, but R_386_PLT32 is arguably preferable for -fno-pic code as well:
this can avoid a "canonical PLT entry" (st_shndx=0, st_value!=0) if the
symbol turns out to be defined externally.

clang-12 -fno-pic (since
https://github.com/llvm/llvm-project/commit/961f31d8ad14c66829991522d73e14b5a96ff6d4)
can emit R_386_PLT32 for compiler produced symbols (if we drop
-ffreestanding for CONFIG_X86_32, library call optimization can produce
newer declarations) and future GCC -fno-pic may emit R_386_PLT32 as well
if an option like -fno-direct-access-external-data is adopted to avoid
canonical PLT entry/copy relocations.

Link: https://github.com/ClangBuiltLinux/linux/issues/1210
Link: 
https://github.com/llvm/llvm-project/commit/961f31d8ad14c66829991522d73e14b5a96ff6d4
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112
Reported-by: Arnd Bergmann 
Signed-off-by: Fangrui Song 
Reviewed-by: Nick Desaulniers 
Reviewed-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Tested-by: Nathan Chancellor 

---
Change in v2:
* Improve commit message
---
 arch/x86/kernel/module.c | 1 +
 arch/x86/tools/relocs.c  | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 34b153cbd4ac..5e9a34b5bd74 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -114,6 +114,7 @@ int apply_relocate(Elf32_Shdr *sechdrs,
*location += sym->st_value;
break;
case R_386_PC32:
+   case R_386_PLT32:
/* Add the value, subtract its position */
*location += sym->st_value - (uint32_t)location;
break;
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index ce7188cbdae5..717e48ca28b6 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -867,6 +867,7 @@ static int do_reloc32(struct section *sec, Elf_Rel *rel, 
Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
 * NONE can be ignored and PC relative relocations don't
 * need to be adjusted.
@@ -910,6 +911,7 @@ static int do_reloc_real(struct section *sec, Elf_Rel *rel, 
Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
 * NONE can be ignored and PC relative relocations don't
 * need to be adjusted.
-- 
2.29.2.729.g45daf8777d-goog



Re: [PATCH 4/4] x86: don't build CONFIG_X86_32 as -ffreestanding

2021-01-06 Thread Fangrui Song

On 2020-08-17, Nick Desaulniers wrote:

-ffreestanding typically inhibits "libcall optimizations" where calls to
certain library functions can be replaced by the compiler in certain
cases to calls to other library functions that may be more efficient.
This can be problematic for embedded targets that don't provide full
libc implementations.

-ffreestanding inhibits all such optimizations, which is the safe
choice, but generally we want the optimizations that are performed. The
Linux kernel does implement a fair amount of libc routines. Instead of
-ffreestanding (which makes more sense in smaller images like kexec's
purgatory image), prefer -fno-builtin-* flags to disable the compiler
from emitting calls to functions which may not be defined.

If you see a linkage failure due to a missing symbol that's typically
defined in a libc, and not explicitly called from the source code, then
the compiler may have done such a transform.  You can either implement
such a function (ie. in lib/string.c) or disable the transform outright
via -fno-builtin-* flag (where * is the name of the library routine, ie.
-fno-builtin-bcmp).

i386_defconfig build+boot tested with GCC and Clang. Removes a pretty
old TODO from the codebase.

Fixes: 6edfba1b33c7 ("x86_64: Don't define string functions to builtin")
Suggested-by: Arvind Sankar 
Signed-off-by: Nick Desaulniers 
Reviewed-by: Kees Cook 
---
arch/x86/Makefile | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 4346ffb2e39f..2383a96cf4fd 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -80,9 +80,6 @@ ifeq ($(CONFIG_X86_32),y)
# CPU-specific tuning. Anything which can be shared with UML should go 
here.
include arch/x86/Makefile_32.cpu
KBUILD_CFLAGS += $(cflags-y)
-
-# temporary until string.h is fixed
-KBUILD_CFLAGS += -ffreestanding
else
BITS := 64
UTS_MACHINE := x86_64
--
2.28.0.220.ged08abb693-goog


Reviewed-by: Fangrui Song 

But dropping -ffreestanding causes compiler produced declarations which
require
https://lore.kernel.org/lkml/20210107001739.1321725-1-mask...@google.com/
"x86: Treat R_386_PLT32 as R_386_PC32" as a prerequisite
to build with trunk Clang https://github.com/ClangBuiltLinux/linux/issues/1210

Since there have been more than 4 months, it seems that something else
regressed the non -ffreestanding build. Maybe another -fno-builtin-* is
needed somewhere.


[PATCH] x86: Treat R_386_PLT32 as R_386_PC32

2021-01-06 Thread Fangrui Song
This is similar to commit b21ebf2fb4cde1618915a97cc773e287ff49173e "x86:
Treat R_X86_64_PLT32 as R_X86_64_PC32", but for i386.  As far as Linux
kernel is concerned, R_386_PLT32 can be treated the same as R_386_PC32.

R_386_PC32/R_X86_64_PC32 are PC-relative relocation types with the
requirement that the symbol address is significant.
R_386_PLT32/R_X86_64_PLT32 are PC-relative relocation types without the
address significance requirement.

On x86-64, there is no PIC vs non-PIC PLT distinction and an
R_X86_64_PLT32 relocation is produced for both `call/jmp foo` and
`call/jmp foo@PLT` with newer (2018) GNU as/LLVM integrated assembler.

On i386, there are 2 types of PLTs, PIC and non-PIC. Currently the
convention is to use R_386_PC32 for non-PIC PLT and R_386_PLT32 for PIC
PLT, but R_386_PLT32 is arguably preferable for -fno-pic code as well:
this can avoid a "canonical PLT entry" (st_shndx=0, st_value!=0) if the
symbol turns out to be defined externally. Latest Clang (since
961f31d8ad14c66829991522d73e14b5a96ff6d4) can use R_386_PLT32 for
compiler produced symbols (if we drop -ffreestanding for CONFIG_X86_32,
library call optimization can produce newer declarations) and future GCC
may use R_386_PLT32 as well if the maintainers agree to adopt an option
like -fdirect-access-external-data to avoid "canonical PLT entry"/copy
relocations https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112

Link: https://github.com/ClangBuiltLinux/linux/issues/1210
Reported-by: Arnd Bergmann 
Signed-off-by: Fangrui Song 
---
 arch/x86/kernel/module.c | 1 +
 arch/x86/tools/relocs.c  | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 34b153cbd4ac..5e9a34b5bd74 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -114,6 +114,7 @@ int apply_relocate(Elf32_Shdr *sechdrs,
*location += sym->st_value;
break;
case R_386_PC32:
+   case R_386_PLT32:
/* Add the value, subtract its position */
*location += sym->st_value - (uint32_t)location;
break;
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index ce7188cbdae5..717e48ca28b6 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -867,6 +867,7 @@ static int do_reloc32(struct section *sec, Elf_Rel *rel, 
Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
 * NONE can be ignored and PC relative relocations don't
 * need to be adjusted.
@@ -910,6 +911,7 @@ static int do_reloc_real(struct section *sec, Elf_Rel *rel, 
Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
 * NONE can be ignored and PC relative relocations don't
 * need to be adjusted.
-- 
2.29.2.729.g45daf8777d-goog



Re: building csky with CC=clang

2020-12-22 Thread Fangrui Song

On 2020-12-22, 'Nick Desaulniers' via Clang Built Linux wrote:

Hello!
I was playing with some of LLVM's experimental backends (m68k) and saw
there was a CSKY backend. I rebuilt LLVM to support CSKY, but I ran
into trouble building the kernel before even getting to the compiler
invocation:

$ ARCH=csky CROSS_COMPILE=csky-linux-gnu- make CC=clang -j71 defconfig
...
scripts/Kconfig.include:40: linker 'csky-linux-gnu-ld' not found

My distro doesn't package binutils-csky-linux-gnu, is there
documentation on how to build the kernel targeting CSKY, starting with
building GNU binutils configured with CSKY emulation?


Note also that the llvm/lib/Target/CSKY has not been fully upstreamed
yet. It is a WIP 
https://lists.llvm.org/pipermail/llvm-dev/2020-August/144481.html
I will not expect clang csky to work currently.
(The latest committed LLVM patch is https://reviews.llvm.org/D93372
Normally committing an important piece of a large patch series like this should 
take
a bit longer time longer after someone in the community accepted it
https://llvm.org/docs/CodeReview.html#can-code-be-reviewed-after-it-is-committed
 )

I do want to raise the recent LLVM M68k target. Its patches ([M67k] (Patch */8))
are very organized and the main proposer shares updates to llvm-dev regularly.
There is a lot from the process where the C-SKY target can learn from.


Re: [PATCH v8 00/16] Add support for Clang LTO

2020-12-08 Thread Fangrui Song



On 2020-12-08, 'Sami Tolvanen' via Clang Built Linux wrote:

On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann  wrote:


On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux
 wrote:
>
> This patch series adds support for building the kernel with Clang's
> Link Time Optimization (LTO). In addition to performance, the primary
> motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
> to be used in the kernel. Google has shipped millions of Pixel
> devices running three major kernel versions with LTO+CFI since 2018.
>
> Most of the patches are build system changes for handling LLVM
> bitcode, which Clang produces with LTO instead of ELF object files,
> postponing ELF processing until a later stage, and ensuring initcall
> ordering.
>
> Note that arm64 support depends on Will's memory ordering patches
> [1]. I will post x86_64 patches separately after we have fixed the
> remaining objtool warnings [2][3].
>
> [1] 
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/
> [3] 
https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux
>
> You can also pull this series from
>
>   https://github.com/samitolvanen/linux.git lto-v8

I've tried pull this into my randconfig test tree to give it a spin.


Great, thank you for testing this!


So far I have
not managed to get a working build out of it, the main problem so far being
that it is really slow to build because the link stage only uses one CPU.
These are the other issues I've seen so far:


ld.lld ThinLTO uses the number of (physical cores enabled by affinity) by 
default.


You may want to limit your testing only to ThinLTO at first, because
full LTO is going to be extremely slow with larger configs, especially
when building arm64 kernels.


- one build seems to take even longer to link. It's currently at 35GB RAM
  usage and 40 minutes into the final link, but I'm worried it might
not complete
  before it runs out of memory.  I only have 128GB installed, and google-chrome
  uses another 30GB of that, and I'm also doing some other builds in parallel.
  Is there a minimum recommended amount of memory for doing LTO builds?


When building arm64 defconfig, the maximum memory usage I measured
with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured
larger configurations, but I believe LLD can easily consume 3-4x that
much with full LTO allyesconfig.


- One build failed with
 ld.lld -EL -maarch64elf -mllvm -import-instr-limit=5 -r -o vmlinux.o
-T .tmp_initcalls.lds --whole-archive arch/arm64/kernel/head.o
init/built-in.a usr/built-in.a arch/arm64/built-in.a kernel/built-in.a
certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a
security/built-in.a crypto/built-in.a block/built-in.a
arch/arm64/lib/built-in.a lib/built-in.a drivers/built-in.a
sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive
--start-group arch/arm64/lib/lib.a lib/lib.a
./drivers/firmware/efi/libstub/lib.a --end-group
  "ld.lld: error: arch/arm64/kernel/head.o: invalid symbol index"
  after about 30 minutes


That's interesting. Did you use LLVM_IAS=1?


May be worth checking which relocation or (SHT_GROUP section's sh_info) in 
arch/arm64/kernel/head.o is incorrect.


- CONFIG_CPU_BIG_ENDIAN doesn't seem to work with lld, and LTO
  doesn't work with ld.bfd.
  I've added a CPU_LITTLE_ENDIAN dependency to
  ARCH_SUPPORTS_LTO_CLANG{,THIN}


Ah, good point. I'll fix this in v9.


Full/Thin LTO should work with GNU ld and gold with LLVMgold.so built from
llvm-project (https://llvm.org/docs/GoldPlugin.html ). You'll need to make sure
that LLVMgold.so is newer than clang. (Newer clang may introduce bitcode
attributes which are unrecognizable by older LLVMgold.so/ld.lld)


[...]

Not sure if these are all known issues. If there is one you'd like me try
take a closer look at for finding which config options break it, I can try


No, none of these are known issues. I would be happy to take a closer
look if you can share configs that reproduce these.

Sami

--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clang-built-linux+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clang-built-linux/CABCJKueCHo2RYfx_A21m%2B%3Dd1gQLR9QsOOxCsHFeicCqyHkb-Kg%40mail.gmail.com.


[PATCH v2] firmware_loader: Align .builtin_fw to 8

2020-12-07 Thread Fangrui Song
arm64 references the start address of .builtin_fw (__start_builtin_fw)
with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC
relocations. The compiler is allowed to emit the
R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in
include/linux/firmware.h is 8-byte aligned.

The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a
multiple of 8, which may not be the case if .builtin_fw is empty.
Unconditionally align .builtin_fw to fix the linker error. 32-bit
architectures could use ALIGN(4) but that would add unnecessary
complexity, so just use ALIGN(8).

Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image")
Link: https://github.com/ClangBuiltLinux/linux/issues/1204
Reported-by: kernel test robot 
Signed-off-by: Fangrui Song 
Acked-by: Arnd Bergmann 

---
Change in v2:
* Use output section alignment instead of inappropriate ALIGN_FUNCTION()
---
 include/asm-generic/vmlinux.lds.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..b97c628ad91f 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -459,7 +459,7 @@
}   \
\
/* Built-in firmware blobs */   \
-   .builtin_fw: AT(ADDR(.builtin_fw) - LOAD_OFFSET) {  \
+   .builtin_fw : AT(ADDR(.builtin_fw) - LOAD_OFFSET) ALIGN(8) {\
__start_builtin_fw = .; \
KEEP(*(.builtin_fw))\
__end_builtin_fw = .;   \
-- 
2.29.2.576.ga3fc446d84-goog



Re: [PATCH] firmware_loader: Align .builtin_fw to 8

2020-12-03 Thread Fangrui Song

On 2020-12-03, Nick Desaulniers wrote:

On Thu, Dec 3, 2020 at 9:05 AM Fangrui Song  wrote:


arm64 references the start address of .builtin_fw (__start_builtin_fw)
with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC
relocations. The compiler is allowed to emit the
R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in
include/linux/firmware.h is 8-byte aligned.

The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a
multiple of 8, which may not be the case if .builtin_fw is empty.
Unconditionally align .builtin_fw to fix the linker error.

Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image")
Link: https://github.com/ClangBuiltLinux/linux/issues/1204
Reported-by: kernel test robot 
Signed-off-by: Fangrui Song 
---
 include/asm-generic/vmlinux.lds.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..3cd4bd1193ab 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -459,6 +459,7 @@
}   \
\
/* Built-in firmware blobs */   \
+   ALIGN_FUNCTION();   \


Thanks for the patch!

I'm going to repeat my question from the above link
(https://github.com/ClangBuiltLinux/linux/issues/1204#issuecomment-737610582)
just in case it's not naive:

ALIGN_FUNCTION() C preprocessor macro seems to be used to realign
code, while STRUCT_ALIGN() seems to be used to realign data.  It looks
to me like only data is put into .builtin_fw.  If these relocations
require an alignment of 8, than multiples of 8 should also be fine
(STRUCT_ALIGN in 32 for all toolchain version, except gcc 4.9 which is
64; both are multiples of 8 though).  It looks like only structs are
placed in .builtin_fw; ie. data.  In that case, I worry that using
ALIGN_FUNCTION/8 might actually be under-aligning data in this
section.


Regarding STRUCT_ALIGN (32 for GCC>4.9) in
include/asm-generic/vmlinux.lds.h, it is probably not suitable for
.builtin_fw

* Its comment is a bit unclear. It probably should mention that the
  32-byte overalignment is only for global structure variables which are
  at least 32 byte large. But this is just my observation. Adding a GCC
  maintainer to comment on this.
* Even if GCC does overalign defined global struct variables, it is unlikely
  that GCC will leverage this property for undefined `extern struct
  builtin_fw __start_builtin_fw[]` (drivers/base/firmware_loader/main.c)

To make .builtin_fw aligned, I agree that ALIGN_FUNCTION() is probably a
misuse. Maybe I should just use `. = ALIGN(8)` if the kernel linker
script prefers `. = ALIGN(8)` to an output section alignment
(https://sourceware.org/binutils/docs/ld/Output-Section-Description.html#Output-Section-Description
https://lld.llvm.org/ELF/linker_script.html#output-section-alignment)


Though, in 
https://github.com/ClangBuiltLinux/linux/issues/1204#issuecomment-737625134
you're comment:


In GNU ld, the empty .builtin_fw is removed


So that's a difference in behavior between ld.bfd and ld.lld, which is
fine, but it makes me wonder whether we should instead or additionally
be discarding this section explicitly via linker script when
CONFIG_FW_LOADER is not set?


Short answer: No, we should not discard .builtin_fw

  .builtin_fw: AT(ADDR(.builtin_fw) - LOAD_OFFSET) {
  __start_builtin_fw = .; ... }

In LLD, either a section reference (`ADDR(.builtin_fw)`) or a
non-PROVIDE symbol assignment __start_builtin_fw makes the section 
non-discardable.

It can be argued that discarding an output section with a symbol
assignment (GNU ld) is strange because the symbol (st_shndx) will be
defined relative to an arbitrary unrelated section. Retaining the
section can avoid some other issues.


.builtin_fw: AT(ADDR(.builtin_fw) - LOAD_OFFSET) {  \
__start_builtin_fw = .; \
KEEP(*(.builtin_fw))\
--
2.29.2.576.ga3fc446d84-goog




--
Thanks,
~Nick Desaulniers


[PATCH] firmware_loader: Align .builtin_fw to 8

2020-12-03 Thread Fangrui Song
arm64 references the start address of .builtin_fw (__start_builtin_fw)
with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC
relocations. The compiler is allowed to emit the
R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in
include/linux/firmware.h is 8-byte aligned.

The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a
multiple of 8, which may not be the case if .builtin_fw is empty.
Unconditionally align .builtin_fw to fix the linker error.

Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image")
Link: https://github.com/ClangBuiltLinux/linux/issues/1204
Reported-by: kernel test robot 
Signed-off-by: Fangrui Song 
---
 include/asm-generic/vmlinux.lds.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..3cd4bd1193ab 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -459,6 +459,7 @@
}   \
\
/* Built-in firmware blobs */   \
+   ALIGN_FUNCTION();   \
.builtin_fw: AT(ADDR(.builtin_fw) - LOAD_OFFSET) {  \
__start_builtin_fw = .; \
KEEP(*(.builtin_fw))\
-- 
2.29.2.576.ga3fc446d84-goog



Re: [PATCH v2 2/4] Kbuild: do not emit debug info for assembly with LLVM_IAS=1

2020-11-04 Thread Fangrui Song



On 2020-11-04, Nathan Chancellor wrote:

On Tue, Nov 03, 2020 at 04:53:41PM -0800, Nick Desaulniers wrote:

Clang's integrated assembler produces the warning for assembly files:

warning: DWARF2 only supports one section per compilation unit

If -Wa,-gdwarf-* is unspecified, then debug info is not emitted.  This


Is this something that should be called out somewhere? If I understand
this correctly, LLVM_IAS=1 + CONFIG_DEBUG_INFO=y won't work? Maybe this
should be handled in Kconfig?


will be re-enabled for new DWARF versions in a follow up patch.

Enables defconfig+CONFIG_DEBUG_INFO to build cleanly with
LLVM=1 LLVM_IAS=1 for x86_64 and arm64.

Cc: 
Link: https://github.com/ClangBuiltLinux/linux/issues/716
Reported-by: Nathan Chancellor 
Suggested-by: Dmitry Golovin 


If you happen to respin, Dmitry deserves a Reported-by tag too :)


Suggested-by: Sedat Dilek 
Signed-off-by: Nick Desaulniers 


Regardless of the other two comments, this is fine as is as a fix for
stable to unblock Android + CrOS since we have been running something
similar to it in CI:

Reviewed-by: Nathan Chancellor 


---
 Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Makefile b/Makefile
index f353886dbf44..75b1a3dcbf30 100644
--- a/Makefile
+++ b/Makefile
@@ -826,7 +826,9 @@ else
 DEBUG_CFLAGS   += -g
 endif

+ifndef LLVM_IAS


Nit: this should probably match the existing LLVM_IAS check

ifneq ($(LLVM_IAS),1)


 KBUILD_AFLAGS  += -Wa,-gdwarf-2
+endif

 ifdef CONFIG_DEBUG_INFO_DWARF4
 DEBUG_CFLAGS   += -gdwarf-4
--
2.29.1.341.ge80a0c044ae-goog



The root cause is that DWARF v2 has no DW_AT_ranges, so it cannot
represent non-contiguous address ranges. It seems that GNU as -gdwarf-3
emits DW_AT_ranges as well and emits an entry for a non-executable section.
In any case, the option is of very low value, at least for LLVM.


Reviewed-by: Fangrui Song 


[tip: x86/urgent] x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S

2020-11-04 Thread tip-bot2 for Fangrui Song
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 4d6ffa27b8e5116c0abb318790fd01d4e12d75e6
Gitweb:
https://git.kernel.org/tip/4d6ffa27b8e5116c0abb318790fd01d4e12d75e6
Author:Fangrui Song 
AuthorDate:Mon, 02 Nov 2020 17:23:58 -08:00
Committer: Borislav Petkov 
CommitterDate: Wed, 04 Nov 2020 12:30:20 +01:00

x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S

Commit

  393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy 
functions")

added .weak directives to arch/x86/lib/mem*_64.S instead of changing the
existing ENTRY macros to WEAK. This can lead to the assembly snippet

  .weak memcpy
  ...
  .globl memcpy

which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy
with LLVM's integrated assembler before LLVM 12. LLVM 12 (since
https://reviews.llvm.org/D90108) will error on such an overridden symbol
binding.

Commit

  ef1e03152cb0 ("x86/asm: Make some functions local")

changed ENTRY in arch/x86/lib/memcpy_64.S to SYM_FUNC_START_LOCAL, which
was ineffective due to the preceding .weak directive.

Use the appropriate SYM_FUNC_START_WEAK instead.

Fixes: 393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy 
functions")
Fixes: ef1e03152cb0 ("x86/asm: Make some functions local")
Reported-by: Sami Tolvanen 
Signed-off-by: Fangrui Song 
Signed-off-by: Borislav Petkov 
Reviewed-by: Nick Desaulniers 
Tested-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Cc: 
Link: https://lkml.kernel.org/r/20201103012358.168682-1-mask...@google.com
---
 arch/x86/lib/memcpy_64.S  | 4 +---
 arch/x86/lib/memmove_64.S | 4 +---
 arch/x86/lib/memset_64.S  | 4 +---
 3 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 037faac..1e299ac 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -16,8 +16,6 @@
  * to a jmp to memcpy_erms which does the REP; MOVSB mem copy.
  */
 
-.weak memcpy
-
 /*
  * memcpy - Copy a memory block.
  *
@@ -30,7 +28,7 @@
  * rax original destination
  */
 SYM_FUNC_START_ALIAS(__memcpy)
-SYM_FUNC_START_LOCAL(memcpy)
+SYM_FUNC_START_WEAK(memcpy)
ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \
  "jmp memcpy_erms", X86_FEATURE_ERMS
 
diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 7ff00ea..41902fe 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -24,9 +24,7 @@
  * Output:
  * rax: dest
  */
-.weak memmove
-
-SYM_FUNC_START_ALIAS(memmove)
+SYM_FUNC_START_WEAK(memmove)
 SYM_FUNC_START(__memmove)
 
mov %rdi, %rax
diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S
index 9ff15ee..0bfd26e 100644
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -6,8 +6,6 @@
 #include 
 #include 
 
-.weak memset
-
 /*
  * ISO C memset - set a memory block to a byte value. This function uses fast
  * string to get better performance than the original function. The code is
@@ -19,7 +17,7 @@
  *
  * rax   original destination
  */
-SYM_FUNC_START_ALIAS(memset)
+SYM_FUNC_START_WEAK(memset)
 SYM_FUNC_START(__memset)
/*
 * Some CPUs support enhanced REP MOVSB/STOSB feature. It is recommended


[PATCH] perf bench: Update arch/x86/lib/mem{cpy,set}_64.S

2020-11-03 Thread Fangrui Song
In memset_64.S, the macros expand to `.weak MEMSET ... .globl MEMSET`
which will produce a STB_WEAK MEMSET with GNU as but STB_GLOBAL MEMSET
with LLVM's integrated assembler before LLVM 12. LLVM 12 (since
https://reviews.llvm.org/D90108) will error on such an overridden symbol
binding. memcpy_64.S is similar.

Port http://lore.kernel.org/r/20201103012358.168682-1-mask...@google.com
("x86_64: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S")
to fix the issue. Additionally, port SYM_L_WEAK and SYM_FUNC_START_WEAK
from include/linux/linkage.h to tools/perf/util/include/linux/linkage.h

Fixes: 7d7d1bf1d1da ("perf bench: Copy kernel files needed to build 
mem{cpy,set} x86_64 benchmarks")
Link: https://lore.kernel.org/r/20201103012358.168682-1-mask...@google.com
Signed-off-by: Fangrui Song 
---
 tools/arch/x86/lib/memcpy_64.S  | 4 +---
 tools/arch/x86/lib/memset_64.S  | 4 +---
 tools/perf/util/include/linux/linkage.h | 7 +++
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/tools/arch/x86/lib/memcpy_64.S b/tools/arch/x86/lib/memcpy_64.S
index 0b5b8ae56bd9..9428f251df0f 100644
--- a/tools/arch/x86/lib/memcpy_64.S
+++ b/tools/arch/x86/lib/memcpy_64.S
@@ -16,8 +16,6 @@
  * to a jmp to memcpy_erms which does the REP; MOVSB mem copy.
  */
 
-.weak memcpy
-
 /*
  * memcpy - Copy a memory block.
  *
@@ -30,7 +28,7 @@
  * rax original destination
  */
 SYM_FUNC_START_ALIAS(__memcpy)
-SYM_FUNC_START_LOCAL(memcpy)
+SYM_FUNC_START_WEAK(memcpy)
ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \
  "jmp memcpy_erms", X86_FEATURE_ERMS
 
diff --git a/tools/arch/x86/lib/memset_64.S b/tools/arch/x86/lib/memset_64.S
index fd5d25a474b7..1f9b11f9244d 100644
--- a/tools/arch/x86/lib/memset_64.S
+++ b/tools/arch/x86/lib/memset_64.S
@@ -5,8 +5,6 @@
 #include 
 #include 
 
-.weak memset
-
 /*
  * ISO C memset - set a memory block to a byte value. This function uses fast
  * string to get better performance than the original function. The code is
@@ -18,7 +16,7 @@
  *
  * rax   original destination
  */
-SYM_FUNC_START_ALIAS(memset)
+SYM_FUNC_START_WEAK(memset)
 SYM_FUNC_START(__memset)
/*
 * Some CPUs support enhanced REP MOVSB/STOSB feature. It is recommended
diff --git a/tools/perf/util/include/linux/linkage.h 
b/tools/perf/util/include/linux/linkage.h
index b8a5159361b4..0e493bf3151b 100644
--- a/tools/perf/util/include/linux/linkage.h
+++ b/tools/perf/util/include/linux/linkage.h
@@ -25,6 +25,7 @@
 
 /* SYM_L_* -- linkage of symbols */
 #define SYM_L_GLOBAL(name) .globl name
+#define SYM_L_WEAK(name)   .weak name
 #define SYM_L_LOCAL(name)  /* nothing */
 
 #define ALIGN __ALIGN
@@ -78,6 +79,12 @@
SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN)
 #endif
 
+/* SYM_FUNC_START_WEAK -- use for weak functions */
+#ifndef SYM_FUNC_START_WEAK
+#define SYM_FUNC_START_WEAK(name)  \
+   SYM_START(name, SYM_L_WEAK, SYM_A_ALIGN)
+#endif
+
 /* SYM_FUNC_END_ALIAS -- the end of LOCAL_ALIASed or ALIASed function */
 #ifndef SYM_FUNC_END_ALIAS
 #define SYM_FUNC_END_ALIAS(name)   \
-- 
2.29.1.341.ge80a0c044ae-goog



Re: [PATCH] module: use hidden visibility for weak symbol references

2020-10-27 Thread Fangrui Song

One nit about ".got" in the message:

Reviewed-by: Fangrui Song 

On 2020-10-27, Nick Desaulniers wrote:

+ Fangrui

On Tue, Oct 27, 2020 at 8:11 AM Ard Biesheuvel  wrote:


Geert reports that commit be2881824ae9eb92 ("arm64/build: Assert for
unwanted sections") results in build errors on arm64 for configurations
that have CONFIG_MODULES disabled.

The commit in question added ASSERT()s to the arm64 linker script to
ensure that linker generated sections such as .got, .plt etc are empty,


.got -> .got.plt

be2881824ae9eb92 does not ASSERT on .got (it can).

Strangely *(.got) is placed in .text in arch/arm64/kernel/vmlinux.lds.S
I think that line can be removed. On x86, aarch64 and many other archs,
the start of .got.plt is the GOT base. .got is not needed (ppc/arm/riscv
use .got instead of .got.plt as the GOT base anchor).


but as it turns out, there are corner cases where the linker does emit
content into those sections. More specifically, weak references to
function symbols (which can remain unsatisfied, and can therefore not
be emitted as relative references) will be emitted as GOT and PLT
entries when linking the kernel in PIE mode (which is the case when
CONFIG_RELOCATABLE is enabled, which is on by default).


Confirmed.


What happens is that code such as

struct device *(*fn)(struct device *dev);
struct device *iommu_device;

fn = symbol_get(mdev_get_iommu_device);
if (fn) {
iommu_device = fn(dev);

essentially gets converted into the following when CONFIG_MODULES is off:

struct device *iommu_device;

if (_get_iommu_device) {
iommu_device = mdev_get_iommu_device(dev);

where mdev_get_iommu_device is emitted as a weak symbol reference into
the object file. The first reference is decorated with an ordinary
ABS64 data relocation (which yields 0x0 if the reference remains
unsatisfied). However, the indirect call is turned into a direct call
covered by a R_AARCH64_CALL26 relocation, which is converted into a
call via a PLT entry taking the target address from the associated
GOT entry.


Yes, the R_AARCH64_CALL26 relocation referencing an undefined weak
symbol causes one .plt entry and one .got.plt entry.


Given that such GOT and PLT entries are unnecessary for fully linked
binaries such as the kernel, let's give these weak symbol references
hidden visibility, so that the linker knows that the weak reference
via R_AARCH64_CALL26 can simply remain unsatisfied.

Cc: Jessica Yu 
Cc: Kees Cook 
Cc: Geert Uytterhoeven 
Cc: Nick Desaulniers 
Signed-off-by: Ard Biesheuvel 
---
 include/linux/module.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 7ccdf87f376f..6264617bab4d 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -740,7 +740,7 @@ static inline bool within_module(unsigned long addr, const 
struct module *mod)
 }

 /* Get/put a kernel symbol (calls should be symmetric) */
-#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak)); &(x); })
+#define symbol_get(x) ({ extern typeof(x) x 
__attribute__((weak,visibility("hidden"))); &(x); })
 #define symbol_put(x) do { } while (0)
 #define symbol_put_addr(x) do { } while (0)

--
2.17.1




--
Thanks,
~Nick Desaulniers


Re: [PATCH] Kbuild: implement support for DWARF5

2020-10-21 Thread Fangrui Song

On 2020-10-21, 'Nick Desaulniers' via Clang Built Linux wrote:

DWARF5 is the latest standard of the DWARF debug info format.

Feature detection of DWARF5 is onerous, especially given that we've
removed $(AS), so we must query $(CC) for DWARF5 assembler directive
support. Further -gdwarf-X where X is an unsupported value doesn't
produce an error in $(CC). GNU `as` only recently gained support for
specifying -gdwarf-5.

The DWARF version of a binary can be validated with:


To be more correct: this is just the version number of the .debug_info section.
Other sections can use different version numbers.
(For example, GNU as still does not support version 5 .debug_line)


$ llvm-dwarfdump vmlinux | head -n 5 | grep version
or
$ readelf --debug-dump=info vmlinux 2>/dev/null | grep Version

DWARF5 wins significantly in terms of size when mixed with compression
(CONFIG_DEBUG_INFO_COMPRESSED).

363Mvmlinux.clang12.dwarf5.compressed
434Mvmlinux.clang12.dwarf4.compressed
439Mvmlinux.clang12.dwarf2.compressed
457Mvmlinux.clang12.dwarf5
536Mvmlinux.clang12.dwarf4
548Mvmlinux.clang12.dwarf2

Make CONFIG_DEBUG_INFO_DWARF4 part of a Kconfig choice to preserve
forward compatibility.

Link: http://www.dwarfstd.org/doc/DWARF5.pdf
Signed-off-by: Nick Desaulniers 
---
RFC because this patch is super half baked, but I'm looking for
feedback.

I would logically split this into a series of patches;
1. disable -Wa,gdwarf-2 for LLVM_IAS=1, see also
 https://github.com/ClangBuiltLinux/linux/issues/716
 
https://github.com/ClangBuiltLinux/continuous-integration/blob/master/patches/llvm-all/linux-next/arm64/silence-dwarf2-warnings.patch
 that way we can backport for improved LLVM_IAS support.
2. move CONFIG_DEBUG_INFO_DWARF4 to choice.
3. implement the rest on top.

I'm pretty sure GNU `as` only recently gained the ability to specify
-gdwarf-4 without erroring in binutils 2.35, so that part likely needs
to be fixed.

Makefile  | 19 ---
include/asm-generic/vmlinux.lds.h |  6 +-
lib/Kconfig.debug | 29 +
scripts/test_dwarf5_support.sh|  4 
4 files changed, 50 insertions(+), 8 deletions(-)
create mode 100755 scripts/test_dwarf5_support.sh

diff --git a/Makefile b/Makefile
index e71979882e4f..0862df5b1a24 100644
--- a/Makefile
+++ b/Makefile
@@ -828,10 +828,23 @@ else
DEBUG_CFLAGS+= -g
endif

-KBUILD_AFLAGS  += -Wa,-gdwarf-2
-
+DWARF_VERSION=2
ifdef CONFIG_DEBUG_INFO_DWARF4
-DEBUG_CFLAGS   += -gdwarf-4
+DWARF_VERSION=4
+endif
+ifdef CONFIG_DEBUG_INFO_DWARF5
+DWARF_VERSION=5
+endif
+DEBUG_CFLAGS   += -gdwarf-$(DWARF_VERSION)
+
+ifneq ($(DWARF_VERSION)$(LLVM_IAS),21)
+KBUILD_AFLAGS  += -Wa,-gdwarf-$(DWARF_VERSION)
+endif
+
+ifdef CONFIG_CC_IS_CLANG
+ifneq ($(LLVM_IAS),1)
+KBUILD_CFLAGS  += -Wa,-gdwarf-$(DWARF_VERSION)
+endif
endif

ifdef CONFIG_DEBUG_INFO_REDUCED
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index cd1bf600..0382808ef9fe 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -828,7 +828,11 @@
.debug_types0 : { *(.debug_types) } \
/* DWARF 5 */   \
.debug_macro0 : { *(.debug_macro) } \
-   .debug_addr 0 : { *(.debug_addr) }
+   .debug_addr 0 : { *(.debug_addr) }  \
+   .debug_line_str 0 : { *(.debug_line_str) }  \
+   .debug_loclists 0 : { *(.debug_loclists) }  \
+   .debug_rnglists 0 : { *(.debug_rnglists) }  \
+   .debug_str_offsets 0 : { *(.debug_str_offsets) }


Consider adding .debug_names for the accelerator table.
It is the DWARF v5 version of .debug_pub{names,types} (which are mentioned
a few lines above).


/* Stabs debugging sections. */
#define STABS_DEBUG \
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 537cf3c2937d..6b01f0e2dad8 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -256,14 +256,35 @@ config DEBUG_INFO_SPLIT
  to know about the .dwo files and include them.
  Incompatible with older versions of ccache.

+choice
+prompt "DWARF version"
+   depends on DEBUG_INFO
+   default DEBUG_INFO_DWARF2
+   help
+ Which version of DWARF debug info to emit.
+
+config DEBUG_INFO_DWARF2
+   bool "Generate dwarf2 debuginfo"
+   help
+ Generate dwarf2 debug info.


In documentation, a more official way to refer to the format is: DWARF v2.
(While "DWARF5" and "DWARF v5" are acceptable, the latter is preferred)
Ditto below.


config DEBUG_INFO_DWARF4
bool "Generate dwarf4 debuginfo"
depends on $(cc-option,-gdwarf-4)
help
- Generate dwarf4 debug info. This requires recent versions
- of gcc and gdb. It makes the 

Re: [tip:x86/seves] BUILD SUCCESS WITH WARNING e6eb15c9ba3165698488ae5c34920eea20eaa38e

2020-09-16 Thread Fangrui Song

On 2020-09-16, 'Marco Elver' via Clang Built Linux wrote:

On Wed, 16 Sep 2020 at 20:22, 'Nick Desaulniers' via kasan-dev
 wrote:


On Wed, Sep 16, 2020 at 1:46 AM Marco Elver  wrote:
>
> On Wed, 16 Sep 2020 at 10:30,  wrote:
> > On Tue, Sep 15, 2020 at 08:09:16PM +0200, Marco Elver wrote:
> > > On Tue, 15 Sep 2020 at 19:40, Nick Desaulniers  
wrote:
> > > > On Tue, Sep 15, 2020 at 10:21 AM Borislav Petkov  wrote:
> >
> > > > > init/calibrate.o: warning: objtool: asan.module_ctor()+0xc: call 
without frame pointer save/setup
> > > > > init/calibrate.o: warning: objtool: asan.module_dtor()+0xc: call 
without frame pointer save/setup
> > > > > init/version.o: warning: objtool: asan.module_ctor()+0xc: call 
without frame pointer save/setup
> > > > > init/version.o: warning: objtool: asan.module_dtor()+0xc: call 
without frame pointer save/setup
> > > > > certs/system_keyring.o: warning: objtool: asan.module_ctor()+0xc: 
call without frame pointer save/setup
> > > > > certs/system_keyring.o: warning: objtool: asan.module_dtor()+0xc: 
call without frame pointer save/setup
> > >
> > > This one also appears with Clang 11. This is new I think because we
> > > started emitting ASAN ctors for globals redzone initialization.
> > >
> > > I think we really do not care about precise stack frames in these
> > > compiler-generated functions. So, would it be reasonable to make
> > > objtool ignore all *san.module_ctor and *san.module_dtor functions (we
> > > have them for ASAN, TSAN, MSAN)?
> >
> > The thing is, if objtool cannot follow, it cannot generate ORC data and
> > our unwinder cannot unwind through the instrumentation, and that is a
> > fail.
> >
> > Or am I missing something here?
>
> They aren't about the actual instrumentation. The warnings are about
> module_ctor/module_dtor functions which are compiler-generated, and
> these are only called on initialization/destruction (dtors only for
> modules I guess).
>
> E.g. for KASAN it's the calls to __asan_register_globals that are
> called from asan.module_ctor. For KCSAN the tsan.module_ctor is
> effectively a noop (because __tsan_init() is a noop), so it really
> doesn't matter much.
>
> Is my assumption correct that the only effect would be if something
> called by them fails, we just don't see the full stack trace? I think
> we can live with that, there are only few central places that deal
> with ctors/dtors (do_ctors(), ...?).
>
> The "real" fix would be to teach the compilers about "frame pointer
> save/setup" for generated functions, but I don't think that's
> realistic.

So this has come up before, specifically in the context of gcov:
https://github.com/ClangBuiltLinux/linux/issues/955.

I looked into this a bit, and IIRC, the issue was that compiler
generated functions aren't very good about keeping track of whether
they should or should not emit framepointer setup/teardown
prolog/epilogs.  In LLVM's IR, -fno-omit-frame-pointer gets attached
to every function as a function level attribute.
https://godbolt.org/z/fcn9c6 ("frame-pointer"="all").

There were some recent LLVM patches for BTI (arm64) that made some BTI
related command line flags module level attributes, which I thought
was interesting; I was wondering last night if -fno-omit-frame-pointer
and maybe even the level of stack protector should be?  I guess LTO
would complicate things; not sure it would be good to merge modules
with different attributes; I'm not sure how that's handled today in
LLVM.

Basically, when the compiler is synthesizing a new function
definition, it should check whether a frame pointer should be emitted
or not.  We could do that today by maybe scanning all other function
definitions for the presence of "frame-pointer"="all" fn attr,
breaking early if we find one, and emitting the frame pointer setup in
that case.  Though I guess it's "frame-pointer"="none" otherwise, so
maybe checking any other fn def would be fine; I don't see any C fn
attr's that allow you to keep frame pointers or not.  What's tricky is
that the front end flag was resolved much earlier than where this code
gets generated, so it would need to look for traces that the flag ever
existed, which sounds brittle on paper to me.


Thanks for the summary -- yeah, that was my suspicion, that some
attribute was being lost somewhere. And I think if we generalize this,
and don't just try to attach "frame-pointer" attr to the function, we
probably also solve the BTI issue that Mark still pointed out with
these module_ctor/dtors.

I was trying to see if there was a generic way to attach all the
common attributes to the function generated here:
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/ModuleUtils.cpp#L122
-- but we probably can't attach all attributes, and need to remove a
bunch of them again like the sanitizers (or alternatively just select
the ones we need). But, I'm still digging for the function that
attaches all the common attributes...

Thanks,
-- Marco


Speaking of gcov, do 

Re: [PATCH v2 2/7] Revert "kbuild: disable clang's default use of -fmerge-all-constants"

2020-09-01 Thread Fangrui Song




On 2020-08-31, Nathan Chancellor wrote:

On Mon, Aug 31, 2020 at 05:23:21PM -0700, Nick Desaulniers wrote:

This reverts commit 87e0d4f0f37fb0c8c4aeeac46fff5e957738df79.

This was fixed in clang-6; the minimum supported version of clang in the
kernel is clang-10 (10.0.1).

Link: https://reviews.llvm.org/rL329300.
Link: https://github.com/ClangBuiltLinux/linux/issues/9
Suggested-by: Nathan Chancellor 
Signed-off-by: Nick Desaulniers 


Reviewed-by: Nathan Chancellor 


How about expanding "This was fixed in clang-6" to be
-fno-merge-all-constants has been the default since clang-6?

(Both gcc|clang -fmerge-all-constants can cause an assertion failure for
the example on https://bugs.llvm.org/show_bug.cgi?id=18538 )

Reviewed-by: Fangrui Song 


---
 Makefile | 9 -
 1 file changed, 9 deletions(-)

diff --git a/Makefile b/Makefile
index 37739ee53f27..144ac6a073ff 100644
--- a/Makefile
+++ b/Makefile
@@ -932,15 +932,6 @@ KBUILD_CFLAGS += $(call cc-disable-warning, 
maybe-uninitialized)
 # disable invalid "can't wrap" optimizations for signed / pointers
 KBUILD_CFLAGS  += $(call cc-option,-fno-strict-overflow)

-# clang sets -fmerge-all-constants by default as optimization, but this
-# is non-conforming behavior for C and in fact breaks the kernel, so we
-# need to disable it here generally.
-KBUILD_CFLAGS  += $(call cc-option,-fno-merge-all-constants)
-
-# for gcc -fno-merge-all-constants disables everything, but it is fine
-# to have actual conforming behavior enabled.
-KBUILD_CFLAGS  += $(call cc-option,-fmerge-constants)
-
 # Make sure -fstack-check isn't enabled (like gentoo apparently did)
 KBUILD_CFLAGS  += $(call cc-option,-fno-stack-check,)

--
2.28.0.402.g5ffc5be6b7-goog



--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clang-built-linux+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clang-built-linux/20200901045516.GA1561318%40ubuntu-n2-xlarge-x86.


Re: [PATCH v5 23/36] arm/build: Explicitly keep .ARM.attributes sections

2020-08-17 Thread Fangrui Song

On 2020-08-03, 'Nick Desaulniers' via Clang Built Linux wrote:

On Fri, Jul 31, 2020 at 4:18 PM Kees Cook  wrote:


In preparation for adding --orphan-handling=warn, explicitly keep the
.ARM.attributes section by expanding the existing ELF_DETAILS macro into
ARM_DETAILS.

Suggested-by: Nick Desaulniers 
Link: 
https://lore.kernel.org/lkml/cakwvodk-racgq5pxsogs6vtifbtrk5fmkmnolxrqmaovv0n...@mail.gmail.com/
Signed-off-by: Kees Cook 
---
 arch/arm/include/asm/vmlinux.lds.h | 4 
 arch/arm/kernel/vmlinux-xip.lds.S  | 2 +-
 arch/arm/kernel/vmlinux.lds.S  | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/vmlinux.lds.h 
b/arch/arm/include/asm/vmlinux.lds.h
index a08f4301b718..c4af5182ab48 100644
--- a/arch/arm/include/asm/vmlinux.lds.h
+++ b/arch/arm/include/asm/vmlinux.lds.h
@@ -52,6 +52,10 @@
ARM_MMU_DISCARD(*(__ex_table))  \
COMMON_DISCARDS

+#define ARM_DETAILS\
+   ELF_DETAILS \
+   .ARM.attributes 0 : { *(.ARM.attributes) }


I had to look up what the `0` meant:
https://sourceware.org/binutils/docs/ld/Output-Section-Attributes.html#Output-Section-Attributes
mentions it's an "address" and
https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_chapter/ld_3.html#SEC21
mentions it as "start" (an address).
Unless we need those, can we drop them? (Sorry for the resulting churn
that would cause).  I think the NO_LOAD stuff makes more sense, but
I'm curious if the kernel checks for that.


NOLOAD means SHT_NOBITS (usually SHF_ALLOC). .ARM.attributes is a
non-SHF_ALLOC section.

An explicit 0 (output section address) is good - GNU ld's internal
linker scripts (ld --verbose output) use 0 for such non-SHF_ALLOC sections.
Without the 0, the section may get a non-zero address, which is not
wrong - but probably does not look well. See https://reviews.llvm.org/D85867 
for details.


Reviewed-by: Fangrui Song 


+
 #define ARM_STUBS_TEXT \
*(.gnu.warning) \
*(.glue_7)  \
diff --git a/arch/arm/kernel/vmlinux-xip.lds.S 
b/arch/arm/kernel/vmlinux-xip.lds.S
index 904c31fa20ed..57fcbf55f913 100644
--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -150,7 +150,7 @@ SECTIONS
_end = .;

STABS_DEBUG
-   ELF_DETAILS
+   ARM_DETAILS
 }

 /*
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index bb950c896a67..1d3d3b599635 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -149,7 +149,7 @@ SECTIONS
_end = .;

STABS_DEBUG
-   ELF_DETAILS
+   ARM_DETAILS
 }

 #ifdef CONFIG_STRICT_KERNEL_RWX
--
2.25.1



Re: [PATCH v2] lib/string.c: implement stpcpy

2020-08-15 Thread Fangrui Song

On 2020-08-15, 'Nick Desaulniers' via Clang Built Linux wrote:

On Sat, Aug 15, 2020 at 2:31 PM Joe Perches  wrote:


On Sat, 2020-08-15 at 14:28 -0700, Nick Desaulniers wrote:
> On Sat, Aug 15, 2020 at 2:24 PM Joe Perches  wrote:
> > On Sat, 2020-08-15 at 13:47 -0700, Nick Desaulniers wrote:
> > > On Sat, Aug 15, 2020 at 9:34 AM Kees Cook  wrote:
> > > > On Fri, Aug 14, 2020 at 07:09:44PM -0700, Nick Desaulniers wrote:
> > > > > LLVM implemented a recent "libcall optimization" that lowers calls to
> > > > > `sprintf(dest, "%s", str)` where the return value is used to
> > > > > `stpcpy(dest, str) - dest`. This generally avoids the machinery 
involved
> > > > > in parsing format strings.  Calling `sprintf` with overlapping 
arguments
> > > > > was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.
> > > > >
> > > > > `stpcpy` is just like `strcpy` except it returns the pointer to the 
new
> > > > > tail of `dest`. This allows you to chain multiple calls to `stpcpy` in
> > > > > one statement.
> > > >
> > > > O_O What?
> > > >
> > > > No; this is a _terrible_ API: there is no bounds checking, there are no
> > > > buffer sizes. Anything using the example sprintf() pattern is _already_
> > > > wrong and must be removed from the kernel. (Yes, I realize that the
> > > > kernel is *filled* with this bad assumption that "I'll never write more
> > > > than PAGE_SIZE bytes to this buffer", but that's both theoretically
> > > > wrong ("640k is enough for anybody") and has been known to be wrong in
> > > > practice too (e.g. when suddenly your writing routine is reachable by
> > > > splice(2) and you may not have a PAGE_SIZE buffer).
> > > >
> > > > But we cannot _add_ another dangerous string API. We're already in a
> > > > terrible mess trying to remove strcpy[1], strlcpy[2], and strncpy[3]. 
This
> > > > needs to be addressed up by removing the unbounded sprintf() uses. (And
> > > > to do so without introducing bugs related to using snprintf() when
> > > > scnprintf() is expected[4].)
> > >
> > > Well, everything (-next, mainline, stable) is broken right now (with
> > > ToT Clang) without providing this symbol.  I'm not going to go clean
> > > the entire kernel's use of sprintf to get our CI back to being green.
> >
> > Maybe this should get place in compiler-clang.h so it isn't
> > generic and public.
>
> https://bugs.llvm.org/show_bug.cgi?id=47162#c7 and
> https://bugs.llvm.org/show_bug.cgi?id=47144
> Seem to imply that Clang is not the only compiler that can lower a
> sequence of libcalls to stpcpy.  Do we want to wait until we have a
> fire drill w/ GCC to move such an implementation from
> include/linux/compiler-clang.h back in to lib/string.c?

My guess is yes, wait until gcc, if ever, needs it.


The suggestion to use static inline doesn't even make sense. The
compiler is lowering calls to other library routines; `stpcpy` isn't
being explicitly called.  Even if it was, not sure we want it being
inlined.  No symbol definition will be emitted; problem not solved.
And I refuse to add any more code using `extern inline`.  Putting the
definition in lib/string.c is the most straightforward and avoids
revisiting this issue in the future for other toolchains.  I'll limit
access by removing the declaration, and adding a comment to avoid its
use.  But if you're going to use a gnu target triple without using
-ffreestanding because you *want* libcall optimizations, then you have
to provide symbols for all possible library routines!


Adding a definition without a declaration for stpcpy looks good.
Clang LTO will work.

(If the kernel does not want to provide these routines,
is http://git.kernel.org/linus/6edfba1b33c701108717f4e036320fc39abe1912
probably wrong? (why remove -ffreestanding from the main Makefile) )


Re: [PATCH v3] Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation

2020-07-21 Thread Fangrui Song

On 2020-07-22, Masahiro Yamada wrote:

On Wed, Jul 22, 2020 at 9:14 AM Fangrui Song  wrote:


On 2020-07-22, Masahiro Yamada wrote:
>On Wed, Jul 22, 2020 at 2:31 AM 'Fangrui Song' via Clang Built Linux
> wrote:
>>
>> When CROSS_COMPILE is set (e.g. aarch64-linux-gnu-), if
>> $(CROSS_COMPILE)elfedit is found at /usr/bin/aarch64-linux-gnu-elfedit,
>> GCC_TOOLCHAIN_DIR will be set to /usr/bin/.  --prefix= will be set to
>> /usr/bin/ and Clang as of 11 will search for both
>> $(prefix)aarch64-linux-gnu-$needle and $(prefix)$needle.
>>
>> GCC searchs for $(prefix)aarch64-linux-gnu/$version/$needle,
>> $(prefix)aarch64-linux-gnu/$needle and $(prefix)$needle. In practice,
>> $(prefix)aarch64-linux-gnu/$needle rarely contains executables.
>>
>> To better model how GCC's -B/--prefix takes in effect in practice, newer
>> Clang (since
>> 
https://github.com/llvm/llvm-project/commit/3452a0d8c17f7166f479706b293caf6ac76ffd90)
>> only searches for $(prefix)$needle. Currently it will find /usr/bin/as
>> instead of /usr/bin/aarch64-linux-gnu-as.
>>
>> Set --prefix= to $(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
>> (/usr/bin/aarch64-linux-gnu-) so that newer Clang can find the
>> appropriate cross compiling GNU as (when -no-integrated-as is in
>> effect).
>>
>> Cc: sta...@vger.kernel.org
>> Reported-by: Nathan Chancellor 
>> Signed-off-by: Fangrui Song 
>> Reviewed-by: Nathan Chancellor 
>> Tested-by: Nathan Chancellor 
>> Tested-by: Nick Desaulniers 
>> Link: https://github.com/ClangBuiltLinux/linux/issues/1099
>> ---
>> Changes in v2:
>> * Updated description to add tags and the llvm-project commit link.
>> * Fixed a typo.
>>
>> Changes in v3:
>> * Add Cc: sta...@vger.kernel.org
>> ---
>>  Makefile | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/Makefile b/Makefile
>> index 0b5f8538bde5..3ac83e375b61 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -567,7 +567,7 @@ ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep 
clang),)
>>  ifneq ($(CROSS_COMPILE),)
>>  CLANG_FLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%))
>>  GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
>> -CLANG_FLAGS+= --prefix=$(GCC_TOOLCHAIN_DIR)
>> +CLANG_FLAGS+= --prefix=$(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
>
>
>
>CROSS_COMPILE may contain the directory path
>to the cross toolchains.
>
>
>For example, I use aarch64-linux-gnu-*
>installed in
>/home/masahiro/tools/aarch64-linaro-7.5/bin
>
>
>
>Basically, there are two ways to use it.
>
>[1]
>PATH=$PATH:/home/masahiro/tools/aarch64-linaro-7.5/bin
>CROSS_COMPILE=aarch64-linux-gnu-
>
>
>[2]
>Without setting PATH,
>CROSS_COMPILE=~/tools/aarch64-linaro-7.5/bin/aarch64-linux-gnu-
>
>
>
>I usually do [2] (and so does intel's 0day bot).
>
>
>
>This patch works for the use-case [1]
>but if I do [2], --prefix is set to a strange path:
>
>--prefix=/home/masahiro/tools/aarch64-linaro-7.5/bin//home/masahiro/tools/aarch64-linaro-7.5/bin/aarch64-linux-gnu-

Thanks. I did not know the use-case [2].
This explains why there is a `$(notdir ...)` in
`CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%))`


>
>
>Interestingly, the build is still successful.
>Presumably Clang searches for more paths
>when $(prefix)$needle is not found ?

The priority order is:

-B(--prefix), COMPILER_PATH, detected gcc-cross paths

(In GCC, -B paths get prepended to the COMPILER_PATH list. Clang<=11 incorrectly
adds -B to the COMPILER_PATH list. I have fixed it for 12.0.0)

If -B fails, the detected gcc-cross paths may still be able to find
/usr/bin/aarch64-linux-gnu-

For example, on my machine (a variant of Debian testing), Clang finds
$(realpath
/usr/lib/gcc-cross/aarch64-linux-gnu/9/../../../../aarch64-linux-gnu/bin/as),
which is /usr/bin/aarch64-linux-gnu-as

>
>I applied your patch and added -v option
>to see which assembler was internally invoked:
>
> 
"/home/masahiro/tools/aarch64-linaro-7.5/lib/gcc/aarch64-linux-gnu/7.5.0/../../../../aarch64-linux-gnu/bin/as"
>-EL -I ./arch/arm64/include -I ./arch/arm64/include/generated -I
>./include -I ./arch/arm64/include/uapi -I
>./arch/arm64/include/generated/uapi -I ./include/uapi -I
>./include/generated/uapi -o kernel/smp.o /tmp/smp-2ec2c7.s
>
>
>Ok, it looks like Clang found an alternative path
>to the correct 'as'.
>
>
>
>
>But, to keep the original behavior for both [1] and [2],
>how about this?
>
>CLANG_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR)$(notdir $(CROSS_COMPILE))
>
>
>
>Then, I can get this:
>
> "/home/masah

Re: [PATCH v3] Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation

2020-07-21 Thread Fangrui Song

On 2020-07-22, Masahiro Yamada wrote:

On Wed, Jul 22, 2020 at 2:31 AM 'Fangrui Song' via Clang Built Linux
 wrote:


When CROSS_COMPILE is set (e.g. aarch64-linux-gnu-), if
$(CROSS_COMPILE)elfedit is found at /usr/bin/aarch64-linux-gnu-elfedit,
GCC_TOOLCHAIN_DIR will be set to /usr/bin/.  --prefix= will be set to
/usr/bin/ and Clang as of 11 will search for both
$(prefix)aarch64-linux-gnu-$needle and $(prefix)$needle.

GCC searchs for $(prefix)aarch64-linux-gnu/$version/$needle,
$(prefix)aarch64-linux-gnu/$needle and $(prefix)$needle. In practice,
$(prefix)aarch64-linux-gnu/$needle rarely contains executables.

To better model how GCC's -B/--prefix takes in effect in practice, newer
Clang (since
https://github.com/llvm/llvm-project/commit/3452a0d8c17f7166f479706b293caf6ac76ffd90)
only searches for $(prefix)$needle. Currently it will find /usr/bin/as
instead of /usr/bin/aarch64-linux-gnu-as.

Set --prefix= to $(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
(/usr/bin/aarch64-linux-gnu-) so that newer Clang can find the
appropriate cross compiling GNU as (when -no-integrated-as is in
effect).

Cc: sta...@vger.kernel.org
Reported-by: Nathan Chancellor 
Signed-off-by: Fangrui Song 
Reviewed-by: Nathan Chancellor 
Tested-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Link: https://github.com/ClangBuiltLinux/linux/issues/1099
---
Changes in v2:
* Updated description to add tags and the llvm-project commit link.
* Fixed a typo.

Changes in v3:
* Add Cc: sta...@vger.kernel.org
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 0b5f8538bde5..3ac83e375b61 100644
--- a/Makefile
+++ b/Makefile
@@ -567,7 +567,7 @@ ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep 
clang),)
 ifneq ($(CROSS_COMPILE),)
 CLANG_FLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%))
 GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
-CLANG_FLAGS+= --prefix=$(GCC_TOOLCHAIN_DIR)
+CLANG_FLAGS+= --prefix=$(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)




CROSS_COMPILE may contain the directory path
to the cross toolchains.


For example, I use aarch64-linux-gnu-*
installed in
/home/masahiro/tools/aarch64-linaro-7.5/bin



Basically, there are two ways to use it.

[1]
PATH=$PATH:/home/masahiro/tools/aarch64-linaro-7.5/bin
CROSS_COMPILE=aarch64-linux-gnu-


[2]
Without setting PATH,
CROSS_COMPILE=~/tools/aarch64-linaro-7.5/bin/aarch64-linux-gnu-



I usually do [2] (and so does intel's 0day bot).



This patch works for the use-case [1]
but if I do [2], --prefix is set to a strange path:

--prefix=/home/masahiro/tools/aarch64-linaro-7.5/bin//home/masahiro/tools/aarch64-linaro-7.5/bin/aarch64-linux-gnu-


Thanks. I did not know the use-case [2].
This explains why there is a `$(notdir ...)` in
`CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%))`





Interestingly, the build is still successful.
Presumably Clang searches for more paths
when $(prefix)$needle is not found ?


The priority order is:

-B(--prefix), COMPILER_PATH, detected gcc-cross paths

(In GCC, -B paths get prepended to the COMPILER_PATH list. Clang<=11 incorrectly
adds -B to the COMPILER_PATH list. I have fixed it for 12.0.0)

If -B fails, the detected gcc-cross paths may still be able to find 
/usr/bin/aarch64-linux-gnu-


For example, on my machine (a variant of Debian testing), Clang finds
$(realpath
/usr/lib/gcc-cross/aarch64-linux-gnu/9/../../../../aarch64-linux-gnu/bin/as),
which is /usr/bin/aarch64-linux-gnu-as



I applied your patch and added -v option
to see which assembler was internally invoked:

"/home/masahiro/tools/aarch64-linaro-7.5/lib/gcc/aarch64-linux-gnu/7.5.0/../../../../aarch64-linux-gnu/bin/as"
-EL -I ./arch/arm64/include -I ./arch/arm64/include/generated -I
./include -I ./arch/arm64/include/uapi -I
./arch/arm64/include/generated/uapi -I ./include/uapi -I
./include/generated/uapi -o kernel/smp.o /tmp/smp-2ec2c7.s


Ok, it looks like Clang found an alternative path
to the correct 'as'.




But, to keep the original behavior for both [1] and [2],
how about this?

CLANG_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR)$(notdir $(CROSS_COMPILE))



Then, I can get this:

"/home/masahiro/tools/aarch64-linaro-7.5/bin/aarch64-linux-gnu-as"
-EL -I ./arch/arm64/include -I ./arch/arm64/include/generated -I
./include -I ./arch/arm64/include/uapi -I
./arch/arm64/include/generated/uapi -I ./include/uapi -I
./include/generated/uapi -o kernel/smp.o /tmp/smp-16d76f.s


This looks good.

Agreed that `--prefix=$(GCC_TOOLCHAIN_DIR)$(notdir $(CROSS_COMPILE))` should 
work for both [1] and [2].

Shall I send a v4? Or you are kind enough that you'll just add your 
Signed-off-by: tag
and fix that for me? :)





 GCC_TOOLCHAIN  := $(realpath $(GCC_TOOLCHAIN_DIR)/..)
 endif
 ifneq ($(GCC_TOOLCHAIN),)
--
2.28.0.rc0.105.gf9edc3c819-goog

--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group

[PATCH v3] Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation

2020-07-21 Thread Fangrui Song
When CROSS_COMPILE is set (e.g. aarch64-linux-gnu-), if
$(CROSS_COMPILE)elfedit is found at /usr/bin/aarch64-linux-gnu-elfedit,
GCC_TOOLCHAIN_DIR will be set to /usr/bin/.  --prefix= will be set to
/usr/bin/ and Clang as of 11 will search for both
$(prefix)aarch64-linux-gnu-$needle and $(prefix)$needle.

GCC searchs for $(prefix)aarch64-linux-gnu/$version/$needle,
$(prefix)aarch64-linux-gnu/$needle and $(prefix)$needle. In practice,
$(prefix)aarch64-linux-gnu/$needle rarely contains executables.

To better model how GCC's -B/--prefix takes in effect in practice, newer
Clang (since
https://github.com/llvm/llvm-project/commit/3452a0d8c17f7166f479706b293caf6ac76ffd90)
only searches for $(prefix)$needle. Currently it will find /usr/bin/as
instead of /usr/bin/aarch64-linux-gnu-as.

Set --prefix= to $(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
(/usr/bin/aarch64-linux-gnu-) so that newer Clang can find the
appropriate cross compiling GNU as (when -no-integrated-as is in
effect).

Cc: sta...@vger.kernel.org
Reported-by: Nathan Chancellor 
Signed-off-by: Fangrui Song 
Reviewed-by: Nathan Chancellor 
Tested-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Link: https://github.com/ClangBuiltLinux/linux/issues/1099
---
Changes in v2:
* Updated description to add tags and the llvm-project commit link.
* Fixed a typo.

Changes in v3:
* Add Cc: sta...@vger.kernel.org
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 0b5f8538bde5..3ac83e375b61 100644
--- a/Makefile
+++ b/Makefile
@@ -567,7 +567,7 @@ ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep 
clang),)
 ifneq ($(CROSS_COMPILE),)
 CLANG_FLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%))
 GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
-CLANG_FLAGS+= --prefix=$(GCC_TOOLCHAIN_DIR)
+CLANG_FLAGS+= --prefix=$(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
 GCC_TOOLCHAIN  := $(realpath $(GCC_TOOLCHAIN_DIR)/..)
 endif
 ifneq ($(GCC_TOOLCHAIN),)
-- 
2.28.0.rc0.105.gf9edc3c819-goog



[PATCH v2] Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation

2020-07-20 Thread Fangrui Song
When CROSS_COMPILE is set (e.g. aarch64-linux-gnu-), if
$(CROSS_COMPILE)elfedit is found at /usr/bin/aarch64-linux-gnu-elfedit,
GCC_TOOLCHAIN_DIR will be set to /usr/bin/.  --prefix= will be set to
/usr/bin/ and Clang as of 11 will search for both
$(prefix)aarch64-linux-gnu-$needle and $(prefix)$needle.

GCC searchs for $(prefix)aarch64-linux-gnu/$version/$needle,
$(prefix)aarch64-linux-gnu/$needle and $(prefix)$needle. In practice,
$(prefix)aarch64-linux-gnu/$needle rarely contains executables.

To better model how GCC's -B/--prefix takes in effect in practice, newer
Clang (since
https://github.com/llvm/llvm-project/commit/3452a0d8c17f7166f479706b293caf6ac76ffd90)
only searches for $(prefix)$needle. Currently it will find /usr/bin/as
instead of /usr/bin/aarch64-linux-gnu-as.

Set --prefix= to $(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
(/usr/bin/aarch64-linux-gnu-) so that newer Clang can find the
appropriate cross compiling GNU as (when -no-integrated-as is in
effect).

Reported-by: Nathan Chancellor 
Signed-off-by: Fangrui Song 
Reviewed-by: Nathan Chancellor 
Tested-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Link: https://github.com/ClangBuiltLinux/linux/issues/1099
---
Changes in v2:
* Updated description to add tags and the llvm-project commit link.
* Fixed a typo.
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 0b5f8538bde5..3ac83e375b61 100644
--- a/Makefile
+++ b/Makefile
@@ -567,7 +567,7 @@ ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep 
clang),)
 ifneq ($(CROSS_COMPILE),)
 CLANG_FLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%))
 GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
-CLANG_FLAGS+= --prefix=$(GCC_TOOLCHAIN_DIR)
+CLANG_FLAGS+= --prefix=$(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
 GCC_TOOLCHAIN  := $(realpath $(GCC_TOOLCHAIN_DIR)/..)
 endif
 ifneq ($(GCC_TOOLCHAIN),)
-- 
2.28.0.rc0.105.gf9edc3c819-goog



Re: [PATCH] Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation

2020-07-20 Thread Fangrui Song

On 2020-07-20, Nick Desaulniers wrote:

On Mon, Jul 20, 2020 at 11:16 AM Nathan Chancellor
 wrote:


On Mon, Jul 20, 2020 at 11:12:22AM -0700, Fangrui Song wrote:
> When CROSS_COMPILE is set (e.g. aarch64-linux-gnu-), if
> $(CROSS_COMPILE)elfedit is found at /usr/bin/aarch64-linux-gnu-,
> GCC_TOOLCHAIN_DIR will be set to /usr/bin/.  --prefix= will be set to
> /usr/bin/ and Clang as of 11 will search for both
> $(prefix)aarch64-linux-gnu-$needle and $(prefix)$needle.
>
> GCC searchs for $(prefix)aarch64-linux-gnu/$version/$needle,
> $(prefix)aarch64-linux-gnu/$needle and $(prefix)$needle. In practice,
> $(prefix)aarch64-linux-gnu/$needle rarely contains executables.
>
> To better model how GCC's -B/--prefix takes in effect in practice, newer
> Clang only searches for $(prefix)$needle and for example it will find


"newer Clang" requires the reader to recall that "Clang as of 11" was
the previous frame of reference. I think it would be clearer to:
1. call of clang-12 as having a difference in behavior.
2. explicitly link to the commit, ie:
Link: 
https://github.com/llvm/llvm-project/commit/3452a0d8c17f7166f479706b293caf6ac76ffd90


> /usr/bin/as instead of /usr/bin/aarch64-linux-gnu-as.


That's a common source of pain (for example, when cross compiling
without having the proper cross binutils installed, it's common to get
spooky errors about unsupported configs or host binutils not
recognizing flags specific to cross building).

/usr/bin/as: unrecognized option '-EL'

being the most common.  So in that case, I'm actually very happy with
the llvm change if it solves that particularly common pain point.


>
> Set --prefix= to $(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
> (/usr/bin/aarch64-linux-gnu-) so that newer Clang can find the
> appropriate cross compiling GNU as (when -no-integrated-as is in
> effect).
>
> Signed-off-by: Nathan Chancellor 
> Signed-off-by: Fangrui Song 
> Link: https://github.com/ClangBuiltLinux/linux/issues/1099

Sorry that I did not pay attention before but this needs

Cc: sta...@vger.kernel.org


Agreed.  This change to llvm will blow up all of our CI jobs that
cross compile if not backported to stable.


Thanks. I did not know this.



in the body so that it gets automatically backported into all of our
stable branches. I am not sure if Masahiro is okay with adding that
after the fact or if he will want a v2.

I am fine with having my signed-off-by on the patch but I did not really
do much :) I am fine with having that downgraded to


Not a big deal, but there's only really two cases I can think of where
it's appropriate to attach someone else's "SOB" to a patch:
1. It's their patch that you're resending on their behalf, possibly as
part of a larger series.
2. You're a maintainer, and...well I guess that's also case 1 above.

Reported-by is more appropriate, and you can include the tags
collected from this thread.  Please ping me internally for help
sending a v2, if needed.


Nathan's draft patch on
https://github.com/ClangBuiltLinux/linux/issues/1099 actually works.
I removed the slash. The words are my own. Since Nathan explicitly
requested a downgrade of his tag, I'll do that for V2.

I'll do that anyway because I need to fix a typo in the description:

$(CROSS_COMPILE)elfedit is found at /usr/bin/aarch64-linux-gnu-
$(CROSS_COMPILE)elfedit is found at /usr/bin/aarch64-linux-gnu-elfedit



Reviewed-by: Nathan Chancellor 
Tested-by: Nathan Chancellor 


I tested with this llvm pre- and post-
https://github.com/llvm/llvm-project/commit/3452a0d8c17f7166f479706b293caf6ac76ffd90.
I also tested downstream Android kernel builds with
3452a0d8c17f7166f479706b293caf6ac76ffd90. Builds that don't make use
of CROSS_COMPILE (native host targets) are obviously unaffected.  We
might see this issue pop up a few more times internally if the patch
isn't picked up by stable, or if those downstream kernel trees don't
rebase on stable kernel trees as often as they refresh their
toolchain.

Tested-by: Nick Desaulniers 


Thanks for offerring proofreading service! I'm working on the
description...



if people find it odd.

Thanks for sending this along!

Cheers,
Nathan

> ---
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Makefile b/Makefile
> index 0b5f8538bde5..3ac83e375b61 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -567,7 +567,7 @@ ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep 
clang),)
>  ifneq ($(CROSS_COMPILE),)
>  CLANG_FLAGS  += --target=$(notdir $(CROSS_COMPILE:%-=%))
>  GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
> -CLANG_FLAGS  += --prefix=$(GCC_TOOLCHAIN_DIR)
> +CLANG_FLAGS  += --prefix=$(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
>  GCC_TOOLCHAIN:= $(realpath $(GCC_TOOLCHAIN_DIR)/..)
>  endif
>  ifneq ($(GCC_TOOLCHAIN),)
> --
> 2.28.0.rc0.105.gf9edc3c819-goog
>

--


--
Thanks,
~Nick Desaulniers


[PATCH] Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation

2020-07-20 Thread Fangrui Song
When CROSS_COMPILE is set (e.g. aarch64-linux-gnu-), if
$(CROSS_COMPILE)elfedit is found at /usr/bin/aarch64-linux-gnu-,
GCC_TOOLCHAIN_DIR will be set to /usr/bin/.  --prefix= will be set to
/usr/bin/ and Clang as of 11 will search for both
$(prefix)aarch64-linux-gnu-$needle and $(prefix)$needle.

GCC searchs for $(prefix)aarch64-linux-gnu/$version/$needle,
$(prefix)aarch64-linux-gnu/$needle and $(prefix)$needle. In practice,
$(prefix)aarch64-linux-gnu/$needle rarely contains executables.

To better model how GCC's -B/--prefix takes in effect in practice, newer
Clang only searches for $(prefix)$needle and for example it will find
/usr/bin/as instead of /usr/bin/aarch64-linux-gnu-as.

Set --prefix= to $(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
(/usr/bin/aarch64-linux-gnu-) so that newer Clang can find the
appropriate cross compiling GNU as (when -no-integrated-as is in
effect).

Signed-off-by: Nathan Chancellor 
Signed-off-by: Fangrui Song 
Link: https://github.com/ClangBuiltLinux/linux/issues/1099
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 0b5f8538bde5..3ac83e375b61 100644
--- a/Makefile
+++ b/Makefile
@@ -567,7 +567,7 @@ ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep 
clang),)
 ifneq ($(CROSS_COMPILE),)
 CLANG_FLAGS+= --target=$(notdir $(CROSS_COMPILE:%-=%))
 GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
-CLANG_FLAGS+= --prefix=$(GCC_TOOLCHAIN_DIR)
+CLANG_FLAGS+= --prefix=$(GCC_TOOLCHAIN_DIR)$(CROSS_COMPILE)
 GCC_TOOLCHAIN  := $(realpath $(GCC_TOOLCHAIN_DIR)/..)
 endif
 ifneq ($(GCC_TOOLCHAIN),)
-- 
2.28.0.rc0.105.gf9edc3c819-goog



Re: Plumbers session on GNU+LLVM collab?

2020-07-10 Thread Fangrui Song

On 2020-07-09, 'Nick Desaulniers' via Clang Built Linux wrote:

Hi Segher, Rasmus, and Ramana,
I am working on finalizing a proposal for an LLVM microconference at
plumbers, which is focusing on a lot of issues we currently face on
the LLVM side.

I'd really like to host a session with more GNU toolchain developers
to discuss collaboration more.

I was curious; are either of you planning on attending plumbers this year?

If so, would such a session be interesting enough for you to attend?


Looks like a good idea. I am interested.

Perhaps Tom Stellard, Jeremy Bennett, Nathan Sidwell and Iain Sandoe have some 
ideas.
They have a talk about GCC/LLVM collaboration
https://gcc.gnu.org/wiki/cauldron2019#cauldron2019talks.GCC_LLVM_Collaboration_BoF


I was curious too, who else we should explicitly invite?  I ran a
quick set analysis on who's contributed to both kernel and
, and the list was much much bigger than I was expecting.
https://gist.github.com/nickdesaulniers/5330eea6f46dea93e7766bb03311d474
89 contributors to both linux and llvm
283 linux+gcc
159 linux+binutils
(No one to all four yet...also, not super scientific, since I'm using
name+email for the set, and emails change. Point being I don't want to
explicitly invite hundreds of people)


Might be worth sending an email to g...@gcc.gnu.org as well.

This month's archive: https://sourceware.org/pipermail/gcc/2020-July/


Re: [PATCH v3 7/7] x86/boot: Check that there are no runtime relocations

2020-06-30 Thread Fangrui Song

* Ard Biesheuvel

On Tue, 30 Jun 2020 at 01:34, Fangrui Song  wrote:
>
> On 2020-06-29, Ard Biesheuvel wrote:
> >On Mon, 29 Jun 2020 at 19:37, Fangrui Song  wrote:
> >>
> >> On 2020-06-29, Arvind Sankar wrote:
> >> >On Mon, Jun 29, 2020 at 09:20:31AM -0700, Kees Cook wrote:
> >> >> On Mon, Jun 29, 2020 at 06:11:59PM +0200, Ard Biesheuvel wrote:
> >> >> > On Mon, 29 Jun 2020 at 18:09, Kees Cook  wrote:
> >> >> > >
> >> >> > > On Mon, Jun 29, 2020 at 10:09:28AM -0400, Arvind Sankar wrote:
> >> >> > > > Add a linker script check that there are no runtime relocations, 
and
> >> >> > > > remove the old one that tries to check via looking for 
specially-named
> >> >> > > > sections in the object files.
> >> >> > > >
> >> >> > > > Drop the tests for -fPIE compiler option and -pie linker option, 
as they
> >> >> > > > are available in all supported gcc and binutils versions (as well 
as
> >> >> > > > clang and lld).
> >> >> > > >
> >> >> > > > Signed-off-by: Arvind Sankar 
> >> >> > > > Reviewed-by: Ard Biesheuvel 
> >> >> > > > Reviewed-by: Fangrui Song 
> >> >> > > > ---
> >> >> > > >  arch/x86/boot/compressed/Makefile  | 28 
+++---
> >> >> > > >  arch/x86/boot/compressed/vmlinux.lds.S |  8 
> >> >> > > >  2 files changed, 11 insertions(+), 25 deletions(-)
> >> >> > >
> >> >> > > Reviewed-by: Kees Cook 
> >> >> > >
> >> >> > > question below ...
> >> >> > >
> >> >> > > > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
b/arch/x86/boot/compressed/vmlinux.lds.S
> >> >> > > > index a4a4a59a2628..a78510046eec 100644
> >> >> > > > --- a/arch/x86/boot/compressed/vmlinux.lds.S
> >> >> > > > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> >> >> > > > @@ -42,6 +42,12 @@ SECTIONS
> >> >> > > >   *(.rodata.*)
> >> >> > > >   _erodata = . ;
> >> >> > > >   }
> >> >> > > > + .rel.dyn : {
> >> >> > > > + *(.rel.*)
> >> >> > > > + }
> >> >> > > > + .rela.dyn : {
> >> >> > > > + *(.rela.*)
> >> >> > > > + }
> >> >> > > >   .got : {
> >> >> > > >   *(.got)
> >> >> > > >   }
> >> >> > >
> >> >> > > Should these be marked (INFO) as well?
> >> >> > >
> >> >> >
> >> >> > Given that sections marked as (INFO) will still be emitted into the
> >> >> > ELF image, it does not really make a difference to do this for zero
> >> >> > sized sections.
> >> >>
> >> >> Oh, I misunderstood -- I though they were _not_ emitted; I see now what
> >> >> you said was not allocated. So, disk space used for the .got.plt case,
> >> >> but not memory space used. Sorry for the confusion!
> >> >>
> >> >> -Kees
> >>
> >> About output section type (INFO):
> >> 
https://sourceware.org/binutils/docs/ld/Output-Section-Type.html#Output-Section-Type
> >> says "These type names are supported for backward compatibility, and are
> >> rarely used."
> >>
> >> If all input section don't have the SHF_ALLOC flag, the output section
> >> will not have this flag as well. This type is not useful...
> >>
> >> If .got and .got.plt were used, they should be considered dynamic
> >> relocations which should be part of the loadable image. So they should
> >> have the SHF_ALLOC flag. (INFO) will not be applicable anyway.
> >>
> >
> >I don't care deeply either way, but Kees indicated that he would like
> >to get rid of the 24 bytes of .got.plt magic entries that we have no
> >need for.
> >
> >In fact, a lot of this mangling is caused by the fact that the linker
> >is creating a relocatable binary, and assumes that it is a hosted
> >binary that is loaded by a dynamic loader. It would actually be much

Re: [PATCH v3 7/7] x86/boot: Check that there are no runtime relocations

2020-06-29 Thread Fangrui Song

On 2020-06-29, Ard Biesheuvel wrote:

On Mon, 29 Jun 2020 at 19:37, Fangrui Song  wrote:


On 2020-06-29, Arvind Sankar wrote:
>On Mon, Jun 29, 2020 at 09:20:31AM -0700, Kees Cook wrote:
>> On Mon, Jun 29, 2020 at 06:11:59PM +0200, Ard Biesheuvel wrote:
>> > On Mon, 29 Jun 2020 at 18:09, Kees Cook  wrote:
>> > >
>> > > On Mon, Jun 29, 2020 at 10:09:28AM -0400, Arvind Sankar wrote:
>> > > > Add a linker script check that there are no runtime relocations, and
>> > > > remove the old one that tries to check via looking for specially-named
>> > > > sections in the object files.
>> > > >
>> > > > Drop the tests for -fPIE compiler option and -pie linker option, as 
they
>> > > > are available in all supported gcc and binutils versions (as well as
>> > > > clang and lld).
>> > > >
>> > > > Signed-off-by: Arvind Sankar 
>> > > > Reviewed-by: Ard Biesheuvel 
>> > > > Reviewed-by: Fangrui Song 
>> > > > ---
>> > > >  arch/x86/boot/compressed/Makefile  | 28 +++---
>> > > >  arch/x86/boot/compressed/vmlinux.lds.S |  8 
>> > > >  2 files changed, 11 insertions(+), 25 deletions(-)
>> > >
>> > > Reviewed-by: Kees Cook 
>> > >
>> > > question below ...
>> > >
>> > > > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
b/arch/x86/boot/compressed/vmlinux.lds.S
>> > > > index a4a4a59a2628..a78510046eec 100644
>> > > > --- a/arch/x86/boot/compressed/vmlinux.lds.S
>> > > > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
>> > > > @@ -42,6 +42,12 @@ SECTIONS
>> > > >   *(.rodata.*)
>> > > >   _erodata = . ;
>> > > >   }
>> > > > + .rel.dyn : {
>> > > > + *(.rel.*)
>> > > > + }
>> > > > + .rela.dyn : {
>> > > > + *(.rela.*)
>> > > > + }
>> > > >   .got : {
>> > > >   *(.got)
>> > > >   }
>> > >
>> > > Should these be marked (INFO) as well?
>> > >
>> >
>> > Given that sections marked as (INFO) will still be emitted into the
>> > ELF image, it does not really make a difference to do this for zero
>> > sized sections.
>>
>> Oh, I misunderstood -- I though they were _not_ emitted; I see now what
>> you said was not allocated. So, disk space used for the .got.plt case,
>> but not memory space used. Sorry for the confusion!
>>
>> -Kees

About output section type (INFO):
https://sourceware.org/binutils/docs/ld/Output-Section-Type.html#Output-Section-Type
says "These type names are supported for backward compatibility, and are
rarely used."

If all input section don't have the SHF_ALLOC flag, the output section
will not have this flag as well. This type is not useful...

If .got and .got.plt were used, they should be considered dynamic
relocations which should be part of the loadable image. So they should
have the SHF_ALLOC flag. (INFO) will not be applicable anyway.



I don't care deeply either way, but Kees indicated that he would like
to get rid of the 24 bytes of .got.plt magic entries that we have no
need for.

In fact, a lot of this mangling is caused by the fact that the linker
is creating a relocatable binary, and assumes that it is a hosted
binary that is loaded by a dynamic loader. It would actually be much
better if the compiler and linker would take -ffreestanding into
account, and suppress GOT entries, PLTs, dynamic program headers for
shared libraries altogether.


Linkers (GNU ld and LLD) don't create .got or .got.plt just because the linker
command line has -pie or -shared.  They create .got or .got.plt if there are
specific needs.

For .got.plt, if there is (1) any .plt/.iplt entry, (2) any .got.plt based
relocation (e.g. R_X86_64_GOTPC32 on x86-64), or (3) if _GLOBAL_OFFSET_TABLE_ is
referenced, .got.plt will be created (both GNU ld and LLD) with usually 3
entries (for ld.so purposes).

If (1) is not satisfied, the created .got.plt is just served as an anchor for
things that want to reference (the distance from GOT base to some point). The
linker will still reserve 3 words but the words are likely not needed.

I don't think there is a specific need for another option to teach the linker
(GNU ld or LLD) that this is a kernel link.  For -ffreestanding builds, cc
-static (ld -no-pie))/-static-pie (-pie) already work quite well.


Re: [PATCH v3 7/7] x86/boot: Check that there are no runtime relocations

2020-06-29 Thread Fangrui Song

On 2020-06-29, Arvind Sankar wrote:

On Mon, Jun 29, 2020 at 09:20:31AM -0700, Kees Cook wrote:

On Mon, Jun 29, 2020 at 06:11:59PM +0200, Ard Biesheuvel wrote:
> On Mon, 29 Jun 2020 at 18:09, Kees Cook  wrote:
> >
> > On Mon, Jun 29, 2020 at 10:09:28AM -0400, Arvind Sankar wrote:
> > > Add a linker script check that there are no runtime relocations, and
> > > remove the old one that tries to check via looking for specially-named
> > > sections in the object files.
> > >
> > > Drop the tests for -fPIE compiler option and -pie linker option, as they
> > > are available in all supported gcc and binutils versions (as well as
> > > clang and lld).
> > >
> > > Signed-off-by: Arvind Sankar 
> > > Reviewed-by: Ard Biesheuvel 
> > > Reviewed-by: Fangrui Song 
> > > ---
> > >  arch/x86/boot/compressed/Makefile  | 28 +++---
> > >  arch/x86/boot/compressed/vmlinux.lds.S |  8 
> > >  2 files changed, 11 insertions(+), 25 deletions(-)
> >
> > Reviewed-by: Kees Cook 
> >
> > question below ...
> >
> > > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
b/arch/x86/boot/compressed/vmlinux.lds.S
> > > index a4a4a59a2628..a78510046eec 100644
> > > --- a/arch/x86/boot/compressed/vmlinux.lds.S
> > > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> > > @@ -42,6 +42,12 @@ SECTIONS
> > >   *(.rodata.*)
> > >   _erodata = . ;
> > >   }
> > > + .rel.dyn : {
> > > + *(.rel.*)
> > > + }
> > > + .rela.dyn : {
> > > + *(.rela.*)
> > > + }
> > >   .got : {
> > >   *(.got)
> > >   }
> >
> > Should these be marked (INFO) as well?
> >
>
> Given that sections marked as (INFO) will still be emitted into the
> ELF image, it does not really make a difference to do this for zero
> sized sections.

Oh, I misunderstood -- I though they were _not_ emitted; I see now what
you said was not allocated. So, disk space used for the .got.plt case,
but not memory space used. Sorry for the confusion!

-Kees


About output section type (INFO):
https://sourceware.org/binutils/docs/ld/Output-Section-Type.html#Output-Section-Type
says "These type names are supported for backward compatibility, and are
rarely used."

If all input section don't have the SHF_ALLOC flag, the output section
will not have this flag as well. This type is not useful...

If .got and .got.plt were used, they should be considered dynamic
relocations which should be part of the loadable image. So they should
have the SHF_ALLOC flag. (INFO) will not be applicable anyway.

SHT_REL[A] may be allocable or not. Usually .rel[a].dyn and .rel[a].plt
are linker created allocable sections. (INFO) does not make sense for them.


In the case of the REL[A] and .got sections, they are actually already
not emitted at all into the ELF file now that they are zero size.

For .got.plt, it is only emitted for 32-bit (with the 3 reserved
entries), the 64-bit linker seems to get rid of it.


Re: [PATCH v3 2/9] vmlinux.lds.h: Add .symtab, .strtab, and .shstrtab to STABS_DEBUG

2020-06-24 Thread Fangrui Song



On 2020-06-24, Arvind Sankar wrote:

On Wed, Jun 24, 2020 at 09:16:43AM -0700, Fangrui Song wrote:


On 2020-06-24, Arvind Sankar wrote:
>On Tue, Jun 23, 2020 at 06:49:33PM -0700, Kees Cook wrote:
>> When linking vmlinux with LLD, the synthetic sections .symtab, .strtab,
>> and .shstrtab are listed as orphaned. Add them to the STABS_DEBUG section
>> so there will be no warnings when --orphan-handling=warn is used more
>> widely. (They are added above comment as it is the more common
>
>Nit 1: is "after .comment" better than "above comment"? It's above in the
>sense of higher file offset, but it's below in readelf output.

I mean this order:)

   .comment
   .symtab
   .shstrtab
   .strtab

This is the case in the absence of a linker script if at least one object file 
has .comment (mostly for GCC/clang version information) or the linker is LLD 
which adds a .comment

>Nit 2: These aren't actually debugging sections, no? Is it better to add
>a new macro for it, and is there any plan to stop LLD from warning about
>them?

https://reviews.llvm.org/D75149 "[ELF] --orphan-handling=: don't warn/error for 
unused synthesized sections"
described that .symtab .shstrtab .strtab are different in GNU ld.
Since many other GNU ld synthesized sections (.rela.dyn .plt ...) can be 
renamed or dropped
via output section descriptions, I don't understand why the 3 sections
can't be customized.


So IIUC, lld will now warn about .rela.dyn etc only if they're non-empty?


HEAD and future 11.0.0 will not warn about unused synthesized sections
like .rela.dyn

For most synthesized sections, empty = unused.



I created a feature request: 
https://sourceware.org/bugzilla/show_bug.cgi?id=26168
(If this is supported, it is a consistent behavior to warn for orphan
.symtab/.strtab/.shstrtab

There may be 50% chance that the maintainer decides that "LLD diverges"
I would disagree: there is no fundamental problems with 
.symtab/.strtab/.shstrtab which make them special in output section 
descriptions or orphan handling.)



.shstrtab is a little special in that it can't be discarded if the ELF
file contains any sections at all. But yeah, there's no reason they
can't be renamed or placed in a custom location in the file.


https://sourceware.org/pipermail/binutils/2020-March/000179.html
proposes -z nosectionheader. With this option, I believe .shstrtab is
not needed. /DISCARD/ : { *(.shstrtab) }  should achieve a similar effect.


Re: [PATCH v3 2/9] vmlinux.lds.h: Add .symtab, .strtab, and .shstrtab to STABS_DEBUG

2020-06-24 Thread Fangrui Song



On 2020-06-24, Arvind Sankar wrote:

On Tue, Jun 23, 2020 at 06:49:33PM -0700, Kees Cook wrote:

When linking vmlinux with LLD, the synthetic sections .symtab, .strtab,
and .shstrtab are listed as orphaned. Add them to the STABS_DEBUG section
so there will be no warnings when --orphan-handling=warn is used more
widely. (They are added above comment as it is the more common


Nit 1: is "after .comment" better than "above comment"? It's above in the
sense of higher file offset, but it's below in readelf output.


I mean this order:)

  .comment
  .symtab
  .shstrtab
  .strtab

This is the case in the absence of a linker script if at least one object file 
has .comment (mostly for GCC/clang version information) or the linker is LLD 
which adds a .comment


Nit 2: These aren't actually debugging sections, no? Is it better to add
a new macro for it, and is there any plan to stop LLD from warning about
them?


https://reviews.llvm.org/D75149 "[ELF] --orphan-handling=: don't warn/error for 
unused synthesized sections"
described that .symtab .shstrtab .strtab are different in GNU ld.
Since many other GNU ld synthesized sections (.rela.dyn .plt ...) can be 
renamed or dropped
via output section descriptions, I don't understand why the 3 sections
can't be customized.

I created a feature request: 
https://sourceware.org/bugzilla/show_bug.cgi?id=26168
(If this is supported, it is a consistent behavior to warn for orphan
.symtab/.strtab/.shstrtab

There may be 50% chance that the maintainer decides that "LLD diverges"
I would disagree: there is no fundamental problems with 
.symtab/.strtab/.shstrtab which make them special in output section 
descriptions or orphan handling.)


order[1].)

ld.lld: warning: :(.symtab) is being placed in '.symtab'
ld.lld: warning: :(.shstrtab) is being placed in '.shstrtab'
ld.lld: warning: :(.strtab) is being placed in '.strtab'

[1] https://lore.kernel.org/lkml/2020064928.o2a7jkq33guxf...@google.com/

Reported-by: Fangrui Song 
Reviewed-by: Fangrui Song 
Signed-off-by: Kees Cook 
---
 include/asm-generic/vmlinux.lds.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 1248a206be8d..8e71757f485b 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -792,7 +792,10 @@
.stab.exclstr 0 : { *(.stab.exclstr) }  \
.stab.index 0 : { *(.stab.index) }  \
.stab.indexstr 0 : { *(.stab.indexstr) }\
-   .comment 0 : { *(.comment) }
+   .comment 0 : { *(.comment) }\
+   .symtab 0 : { *(.symtab) }  \
+   .strtab 0 : { *(.strtab) }  \
+   .shstrtab 0 : { *(.shstrtab) }

 #ifdef CONFIG_GENERIC_BUG
 #define BUG_TABLE  \
--
2.25.1



Re: [PATCH v3 3/9] efi/libstub: Remove .note.gnu.property

2020-06-23 Thread Fangrui Song

On 2020-06-23, Kees Cook wrote:

In preparation for adding --orphan-handling=warn to more architectures,
make sure unwanted sections don't end up appearing under the .init
section prefix that libstub adds to itself during objcopy.

Signed-off-by: Kees Cook 
---
drivers/firmware/efi/libstub/Makefile | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/firmware/efi/libstub/Makefile 
b/drivers/firmware/efi/libstub/Makefile
index 75daaf20374e..9d2d2e784bca 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -66,6 +66,9 @@ lib-$(CONFIG_X86) += x86-stub.o
CFLAGS_arm32-stub.o := -DTEXT_OFFSET=$(TEXT_OFFSET)
CFLAGS_arm64-stub.o := -DTEXT_OFFSET=$(TEXT_OFFSET)

+# Remove unwanted sections first.
+STUBCOPY_FLAGS-y   += --remove-section=.note.gnu.property
+
#
# For x86, bootloaders like systemd-boot or grub-efi do not zero-initialize the
# .bss section, so the .bss section of the EFI stub needs to be included in the


arch/arm64/Kconfig enables ARM64_PTR_AUTH by default. When the config is on

ifeq ($(CONFIG_ARM64_BTI_KERNEL),y)
branch-prot-flags-$(CONFIG_CC_HAS_BRANCH_PROT_PAC_RET_BTI) := 
-mbranch-protection=pac-ret+leaf+bti
else
branch-prot-flags-$(CONFIG_CC_HAS_BRANCH_PROT_PAC_RET) := 
-mbranch-protection=pac-ret+leaf
endif

This option creates .note.gnu.property:

% readelf -n drivers/firmware/efi/libstub/efi-stub.o

Displaying notes found in: .note.gnu.property
  OwnerData sizeDescription
  GNU  0x0010   NT_GNU_PROPERTY_TYPE_0
  Properties: AArch64 feature: PAC

If .note.gnu.property is not desired in drivers/firmware/efi/libstub, specifying
-mbranch-protection=none can override -mbranch-protection=pac-ret+leaf


Re: [PATCH v2 1/3] vmlinux.lds.h: Add .gnu.version* to DISCARDS

2020-06-22 Thread Fangrui Song

On 2020-06-22, Kees Cook wrote:

On Mon, Jun 22, 2020 at 03:00:43PM -0700, Fangrui Song wrote:

On 2020-06-22, Kees Cook wrote:
> For vmlinux linking, no architecture uses the .gnu.version* section,
> so remove it via the common DISCARDS macro in preparation for adding
> --orphan-handling=warn more widely.
>
> Signed-off-by: Kees Cook 
> ---
> include/asm-generic/vmlinux.lds.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
> index db600ef218d7..6fbe9ed10cdb 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -934,6 +934,7 @@
>*(.discard) \
>*(.discard.*)   \
>*(.modinfo) \
> +  *(.gnu.version*)\
>}
>
> /**
> --
> 2.25.1

I wonder what lead to .gnu.version{,_d,_r} sections in the kernel.


This looks like a bug in bfd.ld? There are no versioned symbols in any
of the input files (and no output section either!)

The link command is:
$ ld -m elf_x86_64 --no-ld-generated-unwind-info -z noreloc-overflow -pie \
--no-dynamic-linker   --orphan-handling=warn -T \
arch/x86/boot/compressed/vmlinux.lds \
arch/x86/boot/compressed/kernel_info.o \
arch/x86/boot/compressed/head_64.o arch/x86/boot/compressed/misc.o \
arch/x86/boot/compressed/string.o arch/x86/boot/compressed/cmdline.o \
arch/x86/boot/compressed/error.o arch/x86/boot/compressed/piggy.o \
arch/x86/boot/compressed/cpuflags.o \
arch/x86/boot/compressed/early_serial_console.o \
arch/x86/boot/compressed/kaslr.o arch/x86/boot/compressed/kaslr_64.o \
arch/x86/boot/compressed/mem_encrypt.o \
arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/acpi.o \
-o arch/x86/boot/compressed/vmlinux

None of the inputs have the section:

$ for i in arch/x86/boot/compressed/kernel_info.o \
arch/x86/boot/compressed/head_64.o arch/x86/boot/compressed/misc.o \
arch/x86/boot/compressed/string.o arch/x86/boot/compressed/cmdline.o \
arch/x86/boot/compressed/error.o arch/x86/boot/compressed/piggy.o \
arch/x86/boot/compressed/cpuflags.o \
arch/x86/boot/compressed/early_serial_console.o \
arch/x86/boot/compressed/kaslr.o arch/x86/boot/compressed/kaslr_64.o \
arch/x86/boot/compressed/mem_encrypt.o \
arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/acpi.o \
; do echo -n $i": "; readelf -Vs $i | grep 'version'; done
arch/x86/boot/compressed/kernel_info.o: No version information found in this 
file.
arch/x86/boot/compressed/head_64.o: No version information found in this file.
arch/x86/boot/compressed/misc.o: No version information found in this file.
arch/x86/boot/compressed/string.o: No version information found in this file.
arch/x86/boot/compressed/cmdline.o: No version information found in this file.
arch/x86/boot/compressed/error.o: No version information found in this file.
arch/x86/boot/compressed/piggy.o: No version information found in this file.
arch/x86/boot/compressed/cpuflags.o: No version information found in this file.
arch/x86/boot/compressed/early_serial_console.o: No version information found 
in this file.
arch/x86/boot/compressed/kaslr.o: No version information found in this file.
arch/x86/boot/compressed/kaslr_64.o: No version information found in this file.
arch/x86/boot/compressed/mem_encrypt.o: No version information found in this 
file.
arch/x86/boot/compressed/pgtable_64.o: No version information found in this 
file.
arch/x86/boot/compressed/acpi.o: No version information found in this file.

And it's not in the output:

$ readelf -Vs arch/x86/boot/compressed/vmlinux | grep version
No version information found in this file.

So... for the kernel we need to silence it right now.


Re-link with -M (or -Map file) to check where .gnu.version{,_d,_r} input
sections come from?

If it is a bug, we should probably figure out which version of binutils
has fixed the bug.


Re: [PATCH v2 3/3] x86/boot: Warn on orphan section placement

2020-06-22 Thread Fangrui Song

On 2020-06-22, Kees Cook wrote:

On Mon, Jun 22, 2020 at 03:06:28PM -0700, Fangrui Song wrote:

LLD may report warnings for 3 synthetic sections if they are orphans:

ld.lld: warning: :(.symtab) is being placed in '.symtab'
ld.lld: warning: :(.shstrtab) is being placed in '.shstrtab'
ld.lld: warning: :(.strtab) is being placed in '.strtab'

Are they described?


Perhaps:

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index db600ef218d7..57e9c142e401 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -792,6 +792,9 @@
.stab.exclstr 0 : { *(.stab.exclstr) }  \
.stab.index 0 : { *(.stab.index) }  \
.stab.indexstr 0 : { *(.stab.indexstr) }\
+   .symtab 0 : { *(.symtab) }  \
+   .strtab 0 : { *(.strtab) }  \
+   .shstrtab 0 : { *(.shstrtab) }  \
.comment 0 : { *(.comment) }

#ifdef CONFIG_GENERIC_BUG


This LGTM. Nit: .comment before .symtab is a more common order.

Reviewed-by: Fangrui Song 


Re: [PATCH v2 3/3] x86/boot: Warn on orphan section placement

2020-06-22 Thread Fangrui Song

On 2020-06-22, Kees Cook wrote:

We don't want to depend on the linker's orphan section placement
heuristics as these can vary between linkers, and may change between
versions. All sections need to be explicitly named in the linker
script.

Add the common debugging sections. Discard the unused note, rel, plt,
dyn, and hash sections that are not needed in the compressed vmlinux.
Disable .eh_frame generation in the linker and enable orphan section
warnings.

Signed-off-by: Kees Cook 
---
arch/x86/boot/compressed/Makefile  |  3 ++-
arch/x86/boot/compressed/vmlinux.lds.S | 11 +++
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 7619742f91c9..646720a05f89 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -48,6 +48,7 @@ GCOV_PROFILE := n
UBSAN_SANITIZE :=n

KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
+KBUILD_LDFLAGS += $(call ld-option,--no-ld-generated-unwind-info)
# Compressed kernel should be built as PIE since it may be loaded at any
# address by the bootloader.
ifeq ($(CONFIG_X86_32),y)
@@ -59,7 +60,7 @@ else
KBUILD_LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \
&& echo "-z noreloc-overflow -pie --no-dynamic-linker")
endif
-LDFLAGS_vmlinux := -T
+LDFLAGS_vmlinux := --orphan-handling=warn -T

hostprogs   := mkpiggy
HOST_EXTRACFLAGS += -I$(srctree)/tools/include
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
b/arch/x86/boot/compressed/vmlinux.lds.S
index 8f1025d1f681..6fe3ecdfd685 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -75,5 +75,16 @@ SECTIONS
. = ALIGN(PAGE_SIZE);   /* keep ZO size page aligned */
_end = .;

+   STABS_DEBUG
+   DWARF_DEBUG
+
DISCARDS
+   /DISCARD/ : {
+   *(.note.*)
+   *(.rela.*) *(.rela_*)
+   *(.rel.*) *(.rel_*)
+   *(.plt) *(.plt.*)
+   *(.dyn*)
+   *(.hash) *(.gnu.hash)
+   }
}
--
2.25.1


LLD may report warnings for 3 synthetic sections if they are orphans:

ld.lld: warning: :(.symtab) is being placed in '.symtab'
ld.lld: warning: :(.shstrtab) is being placed in '.shstrtab'
ld.lld: warning: :(.strtab) is being placed in '.strtab'

Are they described?


Re: [PATCH v2 1/3] vmlinux.lds.h: Add .gnu.version* to DISCARDS

2020-06-22 Thread Fangrui Song

On 2020-06-22, Kees Cook wrote:

For vmlinux linking, no architecture uses the .gnu.version* section,
so remove it via the common DISCARDS macro in preparation for adding
--orphan-handling=warn more widely.

Signed-off-by: Kees Cook 
---
include/asm-generic/vmlinux.lds.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index db600ef218d7..6fbe9ed10cdb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -934,6 +934,7 @@
*(.discard) \
*(.discard.*)   \
*(.modinfo) \
+   *(.gnu.version*)\
}

/**
--
2.25.1


I wonder what lead to .gnu.version{,_d,_r} sections in the kernel.

tools/lib/bpf/libbpf_internal.h uses `.symver` directive and
-Wl,--version-script, which may lead to .gnu.version{,_d}, but this only
applies to the userspace libbpf.so

libperf.so has a similar -Wl,--version-script.

Linking vmlinux does not appear to use any symbol versioning.


Re: [kbuild-all] Re: gcc-5: error: -gz is not supported in this configuration

2020-06-10 Thread Fangrui Song

But if that gcc was originally
_configured_ with a version of binutils that doesn't support -gz=zlib,


I agree with this theory :)

On 2020-06-10, Arvind Sankar wrote:

On Tue, Jun 09, 2020 at 11:23:31PM -0400, Arvind Sankar wrote:

On Tue, Jun 09, 2020 at 11:12:25PM -0400, Arvind Sankar wrote:
> The output of gcc-5 -dumpspecs may also be useful.
>
> The exact Kconfig check should have been
>gcc-5 -Werror -gz=zlib -S -x c /dev/null -o /dev/null
>
> I can't see how that would succeed if the a.c test didn't but maybe just
> in case?

Oh wait, -S instead of -c. Which means it runs neither the assembler nor
the linker, so gcc won't error out. But if that gcc was originally
_configured_ with a version of binutils that doesn't support -gz=zlib,
it will give an error on -c regardless of whether the runtime binutils
would actually support it or not.


I think the below might be better than passing the option via -Wa, since
gcc will translate -gz=zlib into the right assembler option anyway, and
it will also generate an error if the compiler driver was misconfigured
and won't support the option even if the rest of the toolchain does,
fixing the config dependency.

Unless this doesn't work with Clang?


Clang>=6 supports -gz=zlib


Alternatively (or even in addition), we should redefine cc-option to use
-c, it uses -S in the Kconfig version, apparently for speed, but -c in
the Kbuild version.


Unifying cc-option in scripts/Kbuild.include & scripts/Kconfig.include
sounds good.


diff --git a/Makefile b/Makefile
index 839f9fee22cb..cb29e56f227a 100644
--- a/Makefile
+++ b/Makefile
@@ -842,7 +842,7 @@ endif

ifdef CONFIG_DEBUG_INFO_COMPRESSED
DEBUG_CFLAGS+= -gz=zlib
-KBUILD_AFLAGS  += -Wa,--compress-debug-sections=zlib
+KBUILD_AFLAGS  += -gz=zlib
KBUILD_LDFLAGS  += --compress-debug-sections=zlib
endif

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index cb98741601bd..94ce36be470c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -229,7 +229,7 @@ config DEBUG_INFO_COMPRESSED
bool "Compressed debugging information"
depends on DEBUG_INFO
depends on $(cc-option,-gz=zlib)
-   depends on $(as-option,-Wa$(comma)--compress-debug-sections=zlib)
+   depends on $(as-option,-gz=zlib)
depends on $(ld-option,--compress-debug-sections=zlib)
help
  Compress the debug information using zlib.  Requires GCC 5.0+ or Clang


This patch looks good.

(clang cc1as only supports(hardcodes) a limited number of -Wa, options
(it parses the options by itself, rather than delegating to GNU as like
GCC). If there is a compiler driver option, that is usually preferable)


Re: [kbuild-all] Re: gcc-5: error: -gz is not supported in this configuration

2020-06-09 Thread Fangrui Song

On 2020-06-10, Rong Chen wrote:



On 6/10/20 1:49 AM, Fangrui Song wrote:

On 2020-06-09, Nick Desaulniers wrote:

On Tue, Jun 9, 2020 at 6:12 AM kernel test robot  wrote:


tree: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master

head:   abfbb29297c27e3f101f348dc9e467b0fe70f919
commit: 10e68b02c861ccf2b3adb59d3f0c10dc6b5e3ace Makefile: 
support compressed debug info

date:   12 days ago
config: x86_64-randconfig-r032-20200609 (attached as .config)
compiler: gcc-5 (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
reproduce (this is a W=1 build):
    git checkout 10e68b02c861ccf2b3adb59d3f0c10dc6b5e3ace
    # save the attached .config to linux build tree
    make W=1 ARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):


gcc-5: error: -gz is not supported in this configuration


Hmm...I wonder if the feature detection is incomplete?  I suspect it's
possible to not depend on zlib.


make[2]: *** [scripts/Makefile.build:277: scripts/mod/empty.o] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [Makefile:1169: prepare0] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:185: __sub-make] Error 2


The output of gcc-5 -v --version on that machine may help.  The
convoluted gcc_cv_ld_compress_de logic in gcc/configure.ac may be
related, but I can't find any mistake that our
CONFIG_DEBUG_INFO_COMPRESSED conditions may make.


Hi Fangrui,

Here is the output:

$gcc-5 -v --version
Using built-in specs.
COLLECT_GCC=gcc-5
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
gcc-5 (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
5.5.0-12ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs 
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-5 --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ 
--enable-clocale=gnu --enable-libstdcxx-debug 
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new 
--enable-gnu-unique-object --disable-vtable-verify --enable-libmpx 
--enable-plugin --enable-default-pie --with-system-zlib 
--enable-objc-gc --enable-multiarch --disable-werror 
--with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 
--enable-multilib --with-tune=generic --enable-checking=release 
--build=x86_64-linux-gnu --host=x86_64-linux-gnu 
--target=x86_64-linux-gnu

Thread model: posix
gcc version 5.5.0 20171010 (Ubuntu 5.5.0-12ubuntu1)
COLLECT_GCC_OPTIONS='-v' '--version' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/5/cc1 -quiet -v -imultiarch 
x86_64-linux-gnu help-dummy -quiet -dumpbase help-dummy -mtune=generic 
-march=x86-64 -auxbase help-dummy -version --version 
-fstack-protector-strong -Wformat -Wformat-security -o /tmp/ccqnZumV.s

GNU C11 (Ubuntu 5.5.0-12ubuntu1) version 5.5.0 20171010 (x86_64-linux-gnu)
    compiled by GNU C version 5.5.0 20171010, GMP version 6.1.2, 
MPFR version 4.0.1, MPC version 1.1.0

warning: MPFR header version 4.0.1 differs from library version 4.0.2.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
COLLECT_GCC_OPTIONS='-v' '--version' '-mtune=generic' '-march=x86-64'
 as -v --64 --version -o /tmp/ccRPgs9J.o /tmp/ccqnZumV.s
GNU assembler version 2.34 (x86_64-linux-gnu) using BFD version (GNU 
Binutils for Ubuntu) 2.34

GNU assembler (GNU Binutils for Ubuntu) 2.34
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-linux-gnu'.
COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/5/:/usr/lib/gcc/x86_64-linux-gnu/5/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/5/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/5/:/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/5/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '--version' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/5/collect2 -plugin 
/usr/lib/gcc/x86_64-linux-gnu/5/liblto_plugin.so 
-plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper 
-plugin-opt=-fresolution=/tmp/ccJLhs3y.res 
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s 
-plugin-opt=-pass-through=-lc

Re: gcc-5: error: -gz is not supported in this configuration

2020-06-09 Thread Fangrui Song

On 2020-06-09, Nick Desaulniers wrote:

On Tue, Jun 9, 2020 at 6:12 AM kernel test robot  wrote:


tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   abfbb29297c27e3f101f348dc9e467b0fe70f919
commit: 10e68b02c861ccf2b3adb59d3f0c10dc6b5e3ace Makefile: support compressed 
debug info
date:   12 days ago
config: x86_64-randconfig-r032-20200609 (attached as .config)
compiler: gcc-5 (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
reproduce (this is a W=1 build):
git checkout 10e68b02c861ccf2b3adb59d3f0c10dc6b5e3ace
# save the attached .config to linux build tree
make W=1 ARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> gcc-5: error: -gz is not supported in this configuration


Hmm...I wonder if the feature detection is incomplete?  I suspect it's
possible to not depend on zlib.


make[2]: *** [scripts/Makefile.build:277: scripts/mod/empty.o] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [Makefile:1169: prepare0] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:185: __sub-make] Error 2


The output of gcc-5 -v --version on that machine may help.  The
convoluted gcc_cv_ld_compress_de logic in gcc/configure.ac may be
related, but I can't find any mistake that our
CONFIG_DEBUG_INFO_COMPRESSED conditions may make.


Re: [PATCH] arm64: disable -fsanitize=shadow-call-stack for big-endian

2020-05-27 Thread Fangrui Song

On 2020-05-27, Arnd Bergmann wrote:

On Wed, May 27, 2020 at 7:28 PM 'Nick Desaulniers' via Clang Built
Linux  wrote:


On Wed, May 27, 2020 at 8:24 AM Mark Rutland  wrote:
>
> On Wed, May 27, 2020 at 03:39:46PM +0200, Arnd Bergmann wrote:
> > clang-11 and earlier do not support -fsanitize=shadow-call-stack
> > in combination with -mbig-endian, but the Kconfig check does not
> > pass the endianess flag, so building a big-endian kernel with
> > this fails at build time:
> >
> > clang: error: unsupported option '-fsanitize=shadow-call-stack' for target 
'aarch64_be-unknown-linux'
> >
> > Change the Kconfig check to let Kconfig figure this out earlier
> > and prevent the broken configuration. I assume this is a bug
> > in clang that needs to be fixed, but we also have to work
> > around existing releases.
> >
> > Fixes: 5287569a790d ("arm64: Implement Shadow Call Stack")
> > Link: https://bugs.llvm.org/show_bug.cgi?id=46076
> > Signed-off-by: Arnd Bergmann 
>
> I suspect this is similar to the patchable-function-entry issue, and
> this is an oversight that we'd rather fix toolchain side.
>
> Nick, Fangrui, thoughts?

Exactly, Fangrui already has a fix: https://reviews.llvm.org/D80647.
Thanks Fangrui!


Ok, great! I had opened the bug first so I could reference it in the
commit changelog, it seems the fix came fast than I managed to
send out the kernel workaround.

Do we still want the kernel workaround anyway to make it work
with older clang versions, or do we expect to fall back to not
use the integrated assembler for the moment?

 Arnd


We can condition it on `CLANG_VERSION >= 11` (assuming Tom (CCed)
is happy (and there is still time) cherrying pick the two commits 
https://bugs.llvm.org/show_bug.cgi?id=46076 to clang 10.0.1)


Re: [PATCH] arm64: fix clang integrated assembler build

2020-05-27 Thread Fangrui Song



On 2020-05-27, 'Nick Desaulniers' via Clang Built Linux wrote:

On Wed, May 27, 2020 at 7:14 AM Arnd Bergmann  wrote:


clang and gas seem to interpret the symbols in memmove.S and
memset.S differently, such that clang does not make them
'weak' as expected, which leads to a linker error, with both
ld.bfd and ld.lld:

ld.lld: error: duplicate symbol: memmove
>>> defined at common.c
>>>kasan/common.o:(memmove) in archive mm/built-in.a
>>> defined at memmove.o:(__memmove) in archive arch/arm64/lib/lib.a

ld.lld: error: duplicate symbol: memset
>>> defined at common.c
>>>kasan/common.o:(memset) in archive mm/built-in.a
>>> defined at memset.o:(__memset) in archive arch/arm64/lib/lib.a

Copy the exact way these are written in memcpy_64.S, which does
not have the same problem.

I don't know why this makes a difference, and it would be good
to have someone with a better understanding of assembler internals
review it.

It might be either a bug in the kernel or a bug in the assembler,
no idea which one. My patch makes it work with all versions of
clang and gcc, which is probably helpful even if it's a workaround
for a clang bug.


+ Bill, Fangrui, Jian
I think we saw this bug or a very similar bug internally around the
ordering of .weak to .global.


This may be another instance of
https://sourceware.org/pipermail/binutils/2020-March/000299.html
https://lore.kernel.org/linuxppc-dev/20200325164257.170229-1-mask...@google.com/

I haven't checked but there may be both a .globl directive and a .weak
directive


Cc: sta...@vger.kernel.org
Signed-off-by: Arnd Bergmann 
---
---
 arch/arm64/lib/memcpy.S  | 3 +--
 arch/arm64/lib/memmove.S | 3 +--
 arch/arm64/lib/memset.S  | 3 +--
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S
index e0bf83d556f2..dc8d2a216a6e 100644
--- a/arch/arm64/lib/memcpy.S
+++ b/arch/arm64/lib/memcpy.S
@@ -56,9 +56,8 @@
stp \reg1, \reg2, [\ptr], \val
.endm

-   .weak memcpy
 SYM_FUNC_START_ALIAS(__memcpy)
-SYM_FUNC_START_PI(memcpy)
+SYM_FUNC_START_WEAK_PI(memcpy)
 #include "copy_template.S"
ret
 SYM_FUNC_END_PI(memcpy)
diff --git a/arch/arm64/lib/memmove.S b/arch/arm64/lib/memmove.S
index 02cda2e33bde..1035dce4bdaf 100644
--- a/arch/arm64/lib/memmove.S
+++ b/arch/arm64/lib/memmove.S
@@ -45,9 +45,8 @@ C_h   .reqx12
 D_l.reqx13
 D_h.reqx14

-   .weak memmove
 SYM_FUNC_START_ALIAS(__memmove)
-SYM_FUNC_START_PI(memmove)
+SYM_FUNC_START_WEAK_PI(memmove)
cmp dstin, src
b.lo__memcpy
add tmp1, src, count
diff --git a/arch/arm64/lib/memset.S b/arch/arm64/lib/memset.S
index 77c3c7ba0084..a9c1c9a01ea9 100644
--- a/arch/arm64/lib/memset.S
+++ b/arch/arm64/lib/memset.S
@@ -42,9 +42,8 @@ dst   .reqx8
 tmp3w  .reqw9
 tmp3   .reqx9

-   .weak memset
 SYM_FUNC_START_ALIAS(__memset)
-SYM_FUNC_START_PI(memset)
+SYM_FUNC_START_WEAK_PI(memset)
mov dst, dstin  /* Preserve return value.  */
and A_lw, val, #255
orr A_lw, A_lw, A_lw, lsl #8
--
2.26.2


--
Thanks,
~Nick Desaulniers

--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clang-built-linux+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clang-built-linux/CAKwvOdnNxj-MdKj3aWoefF2W9PPG-TSeNU4Ym-N8NODJB5Yw_w%40mail.gmail.com.


Re: [PATCH v2 4/4] x86/boot: Check that there are no runtime relocations

2020-05-26 Thread Fangrui Song



On 2020-05-26, Arvind Sankar wrote:

On Tue, May 26, 2020 at 08:11:56AM +0200, Ard Biesheuvel wrote:

On Tue, 26 May 2020 at 00:59, Arvind Sankar  wrote:
>  # Compressed kernel should be built as PIE since it may be loaded at any
>  # address by the bootloader.
> -KBUILD_LDFLAGS += $(call ld-option, -pie) $(call ld-option, 
--no-dynamic-linker)
> +KBUILD_LDFLAGS += -pie $(call ld-option, --no-dynamic-linker)

Do we still need -pie linking with these changes applied?



I think it's currently not strictly necessary -- eg the 64bit kernel
doesn't get linked as pie right now with LLD or old binutils. However,
it is safer to do so to ensure that the result remains PIC with future
versions of the linker. There are linker optimizations that can convert
certain PIC instructions when PIE is disabled. While I think they
currently all focus on eliminating indirection through the GOT (and thus
wouldn't be applicable any more),


There are 3 forms described by x86-64 psABI B.2 Optimize GOTPCRELX Relocations

(1) movq foo@GOTPCREL(%rip), %reg -> leaq foo(%rip), %reg
(2) call *foo@GOTPCREL(%rip) -> nop; call foo
(3) jmp *foo@GOTPCREL(%rip) -> jmp foo; nop

ld.bfd and gold perform (1) even for R_X86_64_GOTPCREL. LLD requires 
R_X86_64_[REX_]GOTPCRELX


it's easy to imagine that they could
get extended to, for eg, convert
leaqfoo(%rip), %rax
to
movl$foo, %eax
with some nop padding, etc.


Not with NOP padding, but probably with instruction prefixes. It is
unclear the rewriting will be beneficial. Rewriting instructions definitely 
requires a
dedicated relocation type like R_X86_64_[REX_]GOTPCRELX.


Also, the relocation check that's being added here would only work with
PIE linking.


Re: [PATCH 0/4] x86/boot: Remove runtime relocations from compressed kernel

2020-05-25 Thread Fangrui Song

On 2020-05-24, Arvind Sankar wrote:

The compressed kernel currently contains bogus runtime relocations in
the startup code in head_{32,64}.S, which are generated by the linker,
but must not actually be processed at runtime.

This generates warnings when linking with the BFD linker, and errors
with LLD, which defaults to erroring on runtime relocations in read-only
sections. It also requires the -z noreloc-overflow hack for the 64-bit
kernel, which prevents us from linking it as -pie on an older BFD linker
(<= 2.26) or on LLD, because the locations that are to be apparently
relocated are only 32-bits in size and so cannot normally have
R_X86_64_RELATIVE relocations.

This series aims to get rid of these relocations. It is based on
efi/next (efi-changes-for-v5.8), where the latest patches touch the
head code to eliminate the global offset table.

The first patch is an independent fix for LLD, to avoid an orphan
section in arch/x86/boot/setup.elf [0].

The second patch gets rid of almost all the relocations. It uses
standard PIC addressing technique for 32-bit, i.e. loading a register
with the address of _GLOBAL_OFFSET_TABLE_ and then using GOTOFF
references to access variables. For 64-bit, there is 32-bit code that
cannot use RIP-relative addressing, and also cannot use the 32-bit
method, since GOTOFF references are 64-bit only. This is instead handled
using a macro to replace a reference like gdt with (gdt-startup_32)
instead. The assembler will generate a PC32 relocation entry, with
addend set to (.-startup_32), and these will be replaced with constants
at link time. This works as long as all the code using such references
lives in the same section as startup_32, i.e. in .head.text.

The third patch addresses a remaining issue with the BFD linker, which
insists on generating runtime relocations for absolute symbols. We use
z_input_len and z_output_len, defined in the generated piggy.S file, as
symbols whose absolute "addresses" are actually the size of the
compressed payload and the size of the decompressed kernel image
respectively. LLD does not generate relocations for these two symbols,
but the BFD linker does. To get around this, piggy.S is extended to also
define two u32 variables (in .rodata) with the lengths, and the head
code is modified to use those instead of the symbol addresses.

An alternative way to handle z_input_len/z_output_len would be to just
include piggy.S in head_{32,64}.S instead of as a separate object file,
since the GNU assembler doesn't generate relocations for symbols set to
constants.

The last patch adds a check in the linker script to ensure that no
runtime relocations get reintroduced. Since the GOT has been eliminated
as well, the compressed kernel has no runtime relocations whatsoever any
more.

[0] https://lore.kernel.org/lkml/20200521152459.558081-1-nived...@alum.mit.edu/

Arvind Sankar (4):
 x86/boot: Add .text.startup to setup.ld
 x86/boot: Remove runtime relocations from .head.text code
 x86/boot: Remove runtime relocations from head_{32,64}.S
 x86/boot: Check that there are no runtime relocations

arch/x86/boot/compressed/Makefile  | 36 +-
arch/x86/boot/compressed/head_32.S | 59 +++
arch/x86/boot/compressed/head_64.S | 99 +++---
arch/x86/boot/compressed/mkpiggy.c |  6 ++
arch/x86/boot/compressed/vmlinux.lds.S | 11 +++
arch/x86/boot/setup.ld |  2 +-
6 files changed, 109 insertions(+), 104 deletions(-)

--
2.26.2


All 4 commits look good.

Reviewed-by: Fangrui Song 


Re: [PATCH 4/4] x86/boot: Check that there are no runtime relocations

2020-05-25 Thread Fangrui Song

On 2020-05-25, Ard Biesheuvel wrote:

On Sun, 24 May 2020 at 23:28, Arvind Sankar  wrote:


Add a linker script check that there are no runtime relocations, and
remove the old one that tries to check via looking for specially-named
sections in the object files.

Drop the tests for -fPIE compiler option and -pie linker option, as they
are available in all supported gcc and binutils versions (as well as
clang and lld).

Signed-off-by: Arvind Sankar 
---
 arch/x86/boot/compressed/Makefile  | 28 +++---
 arch/x86/boot/compressed/vmlinux.lds.S | 11 ++
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index d3e882e855ee..679a2b383bfe 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -27,7 +27,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 
vmlinux.bin.lzma \
vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4

 KBUILD_CFLAGS := -m$(BITS) -O2
-KBUILD_CFLAGS += -fno-strict-aliasing $(call cc-option, -fPIE, -fPIC)
+KBUILD_CFLAGS += -fno-strict-aliasing -fPIE
 KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
 cflags-$(CONFIG_X86_32) := -march=i386
 cflags-$(CONFIG_X86_64) := -mcmodel=small
@@ -49,7 +49,7 @@ UBSAN_SANITIZE :=n
 KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
 # Compressed kernel should be built as PIE since it may be loaded at any
 # address by the bootloader.
-KBUILD_LDFLAGS += $(call ld-option, -pie) $(call ld-option, 
--no-dynamic-linker)
+KBUILD_LDFLAGS += -pie $(call ld-option, --no-dynamic-linker)
 LDFLAGS_vmlinux := -T

 hostprogs  := mkpiggy
@@ -84,30 +84,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
 vmlinux-objs-$(CONFIG_EFI_STUB) += 
$(objtree)/drivers/firmware/efi/libstub/lib.a
 vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o

-# The compressed kernel is built with -fPIC/-fPIE so that a boot loader
-# can place it anywhere in memory and it will still run. However, since
-# it is executed as-is without any ELF relocation processing performed
-# (and has already had all relocation sections stripped from the binary),
-# none of the code can use data relocations (e.g. static assignments of
-# pointer values), since they will be meaningless at runtime. This check
-# will refuse to link the vmlinux if any of these relocations are found.
-quiet_cmd_check_data_rel = DATAREL $@
-define cmd_check_data_rel
-   for obj in $(filter %.o,$^); do \
-   $(READELF) -S $$obj | grep -qF .rel.local && { \
-   echo "error: $$obj has data relocations!" >&2; \
-   exit 1; \
-   } || true; \
-   done
-endef
-
-# We need to run two commands under "if_changed", so merge them into a
-# single invocation.
-quiet_cmd_check-and-link-vmlinux = LD  $@
-  cmd_check-and-link-vmlinux = $(cmd_check_data_rel); $(cmd_ld)
-
 $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
-   $(call if_changed,check-and-link-vmlinux)
+   $(call if_changed,ld)

 OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
 $(obj)/vmlinux.bin: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
b/arch/x86/boot/compressed/vmlinux.lds.S
index d826ab38a8f9..0ac14feacb24 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -11,9 +11,15 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT)
 #ifdef CONFIG_X86_64
 OUTPUT_ARCH(i386:x86-64)
 ENTRY(startup_64)
+
+#define REL .rela
+
 #else
 OUTPUT_ARCH(i386)
 ENTRY(startup_32)
+
+#define REL .rel
+
 #endif

 SECTIONS
@@ -42,6 +48,9 @@ SECTIONS
*(.rodata.*)
_erodata = . ;
}
+   REL.dyn : {
+   *(REL.*)
+   }


Do we really need the macro here? Could we just do


The output section name does not matter: it will be discarded by the linker.


.rel.dyn : { *(.rel.* .rela.*) }


If for some reasons there is at least one SHT_REL and at least one
SHT_RELA, LLD will error "section type mismatch for .rel.dyn", while the
intended diagnostic is the assert below.


(or even

.rel.dyn  : { *(.rel.* }
.rela.dyn : { *(.rela.*) }

if the output section name matters, and always assert that both are empty)?


  .rel.dyn  : { *(.rel.* }
  .rela.dyn : { *(.rela.*) }

looks good to me.


FWIW I intend to add -z rel and -z rela to LLD: 
https://reviews.llvm.org/D80496#inline-738804
(binutils thread https://sourceware.org/pipermail/binutils/2020-May/111244.html)

In case someone builds the x86-64 kernel with -z rel, your suggested
input section description will work out of the box...



.got : {
*(.got)
}
@@ -83,3 +92,5 @@ ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18, 
"Unexpected GOT/PLT en
 #else
 ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc, "Unexpected GOT/PLT 
entries detected!")
 #endif
+
+ASSERT(SIZEOF(REL.dyn) == 0, "Unexpected runtime relocations detected!")
--
2.26.2



Re: [PATCH 2/4] x86/boot: Remove runtime relocations from .head.text code

2020-05-24 Thread Fangrui Song



On 2020-05-24, Arvind Sankar wrote:

On Sun, May 24, 2020 at 03:53:59PM -0700, Fangrui Song wrote:

On 2020-05-24, Arvind Sankar wrote:
>The assembly code in head_{32,64}.S, while meant to be
>position-independent, generates run-time relocations because it uses
>instructions such as
>lealgdt(%edx), %eax
>which make the assembler and linker think that the code is using %edx as
>an index into gdt, and hence gdt needs to be relocated to its run-time
>address.
>
>With the BFD linker, this generates a warning during the build:
>  LD  arch/x86/boot/compressed/vmlinux
>ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only 
section `.head.text'
>ld: warning: creating a DT_TEXTREL in object

Interesting. How does the build generate a warning by default?
Do you use Gentoo Linux which appears to ship a --warn-shared-textrel
enabled-by-default patch? (https://bugs.gentoo.org/700488)


Ah, yes I am using gentoo. I didn't realize that was a distro
modification.


>+
>+/*
>+ * This macro gives the link address of X. It's the same as X, since 
startup_32
>+ * has link address 0, but defining it this way tells the assembler/linker 
that
>+ * we want the link address, and not the run-time address of X. This prevents
>+ * the linker from creating a run-time relocation entry for this reference.
>+ * The macro should be used as a displacement with a base register containing
>+ * the run-time address of startup_32 [i.e. la(X)(%reg)], or as an
>+ * immediate [$ la(X)].
>+ *
>+ * This macro can only be used from within the .head.text section, since the
>+ * expression requires startup_32 to be in the same section as the code being
>+ * assembled.
>+ */
>+#define la(X) ((X) - startup_32)
>+

IIRC, %ebp contains the address of startup_32. la(X) references X
relative to startup_32. The fixup (in GNU as and clang integrated
assembler's term) is a constant which is resolved by the assembler.

There is no R_386_32 or R_386_PC32 for the linker to resolve.


This is incorrect (or maybe I'm not understanding you correctly). X is a
symbol whose final location relative to startup_32 is in most cases not
known to the assembler (there are a couple of cases where X is a label
within .head.text which do get completely resolved by the assembler).

For example, taking the instruction loading the gdt address, this is
what we get from the assembler:
lea la(gdt)(%ebp), %eax
becomes in the object file:
 11:   8d 85 00 00 00 00   lea0x0(%ebp),%eax
13: R_X86_64_PC32   .data+0x23
or a cleaner example using a global symbol:
sublla(image_offset)(%ebp), %ebx
becomes
 41:   2b 9d 00 00 00 00   sub0x0(%ebp),%ebx
43: R_X86_64_PC32   image_offset+0x43

So in general you get PC32 relocations, with the addend being set by the
assembler to .-startup_32, modulo the adjustment for where within the
instruction the displacement needs to be. These relocations are resolved
by the static linker to produce constants in the final executable.



You are right. I am not familiar with the code and only looked at 1b.

Just preprocessed head_64.S and verified many target symbols are in
.data and .pgtable  The assembler converts an expression `foo - 
symbol_defined_in_same_section`
to be `foo - . + offset` which can be encoded as an R_X86_64_PC32 (or
resolved the fixup if it is a constant, e.g. `1b - startup_32`)



Not very sure stating that "since startup_32 has link address 0" is very
appropriate here (probably because I did't see the term "link address"
before). If my understanding above is correct, I think you can just
reword the comment to express that X is referenced relative to
startup_32, which is stored in %ebp.



Yeah, the more standard term is virtual address/VMA, but that sounds
confusing to me with PIE code since the _actual_ virtual address at
which this code is going to run isn't 0, that's just the address assumed
for linking. I can reword it to avoid referencing "link address" but
then it's not obvious why the macro is named "la" :) I'm open to
suggestions on a better name, I could use offset but that's a bit
long-winded. I could just use vma() if nobody else finds it confusing.

Thanks.


With your approach, the important property is that "the distance between
startup_32 and the target symbol is a constant", not that "startup_32
has link address 0".  `ra`, `rva`, `rvma` or any other term which expresses 
"relative"
should work.  Hope someone can come up with a good suggestion:)


Re: [PATCH 1/4] x86/boot: Add .text.startup to setup.ld

2020-05-24 Thread Fangrui Song

On 2020-05-24, Arvind Sankar wrote:

On Sun, May 24, 2020 at 03:13:26PM -0700, Fangrui Song wrote:

On 2020-05-24, Arvind Sankar wrote:
>gcc puts the main function into .text.startup when compiled with -Os (or
>-O2). This results in arch/x86/boot/main.c having a .text.startup
>section which is currently not included explicitly in the linker script
>setup.ld in the same directory.
>
>The BFD linker places this orphan section immediately after .text, so
>this still works. However, LLD git, since [1], is choosing to place it
>immediately after the .bstext section instead (this is the first code
>section). This plays havoc with the section layout that setup.elf
>requires to create the setup header, for eg on 64-bit:
>
>LD  arch/x86/boot/setup.elf
>  ld.lld: error: section .text.startup file range overlaps with .header
>  >>> .text.startup range is [0x200040, 0x2001FE]
>  >>> .header range is [0x2001EF, 0x20026B]
>
>  ld.lld: error: section .header file range overlaps with .bsdata
>  >>> .header range is [0x2001EF, 0x20026B]
>  >>> .bsdata range is [0x2001FF, 0x200398]
>
>  ld.lld: error: section .bsdata file range overlaps with .entrytext
>  >>> .bsdata range is [0x2001FF, 0x200398]
>  >>> .entrytext range is [0x20026C, 0x2002D3]
>
>  ld.lld: error: section .text.startup virtual address range overlaps
>  with .header
>  >>> .text.startup range is [0x40, 0x1FE]
>  >>> .header range is [0x1EF, 0x26B]
>
>  ld.lld: error: section .header virtual address range overlaps with
>  .bsdata
>  >>> .header range is [0x1EF, 0x26B]
>  >>> .bsdata range is [0x1FF, 0x398]
>
>  ld.lld: error: section .bsdata virtual address range overlaps with
>  .entrytext
>  >>> .bsdata range is [0x1FF, 0x398]
>  >>> .entrytext range is [0x26C, 0x2D3]
>
>  ld.lld: error: section .text.startup load address range overlaps with
>  .header
>  >>> .text.startup range is [0x40, 0x1FE]
>  >>> .header range is [0x1EF, 0x26B]
>
>  ld.lld: error: section .header load address range overlaps with
>  .bsdata
>  >>> .header range is [0x1EF, 0x26B]
>  >>> .bsdata range is [0x1FF, 0x398]
>
>  ld.lld: error: section .bsdata load address range overlaps with
>  .entrytext
>  >>> .bsdata range is [0x1FF, 0x398]
>  >>> .entrytext range is [0x26C, 0x2D3]
>
>Explicitly pull .text.startup into the .text output section to avoid
>this.
>
>[1] https://reviews.llvm.org/D75225
>
>Signed-off-by: Arvind Sankar 
>Reviewed-by: Fangrui Song 
>---
> arch/x86/boot/setup.ld | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
>index 24c95522f231..ed60abcdb089 100644
>--- a/arch/x86/boot/setup.ld
>+++ b/arch/x86/boot/setup.ld
>@@ -20,7 +20,7 @@ SECTIONS
>.initdata   : { *(.initdata) }
>__end_init = .;
>
>-   .text   : { *(.text) }
>+   .text   : { *(.text.startup) *(.text) }
>.text32 : { *(.text32) }
>
>. = ALIGN(16);
>--
>2.26.2

Should .text.startup* be used instead? If -ffunction-sections is used,

// a.c
int main() {}

gcc -O2 a.c # .text.startup
gcc -Os a.c # .text.startup

gcc -O2 -ffunction-sections a.c # .text.startup.main
gcc -Os -ffunction-sections a.c # .text.startup.main


It's probably unlikely we'll use function-sections on the setup code,
but *(.text.*) might be more future-proof, since gcc/clang might grow
the ability to stick code into .text.hot or .text.unlikely etc
automatically.


*(.text.*) looks good to me. When you send PATCH v2, feel free to add

(There is indeed no guarantee that future clang FDO will not place it 
.text.unknown. :)

Reviewed-by: Fangrui Song 



-

In case anyone wants to CC a GCC dev for the citation that
  main compiles to `.text.startup` in -Os or -O2 mode, I have a small request
  that `.text.startup.` probably makes more sense. See

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

I made an llvm change recently https://reviews.llvm.org/D79600


Re: [PATCH 4/4] x86/boot: Check that there are no runtime relocations

2020-05-24 Thread Fangrui Song

On 2020-05-24, Arvind Sankar wrote:

Add a linker script check that there are no runtime relocations, and
remove the old one that tries to check via looking for specially-named
sections in the object files.

Drop the tests for -fPIE compiler option and -pie linker option, as they
are available in all supported gcc and binutils versions (as well as
clang and lld).

Signed-off-by: Arvind Sankar 
---
arch/x86/boot/compressed/Makefile  | 28 +++---
arch/x86/boot/compressed/vmlinux.lds.S | 11 ++
2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index d3e882e855ee..679a2b383bfe 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -27,7 +27,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 
vmlinux.bin.lzma \
vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4

KBUILD_CFLAGS := -m$(BITS) -O2
-KBUILD_CFLAGS += -fno-strict-aliasing $(call cc-option, -fPIE, -fPIC)
+KBUILD_CFLAGS += -fno-strict-aliasing -fPIE
KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
cflags-$(CONFIG_X86_32) := -march=i386
cflags-$(CONFIG_X86_64) := -mcmodel=small
@@ -49,7 +49,7 @@ UBSAN_SANITIZE :=n
KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
# Compressed kernel should be built as PIE since it may be loaded at any
# address by the bootloader.
-KBUILD_LDFLAGS += $(call ld-option, -pie) $(call ld-option, 
--no-dynamic-linker)
+KBUILD_LDFLAGS += -pie $(call ld-option, --no-dynamic-linker)
LDFLAGS_vmlinux := -T

hostprogs   := mkpiggy
@@ -84,30 +84,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o

-# The compressed kernel is built with -fPIC/-fPIE so that a boot loader
-# can place it anywhere in memory and it will still run. However, since
-# it is executed as-is without any ELF relocation processing performed
-# (and has already had all relocation sections stripped from the binary),
-# none of the code can use data relocations (e.g. static assignments of
-# pointer values), since they will be meaningless at runtime. This check
-# will refuse to link the vmlinux if any of these relocations are found.
-quiet_cmd_check_data_rel = DATAREL $@
-define cmd_check_data_rel
-   for obj in $(filter %.o,$^); do \
-   $(READELF) -S $$obj | grep -qF .rel.local && { \
-   echo "error: $$obj has data relocations!" >&2; \
-   exit 1; \
-   } || true; \
-   done
-endef
-
-# We need to run two commands under "if_changed", so merge them into a
-# single invocation.
-quiet_cmd_check-and-link-vmlinux = LD  $@
-  cmd_check-and-link-vmlinux = $(cmd_check_data_rel); $(cmd_ld)
-
$(obj)/vmlinux: $(vmlinux-objs-y) FORCE
-   $(call if_changed,check-and-link-vmlinux)
+   $(call if_changed,ld)

OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
$(obj)/vmlinux.bin: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
b/arch/x86/boot/compressed/vmlinux.lds.S
index d826ab38a8f9..0ac14feacb24 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -11,9 +11,15 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT)
#ifdef CONFIG_X86_64
OUTPUT_ARCH(i386:x86-64)
ENTRY(startup_64)
+
+#define REL .rela
+
#else
OUTPUT_ARCH(i386)
ENTRY(startup_32)
+
+#define REL .rel
+
#endif

SECTIONS
@@ -42,6 +48,9 @@ SECTIONS
*(.rodata.*)
_erodata = . ;
}
+   REL.dyn : {
+   *(REL.*)
+   }
.got : {
*(.got)
}
@@ -83,3 +92,5 @@ ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18, 
"Unexpected GOT/PLT en
#else
ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc, "Unexpected GOT/PLT entries 
detected!")
#endif
+
+ASSERT(SIZEOF(REL.dyn) == 0, "Unexpected runtime relocations detected!")
--
2.26.2



`grep -qF .rel.local` from 98f78525371b55ccd1c480207ce10296c72fa340
may be incorrect.. None of these synthesized dynamic relocation sections is
called *.rel.local* ...
(it probably wanted to name .rel.data.rel.ro or .rel.data)


Reviewed-by: Fangrui Song 


Re: [PATCH 3/4] x86/boot: Remove runtime relocations from head_{32,64}.S

2020-05-24 Thread Fangrui Song
at a.s
pushl $z_input_len
% cat b.s
.globl z_input_len
z_input_len = 0xb612
% gcc -m32 -c a.s b.s
% ld.bfd -m elf_i386 -pie a.o b.o  # has an incorrect R_386_RELATIVE before 
binutils 2.35


Reviewed-by: Fangrui Song 


Re: [PATCH 2/4] x86/boot: Remove runtime relocations from .head.text code

2020-05-24 Thread Fangrui Song

On 2020-05-24, Arvind Sankar wrote:

The assembly code in head_{32,64}.S, while meant to be
position-independent, generates run-time relocations because it uses
instructions such as
lealgdt(%edx), %eax
which make the assembler and linker think that the code is using %edx as
an index into gdt, and hence gdt needs to be relocated to its run-time
address.

With the BFD linker, this generates a warning during the build:
 LD  arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only 
section `.head.text'
ld: warning: creating a DT_TEXTREL in object


Interesting. How does the build generate a warning by default?
Do you use Gentoo Linux which appears to ship a --warn-shared-textrel
enabled-by-default patch? (https://bugs.gentoo.org/700488)

% cat a.s
lealgdt(%edx), %eax
% as --32 a.s -o a.o
% ld.bfd -m elf_i386 -shared a.o -z notext # DT_TEXTREL is set. R_386_32

% ld.bfd -m elf_i386 -shared a.o   # on-demand text relocations. 
DT_TEXTREL is set. R_386_32

% ld.bfd -m elf_i386 -shared a.o --warn-shared-textrel
ld.bfd: a.o: warning: relocation against `gdt' in read-only section `.text'
ld.bfd: warning: creating a DT_TEXTREL in a shared object

% ld.bfd -m elf_i386 -shared a.o -z text
ld.bfd: a.o: warning: relocation against `gdt' in read-only section `.text'
ld.bfd: read-only segment has dynamic relocations
## The above is an error. Output is suppressed

lld has only two modes: -z text (default) and -z notext. There is no
on-demand state. By default it will error.

There are feature requests to make -z text default, at least for some
architectures. I just found
https://sourceware.org/bugzilla/show_bug.cgi?id=20824


With lld, Dmitry Golovin reports that this results in a link-time error
with default options (i.e. unless -z notext is explicitly passed):
 LD  arch/x86/boot/compressed/vmlinux
ld.lld: error: can't create dynamic relocation R_386_32 against local
symbol in readonly segment; recompile object files with -fPIC or pass
'-Wl,-z,notext' to allow text relocations in the output

Start fixing this by removing relocations from .head.text:
- On 32-bit, use a base register that holds the address of the GOT and
 reference symbol addresses using @GOTOFF, i.e.
lealgdt@GOTOFF(%edx), %eax


Looks good to me.


- On 64-bit, most of the code can (and already does) use %rip-relative
 addressing, however the .code32 bits can't, and the 64-bit code also
 needs to reference symbol addresses as they will be after moving the
 compressed kernel to the end of the decompression buffer.
 For these cases, reference the symbols as an offset to startup_32 to
 avoid creating relocations, i.e.
leal(gdt-startup_32)(%bp), %eax
 This only works in .head.text as the subtraction cannot be represented
 as a PC-relative relocation unless startup_32 is in the same section
 as the code. Move efi32_pe_entry into .head.text so that it can use
 the same method to avoid relocations.


I have a nit about the startup_32 comment. See below.


Signed-off-by: Arvind Sankar 
---
arch/x86/boot/compressed/head_32.S | 40 +++--
arch/x86/boot/compressed/head_64.S | 95 ++
2 files changed, 77 insertions(+), 58 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S 
b/arch/x86/boot/compressed/head_32.S
index dfa4131c65df..66657bb99aae 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -73,10 +73,10 @@ SYM_FUNC_START(startup_32)
leal(BP_scratch+4)(%esi), %esp
call1f
1:  popl%edx
-   subl$1b, %edx
+   addl$_GLOBAL_OFFSET_TABLE_+(.-1b), %edx

/* Load new GDT */
-   lealgdt(%edx), %eax
+   lealgdt@GOTOFF(%edx), %eax
movl%eax, 2(%eax)
lgdt(%eax)

@@ -89,14 +89,16 @@ SYM_FUNC_START(startup_32)
movl%eax, %ss

/*
- * %edx contains the address we are loaded at by the boot loader and %ebx
- * contains the address where we should move the kernel image temporarily
- * for safe in-place decompression. %ebp contains the address that the kernel
- * will be decompressed to.
+ * %edx contains the address we are loaded at by the boot loader (plus the
+ * offset to the GOT).  The below code calculates %ebx to be the address where
+ * we should move the kernel image temporarily for safe in-place decompression
+ * (again, plus the offset to the GOT).
+ *
+ * %ebp is calculated to be the address that the kernel will be decompressed 
to.
 */

#ifdef CONFIG_RELOCATABLE
-   movl%edx, %ebx
+   lealstartup_32@GOTOFF(%edx), %ebx

#ifdef CONFIG_EFI_STUB
/*
@@ -107,7 +109,7 @@ SYM_FUNC_START(startup_32)
 *  image_offset = startup_32 - image_base
 * Otherwise image_offset will be zero and has no effect on the calculations.
 */
-   sublimage_offset(%edx), %ebx
+   sublimage_offset@GOTOFF(%edx), %ebx
#endif

movlBP_kernel_alignment(%esi), %eax
@@ -124,10 

Re: [PATCH 1/4] x86/boot: Add .text.startup to setup.ld

2020-05-24 Thread Fangrui Song

On 2020-05-24, Arvind Sankar wrote:

gcc puts the main function into .text.startup when compiled with -Os (or
-O2). This results in arch/x86/boot/main.c having a .text.startup
section which is currently not included explicitly in the linker script
setup.ld in the same directory.

The BFD linker places this orphan section immediately after .text, so
this still works. However, LLD git, since [1], is choosing to place it
immediately after the .bstext section instead (this is the first code
section). This plays havoc with the section layout that setup.elf
requires to create the setup header, for eg on 64-bit:

   LD  arch/x86/boot/setup.elf
 ld.lld: error: section .text.startup file range overlaps with .header
 >>> .text.startup range is [0x200040, 0x2001FE]
 >>> .header range is [0x2001EF, 0x20026B]

 ld.lld: error: section .header file range overlaps with .bsdata
 >>> .header range is [0x2001EF, 0x20026B]
 >>> .bsdata range is [0x2001FF, 0x200398]

 ld.lld: error: section .bsdata file range overlaps with .entrytext
 >>> .bsdata range is [0x2001FF, 0x200398]
 >>> .entrytext range is [0x20026C, 0x2002D3]

 ld.lld: error: section .text.startup virtual address range overlaps
 with .header
 >>> .text.startup range is [0x40, 0x1FE]
 >>> .header range is [0x1EF, 0x26B]

 ld.lld: error: section .header virtual address range overlaps with
 .bsdata
 >>> .header range is [0x1EF, 0x26B]
 >>> .bsdata range is [0x1FF, 0x398]

 ld.lld: error: section .bsdata virtual address range overlaps with
 .entrytext
 >>> .bsdata range is [0x1FF, 0x398]
 >>> .entrytext range is [0x26C, 0x2D3]

 ld.lld: error: section .text.startup load address range overlaps with
 .header
 >>> .text.startup range is [0x40, 0x1FE]
 >>> .header range is [0x1EF, 0x26B]

 ld.lld: error: section .header load address range overlaps with
 .bsdata
 >>> .header range is [0x1EF, 0x26B]
 >>> .bsdata range is [0x1FF, 0x398]

 ld.lld: error: section .bsdata load address range overlaps with
 .entrytext
 >>> .bsdata range is [0x1FF, 0x398]
 >>> .entrytext range is [0x26C, 0x2D3]

Explicitly pull .text.startup into the .text output section to avoid
this.

[1] https://reviews.llvm.org/D75225

Signed-off-by: Arvind Sankar 
Reviewed-by: Fangrui Song 
---
arch/x86/boot/setup.ld | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 24c95522f231..ed60abcdb089 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -20,7 +20,7 @@ SECTIONS
.initdata   : { *(.initdata) }
__end_init = .;

-   .text   : { *(.text) }
+   .text   : { *(.text.startup) *(.text) }
.text32 : { *(.text32) }

. = ALIGN(16);
--
2.26.2


Should .text.startup* be used instead? If -ffunction-sections is used,

// a.c
int main() {}

gcc -O2 a.c # .text.startup
gcc -Os a.c # .text.startup

gcc -O2 -ffunction-sections a.c # .text.startup.main
gcc -Os -ffunction-sections a.c # .text.startup.main

-

In case anyone wants to CC a GCC dev for the citation that 
 main compiles to `.text.startup` in -Os or -O2 mode, I have a small request

 that `.text.startup.` probably makes more sense. See

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

I made an llvm change recently https://reviews.llvm.org/D79600


[tip: x86/build] x86/boot: Discard .discard.unreachable for arch/x86/boot/compressed/vmlinux

2020-05-22 Thread tip-bot2 for Fangrui Song
The following commit has been merged into the x86/build branch of tip:

Commit-ID: d6ee6529436a15a0541aff6e1697989ee7dc2c44
Gitweb:
https://git.kernel.org/tip/d6ee6529436a15a0541aff6e1697989ee7dc2c44
Author:Fangrui Song 
AuthorDate:Wed, 20 May 2020 11:20:10 -07:00
Committer: Borislav Petkov 
CommitterDate: Fri, 22 May 2020 12:42:07 +02:00

x86/boot: Discard .discard.unreachable for arch/x86/boot/compressed/vmlinux

With commit

  ce5e3f909fc0 ("efi/printf: Add 64-bit and 8-bit integer support")

arch/x86/boot/compressed/vmlinux may have an undesired .discard.unreachable
section coming from drivers/firmware/efi/libstub/vsprintf.stub.o. That section
gets generated from unreachable() annotations when CONFIG_STACK_VALIDATION is
enabled.

.discard.unreachable contains an R_X86_64_PC32 relocation which will be
warned about by LLD: a non-SHF_ALLOC section (.discard.unreachable) is
not part of the memory image, thus conceptually the distance between a
non-SHF_ALLOC and a SHF_ALLOC is not a constant which can be resolved at
link time:

  % ld.lld -m elf_x86_64 -T arch/x86/boot/compressed/vmlinux.lds ... -o 
arch/x86/boot/compressed/vmlinux
  ld.lld: warning: vsprintf.c:(.discard.unreachable+0x0): has non-ABS 
relocation R_X86_64_PC32 against symbol ''

Reuse the DISCARDS macro which includes .discard.* to drop
.discard.unreachable.

 [ bp: Massage and complete the commit message. ]

Reported-by: kbuild test robot 
Signed-off-by: Fangrui Song 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Tested-by: Arvind Sankar 
Tested-by: Sedat Dilek 
Link: https://lkml.kernel.org/r/20200520182010.242489-1-mask...@google.com
---
 arch/x86/boot/compressed/vmlinux.lds.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
b/arch/x86/boot/compressed/vmlinux.lds.S
index 508cfa6..1031af1 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -73,4 +73,6 @@ SECTIONS
 #endif
. = ALIGN(PAGE_SIZE);   /* keep ZO size page aligned */
_end = .;
+
+   DISCARDS
 }


Re: [PATCH 0/1] x86/boot: lld fix

2020-05-20 Thread Fangrui Song

On 2020-05-20, Arvind Sankar wrote:

On Wed, May 20, 2020 at 06:56:53PM -0400, Arvind Sankar wrote:

arch/x86/boot/setup.elf currently has an orphan section .text.startup,
and lld git as of ebf14d9b6d8b is breaking on 64-bit due to what seems
to be a change in behavior on orphan section placement (details in patch
commit message).

I'm not sure if this was an intentional change in lld, but it seems like
a good idea to explicitly include .text.startup anyway.

Arvind Sankar (1):
  x86/boot: Add .text.startup to setup.ld

 arch/x86/boot/setup.ld | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
2.26.2



I found your PATCH 1/1 on https://lkml.org/lkml/2020/5/20/1491 


-   .text   : { *(.text) }
+   .text   : { *(.text.startup) *(.text) }

The LLD behavior change was introduced in
https://reviews.llvm.org/D75225 (will be included in 11.0.0)
It was intended to match GNU ld.

But yes, orphan section placement is still different in the two linkers.

Placing .text.startup before .text seems good.
In GNU ld's internal linker script (ld --verbose),
.text.startup is placed before .text

Reviewed-by: Fangrui Song 


Re: [PATCH] x86/boot: allow a relocatable kernel to be linked with lld

2020-05-17 Thread Fangrui Song

On 2020-05-16, Dmitry Golovin wrote:

15.05.2020, 21:50, "Borislav Petkov" :


I need more info here about which segment is read-only?

Is this something LLD does by default or what's happening?



Probably should have quoted the original error message:

   ld.lld: error: can't create dynamic relocation R_386_32 against
   symbol: _bss in readonly segment; recompile object files with -fPIC
   or pass '-Wl,-z,notext' to allow text relocations in the output


Do we know where do these R_386_32 come from?

When linking in -shared mode, the linker assumes the image is a shared
object and has undetermined image base at runtime. An absolute
relocation needs a text relocation (a relocation against a readonly
segment).

When neither -z notext nor -z text is specified, GNU ld is in an
indefinite state where it will enable text relocations (DT_TEXTREL
DF_TEXTREL) on demand. It is not considered a good practice for
userspace applications to do this.

Of course the kernel is different... I know little about the kernel,
but if there is a way to make the sections containing R_386_32
relocations writable (SHF_WRITE), that will be a better solution to me.
In LLD, -z notext is like making every section SHF_WRITE.



IOW, don't be afraid to be more verbose in the commit message. :)



Tried both BFD and LLD for linking to understand the difference more and
rewrite the commit message, and came to the conclusion that the patch is
wrong. I will submit v2 when I figure out the correct solution.

Regards,
Dmitry

--
You received this message because you are subscribed to the Google Groups "Clang 
Built Linux" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clang-built-linux+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clang-built-linux/602331589572661%40mail.yandex.ru.


Re: [PATCH] Makefile: support compressed debug info

2020-05-12 Thread Fangrui Song

On 2020-05-12, Nick Desaulniers wrote:

On Mon, May 11, 2020 at 10:54 PM Masahiro Yamada  wrote:


> >On Mon, May 4, 2020 at 5:13 AM Nick Desaulniers
> > wrote:
> >>
> >> As debug information gets larger and larger, it helps significantly save
> >> the size of vmlinux images to compress the information in the debug
> >> information sections. Note: this debug info is typically split off from
> >> the final compressed kernel image, which is why vmlinux is what's used
> >> in conjunction with GDB. Minimizing the debug info size should have no
> >> impact on boot times, or final compressed kernel image size.
> >>
Nick,

I am OK with this patch.

Fangrui provided the minimal requirement for
--compress-debug-sections=zlib


Is it worth recording in the help text?
Do you want to send v2?


Yes I'd like to record that information.  I can also record Sedat's
Tested-by tag.  Thank you for testing Sedat.

I don't know what "linux-image-dbg file" are, or why they would be
bigger.  The size of the debug info is the primary concern with this
config.  It sounds like however that file is created might be
problematic.

Fangrui, I wasn't able to easily find what version of binutils first
added support.  Can you please teach me how to fish?


I actually downloaded https://ftp.gnu.org/gnu/binutils/ archives and
located the sources... I think an easier way is:

% cd binutils-gdb
% git show binutils-2_26:./gas/as.c | grep compress-debug-sections
--compress-debug-sections[={none|zlib|zlib-gnu|zlib-gabi}]\n\
...

GNU as 2.25 only supports --compress-debug-sections which means "zlib-gnu" in
newer versions.

Similarly, for GNU ld:

% git show binutils-2_26:./ld/lexsup.c | grep compress-debug-sections
  --compress-debug-sections=[none|zlib|zlib-gnu|zlib-gabi]\n\

(I have spent a lot of time investigating GNU ld's behavior :)


Another question I had for Fangrui is, if the linker can compress
these sections, shouldn't we just have the linker do it, not the the
compiler and assembler?  IIUC the debug info can contain relocations,
so the linker would have to decompress these, perform relocations,
then recompress these?  I guess having the compiler and assembler
compress the debug info as well would minimize the size of the .o
files on disk.


The linker will decompress debug info unconditionally. Because
input .debug_info sections need to be concatenated to form the output
.debug_info . Whether the output .debug_info is compressed is controlled
by the linker option --compress-debug-sections=zlib, which is not
affected by the compression state of object files.

Both GNU as and GNU ld name the option --compress-debug-sections=zlib.
In a compiler driver context, an unfamiliar user may find
-Wa,--compress-debug-sections=zlib -Wl,--compress-debug-sections=zlib
confusing:/


Otherwise I should add this flag to the assembler invocation, too, in
v2.  Thoughts?


Compressing object files along with the linked output should be fine. It
can save disk space. (It'd be great if you paste the comparison
with and w/o object files compressed)

Feel free to add:

Reviewed-by: Fangrui Song 


I have a patch series that enables dwarf5 support in the kernel that
I'm working up to.  I wanted to send this first.  Both roughly reduce
the debug info size by 20% each, though I haven't measured them
together, yet.  Requires ToT binutils because there have been many
fixes from reports of mine recently.


This will be awesome! I also heard that enabling DWARF v5 for our object
files can easily make debug info size smaller by 20%. Glad that the
kernel can benefit it as well:)


Re: [PATCH] arm64: disable patchable function entry on big-endian clang builds

2020-05-06 Thread Fangrui Song

On 2020-05-06, Nick Desaulniers wrote:

On Wed, May 6, 2020 at 8:46 AM 'Fangrui Song' via Clang Built Linux
 wrote:

Created https://reviews.llvm.org/D79495 to allow the function attribute
'patchable_function_entry' on aarch64_be.
I think -fpatchable-function-entry= just works.

Note, LLD does not support aarch64_be
(https://github.com/ClangBuiltLinux/linux/issues/380).


I've approved the patch. Thanks for the quick fix.  Looks like we
backported -fpatchable-function-entry= to the clang-10 release, so we
should cherry pick your fix to the release-10 branch for the clang
10.1 release.

I'd rather have this fixed on the toolchain side.


+1.

Cherry picked to release/10.x
https://github.com/llvm/llvm-project/commit/98f9f73f6d2367aa8001c4d16de9d3b347febb08
I did not use any endianness-sensitive in the original implementation,
so hopefully this will not run into issues.


The scheduled rc1 of LLVM 10.0.1 will happen on May 18, 2020
(https://lists.llvm.org/pipermail/llvm-dev/2020-April/141128.html)
We should be quick if we want to test it on qemu or real hardware.


Re: [PATCH] arm64: disable patchable function entry on big-endian clang builds

2020-05-06 Thread Fangrui Song

On 2020-05-06, Nathan Chancellor wrote:

On Wed, May 06, 2020 at 12:22:58PM +0200, Arnd Bergmann wrote:

On Wed, May 6, 2020 at 5:45 AM Nathan Chancellor
 wrote:
> On Tue, May 05, 2020 at 07:42:43PM +0200, Torsten Duwe wrote:
> > On Tue, 5 May 2020 15:25:56 +0100 Mark Rutland  wrote:
> > > On Tue, May 05, 2020 at 04:12:36PM +0200, Arnd Bergmann wrote:
> > > This practically rules out a BE distro kernel with things like PAC,
> > > which isn't ideal.
>
> To be fair, are there big endian AArch64 distros?
>
> https://wiki.debian.org/Arm64Port: "There is also a big-endian version
> of the architecture/ABI: aarch64_be-linux-gnu but we're not supporting
> that in Debian (so there is no corresponding Debian architecture name)
> and hopefully will never have to. Nevertheless you might want to check
> for this by way of completeness in upstream code."
>
> OpenSUSE and Fedora don't appear to have support for big endian.

I don't think any of the binary distros ship big endian ARM64. There are
a couple of users that tend to build everything from source using Yocto
or similar embedded distros, but as far as I can tell this is getting less
common over time as applications get ported to be compatible with
big-endian, or get phased out and replaced by code running on regular
little-endian systems.

The users we see today are likely in telco, military or aerospace
settings (While earth is mostly little-endian these days, space is
definitely big-endian) that got ported from big-endian hardware, but
often with a high degree of customization and long service life.


Ah yes, that makes sense, thanks for the information and background.
Helps orient myself for the future.


My policy for Arm specific kernel code submissions is generally that
it should be written so it can work on either big-endian or little-endian
using the available abstractions (just like any architecture independent
code), but I don't normally expect it to be tested on big endian.

There are some important examples of code that just doesn't work
on big-endian because it's far too hard, e.g. the UEFI runtime services.
That is also ok, if anyone really needs it, they can do the work.

> > I suggest to get a quote from clang folks first about their schedule and
> > regarded importance of patchable-function-entries on BE, and leave it as
> > is: broken on certain clang configurations. It's not the kernel's fault.
>
> We can file an upstream PR (https://bugs.llvm.org) to talk about this
> (although I've CC'd Fangrui) but you would rather the kernel fail to
> work properly than prevent the user from being able to select that
> option? Why even have the "select" or "depends on" keyword then?


Created https://reviews.llvm.org/D79495 to allow the function attribute
'patchable_function_entry' on aarch64_be.
I think -fpatchable-function-entry= just works.

Note, LLD does not support aarch64_be
(https://github.com/ClangBuiltLinux/linux/issues/380).


I definitely want all randconfig kernels to build without warnings,
and I agree with you that making it just fail at build time is not
a good solution.

> That said, I do think we should hold off on this patch until we hear
> from the LLVM developers.

+1

  Arnd


Glad we are on the same page.

Cheers,
Nathan


Re: [PATCH] Makefile: support compressed debug info

2020-05-04 Thread Fangrui Song



On 2020-05-04, Sedat Dilek wrote:

On Mon, May 4, 2020 at 5:13 AM Nick Desaulniers
 wrote:


As debug information gets larger and larger, it helps significantly save
the size of vmlinux images to compress the information in the debug
information sections. Note: this debug info is typically split off from
the final compressed kernel image, which is why vmlinux is what's used
in conjunction with GDB. Minimizing the debug info size should have no
impact on boot times, or final compressed kernel image size.

All of the debug sections will have a `C` flag set.
$ readelf -S 

$ bloaty vmlinux.gcc75.compressed.dwarf4 -- \
vmlinux.gcc75.uncompressed.dwarf4

FILE SIZEVM SIZE
 --  --
  +0.0% +18  [ = ]   0[Unmapped]
 -73.3%  -114Ki  [ = ]   0.debug_aranges
 -76.2% -2.01Mi  [ = ]   0.debug_frame
 -73.6% -2.89Mi  [ = ]   0.debug_str
 -80.7% -4.66Mi  [ = ]   0.debug_abbrev
 -82.9% -4.88Mi  [ = ]   0.debug_ranges
 -70.5% -9.04Mi  [ = ]   0.debug_line
 -79.3% -10.9Mi  [ = ]   0.debug_loc
 -39.5% -88.6Mi  [ = ]   0.debug_info
 -18.2%  -123Mi  [ = ]   0TOTAL

$ bloaty vmlinux.clang11.compressed.dwarf4 -- \
vmlinux.clang11.uncompressed.dwarf4

FILE SIZEVM SIZE
 --  --
  +0.0% +23  [ = ]   0[Unmapped]
 -65.6%-871  [ = ]   0.debug_aranges
 -77.4% -1.84Mi  [ = ]   0.debug_frame
 -82.9% -2.33Mi  [ = ]   0.debug_abbrev
 -73.1% -2.43Mi  [ = ]   0.debug_str
 -84.8% -3.07Mi  [ = ]   0.debug_ranges
 -65.9% -8.62Mi  [ = ]   0.debug_line
 -86.2% -40.0Mi  [ = ]   0.debug_loc
 -42.0% -64.1Mi  [ = ]   0.debug_info
 -22.1%  -122Mi  [ = ]   0TOTAL



Hi Nick,

thanks for the patch.

I have slightly modified it to adapt to Linux v5.7-rc4 (what was your base?).

Which linker did you use and has it an impact if you switch from
ld.bfd to ld.lld?


lld has supported the linker option --compress-debug-sections=zlib since
about 5.0.0 (https://reviews.llvm.org/D31941)


I tried a first normal run and in a 2nd one with
CONFIG_DEBUG_INFO_COMPRESSED=y both with clang-10 and ld.lld-10.

My numbers (sizes in MiB):

[ diffconfig ]

$ scripts/diffconfig /boot/config-5.7.0-rc4-1-amd64-clang
/boot/config-5.7.0-rc4-2-amd64-clang
BUILD_SALT "5.7.0-rc4-1-amd64-clang" -> "5.7.0-rc4-2-amd64-clang"
+DEBUG_INFO_COMPRESSED y

[ compiler and linker ]

$ clang-10 -v
ClangBuiltLinux clang version 10.0.1
(https://github.com/llvm/llvm-project
92d5c1be9ee93850c0a8903f05f36a23ee835dc2)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/dileks/src/llvm-toolchain/install/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/10
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/10
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64

$ ld.lld-10 -v
LLD 10.0.1 (https://github.com/llvm/llvm-project
92d5c1be9ee93850c0a8903f05f36a23ee835dc2) (compatible with GNU
linkers)

[ sizes vmlinux ]

$ du -m 5.7.0-rc4-*/vmlinux*
409 5.7.0-rc4-1-amd64-clang/vmlinux
7   5.7.0-rc4-1-amd64-clang/vmlinux.compressed
404 5.7.0-rc4-1-amd64-clang/vmlinux.o
324 5.7.0-rc4-2-amd64-clang/vmlinux
7   5.7.0-rc4-2-amd64-clang/vmlinux.compressed
299 5.7.0-rc4-2-amd64-clang/vmlinux.o

[ readelf (.debug_info as example) ]

$ readelf -S vmlinux.o
 [33] .debug_info   PROGBITS   01d6a5e8
  06be1ee6     0 0 1

$ readelf -S vmlinux.o
 [33] .debug_info   PROGBITS   01749f18
  02ef04d2     C   0 0 1 <---
XXX: "C (compressed)" Flag

Key to Flags:
 W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
 L (link order), O (extra OS processing required), G (group), T (TLS),
 C (compressed), x (unknown), o (OS specific), E (exclude),
 l (large), p (processor specific)

[ sizes linux-image debian packages ]

$ du -m 5.7.0-rc4-*/linux-image*.deb
47  
5.7.0-rc4-1-amd64-clang/linux-image-5.7.0-rc4-1-amd64-clang_5.7.0~rc4-1~bullseye+dileks1_amd64.deb
424 
5.7.0-rc4-1-amd64-clang/linux-image-5.7.0-rc4-1-amd64-clang-dbg_5.7.0~rc4-1~bullseye+dileks1_amd64.deb
47  
5.7.0-rc4-2-amd64-clang/linux-image-5.7.0-rc4-2-amd64-clang_5.7.0~rc4-2~bullseye+dileks1_amd64.deb
771 
5.7.0-rc4-2-amd64-clang/linux-image-5.7.0-rc4-2-amd64-clang-dbg_5.7.0~rc4-2~bullseye+dileks1_amd64.deb

[ sizes linux-git dir (compilation finished ]

5.7.0-rc4-1-amd64-clang: 17963   /home/dileks/src/linux-kernel/linux
5.7.0-rc4-2-amd64-clang: 14328   /home/dileks/src/linux-kernel/linux

[ xz compressed linux-image-dbg packages ]

$ file 

Re: [PATCH v4 4/5] MIPS: VDSO: Use $(LD) instead of $(CC) to link VDSO

2020-04-28 Thread Fangrui Song



On 2020-04-28, Nathan Chancellor wrote:

Currently, the VDSO is being linked through $(CC). This does not match
how the rest of the kernel links objects, which is through the $(LD)
variable.

When clang is built in a default configuration, it first attempts to use
the target triple's default linker then the system's default linker,
unless told otherwise through -fuse-ld=... We do not use -fuse-ld=
because it can be brittle and we have support for invoking $(LD)
directly. See commit fe00e50b2db8c ("ARM: 8858/1: vdso: use $(LD)
instead of $(CC) to link VDSO") and commit 691efbedc60d2 ("arm64: vdso:
use $(LD) instead of $(CC) to link VDSO") for examples of doing this in
the VDSO.

Do the same thing here. Replace the custom linking logic with $(cmd_ld)
and ldflags-y so that $(LD) is respected. We need to explicitly add two
flags to the linker that were implicitly passed by the compiler:
-G 0 (which comes from ccflags-vdso) and --eh-frame-hdr.

Before this patch (generated by adding '-v' to VDSO_LDFLAGS):

/libexec/gcc/mips64-linux/9.3.0/collect2 \
-plugin /libexec/gcc/mips64-linux/9.3.0/liblto_plugin.so \
-plugin-opt=/libexec/gcc/mips64-linux/9.3.0/lto-wrapper \
-plugin-opt=-fresolution=/tmp/ccGEi5Ka.res \
--eh-frame-hdr \
-G 0 \
-EB \
-mips64r2 \
-shared \
-melf64btsmip \
-o arch/mips/vdso/vdso.so.dbg.raw \
-L/lib/gcc/mips64-linux/9.3.0/64 \
-L/lib/gcc/mips64-linux/9.3.0 \
-L/lib/gcc/mips64-linux/9.3.0/../../../../mips64-linux/lib \
-Bsymbolic \
--no-undefined \
-soname=linux-vdso.so.1 \
-EB \
--hash-style=sysv \
--build-id \
-T arch/mips/vdso/vdso.lds \
arch/mips/vdso/elf.o \
arch/mips/vdso/vgettimeofday.o \
arch/mips/vdso/sigreturn.o

After this patch:

/bin/mips64-linux-ld \
-m elf64btsmip \
-Bsymbolic \
--no-undefined \
-soname=linux-vdso.so.1 \
-EB \
-nostdlib \
-shared \
-G 0 \
--eh-frame-hdr \
--hash-style=sysv \
--build-id \
-T  arch/mips/vdso/vdso.lds \
arch/mips/vdso/elf.o \
arch/mips/vdso/vgettimeofday.o
arch/mips/vdso/sigreturn.o \
-o arch/mips/vdso/vdso.so.dbg.raw

Note that we leave behind -mips64r2. Turns out that ld ignores it (see
get_emulation in ld/ldmain.c). This is true of current trunk and 2.23,
which is the minimum supported version for the kernel:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/ldmain.c;hb=aa4209e7b679afd74a3860ce25659e71cc4847d5#l593
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/ldmain.c;hb=a55e30b51bc6227d8d41f707654d0a5620978dcf#l641

Before this patch, LD=ld.lld did nothing:

$ llvm-readelf -p.comment arch/mips/vdso/vdso.so.dbg | sed 's/(.*//'
String dump of section '.comment':
[ 0] ClangBuiltLinux clang version 11.0.0

After this patch, it does:

$ llvm-readelf -p.comment arch/mips/vdso/vdso.so.dbg | sed 's/(.*//'
String dump of section '.comment':
[ 0] Linker: LLD 11.0.0
[62] ClangBuiltLinux clang version 11.0.0

Link: https://github.com/ClangBuiltLinux/linux/issues/785
Signed-off-by: Nathan Chancellor 
---

v3 -> v4:

* Improve commit message to show that ld command is effectively the
 same as the one generated by GCC.

* Add '-G 0' and '--eh-frame-hdr' because they were added by GCC.


My understanding is that we start to use more -fasynchronous-unwind-tables to 
eliminate .eh_frame in object files.
Without .eh_frame, LD --eh-frame-hdr is really not useful.


Sigh...  -G 0. This is an option ignored by LLD. GCC devs probably should
have used the long option --gpsize rather than take the short option -G.
Even better, -z gpsize= or similar if this option is specific to ELF.


v2 -> v3:

* New patch.

arch/mips/vdso/Makefile | 13 -
1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile
index 92b53d1df42c3..2e64c7600eead 100644
--- a/arch/mips/vdso/Makefile
+++ b/arch/mips/vdso/Makefile
@@ -60,10 +60,9 @@ ifdef CONFIG_MIPS_DISABLE_VDSO
endif

# VDSO linker flags.
-VDSO_LDFLAGS := \
-   -Wl,-Bsymbolic -Wl,--no-undefined -Wl,-soname=linux-vdso.so.1 \
-   $(addprefix -Wl$(comma),$(filter -E%,$(KBUILD_CFLAGS))) \
-   -nostdlib -shared -Wl,--hash-style=sysv -Wl,--build-id
+ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
+   $(filter -E%,$(KBUILD_CFLAGS)) -nostdlib -shared \
+   -G 0 --eh-frame-hdr --hash-style=sysv --build-id -T

CFLAGS_REMOVE_vdso.o = -pg

@@ -82,11 +81,7 @@ quiet_cmd_vdso_mips_check = VDSOCHK $@
#

quiet_cmd_vdsold_and_vdso_check = LD  $@
-  cmd_vdsold_and_vdso_check = $(cmd_vdsold); $(cmd_vdso_check); 
$(cmd_vdso_mips_check)
-
-quiet_cmd_vdsold = VDSO$@
-  cmd_vdsold = $(CC) $(c_flags) $(VDSO_LDFLAGS) \
-   -Wl,-T $(filter %.lds,$^) $(filter %.o,$^) -o $@
+  cmd_vdsold_and_vdso_check = $(cmd_ld); $(cmd_vdso_check); 
$(cmd_vdso_mips_check)

quiet_cmd_vdsoas_o_S = AS  $@
  cmd_vdsoas_o_S = $(CC) $(a_flags) -c -o $@ $<
--
2.26.2