On Sun, May 31, 2020 at 04:40 Nadav Har'El <n...@scylladb.com> wrote:

>
> On Sat, May 30, 2020 at 5:25 PM Waldek Kozaczuk <jwkozac...@gmail.com>
> wrote:
>
>> On my Ubuntu 20.04 applying the patch, I am attaching below, shrinks the
>> kernel.elf from 6.8MB down to 5MB (on Fedora 32 from 6.7MB down to 5MB, and
>> older Ubuntu 19.04 [gcc 8.3] from 6.2MB down to 4.9MB). And this is before
>> applying any GCC optimizations I was suggesting in
>> https://groups.google.com/forum/#!topic/osv-dev/hCYGRzytaJ4 and
>> modularization ideas in
>> https://groups.google.com/forum/#!topic/osv-dev/BHwN6Stm3n4) (except
>> hiding C++ library).
>>
>> diff --git a/Makefile b/Makefile
>> index 20ddf3b1..8caac77b 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -1873,10 +1873,11 @@ $(out)/kernel.elf: $(stage1_targets)
>> arch/$(arch)/loader.ld $(out)/empty_bootfs.
>>       --defsym=OSV_KERNEL_VM_BASE=$(kernel_vm_base)
>> --defsym=OSV_KERNEL_VM_SHIFT=$(kernel_vm_shift) \
>>   -Bdynamic --export-dynamic --eh-frame-hdr --enable-new-dtags
>> -L$(out)/arch/$(arch) \
>>       $(^:%.ld=-T %.ld) \
>> +     --version-script=/tmp/version_script \
>>       --whole-archive \
>> -       $(libstdc++.a) $(libgcc_eh.a) \
>> +       $(libgcc_eh.a) \
>>         $(boost-libs) \
>>
>
> If you don't --whole-archive the C++ library, why do it for the even more
> esoteric Boost libraries? And gcc_eh? Maybe you could save even more?
>
Sure. Do you know what is the purpose of libgcc_eh.a and what was the
original reason of linking it ‘—whole-archive’? And if we don’t, would we
have to add some other .so file to the image like with C++ standard library?

>
> -     --no-whole-archive $(libgcc.a), \
>> +     --no-whole-archive $(libstdc++.a) $(libgcc.a), \
>>   LINK kernel.elf)
>>   $(call quiet, $(STRIP) $(out)/kernel.elf -o
>> $(out)/kernel-stripped.elf, STRIP kernel.elf -> kernel-stripped.elf )
>>   $(call very-quiet, cp $(out)/kernel-stripped.elf $(out)/kernel.elf)
>> diff --git a/core/elf.cc b/core/elf.cc
>> index ffb16004..830b2577 100644
>> --- a/core/elf.cc
>> +++ b/core/elf.cc
>> @@ -1310,7 +1310,6 @@ program::program(void* addr)
>>            "libpthread.so.0",
>>            "libdl.so.2",
>>            "librt.so.1",
>> -          "libstdc++.so.6",
>>            "libaio.so.1",
>>            "libxenstore.so.3.0",
>>            "libcrypt.so.1",
>> diff --git a/scripts/generate_version_script.sh
>> b/scripts/generate_version_script.sh
>> new file mode 100755
>> index 00000000..607c71fa
>> --- /dev/null
>> +++ b/scripts/generate_version_script.sh
>> @@ -0,0 +1,42 @@
>> +#!/bin/bash
>> +
>> +VERSION_SCRIPT_FILE=$1
>> +
>> +extract_symbols_from_library()
>> +{
>> +  local LIB_NAME=$1
>> +  local LIB_PATH=$(gcc -print-file-name=$LIB_NAME)
>> +  echo "/*------- $LIB_PATH */" >> $VERSION_SCRIPT_FILE
>> +  nm -g -CD --defined-only $LIB_PATH | awk '// { printf("    %s;\n",$3)
>> }' >> $VERSION_SCRIPT_FILE
>> +}
>> +
>> +
>> +VERSION_SCRIPT_START=$(cat <<"EOF"
>> +{
>> +  global:
>> +EOF
>> +)
>> +
>> +VERSION_SCRIPT_END=$(cat <<"EOF"
>> +  local:
>> +    *;
>> +};
>> +EOF
>> +)
>> +
>> +echo "$VERSION_SCRIPT_START" > $VERSION_SCRIPT_FILE
>> +
>> +extract_symbols_from_library "libresolv.so.2"
>> +extract_symbols_from_library "libc.so.6"
>> +extract_symbols_from_library "libm.so.6"
>> +extract_symbols_from_library "ld-linux-x86-64.so.2"
>> +extract_symbols_from_library "libpthread.so.0"
>> +extract_symbols_from_library "libdl.so.2"
>> +extract_symbols_from_library "librt.so.1"
>> +extract_symbols_from_library "libcrypt.so.1"
>> +extract_symbols_from_library "libaio.so.1"
>> +extract_symbols_from_library "libxenstore.so.3.0"
>> +#extract_symbols_from_library "libboost_system.so"
>> +
>> +echo "$VERSION_SCRIPT_END" >> $VERSION_SCRIPT_FILE
>> +#-fvisibility=hidden -fvisibility-inlines-hidden
>> diff --git a/scripts/manifest_from_host.sh b/scripts/manifest_from_host.sh
>> index 71b3340d..82b47d1a 100755
>> --- a/scripts/manifest_from_host.sh
>> +++ b/scripts/manifest_from_host.sh
>> @@ -61,7 +61,7 @@ output_manifest()
>>   echo "# Dependencies" | tee -a $OUTPUT
>>   echo "# --------------------" | tee -a $OUTPUT
>>   lddtree $so_path | grep -v "not found" | grep -v "$so_path" | grep -v
>> 'ld-linux-x86-64' | \
>> - grep -Pv
>> 'lib(gcc_s|resolv|c|m|pthread|dl|rt|stdc\+\+|aio|xenstore|crypt|selinux)\.so([\d.]+)?'
>> | \
>> + grep -Pv
>> 'lib(gcc_s|resolv|c|m|pthread|dl|rt|aio|xenstore|crypt|selinux)\.so([\d.]+)?'
>> | \
>>   sed 's/ =>/:/' | sed 's/^\s*lib/\/usr\/lib\/lib/' | sort | uniq | tee
>> -a $OUTPUT
>>  }
>>
>> @@ -135,7 +135,7 @@ if [[ -d $NAME_OR_PATH ]]; then
>>   echo "# Dependencies" | tee -a $OUTPUT
>>   echo "# --------------------" | tee -a $OUTPUT
>>   lddtree $SO_FILES | grep -v "not found" | grep -v
>> "$NAME_OR_PATH/$SUBDIRECTORY_PATH" | grep -v 'ld-linux-x86-64' | \
>> - grep -Pv
>> 'lib(gcc_s|resolv|c|m|pthread|dl|rt|stdc\+\+|aio|xenstore|crypt|selinux)\.so([\d.]+)?'
>> | \
>> + grep -Pv
>> 'lib(gcc_s|resolv|c|m|pthread|dl|rt|aio|xenstore|crypt|selinux)\.so([\d.]+)?'
>> | \
>>   sed 's/ =>/:/' | sed 's/^\s*lib/\/usr\/lib\/lib/' | sort | uniq | tee
>> -a $OUTPUT
>>   fi
>>   exit 0
>>
>>
>> In essence, this PoC patch hides most symbols as well as
>> links libstdc++.a with '--no-whole-archive' (please note it only applies to
>> kernel.elf not loader.elf for now). I have successfully tested both ZFS and
>> ROFS images with native-example, java-example (running '/usr/bin/java -cp
>> /java-example Hello !' so that it does not use our java wrapper),
>> golang-pie-example and redis-memonly.
>>
>> You will notice that it adds a new script - 
>> scripts/generate_version_script.sh
>> - to generate limited number of symbols to be visible from kernel.elf.
>> Feel free to offer suggestions for doing it better (other options for nm or
>> using readelf).
>>
>
> What does this "version script" do if it finds a symbol in the host
> library which we never implemented?
>
>From my experiments it looks like it simply ignores such missing symbol,
which is what we want. As a matter of fact the generated version-script
contains over 4000 symbols but we end up with only a little 1,300 ones
exported in kernel.elf. I also noticed that if you compile individual
sources with  '-fvisibility=hidden -fvisibility-inlines-hidden' and have
relevant symbols listed in the version script files, it is too late in the
process and they do not get exported. If we relied on  '-fvisibility=hidden
-fvisibility-inlines-hidden' and added proper annotation to keep what we
want as public it would be pretty painful to do so.

>
> I'm a bit worried that this approach makes use depend even more on the
> setup on the host - if there's a function not defined there, it also won't
> be available from OSv. On the other hand, it may be a valid approach at
> least as an *option*, while we're still considering if this is a good idea
> or not.
>
In this demo patch (PoC) I am assuming we generate version script
dynamically as the part of the build process. But instead we could use the
same or similar build script generate_version_script.sh to create the
version file once and check it into source control. We would probably have
to create some sort of superset version file and run this tool against
library files from Ubuntu and Fedora host in case those libraries (like
"libc.so.6) contain different symbols. The downside would be that we would
have to manually add new symbols to this checked in version file
(effectively maintain it), but maybe it is not so bad given it would be
single file. What do you think?

>
>

>
>>
>> Here is the output from bloaty before applying my patch:
>>
>>     FILE SIZE        VM SIZE
>>  --------------  --------------
>>   55.8%  3.74Mi  51.3%  3.74Mi    .text
>>   10.4%   714Ki   9.6%   714Ki    .dynstr
>>    0.0%       0   8.1%   605Ki    .bss
>>    8.7%   597Ki   8.0%   597Ki    .eh_frame
>>    6.8%   467Ki   6.3%   467Ki    .rodata
>>    5.8%   397Ki   5.3%   397Ki    .dynsym
>>    2.8%   192Ki   2.6%   192Ki    .percpu
>>    2.1%   146Ki   2.0%   146Ki    .gnu.hash
>>    1.9%   130Ki   1.7%   130Ki    .hash
>>    1.9%   130Ki   1.7%   130Ki    .eh_frame_hdr
>>    1.5%   102Ki   1.4%   102Ki    .gcc_except_table
>>    1.2%  80.1Ki   1.1%  80.0Ki    .data.rel.ro
>>    0.4%  26.7Ki   0.4%  26.7Ki    .data
>>    0.2%  10.4Ki   0.2%  12.5Ki    [LOAD #0 [RWX]]
>>    0.2%  12.3Ki   0.2%  12.2Ki    .tracepoint_patch_sites
>>    0.2%  10.6Ki   0.1%  10.6Ki    .data.rel.local
>>    0.1%  6.98Ki   0.1%  3.95Ki    [47 Others]
>>    0.1%  5.61Ki   0.1%  5.55Ki    .data.rel
>>    0.0%  2.22Ki   0.0%  2.16Ki    .init_array
>>    0.0%  1.99Ki   0.0%       0    .shstrtab
>>    0.0%       0   0.0%  1.73Ki    .tbss
>>  100.0%  6.71Mi 100.0%  7.30Mi    TOTAL
>>
>> and number of exported symbols:
>> nm -g -CD --defined-only build/release/kernel.elf | wc -l
>> 16960
>>
>>
>> After hiding most symbols but C++ library still whole-archive:
>>     FILE SIZE        VM SIZE
>>  --------------  --------------
>>   69.1%  3.74Mi  62.3%  3.74Mi    .text
>>    0.0%       0   9.8%   605Ki    .bss
>>   10.8%   597Ki   9.7%   597Ki    .eh_frame
>>    8.4%   467Ki   7.6%   467Ki    .rodata
>>    3.5%   192Ki   3.1%   192Ki    .percpu
>>    2.3%   130Ki   2.1%   130Ki    .eh_frame_hdr
>>    1.9%   102Ki   1.7%   102Ki    .gcc_except_table
>>    1.4%  80.1Ki   1.3%  80.0Ki    .data.rel.ro
>>    0.6%  31.6Ki   0.5%  31.6Ki    .dynsym
>>    0.5%  26.7Ki   0.4%  26.7Ki    .data
>>    0.2%  12.5Ki   0.2%  14.6Ki    [LOAD #0 [RWX]]
>>    0.2%  12.3Ki   0.2%  12.2Ki    .tracepoint_patch_sites
>>    0.2%  12.3Ki   0.2%  12.2Ki    .dynstr
>>
>
> Unsurprisingly, most of the savings seem to come from .dynsym and .dynstr.
> Which makes me think if we can't get most of this savings by just hiding
> specific very long symbol names (perhaps a lot of instances of some C++
> template? Or what?), making more implementation details "static" or
> "visibility hidden" - without trying to hide almost everything except what
> we want to deliberately allow.
>
I am afraid it would be more painful. I think we still would want to
compile with visibility hidden as much as possible, but having central
version file that lists all symbols we export is really better because we
would de facto define the "interface" for the apps.

>
>    0.2%  10.6Ki   0.2%  10.6Ki    .data.rel.local
>>    0.2%  10.3Ki   0.2%  10.3Ki    .gnu.hash
>>    0.2%  9.36Ki   0.2%  9.30Ki    .hash
>>    0.1%  6.58Ki   0.1%  3.54Ki    [47 Others]
>>    0.1%  5.61Ki   0.1%  5.55Ki    .data.rel
>>    0.0%  2.22Ki   0.0%  2.16Ki    .init_array
>>    0.0%  1.99Ki   0.0%       0    .shstrtab
>>    0.0%       0   0.0%  1.73Ki    .tbss
>>  100.0%  5.42Mi 100.0%  6.01Mi    TOTAL
>>
>> nm -g -CD --defined-only build/release/kernel.elf | wc -l
>> 1339
>>
>> Full patch:
>>     FILE SIZE        VM SIZE
>>  --------------  --------------
>>   68.5%  3.42Mi  61.3%  3.42Mi    .text
>>    0.0%       0  10.6%   604Ki    .bss
>>   10.6%   541Ki   9.5%   541Ki    .eh_frame
>>    8.9%   454Ki   7.9%   454Ki    .rodata
>>    3.8%   192Ki   3.4%   192Ki    .percpu
>>    2.3%   119Ki   2.1%   119Ki    .eh_frame_hdr
>>    1.8%  90.1Ki   1.6%  90.0Ki    .gcc_except_table
>>    1.5%  74.6Ki   1.3%  74.5Ki    .data.rel.ro
>>    0.6%  31.6Ki   0.6%  31.6Ki    .dynsym
>>    0.5%  26.7Ki   0.5%  26.7Ki    .data
>>    0.2%  11.9Ki   0.2%  14.1Ki    [LOAD #0 [RWX]]
>>    0.2%  12.3Ki   0.2%  12.2Ki    .tracepoint_patch_sites
>>    0.2%  12.3Ki   0.2%  12.2Ki    .dynstr
>>    0.2%  10.6Ki   0.2%  10.6Ki    .data.rel.local
>>    0.2%  10.3Ki   0.2%  10.3Ki    .gnu.hash
>>    0.2%  9.36Ki   0.2%  9.30Ki    .hash
>>    0.1%  6.08Ki   0.1%  3.48Ki    [39 Others]
>>    0.1%  5.61Ki   0.1%  5.55Ki    .data.rel
>>    0.0%  2.20Ki   0.0%  2.13Ki    .init_array
>>    0.0%       0   0.0%  1.73Ki    .tbss
>>    0.0%  1.62Ki   0.0%       0    .shstrtab
>>  100.0%  5.00Mi 100.0%  5.59Mi    TOTAL
>>
>> nm -g -CD --defined-only build/release/kernel.elf | wc -l
>> 1339
>>
>> I wonder if it would be beneficial to compile as many sources as possible
>> with '-fvisibility=hidden -fvisibility-inlines-hidden' - it will not make
>> the kernel much smaller if any, but may produce more optimized code
>> sometimes, no?
>>
>
> Yes, this similar to what I said above. It will make the kernel smaller if
> we don't use this version file thing.
> I don't know if it's something we should do at a per-source-file level or
> perhaps (but this will take a lot of work) as a default and just enable
> visibility of specific things.
>
We should try to do it as much as possible but still use single version
file to define all symbols exported.

>
>
>>
>> We still need to figure out what exactly to do with internal apps like
>> cpiod, httpserver and tests that need some of the hidden symbols available.
>> Possibly expose those extra symbols as C.
>>
>
> Note that it's fine to expose C++ symbols as in <osv/power.hh> - which
> have mangled names and even namespaces. The problems start when such
> functions take C++ types (even something as simple as std::string) or throw
> exceptions. Those require the C++ on both kernel and application to be the
> same one :-(
>
> Anyway, you're probably right that whatever new APIs add should probably
> be in C, not C++. In the beginning we thought we would have a lot of those,
> but it seems this will never happen. We already started converting - or at
> least just wrapping - some of the new APIs to C because we needed to use
> them in C code - see include/osv/osv_c_wrappers.h and I think there's even
> more I don't remember.
>
Agreed.

>
>
>
>> Waldek
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "OSv Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to osv-dev+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/osv-dev/147d66a0-dcc3-41f8-b94f-71d761dbb98e%40googlegroups.com
>> <https://groups.google.com/d/msgid/osv-dev/147d66a0-dcc3-41f8-b94f-71d761dbb98e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CAL9cFfNFgEZ83k_Jvcs-7ykPQHBoM-hdhL%3DQp5aPrzd%2BaWzcOg%40mail.gmail.com.

Reply via email to