Added a zipped archive of diff files referenced by this email.

On Sunday, August 2, 2020 at 1:32:49 AM UTC-4 Waldek Kozaczuk wrote:

> This is possibly one of the longest emails I have ever written. So bear 
> with me ...
>
> After all the recent patches, I have submitted that replaced many source 
> files under libc/ (~70) and headers under include/api/ (~25) with their 
> identical copies under musl/ folder, there is still a lot of stuff left to 
> do something about. And this is not going to be an easy walk but then after 
> some substantial research I have done, it seems to be doable. In general, 
> in order to ease the upgrade to the newer version of musl (probably 
> 1.1.24), we should first prepare (clean, prune) existing code under 
> include/api and libc/ and at the same time identify files that do not need 
> any change (should not be affected by upgrade).
>
> Before we agree on the exact steps I am proposing later (with some 
> outstanding questions), let me first do an inventory of what is left under 
> include/api and libc/.
>
> So at this point, we have:
>
>    - 93 header files under include/api - (find include/api/ -type f -name 
>    \*.h | wc -l)
>       - I think most of those originate from musl but were manually 
>       changed. I hope most of them with some effort can be replaced with the 
>       current or new version of musl - any other ideas?
>       - Out of those, 19 files under include/api/aarch64 for sure come 
>       from newer version fo musl so until the upgrade they should stay as is
>    - under libc/ (total of 215 files)
>       - 15 C header files
>       - 153 C source files 
>       - 6 C++ header files
>       - 41 C++ source files
>    
> Now let me break those down into following buckets:
>
>    - The files I believe *do NOT need to change *with musl upgrade of 
>    because they originate from somewhere else or were written from scratch 
>    (mostly in C++); see some of my comments/questions next to each one [total 
>    of 56 files]
>       - 
>       - misc/realpath.c
>       - vdso/vdso.c // Somehow get rid of it
>       - process/execle.c
>       - process/execve.cc
>       - process/waitpid.cc
>       - unistd/getppid.c
>       - unistd/getsid.c
>       - unistd/getpgid.c
>       - unistd/getpgrp.c
>       - unistd/sethostname.c
>       - math/finite.c
>       - math/finitef.c
>       - math/finitel.c
>       - math/aliases.c // SEE if there is a way to somehow use aliases 
>       without mdifying musl or new musl has those aliases
>       - misc/uname.c
>       - linux/makedev.c // some relationship to musl macros
>       - errno/strerror.c // small file originally taken from musl but 
>       does not look like the one in musl anymore
>       - libc/libc.hh // depends on "internal/libc.h"
>       - libc/pthread.hh
>       - libc/pipe_buffer.hh
>       - libc/network/__dns.hh
>       - libc/signal.hh
>       - libc/arch/x64/ucontext/ucontext.cc
>       - libc/misc/backtrace.cc
>       - libc/misc/mntent.cc
>       - libc/misc/mntent.cc
>       - libc/network/__dns.cc // Where did it come from? Does anything 
>       have/should come/get upgraded from musl or other place?
>       - libc/process/execve.cc
>       - libc/process/waitpid.cc
>       - libc/stdio/__stdout_write.cc
>       - libc/stdio/printf-hooks.cc
>       - libc/unistd/setpgid.cc - just a stub
>       - libc/unistd/setsid.cc - just a stub
>       - libc/unistd/sync.cc - just a stub
>       - libc/af_local.cc
>       - libc/cxa_thread_atexit.cc
>       - libc/cpu_set.cc
>       - libc/dlfcn.cc
>       - libc/eventfd.cc
>       - libc/malloc_hooks.cc
>       - libc/mallopt.cc
>       - libc/mman.cc
>       - libc/mount.cc
>       - libc/notify.cc
>       - libc/pipe.cc
>       - libc/pipe_buffer.cc
>       - libc/pthread.cc
>       - libc/pthread_barrier.cc
>       - libc/resource.cc
>       - libc/sem.cc
>       - libc/shm.cc
>       - libc/signal.cc
>       - libc/time.cc
>       - libc/timerfd.cc
>       - libc/user.cc
>       - libc/arch/x64/setjmp/sigrtmin.c -> musl/src/signal/sigrtmin.c // 
>       there does not seem to be aarch64 version, but OSv has
>       - The family of *_chk() function that provide small layer of glibc 
>    compability and *should NOT need to change*. Should we move them to 
>    libc/glibc-compat dir (there is already include/glibc-compat directory)?, 
>    total of 17
>       - 
>       - __read_chk.c
>       - stdio/__fprintf_chk.c
>       - stdio/__fread_chk.c
>       - stdio/__vfprintf_chk.c
>       - string/__stpcpy_chk.c
>       - string/__strcpy_chk.c
>       - string/__wcscpy_chk.c
>       - string/__memcpy_chk.c
>       - string/__strcat_chk.c
>       - string/__memset_chk.c
>       - string/__strncat_chk.c
>       - string/__explicit_bzero_chk.c
>       - string/__memmove_chk.c
>       - string/__strncpy_chk.c
>       - __pread64_chk.cc -> why is it a C++ function?
>       - internal/_chk_fail.cc
>       - misc/__longjmp_chk.cc -> why is it a C++ function?
>       - Also, consider moving out __realpath_chk() out of misc/realpath.c 
>       to its own file.
>       - Files that are different from current musl equivalent ones but 
>    should most likely be replaced in Makefile with the musl ones (see the 
>    point_to_musl_as_is.diff attachment):
>       - string/strsignal.c - the current musl one seems to be newer and 
>       most up to date
>       - no signficant differences it seems
>          - internal/floatscan.h
>          - internal/floatscan.c
>          - internal/intscan.h
>          - internal/intscan.c
>          - internal/shgetc.c
>       - possibly can be replaced with musl ones
>          - prng/random.c
>          - network/gai_strerror.cc -> ../musl/src/network/gai_strerror.c
>          - misc/lockf.cc ../musl/src/misc/lockf.c // why is it C++? seems 
>          to have minimal changes
>       - Files under arch (21 files) I have not idea if they are subject 
>    to upgrade
>       - 
>       - libc/arch/aarch64/setjmp/sigsetjmp.s
>       - libc/arch/aarch64/setjmp/sigrtmin.c
>       - libc/arch/aarch64/setjmp/block.c
>       - libc/arch/aarch64/setjmp/longjmp.s
>       - libc/arch/aarch64/setjmp/siglongjmp.c
>       - libc/arch/aarch64/setjmp/sigrtmax.c
>       - libc/arch/aarch64/setjmp/setjmp.s
>       - libc/arch/aarch64/atomic.h
>       - libc/arch/x64/ucontext/getcontext.s
>       - libc/arch/x64/ucontext/ucontext.cc
>       - libc/arch/x64/ucontext/start_context.s
>       - libc/arch/x64/ucontext/setcontext.s
>       - libc/arch/x64/setjmp/sigsetjmp.s
>       - libc/arch/x64/setjmp/sigrtmin.c
>       - libc/arch/x64/setjmp/block.c
>       - libc/arch/x64/setjmp/longjmp.s
>       - libc/arch/x64/setjmp/siglongjmp.c
>       - libc/arch/x64/setjmp/sigrtmax.c
>       - libc/arch/x64/setjmp/setjmp.s
>       - libc/arch/x64/atomic.h
>       - libc/arch/arm/src/__aeabi_atexit.c
>    - locale/ (26 files) - I think most of those come from older musl and 
>    can be replaced with their newer copies in current musl; I think 
>    libc/locale have two files for each family of functions - for example 
>    libc/locale/strcoll.c, libc/locale/strcoll_l.c vs single 
>    musl/src/locale/strcoll.c; see locale.diff attachment for details and this 
>    musl commit as an example - 
>    
> https://git.musl-libc.org/cgit/musl/diff/?id=4b0306c83c8c3614afbaf18a18e22d24f335ea04
>       - 
>       - 
>       - locale/strcoll_l.c
>       - locale/wcsftime_l.c
>       - locale/iswctype_l.c
>       - locale/freelocale.c
>       - locale/wcsxfrm_l.c
>       - locale/uselocale.c
>       - locale/setlocale.c
>       - locale/strtold_l.c
>       - locale/wcsxfrm.c
>       - locale/strfmon.c
>       - locale/strcoll.c
>       - locale/wcscoll.c
>       - locale/strxfrm.c
>       - locale/strtof_l.c
>       - locale/towlower_l.c
>       - locale/towupper_l.c
>       - locale/toupper_l.c
>       - locale/nl_langinfo_l.c
>       - locale/langinfo.c
>       - locale/strtod_l.c
>       - locale/strftime_l.c
>       - locale/wctype_l.c
>       - locale/wcscoll_l.c
>       - locale/duplocale.c
>       - locale/strxfrm_l.c
>    - time/ - time functios that I think come from older musl but were not 
>    changed after; with little effort can be replaced (point to) musl 
>    originals; see the time.diff attachment and this musl commit - 
>    
> https://git.musl-libc.org/cgit/musl/diff/?id=1cc81f5cb0df2b66a795ff0c26d7bbc4d16e13c6
>    - 
>       - time/__tm_to_time.c
>       - time/__time_to_tm.c
>       - time/timegm.c
>       - time/__asctime.c
>       - time/__time.h
>       - time/mktime.c
>       - time/strftime.c
>       - time/tzset.c
>       - time/wcsftime.c
>       - time/localtime_r.c
>       - time/gmtime_r.c
>       - time/localtime.c
>       - time/gmtime.c
>       - time/ftime.c
>    - network/ - the diffs to musl
>    - 
>       - network/if_indextoname.c - SYS_close replaced with close() and 
>       socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) replaced with 
> socket(AF_UNIX, 
>       SOCK_DGRAM, 0)
>       - network/gai_strerror.cc - ?
>       - network/__ipparse.c - includes __dns.hh instead of __dns.h
>       - network/if_nameindex.c - socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 
>       0) replaced with socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC, 0);
>       - network/gethostbyname_r.c - I think it originates from musl and 
>       is identical except for extra gethostbyname() that can probably be 
> moved to 
>       a separate file?
>       - network/inet_addr.c - does NOT exist in musl
>       - network/getifaddrs.c - completely different from musl copy
>       - network/getnameinfo.c - seems to originate from muls but has 
>       extra code and maybe has to stay as is just like getaddrinfo()
>       - network/res_init.c - only diff: weak_alias(res_init, __res_init)
>       - network/if_nametoindex.c - similar to if_indextoname (see above)
>       - network/__dns.hh/__dns.cc - should stay as is 
>       - network/getaddrinfo.c - seems to originate from muls but quite 
>       different from the current musl version so maybe should stay as is
>       - network/inet_ntop.c - originates from musl and has not changed 
>       since; but different from current musl
>       - network/inet_aton.c - does not exist in musl but the only commit 
>       says it was taken from musl-libc
>    - unistd/ - only files that possibly are subject to upgrade
>       - ttyname_r.c - implemented by Rean for node js and I do not think 
>       it comes from musl
>       - ttyname.c is - almost identical to an existing file in musl (see 
>       attached diff) and delegates to ttyname_r() maybe we should point to 
> musl 
>       one
>       - I think we have a unit tests for both
>    - stdlib/
>       - 
>       - wcstol.c - besides including different headers it replaces f.lock 
>       = -1 with f.no_locking = true (what is this exactly about?)
>       - strtol.c - similar as wcstol.c
>       - strtod.c - similar as wcstol.c
>       - qsort_r.c - does not exist in current musl (I checked in 1.1.24) 
>       so maybe should stay as is
>    - string/ - (_chk.c files and others not subject to upgrade skipped)
>    - 
>       - string/memset.c - seems identical but memset_base() vs memset() - 
>       what is it about?
>       - string/memmove.c - this is OSv specific so probably should stay 
>       as is
>       - string/strtok_r.c - seems identical except for '__restrict' vs 
>       'restrict' and #undef
>       - string/stresep.c - per commit seems to come from netbsd (
>       ftp.tku.edu.tw/NetBSD/NetBSD-current/src/lib/libc/string/stresep.c) 
>       -> should stay as is
>       - string/__strndup.c  - not in musl , seems to come from somewhere 
>       else
>       - string/rawmemchr.c - not from musl, where does it come from?
>       - string/strerror_r.c - originates from musl but seems to be 
>       completely different from current musl copy; modified to be both POSIX 
> and 
>       glibc compliant
>       - string/memcpy.c - implementation seems to be OSv specfic, so 
>       maybe is not subject to musl upgrade
>       - string/strlcat.c - slightly different from current musl but has 
>       not been modfied since original import -> probably we should use musl 
>       version
>       - string/explicit_bzero.c - not from musl it seems; glibc?
>       - stdio - this one is the largest in terms of changes (but maybe 
>    not) and biggest mess
>    - 
>       - fmemopen.c - has extra 'if (!libc.threaded) f->lock = -1;' to 
>       deal with some thread related issue - see commit 
>       3f053b8f950be4b89fcb928f956ddc31ba8ba68a
>       - ftrylockfile.c - pthread_self()->tid vs mutex_owned(&f->mutex)
>       - __stdio_read.c - SYS_readv vs readv() mostly
>       - open_wmemstream.c - extra 'if (!libc.threaded) f->lock = -1;' 
>       just like fmemopen.c
>       - __fdopen.c - SYS_fcntl vs fcntl() and SYS_ioctl vs ioctl() mostly
>       - __stdio_close.c - SYS_close vs close()
>       - vfwscanf.c - quite different fromm current musl but seems to not 
>       have been modified for OSv purposes > so maybe just needs to be 
> replaced in 
>       makefile to musl copy?
>       - vdprintf.c - .lock = -1 vs .no_locking = true (what is this 
>       about?)
>       - vfprintf.c - the only difference is handling of L prefix for long 
>       long int
>       - __stdio_seek.c - SYS_lseek vs lseek()
>       - tmpnam.c - SYS_access vs access() and SYS_clock_gettime vs 
>       clock_gettime()
>       - getc.c - 'if (f->lock < 0 || !__lockfile(f))' vs 'if 
>       (f->no_locking || !__lockfile(f))
>       - fopen.c - SYS_open vs open() and SYS_close vs close()
>       - vsnprintf.c - .lock = -1 vs .no_locking = true plus extra 
>       handling of some overflow condition
>       - flockfile.c - seems to have been rewritten for OSv, ftrylockfile 
>       vs mutex_owned() etc
>       - vfscanf.c - seems to originate from musl but per couple of old 
>       commits from 2013 certian features have been disabled; should we 
> repoint ot 
>       tp current musl?
>       - vswscanf.c - '.lock = -1' vs '.no_locking = true'
>       - stdin.c - modified for all kinds of reasons
>       - freopen.c - SYS_fcntl vs fcntl() and __dup3() vs dup3()
>       - sscanf.c - extra weak_alias for __isoc99_sscanf
>       - tmpfile.c - SYS_open vs open() and SYS_close vs unlink()
>       - stdout.c - extra '.lock = -1' - what is this about?
>       - open_memstream.c - extra 'if (!libc.threaded) f->lock = -1;'
>       - putc.c - 'if (f->lock < 0 || !__lockfile(f))' vs 'if 
>       (f->no_locking || !__lockfile(f))'
>       - __lockfile.c - seems to have been heavily adapted for OSv purposes
>       - remove.c - SYS_unlink vs unlink() plus some POSIX compliance 
>       changes
>       - fputc.c -  'if (f->lock < 0 || !__lockfile(f))' vs 'if 
>       (f->no_locking || !__lockfile(f))'
>       - __stdio_write.c - SYS_writev vs writev()
>       - stderr.c - extra '.lock = -1,'
>       - funlockfile.c - includes "pthread_impl.h"
>       - fgetc.c -  'if (f->lock < 0 || !__lockfile(f))' vs 'if 
>       (f->no_locking || !__lockfile(f))'
>       - vsscanf.c - '.read = do_read, .lock = -1' vs '.read = do_read, 
>       .no_locking = true,' plus week_alias for __isoc99_vsscanf
>       - vswprintf.c  - 'f.lock = -1;' vs 'f.no_locking = true;'
>       - __fopen_rb_ca.c - SYS_open vs open() plus 'f->lock = -1;' (what 
>       is this about?)
>    
> From above, It seems there are 3 categories of files under libc/:
>
>    1. Brand new implementations of libc functionality mostly in C++ that 
>    will not need to change as part of musl upgrade
>    2. Glibc compatibility or implementations not from musl; like above 
>    they will not need to change as part of musl upgrade
>    3. Files that originate from musl but need to be adapted for OSv 
>    specific reasons
>
>
> To help us with the upgrade we should add more unit tests. To that extent, 
> I have found musl unit test - https://wiki.musl-libc.org/libc-test.html and 
> possibly some unit tests from bionic (
> https://android.googlesource.com/platform/bionic/+/refs/heads/master/tests/stdio_test.cpp
>  or 
>
> https://android.googlesource.com/platform/bionic/+/refs/heads/master/tests/stdlib_test.cpp).
>  
> The bionic ones would need to be adapted but the musl ones could be used 
> as-is.
>
> To minimize effort for future upgrade I was wondering if the following 
> ideas could help us:
> - use some preprocessor macros "magic" to replace for syscall() calls with 
> regular functions - it seems like many files under libc are original musl 
> source except for these type of changes
> - in cases, we are forced to modify musl originals, maybe instead of 
> storing modified files under libc/ like we have now, we should store the 
> *.patch files instead and apply as part of new build process step; would it 
> be better?
>
> So I think the upgrade plan should be this: 
> 1. Add more unit tests
> 2. Prune/clean libc to minimize the set of files that are part of category 
> 3 (see above)
> 3. Do the upgrade (per Nadav's original plan):
>
>    - Bring the latest Musl version into a "musl-1.1.24" subdirectory in 
>    OSv.
>    - Leave the old "musl/" and "libc/" directories as well.
>    - In Makefile, make a new list of objects (say, "nmusl") which will 
>    take code from musl-1.1.24/ instead of musl/.
>    - Start to switch individual files and directories from "musl += ..." 
>    to "nmusl += ...".  We can start with the math functions needed for 
>    aarch64. Eventually, everything can be converted to nmusl and hopefully, 
>    nothing or little will break and need to be fixed.
>    - Replace our include/api with musl-1.1.24/include. Would be even 
>    better to drop include/api, and just use musl-1.1.5/include directly if we 
>    can. I think, though, this is low priority, and might take quite a bit of 
>    work ("git log include/api" shows we modified this quite a bit since we 
>    took it from musl). I'm hoping that the old header files will work 
>    correctly also for the newer musl.
>       - I think this might be tricky one at this point
>    
>
> What do you think?
>
> Waldek
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/32333b6e-25e8-47a0-8be8-a6fb04d795bdn%40googlegroups.com.

Attachment: categories_to_zip.tar.gz
Description: GNU Zip compressed data

Reply via email to