[osv-dev] Librarization/Modularization

2020-05-25 Thread Waldek Kozaczuk
Hi,

I am going to be sending proper "Next Release Proposal" email later this 
week (or next) and "Librarization/Modularization" will be a key part of it. 
Currently, OSv kernel provides quite a significant subset of the 
functionality of some standard Linux libraries listed here - 
https://github.com/cloudius-systems/osv#kernel-size. In reality, many 
applications do not need all of this functionality, but they "get it" 
whether they need it or not. Even Java, which used to need lots of symbols 
from standard libraries, has become way more modular, and with the advent 
of GraalVM and other AOT-type technologies, OSv kernel does not need to 
provide all this functionality universally to every app. Worse, if you run 
an app on Firecracker which needs console, non-PCI virtio-blk and 
virtio-net drivers only, one gets all other drivers including ones for 
VirtualBox, Xen, VMware, etc. This actually makes OSv barely a unikernel or 
at best a "fat" one. This has some real negative consequences - higher 
memory utilization (kernel needs to be loaded in memory), larger kernel 
file (makes decompression longer), and poorer security because of the 
fairly vast number of exported symbols (at this moment everything 
non-static gets exported) and finally possibly less optimized code. On the 
other hand, because of this "universality", it is *quite easy*, comparing 
to other unikernels, to run an arbitrary Linux app on OSv. And no matter 
what we do to make OSv more modular, we should preserve that "ease" and not 
make it harder, at least by default, to run an app on OSv.

So in general, what I am advocating for, is an ability (and a mechanism) to 
create more "stripped-down" versions of kernels *tailored to the need of 
specific app and/or specific hypervisor *OSv will run on while preserving 
the default universal kernel. And also shrinking the universal kernel by 
*extracting 
optional functionality* from it, where it makes sense and is relatively 
easy to do so, as a shared library to be loaded during the boot process. 
The latter should also ideally involve the build process (compile/link) 
optimizations I have already proposed in my other email I sent a week ago 
to the group - 
https://groups.google.com/d/msg/osv-dev/hCYGRzytaJ4/D23S_ibNAgAJ

In the end, what I am proposing could be organized in the following three 
categories:

   1. Tailor kernel (and really drivers) to a specific hypervisor - this 
   could be as simple as defining more granular sets of targets in the main 
   makefile and adding #ifdef in all relevant places and possibly using 
   existing ./conf/*.mk - based mechanism; for starters we could define a 
   build configuration for Firacracker and QEMU microvm machine that I believe 
   requires the same small subset of drivers.
   2. Extract optional functionality into shared libraries - this is more 
   difficult than the above. One example of such functionality is ZFS and 
   there is already an open issue - 
   https://github.com/cloudius-systems/osv/issues/1009. Some drivers could 
   be extracted as libraries as well but it might be more difficult to do so. 
   The main difficulty here is that there needs to be a filesystem mounted 
   early enough in the boot process to load such a library from - bootfs (less 
   attractive as it is part of loader.elf/kernel.elf) or RoFS. 
   3. Create a mechanism to build a smaller kernel "tailored" to a specific 
   app. This would require some sort of ELF analyzer tool that would identify 
   all symbols needed by the given app and its dependencies and create a 
   version script file defining specific set of symbols to be exported from 
   kernel. To achieve that we could start with addressing the issue - 
   https://github.com/cloudius-systems/osv/issues/97 - "Be more selective 
   on symbols exported from the kernel" - that could deliver such a generic 
   solution.

Addressing 3) could help us with another issue - 
https://github.com/cloudius-systems/osv/issues/821 - "Combining 
pre-compiled OSv kernel with pre-compiled executable". To that end, we 
could also consider creating a mechanism that would let us build a 
stripped-down version of the kernel with functionality exposed through 
SYSCALL instruction only and no built-in musl (except for dynamic linker 
function (dlopen, etc)) and libc and let one mix in original pre-built musl 
library which would interact with kernel through those SYSCALL calls. This 
would require probably exposing more functions as SYSCALL than we have now 
in linux.cc - at least brk and clone. I am not sure if that is even 
feasible but I think I think at least one of the unikernels does just this 
- Hermitux.

I am also leaning more and more toward hiding C++ library - this should 
help us with 821 and there is at least one case - dotnet apps - that 
require an incompatible version of libstdc++.so. This would impact existing 
internal C++ apps like cpiod and httpserver as we would have to add 
libstdc++.so to the 

Re: [osv-dev] Using PERCPU in application or module

2020-05-25 Thread Wonsup Yoon
Thank you for the response.

Yes, dynamic_percpu is perfect for my purpose.

However, I encountered another issue.

If I use dynamic_percpu with preempt-lock (I think it is very common 
pattern), it abort due to assertion failed.
It seems lazy binding prevents preemption lock.
So, I had to add -fno-plt option, and it works.



example code)

#include 
#include 

#include 
#include 

struct counter {
int x = 0;

void inc(){
x += 1;
}

int get(){
return x;
}
};

dynamic_percpu c;

int main(int argc, char *argv[])
{
SCOPE_LOCK(preempt_lock);
c->inc();

return 0;
}


Backtrace)

[backtrace]
0x4023875a <__assert_fail+26>
0x4035860c 
0x40358669 
0x4039e2ef 
0x1000f333 
0x4042a47c 
0x40224bd0 
0x4042a628 
0x40462715 
0x403fac86 
0x4039f632 




2020년 5월 24일 일요일 오후 5시 26분 17초 UTC+9, Nadav Har'El 님의 말:
>
>
> On Sat, May 23, 2020 at 6:35 PM Wonsup Yoon  > wrote:
>
>> Hi,
>>
>> I'm trying to use PERCPU macro in application or module.
>>
>
> Hi,
>
> The PERCPU macro does not support this. What it does is to add information 
> about this variable in a special section of the executable (".percpu"), 
> then arch/x64/loader.ld makes sure all these entries will be together 
> between "_percpu_start" and "_percpu_end", and finally sched.cc for every 
> CPU creates (in the cpu::cpu(id) constructor) a copy of this data. So if a 
> loadable module (share library) contains another per-cpu variable, it never 
> gets added to the percpu area.
>
> However, I believe we do have a mechanism that will suite you: 
> *dynamic_percpu*.
> You can create (and destroy) such an object of type dynamic_percpu at 
> any time, and it does the right thing:  The variable will be allocated on 
> all CPUs when the object is created, will be allocated on new cpus if those 
> happen, and will be freed when the object is destroyed.
> In your case you can have a global dynamic_percpu variable in your 
> loadable module. This object will be created when the module is loaded, and 
> destroyed when the module is unloaded - which is what you want.
>
> Nadav.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/07f76c69-0448-4a97-b587-995f7dbafe58%40googlegroups.com.


[osv-dev] Re: [PATCH v2 4/4] virtio-fs: refactor driver / fs

2020-05-25 Thread Fotis Xenakis

>
> Hi,
>
> I think both 3 and 4 parts of your patches look good. But I guess it would 
> not hurt if Nadav could scrutinize at them from C++ perspective as well.
>
> However, I am having a bit of trouble testing those. I do not think 
> anything is wrong with your patches but possibly something has changed on 
> QEMU side virtiofs implementation or something else.
>
> I am not Ubuntu 20.04 and using your QEMU branch - fix-dax-fd,
>
> Whenever I run virtiofsd and QEMU and I get the error:
>
> ./virtiofsd --socket-path=/tmp/vhostqemu -o 
> source=/home/wkozaczuk/projects/osv/apps/native-example -o cache=always -d
> [2240367089675] [ID: 00018845] virtio_session_mount: Waiting for 
> vhost-user socket connection...
> [2249948371337] [ID: 00018845] virtio_session_mount: Received vhost-user 
> socket connection
> [2249949209337] [ID: 0001] tmpdir(virtiofsd-nuZoui): Permission denied
>
> /home/wkozaczuk/projects/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
> -m 2G \
> -smp 4 \
> -vnc :1 \
> -gdb tcp::1234,server,nowait \
> -kernel /home/wkozaczuk/projects/osv/build/last/kernel.elf \
> -append "--mount-fs=virtiofs,/dev/virtiofs1,/tmp/virtiofs 
> /tmp/virtiofs/hello" \
> -device virtio-blk-pci,id=blk0,drive=hd0,scsi=off \
> -drive 
> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native
>  
> \
> -chardev socket,id=char0,path=/tmp/vhostqemu \
> -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs \
> -object memory-backend-file,id=mem,size=2G,mem-path=/dev/shm,share=on \
> -numa node,memdev=mem \
> -netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 \
> -device virtio-net-pci,netdev=un0 \
> -device virtio-rng-pci \
> -enable-kvm \
> -cpu host,+x2apic \
> -chardev stdio,mux=on,id=stdio,signal=off \
> -mon chardev=stdio,mode=readline \
> -device isa-serial,chardev=stdio
>
> qemu-system-x86_64: -device 
> vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs: Failed to read 
> msg header. Read 0 instead of 12. Original request 1.
> qemu-system-x86_64: -device 
> vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs: vhost_dev_init 
> failed: Operation not permitted
>
> I know virtiofs is a moving target so I must be missing some settings, it 
> looks like a permission error. Any ideas?
>
Truth is I am getting the exact same errors and I haven't yet wrapped my 
head around the security / sandboxing functionality of virtiofsd, so 
unfortunately no ideas on this (apart from that it *seems* to originate in 
virtiofsd, and what fails is the mkdtemp call here 

).
Not porud to say this, but I have been running with elevated privileges 
locally. I will reach out to the virtio-fs devs to ask though and hopefully 
come back with a proper answer!
Thank you for pointing this out

>
> Regards,
> Waldek
>
> On Sunday, May 17, 2020 at 5:52:01 AM UTC-4, Fotis Xenakis wrote:
>>
>> Since in virtio-fs the filesystem is very tightly coupled with the 
>> driver, this tries to make clear the dependence of the first on the 
>> second, as well as simplify. 
>>
>> This includes: 
>> - The definition of fuse_request is moved from the fs to the driver, 
>>   since it is part of the interface it provides. Also, it is enhanced 
>>   with methods, somewhat promoting it to a "proper" class. 
>> - fuse_strategy, as a redirection to the driver is removed and instead 
>>   the dependence on the driver is made explicit. 
>> - Last, virtio::fs::fs_req is removed and fuse_request is used in its 
>>   place, since it offered no value with fuse_request now defined in the 
>>   driver. 
>>
>> Signed-off-by: Fotis Xenakis  
>> --- 
>>  drivers/virtio-fs.cc   | 42 +- 
>>  drivers/virtio-fs.hh   | 27 +++--- 
>>  fs/virtiofs/virtiofs_i.hh  | 24 ++- 
>>  fs/virtiofs/virtiofs_vfsops.cc | 17 +++--- 
>>  fs/virtiofs/virtiofs_vnops.cc  | 39 +++ 
>>  5 files changed, 64 insertions(+), 85 deletions(-) 
>>
>> diff --git a/drivers/virtio-fs.cc b/drivers/virtio-fs.cc 
>> index ca9b00fc..e0e090bc 100644 
>> --- a/drivers/virtio-fs.cc 
>> +++ b/drivers/virtio-fs.cc 
>> @@ -28,25 +28,23 @@ 
>>   
>>  using namespace memory; 
>>   
>> -void fuse_req_wait(fuse_request* req) 
>> -{ 
>> -WITH_LOCK(req->req_mutex) { 
>> -req->req_wait.wait(req->req_mutex); 
>> -} 
>> -} 
>> +using fuse_request = virtio::fs::fuse_request; 
>>   
>>  namespace virtio { 
>>   
>> -static int fuse_make_request(void* driver, fuse_request* req) 
>> +// Wait for the request to be marked as completed. 
>> +void fs::fuse_request::wait() 
>>  { 
>> -auto fs_driver = static_cast(driver); 
>> -return fs_driver->make_request(req); 
>> +WITH_LOCK(req_mutex) { 
>> +req_wait.wait(req_mutex); 
>> +} 
>>  } 
>>   
>> -static void fuse_req_done(fuse_request* req) 
>> 

Re: [osv-dev] OSv boots and runs simple hello world on Firecracker ARM edition ... on Raspberry PI 4

2020-05-25 Thread Nadav Har'El
On Thu, May 21, 2020 at 11:58 PM Waldek Kozaczuk 
wrote:

>
> *ubuntu@ubuntu-rasp4*:*~*$ uname -a
>
> Linux ubuntu-rasp4 5.3.0-1025-raspi2 #27-Ubuntu SMP Fri May 8 08:32:04
> UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
>
> *ubuntu@ubuntu-rasp4*:*~*$ ./firecracker-v0.21.1-aarch64 --no-api --config
> -file ./osv-config.json
>
> OSv v0.55.0-9-g840428ad
>
> PSCI: version 65536.0 detected.
>
> setup_arm_clock() ENTERED, lr=800d54e0
>
> arm_clock(): frequency read as 0337f980
>
> interrupt_table::interrupt_table() ENTERED, lr=800a8d1c
>
> gic_driver::init_cpu() ENTERED, lr=80201c80
>
> CPU interface enabled.
>
> gic_driver::init_dist() ENTERED, lr=80201c8c
>
> number of supported IRQs: 0080
>
> interrupt table: gic driver created.
>
> registered IRQ id=0004
>
> registered IRQ id=001b
>
> registered IRQ id=
>
> Premain complete!
>
> smp_launch ENTERED, lr=800d709c
>
> Booted up in 0.00 ms
>
> Cmdline: console=ttyS0 --verbose --nomount --maxnic=0 /tools/hello.so
> earlycon=uart,mmio,0x40001000
>
> faulting address 1001fea0
>
> faulting address 1328
>
> faulting address 1002
>
> faulting address 10050018
>
> faulting address 100301c8
>
> faulting address 1004fe00
>
> Hello from C code
>
> 2020-05-21T20:24:04.080765641 [anonymous-instance:ERROR:src/vmm/src/vstate
> .rs:951] Unexpected exit reason on vcpu run: SystemEvent
>
> In general it took me a bit of research as I am not really familiar with
> ARM architecture and even reading the assembly was a bit a challenge to say
> the least. And then debugging without debugger and any console (:-( .. so
> not debug() for long time. But all in all it was not too bad and the
> changes that I had to make to OSv are in my opinion much smaller and easier
> comparing to x86_64.
>

Very nice!

I'll try to review a few points below, but I'm even a less of an ARM expert
(and definitely firecracker expert) than you so I'm not sure how valuable
my review will be.
We just need to take care not to break what was already working in the
existing aarch64 code, because I don't know how it was tested, and how
various changes can break other arm variants and hypervisors.


>
> Below you will see the "hack-patch" showing what changes I had to make.
> Logically, following things had to be changed:
>
>- The most important thing was to move kernel from 0x4000 (1GB) to
>0x8000 (2GB) which required changing one line in Makefile (see below)
>and changing boot paging table to map the 2GB-3GB area of memory; only then
>I could actually start debugging :-) I wonder if it will also work on
>QEMU/KVM - possinly qemu boot loader will inspect ELF and place it
>accordingly in memory; firecracker does not read ELF and simply places it
>at 2GB
>- To get console working I guessed that that need to create equivalent
>class for isa_serial_console but communicating over mmio (see
>mmio_isa_serial_console.hh/cc) which in essence invokes mmio_set*/mmio_get*
>(it would be nice to extract common code somehow - suggestions welcome)
>
>
Perhaps you can have class class mmio_isa_serial_console inherit from
isa_serial_console, but change the implementation to use a method out()
instead of pci::outb() - and the new mmio_isa_serial_console will just
override the out() function (perhaps out() needs to be virtual for this to
work?).



>
>- Some more trivial changes - for now mostly disabling things - for
>now quite disorganized
>
>
> diff --git a/Makefile b/Makefile
> index db3c68cf..ffd570a5 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -354,6 +354,7 @@ tools := tools/mkfs/mkfs.so tools/cpiod/cpiod.so
>  $(out)/tools/%.o: COMMON += -fPIC
>
>  tools += tools/uush/uush.so
> +tools += tools/uush/hello.so
>  tools += tools/uush/ls.so
>  tools += tools/uush/mkdir.so
>
> @@ -451,8 +452,8 @@ endif # x64
>
>  ifeq ($(arch),aarch64)
>
> -kernel_base := 0x4008
> -kernel_vm_base := 0x4008
> +kernel_base := 0x8008
> +kernel_vm_base := 0x8008
>

I'm a bit worried about these magic numbers... I don't know if 0x4008
was important previously and things would break with a different number or
was it just picked randomly, and why 0x8008 is necessary now.


>  app_local_exec_tls_size := 0x0
>
>  include $(libfdt_base)/Makefile.libfdt
> @@ -816,6 +817,7 @@ drivers += drivers/xenplatform-pci.o
>  endif # x64
>
>  ifeq ($(arch),aarch64)
> +drivers += drivers/mmio-isa-serial.o
>  drivers += drivers/pl011.o
>  drivers += drivers/xenconsole.o
>  drivers += drivers/virtio.o
> diff --git a/arch/aarch64/arch-dtb.cc b/arch/aarch64/arch-dtb.cc
> index b59f1dcc..cd0719e8 100644
> --- a/arch/aarch64/arch-dtb.cc
> +++ b/arch/aarch64/arch-dtb.cc
> @@ -225,7 +225,8 @@ bool dtb_get_gic_v2(u64 *dist, size_t *dist_len, u64
> *cpu, size_t *cpu_len)
>  if (!dtb)
>  return false;
>
> -node = 

Re: [osv-dev] [PATCH 2/3] lzloader: fix memset() implementation

2020-05-25 Thread Nadav Har'El
On Mon, May 25, 2020 at 7:31 AM Rick Payne  wrote:

>
> I think this is also related to 913.


Indeed! :-) I had a sense of deja-vu with this memset() loop, and dug up
the patch I wrote back then and just updated it.


> I'll try without my patch (which
> we've been having to use since then).
>

Do you mean
https://github.com/cloudius-systems/osv/commit/d52cb12546ff2acd5255a2ac8897891e421f07dc
which just turned off optimization?

At the time, I suggested the memset() fix for #913, because seemed to me
like an "obvious" caused for infinite recursion (which happened now in
Fedora 32). But you said you tested and it didn't help - in your case you
saw that memset() wasn't even used (you commented it out and everything
still compiled fine).

So I guess we can't revert the turn-off-optimization patch. Maybe it's not
actually needed for modern compilers (you can check...) but it will still
be needed for the specific compiler which caused you problem #913 in the
first place.


>
> Rick
>
> On Sat, 2020-05-23 at 23:47 +0300, Nadav Har'El wrote:
> > Some compilers apparently optimize code in fastlz/ to call memset(),
> > so
> > the uncompressing boot loader, fastlz/lzloader.cc, needs to implement
> > this function. The current implementation called the "builtin"
> > memset,
> > which, if you look at the compilation result, actually calls memset()
> > and results in endless recursion and a hanging boot... This started
> > happening on Fedora 32 with Gcc 10, for example.
> >
> > So let's implement memset() using the base_memset() we already have
> > in
> > libc/string/memset.c.
> >
> > Fixes #1084.
> >
> > Signed-off-by: Nadav Har'El 
> > ---
> >  fastlz/lzloader.cc | 15 ++-
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/fastlz/lzloader.cc b/fastlz/lzloader.cc
> > index f65fb2be..7eae2191 100644
> > --- a/fastlz/lzloader.cc
> > +++ b/fastlz/lzloader.cc
> > @@ -21,11 +21,16 @@ extern char _binary_loader_stripped_elf_lz_start;
> >  extern char _binary_loader_stripped_elf_lz_end;
> >  extern char _binary_loader_stripped_elf_lz_size;
> >
> > -// std libraries used by fastlz.
> > -extern "C" void *memset(void *s, int c, size_t n)
> > -{
> > -return __builtin_memset(s, c, n);
> > -}
> > +// The code in fastlz.cc does not call memset(), but some version of
> > gcc
> > +// implement some assignments by calling memset(), so we need to
> > implement
> > +// a memset() function. This is not performance-critical so let's
> > stick to
> > +// the basic implementation we have in libc/string/memset.c. To
> > avoid
> > +// compiling this source file a second time (the loader needs
> > different
> > +// compile parameters), we #include it here instead.
> > +extern "C" void *memset(void *s, int c, size_t n);
> > +#define memset_base memset
> > +#include "libc/string/memset.c"
> > +#undef memset_base
> >
> >  extern "C" void uncompress_loader()
> >  {
> > --
> > 2.26.2
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/d18a6dd80dcdad104b8ed7cc0348e0ebc36dfd19.camel%40rossfell.co.uk
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjsXrWL_BcEP7gGKOWGtMaPXFY43AvZKO%2BtjrTBj3A295w%40mail.gmail.com.