[osv-dev] Re: [PATCH 6/6] virtio-fs: refactor driver / fs

2020-04-30 Thread Fotis Xenakis

Τη Τετάρτη, 29 Απριλίου 2020 - 9:21:13 μ.μ. UTC+3, ο χρήστης Waldek 
Kozaczuk έγραψε:
>
> I think your patch looks good and I like your simplifications.
>
> Couple of things to make sure we have covered all bases. 
>
> 1) Are we sure none of these changes break any thread-safety? 
>
I had checked for this, both trying to reason about the code and testing 
with a program with concurrent read()ers, but by chance of the second 
version, I shall check again, paying more attention to these changes.

> 2) Are we certain we do not need to use "*free_phys_contiguous_aligned*" 
> in some places to make sure the host sees contiguous physical memory? 
> Currently, we use new in all virtiofs related code which uses regular 
> malloc behind the scenes. 
>
This is actually a valid point I hadn't given much thought into. I will 
look into it, thank you!

>
> On Monday, April 20, 2020 at 5:07:18 PM UTC-4, Fotis Xenakis wrote:
>>
>> Since in virtio-fs the filesystem is very tightly coupled with the 
>> driver, this tries to make clear the dependence of the first on the 
>> second, as well as simplify. 
>>
> Agree. 
>
>>
>> This includes: 
>> - The definition of fuse_request is moved from the fs to the driver, 
>>   since it is part of the interface it provides. Also, it is enhanced 
>>   with methods, somewhat promoting it to a "proper" class. 
>>
> I like this. 
>
>> - fuse_strategy, as a redirection to the driver is removed and instead 
>>   the dependence on the driver is made explicit. 
>> - Last, virtio::fs::fs_req is removed and fuse_request is used in its 
>>   place, since it offered no value with fuse_request now defined in the 
>>   driver. 
>>
>> Signed-off-by: Fotis Xenakis  
>> --- 
>>  drivers/virtio-fs.cc   | 42 +- 
>>  drivers/virtio-fs.hh   | 27 +++--- 
>>  fs/virtiofs/virtiofs_i.hh  | 24 ++- 
>>  fs/virtiofs/virtiofs_vfsops.cc | 16 +++-- 
>>  fs/virtiofs/virtiofs_vnops.cc  | 37 ++ 
>>  5 files changed, 63 insertions(+), 83 deletions(-) 
>>
>> diff --git a/drivers/virtio-fs.cc b/drivers/virtio-fs.cc 
>> index b7363040..af1246c1 100644 
>> --- a/drivers/virtio-fs.cc 
>> +++ b/drivers/virtio-fs.cc 
>> @@ -28,25 +28,23 @@ 
>>   
>>  using namespace memory; 
>>   
>> -void fuse_req_wait(fuse_request* req) 
>> -{ 
>> -WITH_LOCK(req->req_mutex) { 
>> -req->req_wait.wait(req->req_mutex); 
>> -} 
>> -} 
>> +using fuse_request = virtio::fs::fuse_request; 
>>   
>>  namespace virtio { 
>>   
>> -static int fuse_make_request(void* driver, fuse_request* req) 
>> +// Wait for the request to be marked as completed. 
>> +void fs::fuse_request::wait() 
>>  { 
>> -auto fs_driver = static_cast(driver); 
>> -return fs_driver->make_request(req); 
>> +WITH_LOCK(req_mutex) { 
>> +req_wait.wait(req_mutex); 
>> +} 
>>  } 
>>   
>> -static void fuse_req_done(fuse_request* req) 
>> +// Mark the request as completed. 
>> +void fs::fuse_request::done() 
>>  { 
>> -WITH_LOCK(req->req_mutex) { 
>> -req->req_wait.wake_one(req->req_mutex); 
>> +WITH_LOCK(req_mutex) { 
>> +req_wait.wake_one(req_mutex); 
>>  } 
>>  } 
>>   
>> @@ -87,7 +85,7 @@ static struct devops fs_devops { 
>>  struct driver fs_driver = { 
>>  "virtio_fs", 
>>  _devops, 
>> -sizeof(struct fuse_strategy), 
>> +sizeof(fs*), 
>>  }; 
>>   
>>  bool fs::ack_irq() 
>> @@ -161,10 +159,7 @@ fs::fs(virtio_device& virtio_dev) 
>>  dev_name += std::to_string(_disk_idx++); 
>>   
>>  struct device* dev = device_create(_driver, dev_name.c_str(), 
>> D_BLK); // TODO Should it be really D_BLK? 
>> -auto* strategy = static_cast(dev->private_data); 
>> -strategy->drv = this; 
>> -strategy->make_request = fuse_make_request; 
>> - 
>> +dev->private_data = this; 
>>  debugf("virtio-fs: Add device instance %d as [%s]\n", _id, 
>>  dev_name.c_str()); 
>>  } 
>> @@ -201,13 +196,12 @@ void fs::req_done() 
>>  while (true) { 
>>  virtio_driver::wait_for_queue(queue, 
>> ::used_ring_not_empty); 
>>   
>> -fs_req* req; 
>> +fuse_request* req; 
>>  u32 len; 
>> -while ((req = static_cast(queue->get_buf_elem())) 
>> != 
>> +while ((req = 
>> static_cast(queue->get_buf_elem())) != 
>>  nullptr) { 
>>   
>> -fuse_req_done(req->fuse_req); 
>> -delete req; 
>> +req->done(); 
>>  queue->get_buf_finalize(); 
>>  } 
>>   
>> @@ -231,11 +225,7 @@ int fs::make_request(fuse_request* req) 
>>  fuse_req_enqueue_input(queue, req); 
>>  fuse_req_enqueue_output(queue, req); 
>>   
>> -auto* fs_request = new (std::nothrow) fs_req(req); 
>> -if (!fs_request) { 
>> -return ENOMEM; 
>> -} 
>> -queue->add_buf_wait(fs_request); 
>> +queue->add_buf_wait(req); 
>>  queue->kick(); 
>>   
>> 

Re: [osv-dev] Re: [PATCH 3/6] virtio-fs: update fuse protocol header

2020-04-30 Thread Fotis Xenakis

Τη Παρασκευή, 1 Μαΐου 2020 - 1:31:00 π.μ. UTC+3, ο χρήστης Waldek Kozaczuk 
έγραψε:
>
>
>
> On Thu, Apr 30, 2020 at 6:19 PM Fotis Xenakis  > wrote:
>
>> Indeed, QEMU 5.0 does not support DAX and the virtiofsd in QEMU 5.0 won't 
>> accept any version other than 7.31 as I see here 
>> ,
>>  
>> thus the mount fails.
>> Both on the QEMU and the Linux side, DAX is not close to upstreaming yet. 
>> Although it seems no longer marked as "experimental" here 
>> , I think it's still under 
>> development (*not* verified with the devs) and that's the source for 
>> some instability.
>>
>> To summarize:
>>
>>- Upstream QEMU 5.0 includes stable virtio-fs support, with the basic 
>>feature set. It negotiates *FUSE 7.31* (latest in upstream Linux).
>>- Downstream virtio-fs QEMU  
>>currently contains:
>>   - The default (thus recommended in the docs 
>>   ) virtio-fs 
>>    branch. This 
>>   negotiates *FUSE 7.27* and supports DAX. This is the one I have 
>>   based my patches upon, because it is the most stable *with DAX 
>>   support*.
>>   - The development branches, virtio-dev 
>>    and 
>>   virtio-fs-dev 
>>    (don't 
>>   know what distinguishes them TBH). They both negotiate *FUSE 7.31* 
>>   and support DAX (with changed protocol details). These iterate 
>> quickly, so 
>>   I haven't used them.
>>
>> I hadn't anticipated this hard constraint upstream, which poses a 
>> problem, since I guess we want to be compatible with it.
>> My plan is to reach out to the virtio-fs devs, asking for the status of 
>> DAX in the dev branches. If they deem it stabilized, I will probably try to 
>> go with those, offering upstream compatibility *and* DAX.
>> Otherwise, we could have a hybrid approach, compatible with upstream for 
>> the stable features, but following the more stale "virtio-fs" downstream 
>> branch as far as DAX is concerned.
>> What do you think?
>>
> I am not sure I 100% understand what you are proposing. Adding some kind 
> of negotiating logic on OSv side that will be able to deal with both 27 and 
> 31 and "advertise" accordingly? Can we simply send 31 if there is no DAX 
> window detected in the driver layer and 27 otherwise?
>

> I guess for we could just keep this header per 31 and 
> add FUSE_SETUPMAPPING AND FUSE_REMOVEMAPPING to our header, no?
>
This is the "hybrid" approach I was thinking of above and the one I will go 
with for now.
Also, I will contact the virtio-fs devs for insight on how the project will 
evolve in the near future.

>
> Meanwhile I will rollback this particular patch to make OSv work with with 
> stock qemu and virtiofs. 
>
Absolutely, this makes sense. 

>
>> Τη Τετάρτη, 29 Απριλίου 2020 - 7:48:02 μ.μ. UTC+3, ο χρήστης Waldek 
>> Kozaczuk έγραψε:
>>>
>>> On Monday, April 20, 2020 at 5:04:27 PM UTC-4, Fotis Xenakis wrote:

 Copy from virtiofsd @ 32006c66f2578af4121d7effaccae4aa4fa12e46. This 
 includes the definitions for FUSE_SETUPMAPPING AND FUSE_REMOVEMAPPING. 

 Signed-off-by: Fotis Xenakis  
 --- 
  fs/virtiofs/fuse_kernel.h | 82 ++- 
  1 file changed, 38 insertions(+), 44 deletions(-) 

 diff --git a/fs/virtiofs/fuse_kernel.h b/fs/virtiofs/fuse_kernel.h 
 index 018a00a2..ce46046a 100644 
 --- a/fs/virtiofs/fuse_kernel.h 
 +++ b/fs/virtiofs/fuse_kernel.h 
 @@ -44,7 +44,6 @@ 
   *  - add lock_owner field to fuse_setattr_in, fuse_read_in and 
 fuse_write_in 
   *  - add blksize field to fuse_attr 
   *  - add file flags field to fuse_read_in and fuse_write_in 
 - *  - Add ATIME_NOW and MTIME_NOW flags to fuse_setattr_in 
   * 
   * 7.10 
   *  - add nonseekable open flag 
 @@ -55,7 +54,7 @@ 
   *  - add POLL message and NOTIFY_POLL notification 
   * 
   * 7.12 
 - *  - add umask flag to input argument of create, mknod and mkdir 
 + *  - add umask flag to input argument of open, mknod and mkdir 
   *  - add notification messages for invalidation of inodes and 
   *directory entries 
   * 
 @@ -120,19 +119,6 @@ 
   * 
   *  7.28 
   *  - add FUSE_COPY_FILE_RANGE 
 - *  - add FOPEN_CACHE_DIR 
 - *  - add FUSE_MAX_PAGES, add max_pages to init_out 
 - *  - add FUSE_CACHE_SYMLINKS 
 - * 
 - *  7.29 
 - *  - add FUSE_NO_OPENDIR_SUPPORT flag 
 - * 
 - *  7.30 
 - *  - add FUSE_EXPLICIT_INVAL_DATA 
 - *  - add FUSE_IOCTL_COMPAT_X32 
 - * 
 - *  7.31 
 - *  - add FUSE_WRITE_KILL_PRIV 

[osv-dev] [COMMIT osv master] Revert "virtio-fs: update fuse protocol header"

2020-04-30 Thread Commit Bot
From: Waldemar Kozaczuk 
Committer: Waldemar Kozaczuk 
Branch: master

Revert "virtio-fs: update fuse protocol header"

This reverts commit bfe114878a194a9230686a8c1c38122f55d9524c.

---
diff --git a/fs/virtiofs/fuse_kernel.h b/fs/virtiofs/fuse_kernel.h
--- a/fs/virtiofs/fuse_kernel.h
+++ b/fs/virtiofs/fuse_kernel.h
@@ -44,6 +44,7 @@
  *  - add lock_owner field to fuse_setattr_in, fuse_read_in and fuse_write_in
  *  - add blksize field to fuse_attr
  *  - add file flags field to fuse_read_in and fuse_write_in
+ *  - Add ATIME_NOW and MTIME_NOW flags to fuse_setattr_in
  *
  * 7.10
  *  - add nonseekable open flag
@@ -54,7 +55,7 @@
  *  - add POLL message and NOTIFY_POLL notification
  *
  * 7.12
- *  - add umask flag to input argument of open, mknod and mkdir
+ *  - add umask flag to input argument of create, mknod and mkdir
  *  - add notification messages for invalidation of inodes and
  *directory entries
  *
@@ -119,6 +120,19 @@
  *
  *  7.28
  *  - add FUSE_COPY_FILE_RANGE
+ *  - add FOPEN_CACHE_DIR
+ *  - add FUSE_MAX_PAGES, add max_pages to init_out
+ *  - add FUSE_CACHE_SYMLINKS
+ *
+ *  7.29
+ *  - add FUSE_NO_OPENDIR_SUPPORT flag
+ *
+ *  7.30
+ *  - add FUSE_EXPLICIT_INVAL_DATA
+ *  - add FUSE_IOCTL_COMPAT_X32
+ *
+ *  7.31
+ *  - add FUSE_WRITE_KILL_PRIV flag
  */
 
 #ifndef _LINUX_FUSE_H
@@ -154,7 +168,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 27
+#define FUSE_KERNEL_MINOR_VERSION 31
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -222,10 +236,14 @@ struct fuse_file_lock {
  * FOPEN_DIRECT_IO: bypass page cache for this open file
  * FOPEN_KEEP_CACHE: don't invalidate the data cache on open
  * FOPEN_NONSEEKABLE: the file is not seekable
+ * FOPEN_CACHE_DIR: allow caching this directory
+ * FOPEN_STREAM: the file is stream-like (no file position at all)
  */
 #define FOPEN_DIRECT_IO(1 << 0)
 #define FOPEN_KEEP_CACHE   (1 << 1)
 #define FOPEN_NONSEEKABLE  (1 << 2)
+#define FOPEN_CACHE_DIR(1 << 3)
+#define FOPEN_STREAM   (1 << 4)
 
 /**
  * INIT request/reply flags
@@ -252,6 +270,10 @@ struct fuse_file_lock {
  * FUSE_HANDLE_KILLPRIV: fs handles killing suid/sgid/cap on write/chown/trunc
  * FUSE_POSIX_ACL: filesystem supports posix acls
  * FUSE_ABORT_ERROR: reading the device after abort returns ECONNABORTED
+ * FUSE_MAX_PAGES: init_out.max_pages contains the max number of req pages
+ * FUSE_CACHE_SYMLINKS: cache READLINK responses
+ * FUSE_NO_OPENDIR_SUPPORT: kernel supports zero-message opendir
+ * FUSE_EXPLICIT_INVAL_DATA: only invalidate cached pages on explicit request
  */
 #define FUSE_ASYNC_READ(1 << 0)
 #define FUSE_POSIX_LOCKS   (1 << 1)
@@ -275,6 +297,10 @@ struct fuse_file_lock {
 #define FUSE_HANDLE_KILLPRIV   (1 << 19)
 #define FUSE_POSIX_ACL (1 << 20)
 #define FUSE_ABORT_ERROR   (1 << 21)
+#define FUSE_MAX_PAGES (1 << 22)
+#define FUSE_CACHE_SYMLINKS(1 << 23)
+#define FUSE_NO_OPENDIR_SUPPORT (1 << 24)
+#define FUSE_EXPLICIT_INVAL_DATA (1 << 25)
 
 /**
  * CUSE INIT request/reply flags
@@ -304,9 +330,11 @@ struct fuse_file_lock {
  *
  * FUSE_WRITE_CACHE: delayed write from page cache, file handle is guessed
  * FUSE_WRITE_LOCKOWNER: lock_owner field is valid
+ * FUSE_WRITE_KILL_PRIV: kill suid and sgid bits
  */
 #define FUSE_WRITE_CACHE   (1 << 0)
 #define FUSE_WRITE_LOCKOWNER   (1 << 1)
+#define FUSE_WRITE_KILL_PRIV   (1 << 2)
 
 /**
  * Read flags
@@ -321,6 +349,7 @@ struct fuse_file_lock {
  * FUSE_IOCTL_RETRY: retry with new iovecs
  * FUSE_IOCTL_32BIT: 32bit ioctl
  * FUSE_IOCTL_DIR: is a directory
+ * FUSE_IOCTL_COMPAT_X32: x32 compat ioctl on 64bit machine (64bit time_t)
  *
  * FUSE_IOCTL_MAX_IOV: maximum of in_iovecs + out_iovecs
  */
@@ -329,6 +358,7 @@ struct fuse_file_lock {
 #define FUSE_IOCTL_RETRY   (1 << 2)
 #define FUSE_IOCTL_32BIT   (1 << 3)
 #define FUSE_IOCTL_DIR (1 << 4)
+#define FUSE_IOCTL_COMPAT_X32  (1 << 5)
 
 #define FUSE_IOCTL_MAX_IOV 256
 
@@ -339,6 +369,13 @@ struct fuse_file_lock {
  */
 #define FUSE_POLL_SCHEDULE_NOTIFY (1 << 0)
 
+/**
+ * Fsync flags
+ *
+ * FUSE_FSYNC_FDATASYNC: Sync data only, not metadata
+ */
+#define FUSE_FSYNC_FDATASYNC   (1 << 0)
+
 enum fuse_opcode {
FUSE_LOOKUP = 1,
FUSE_FORGET = 2,  /* no reply */
@@ -385,11 +422,9 @@ enum fuse_opcode {
FUSE_RENAME2= 45,
FUSE_LSEEK  = 46,
FUSE_COPY_FILE_RANGE= 47,
-FUSE_SETUPMAPPING   = 48,
-FUSE_REMOVEMAPPING  = 49,
 
/* CUSE specific operations */
-   CUSE_INIT   = 4096,
+   CUSE_INIT   = 4096
 };
 
 enum fuse_notify_code {
@@ -399,7 +434,7 @@ enum fuse_notify_code {
FUSE_NOTIFY_STORE = 4,
FUSE_NOTIFY_RETRIEVE = 5,
FUSE_NOTIFY_DELETE = 6,
-   FUSE_NOTIFY_CODE_MAX,
+   

Re: [osv-dev] Re: [PATCH 5/6] virtio-fs: add basic read using the DAX window

2020-04-30 Thread Waldek Kozaczuk
On Thu, Apr 30, 2020 at 6:40 PM Fotis Xenakis  wrote:

> Stock QEMU still does not have DAX support so I used one from
>> https://gitlab.com/virtio-fs/qemu/-/commits/virtio-dev (shall I be using
>> this?) to test the DAX logic.
>>
> The branch you mention is under active development and will *not* work
> with my current patches. Those are based upon the more stable virtio-fs
>  branch.
>
> BTW is there a way to make virtiofsd daemon not to terminate every time
>> after single OSv run?
>>
> I am not aware of a way to make make virtiofsd not terminate every time,
> it would sure be nice not having to restart it all the time though...
>
> As  I understand this is a temporary solution until we integrate DAX logic
>> with page cache, right? Eventually, pages mapped with FUSE_SETUPMAPPING
>> should stay in page cache until a file gets unmapped? Right now we copy
>> data from DAX window but eventually, we would like not to which is the
>> whole point of DAX, right?
>>
> That would be ideal, but I can't see how we could avoid the copy to the
> user buffers, while retaining proper read() semantics(?). We might be able
> to do it in the case of mmap() though (my only concern is that the DAX
> window is device memory, but that doesn't seem to be a problem.
>
Yes we cannot avoid it in virtiofs_read() which btw may not make sense to
use DAX in long term over regular FUSE_READ.
For mmap() I agree. I could be as simple (probably not as simple) as what
we do in ROFS (see rofs_map_cached_page()).

>
> Not right now but going forward as we design integration with page cache
>> we should think of a way to have a simple read-ahead cache in virtio fs
>> just like ROFS has so we can optimize reading even if there is no DAX
>> enabled. In other words, eventually, page cache should either point to
>> pages from DAX window (if DAX on) or to pages in a local cache where we
>> would keep data read using regular FUSE_READ. Ideally we should refactor
>> the read-ahead/around cache in ROFS to make it more generic and usable with
>> virtiofs.
>>
>> But all that is the future.
>>
>
>  I totally agree, with both points!
>
> Τη Τετάρτη, 29 Απριλίου 2020 - 8:30:25 μ.μ. UTC+3, ο χρήστης Waldek
> Kozaczuk έγραψε:
>>
>> Let me start with the results of my testing this patch. First I tried
>> with stock QEMU 5.0 just to verify that non-DAX logic still works. In
>> general, it does, however, I encountered that protocol mismatch error which
>> I reported in my other email.
>>
>> Stock QEMU still does not have DAX support so I used one from
>> https://gitlab.com/virtio-fs/qemu/-/commits/virtio-dev (shall I be using
>> this?) to test the DAX logic.
>>
>
>> When I ran a simple example I got this:
>>
>> #In another window
>> ./build/virtiofsd --socket-path=/tmp/vhostqemu -o source=~/projects/osv/
>> apps/native-example -o cache=always -d
>>
>> # Main
>> /home/wkozaczuk/projects/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
>> -m 4G \
>> -smp 4 \
>> -vnc :1 \
>> -gdb tcp::1234,server,nowait \
>> -kernel /home/wkozaczuk/projects/osv/build/last/kernel.elf \
>> -append "$1" \
>> -device virtio-blk-pci,id=blk0,drive=hd0,scsi=off \
>> -drive
>> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native
>> \
>> -netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 \
>> -device virtio-net-pci,netdev=un0 \
>> -device virtio-rng-pci \
>> -enable-kvm \
>> -cpu host,+x2apic \
>> -chardev stdio,mux=on,id=stdio,signal=off \
>> -mon chardev=stdio,mode=readline \
>> -device isa-serial,chardev=stdio \
>> -chardev socket,id=char0,path=/tmp/vhostqemu \
>> -device
>> vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs,cache-size=64M \
>> -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on
>> -numa node,memdev=mem #do we need that line?
>>
>> OSv v0.54.0-179-g2f92fc91
>> 4 CPUs detected
>> Firmware vendor: SeaBIOS
>> bsd: initializing - done
>> VFS: mounting ramfs at /
>> VFS: mounting devfs at /dev
>> net: initializing - done
>> vga: Add VGA device instance
>> eth0: ethernet address: 52:54:00:12:34:56
>> virtio-blk: Add blk device instances 0 as vblk0, devsize=6470656
>> random: virtio-rng registered as a source.
>> virtio-fs: Detected device with tag: [myfs] and num_queues: 1
>> virtio-fs: Detected DAX window with length 67108864
>> virtio-fs: Add device instance 0 as [virtiofs1]
>> random: intel drng, rdrand registered as a source.
>> random:  initialized
>> VFS: unmounting /dev
>> VFS: mounting rofs at /rofs
>> VFS: mounting devfs at /dev
>> VFS: mounting procfs at /proc
>> VFS: mounting sysfs at /sys
>> VFS: mounting ramfs at /tmp
>> VFS: mounting virtiofs at /virtiofs
>> [virtiofs] Initialized fuse filesystem with version major: 7, minor: 31
>> [I/43 dhcp]: Broadcasting DHCPDISCOVER message with xid: [1369429892]
>> [I/43 dhcp]: Waiting for IP...
>> [I/55 dhcp]: Received DHCPOFFER message from DHCP server: 192.168.122.1
>> regarding offerred IP address: 

[osv-dev] Re: [PATCH 5/6] virtio-fs: add basic read using the DAX window

2020-04-30 Thread Fotis Xenakis

>
> Stock QEMU still does not have DAX support so I used one from 
> https://gitlab.com/virtio-fs/qemu/-/commits/virtio-dev (shall I be using 
> this?) to test the DAX logic.
>
The branch you mention is under active development and will *not* work with 
my current patches. Those are based upon the more stable virtio-fs 
 branch.

BTW is there a way to make virtiofsd daemon not to terminate every time 
> after single OSv run? 
>
I am not aware of a way to make make virtiofsd not terminate every time, it 
would sure be nice not having to restart it all the time though...

As  I understand this is a temporary solution until we integrate DAX logic 
> with page cache, right? Eventually, pages mapped with FUSE_SETUPMAPPING 
> should stay in page cache until a file gets unmapped? Right now we copy 
> data from DAX window but eventually, we would like not to which is the 
> whole point of DAX, right?
>
That would be ideal, but I can't see how we could avoid the copy to the 
user buffers, while retaining proper read() semantics(?). We might be able 
to do it in the case of mmap() though (my only concern is that the DAX 
window is device memory, but that doesn't seem to be a problem.

Not right now but going forward as we design integration with page cache we 
> should think of a way to have a simple read-ahead cache in virtio fs just 
> like ROFS has so we can optimize reading even if there is no DAX enabled. 
> In other words, eventually, page cache should either point to pages from 
> DAX window (if DAX on) or to pages in a local cache where we would keep 
> data read using regular FUSE_READ. Ideally we should refactor the 
> read-ahead/around cache in ROFS to make it more generic and usable with 
> virtiofs.
>
> But all that is the future.
>
 
 I totally agree, with both points!

Τη Τετάρτη, 29 Απριλίου 2020 - 8:30:25 μ.μ. UTC+3, ο χρήστης Waldek 
Kozaczuk έγραψε:
>
> Let me start with the results of my testing this patch. First I tried with 
> stock QEMU 5.0 just to verify that non-DAX logic still works. In general, 
> it does, however, I encountered that protocol mismatch error which I 
> reported in my other email. 
>
> Stock QEMU still does not have DAX support so I used one from 
> https://gitlab.com/virtio-fs/qemu/-/commits/virtio-dev (shall I be using 
> this?) to test the DAX logic.
>

> When I ran a simple example I got this:
>
> #In another window
> ./build/virtiofsd --socket-path=/tmp/vhostqemu -o source=~/projects/osv/
> apps/native-example -o cache=always -d
>
> # Main 
> /home/wkozaczuk/projects/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
> -m 4G \
> -smp 4 \
> -vnc :1 \
> -gdb tcp::1234,server,nowait \
> -kernel /home/wkozaczuk/projects/osv/build/last/kernel.elf \
> -append "$1" \
> -device virtio-blk-pci,id=blk0,drive=hd0,scsi=off \
> -drive 
> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native
>  
> \
> -netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 \
> -device virtio-net-pci,netdev=un0 \
> -device virtio-rng-pci \
> -enable-kvm \
> -cpu host,+x2apic \
> -chardev stdio,mux=on,id=stdio,signal=off \
> -mon chardev=stdio,mode=readline \
> -device isa-serial,chardev=stdio \
> -chardev socket,id=char0,path=/tmp/vhostqemu \
> -device 
> vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs,cache-size=64M \
> -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on 
> -numa node,memdev=mem #do we need that line?
>
> OSv v0.54.0-179-g2f92fc91
> 4 CPUs detected
> Firmware vendor: SeaBIOS
> bsd: initializing - done
> VFS: mounting ramfs at /
> VFS: mounting devfs at /dev
> net: initializing - done
> vga: Add VGA device instance
> eth0: ethernet address: 52:54:00:12:34:56
> virtio-blk: Add blk device instances 0 as vblk0, devsize=6470656
> random: virtio-rng registered as a source.
> virtio-fs: Detected device with tag: [myfs] and num_queues: 1
> virtio-fs: Detected DAX window with length 67108864
> virtio-fs: Add device instance 0 as [virtiofs1]
> random: intel drng, rdrand registered as a source.
> random:  initialized
> VFS: unmounting /dev
> VFS: mounting rofs at /rofs
> VFS: mounting devfs at /dev
> VFS: mounting procfs at /proc
> VFS: mounting sysfs at /sys
> VFS: mounting ramfs at /tmp
> VFS: mounting virtiofs at /virtiofs
> [virtiofs] Initialized fuse filesystem with version major: 7, minor: 31
> [I/43 dhcp]: Broadcasting DHCPDISCOVER message with xid: [1369429892]
> [I/43 dhcp]: Waiting for IP...
> [I/55 dhcp]: Received DHCPOFFER message from DHCP server: 192.168.122.1 
> regarding offerred IP address: 192.168.122.15
> [I/55 dhcp]: Broadcasting DHCPREQUEST message with xid: [1369429892] to 
> SELECT offered IP: 192.168.122.15
> [I/55 dhcp]: Received DHCPACK message from DHCP server: 192.168.122.1 
> regarding offerred IP address: 192.168.122.15
> [I/55 dhcp]: Server acknowledged IP 192.168.122.15 for interface eth0 with 
> time to lease in seconds: 86400
> eth0: 

Re: [osv-dev] Re: [PATCH 3/6] virtio-fs: update fuse protocol header

2020-04-30 Thread Waldek Kozaczuk
On Thu, Apr 30, 2020 at 6:19 PM Fotis Xenakis  wrote:

> Indeed, QEMU 5.0 does not support DAX and the virtiofsd in QEMU 5.0 won't
> accept any version other than 7.31 as I see here
> ,
> thus the mount fails.
> Both on the QEMU and the Linux side, DAX is not close to upstreaming yet.
> Although it seems no longer marked as "experimental" here
> , I think it's still under
> development (*not* verified with the devs) and that's the source for some
> instability.
>
> To summarize:
>
>- Upstream QEMU 5.0 includes stable virtio-fs support, with the basic
>feature set. It negotiates *FUSE 7.31* (latest in upstream Linux).
>- Downstream virtio-fs QEMU 
>currently contains:
>   - The default (thus recommended in the docs
>   ) virtio-fs
>    branch. This
>   negotiates *FUSE 7.27* and supports DAX. This is the one I have
>   based my patches upon, because it is the most stable *with DAX
>   support*.
>   - The development branches, virtio-dev
>    and
>   virtio-fs-dev
>    (don't
>   know what distinguishes them TBH). They both negotiate *FUSE 7.31*
>   and support DAX (with changed protocol details). These iterate quickly, 
> so
>   I haven't used them.
>
> I hadn't anticipated this hard constraint upstream, which poses a problem,
> since I guess we want to be compatible with it.
> My plan is to reach out to the virtio-fs devs, asking for the status of
> DAX in the dev branches. If they deem it stabilized, I will probably try to
> go with those, offering upstream compatibility *and* DAX.
> Otherwise, we could have a hybrid approach, compatible with upstream for
> the stable features, but following the more stale "virtio-fs" downstream
> branch as far as DAX is concerned.
> What do you think?
>
I am not sure I 100% understand what you are proposing. Adding some kind of
negotiating logic on OSv side that will be able to deal with both 27 and 31
and "advertise" accordingly? Can we simply send 31 if there is no DAX
window detected in the driver layer and 27 otherwise?

I guess for we could just keep this header per 31 and add FUSE_SETUPMAPPING
AND FUSE_REMOVEMAPPING to our header, no?

Meanwhile I will rollback this particular patch to make OSv work with with
stock qemu and virtiofs.

>
> Τη Τετάρτη, 29 Απριλίου 2020 - 7:48:02 μ.μ. UTC+3, ο χρήστης Waldek
> Kozaczuk έγραψε:
>>
>> On Monday, April 20, 2020 at 5:04:27 PM UTC-4, Fotis Xenakis wrote:
>>>
>>> Copy from virtiofsd @ 32006c66f2578af4121d7effaccae4aa4fa12e46. This
>>> includes the definitions for FUSE_SETUPMAPPING AND FUSE_REMOVEMAPPING.
>>>
>>> Signed-off-by: Fotis Xenakis 
>>> ---
>>>  fs/virtiofs/fuse_kernel.h | 82 ++-
>>>  1 file changed, 38 insertions(+), 44 deletions(-)
>>>
>>> diff --git a/fs/virtiofs/fuse_kernel.h b/fs/virtiofs/fuse_kernel.h
>>> index 018a00a2..ce46046a 100644
>>> --- a/fs/virtiofs/fuse_kernel.h
>>> +++ b/fs/virtiofs/fuse_kernel.h
>>> @@ -44,7 +44,6 @@
>>>   *  - add lock_owner field to fuse_setattr_in, fuse_read_in and
>>> fuse_write_in
>>>   *  - add blksize field to fuse_attr
>>>   *  - add file flags field to fuse_read_in and fuse_write_in
>>> - *  - Add ATIME_NOW and MTIME_NOW flags to fuse_setattr_in
>>>   *
>>>   * 7.10
>>>   *  - add nonseekable open flag
>>> @@ -55,7 +54,7 @@
>>>   *  - add POLL message and NOTIFY_POLL notification
>>>   *
>>>   * 7.12
>>> - *  - add umask flag to input argument of create, mknod and mkdir
>>> + *  - add umask flag to input argument of open, mknod and mkdir
>>>   *  - add notification messages for invalidation of inodes and
>>>   *directory entries
>>>   *
>>> @@ -120,19 +119,6 @@
>>>   *
>>>   *  7.28
>>>   *  - add FUSE_COPY_FILE_RANGE
>>> - *  - add FOPEN_CACHE_DIR
>>> - *  - add FUSE_MAX_PAGES, add max_pages to init_out
>>> - *  - add FUSE_CACHE_SYMLINKS
>>> - *
>>> - *  7.29
>>> - *  - add FUSE_NO_OPENDIR_SUPPORT flag
>>> - *
>>> - *  7.30
>>> - *  - add FUSE_EXPLICIT_INVAL_DATA
>>> - *  - add FUSE_IOCTL_COMPAT_X32
>>> - *
>>> - *  7.31
>>> - *  - add FUSE_WRITE_KILL_PRIV flag
>>>   */
>>>
>>>  #ifndef _LINUX_FUSE_H
>>> @@ -168,7 +154,7 @@
>>>  #define FUSE_KERNEL_VERSION 7
>>>
>>>  /** Minor version number of this interface */
>>> -#define FUSE_KERNEL_MINOR_VERSION 31
>>> +#define FUSE_KERNEL_MINOR_VERSION 27
>>>
>> I have applied this patch but when I started testing your later patches
>> that enable DAX logic I would get error messages about the wrong protocol
>> version:
>>
>> OSv v0.54.0-179-g2f92fc91
>> 4 CPUs detected
>> Firmware vendor: SeaBIOS
>> bsd: initializing - done
>> VFS: 

[osv-dev] Re: [PATCH 3/6] virtio-fs: update fuse protocol header

2020-04-30 Thread Fotis Xenakis
Indeed, QEMU 5.0 does not support DAX and the virtiofsd in QEMU 5.0 won't 
accept any version other than 7.31 as I see here 
,
 
thus the mount fails.
Both on the QEMU and the Linux side, DAX is not close to upstreaming yet. 
Although it seems no longer marked as "experimental" here 
, I think it's still under 
development (*not* verified with the devs) and that's the source for some 
instability.

To summarize:

   - Upstream QEMU 5.0 includes stable virtio-fs support, with the basic 
   feature set. It negotiates *FUSE 7.31* (latest in upstream Linux).
   - Downstream virtio-fs QEMU  
   currently contains:
  - The default (thus recommended in the docs 
  ) virtio-fs 
   branch. This 
  negotiates *FUSE 7.27* and supports DAX. This is the one I have based 
  my patches upon, because it is the most stable *with DAX support*.
  - The development branches, virtio-dev 
   and 
  virtio-fs-dev  
  (don't know what distinguishes them TBH). They both negotiate *FUSE 
  7.31* and support DAX (with changed protocol details). These iterate 
  quickly, so I haven't used them.
   
I hadn't anticipated this hard constraint upstream, which poses a problem, 
since I guess we want to be compatible with it.
My plan is to reach out to the virtio-fs devs, asking for the status of DAX 
in the dev branches. If they deem it stabilized, I will probably try to go 
with those, offering upstream compatibility *and* DAX.
Otherwise, we could have a hybrid approach, compatible with upstream for 
the stable features, but following the more stale "virtio-fs" downstream 
branch as far as DAX is concerned.
What do you think?

Τη Τετάρτη, 29 Απριλίου 2020 - 7:48:02 μ.μ. UTC+3, ο χρήστης Waldek 
Kozaczuk έγραψε:
>
> On Monday, April 20, 2020 at 5:04:27 PM UTC-4, Fotis Xenakis wrote:
>>
>> Copy from virtiofsd @ 32006c66f2578af4121d7effaccae4aa4fa12e46. This 
>> includes the definitions for FUSE_SETUPMAPPING AND FUSE_REMOVEMAPPING. 
>>
>> Signed-off-by: Fotis Xenakis  
>> --- 
>>  fs/virtiofs/fuse_kernel.h | 82 ++- 
>>  1 file changed, 38 insertions(+), 44 deletions(-) 
>>
>> diff --git a/fs/virtiofs/fuse_kernel.h b/fs/virtiofs/fuse_kernel.h 
>> index 018a00a2..ce46046a 100644 
>> --- a/fs/virtiofs/fuse_kernel.h 
>> +++ b/fs/virtiofs/fuse_kernel.h 
>> @@ -44,7 +44,6 @@ 
>>   *  - add lock_owner field to fuse_setattr_in, fuse_read_in and 
>> fuse_write_in 
>>   *  - add blksize field to fuse_attr 
>>   *  - add file flags field to fuse_read_in and fuse_write_in 
>> - *  - Add ATIME_NOW and MTIME_NOW flags to fuse_setattr_in 
>>   * 
>>   * 7.10 
>>   *  - add nonseekable open flag 
>> @@ -55,7 +54,7 @@ 
>>   *  - add POLL message and NOTIFY_POLL notification 
>>   * 
>>   * 7.12 
>> - *  - add umask flag to input argument of create, mknod and mkdir 
>> + *  - add umask flag to input argument of open, mknod and mkdir 
>>   *  - add notification messages for invalidation of inodes and 
>>   *directory entries 
>>   * 
>> @@ -120,19 +119,6 @@ 
>>   * 
>>   *  7.28 
>>   *  - add FUSE_COPY_FILE_RANGE 
>> - *  - add FOPEN_CACHE_DIR 
>> - *  - add FUSE_MAX_PAGES, add max_pages to init_out 
>> - *  - add FUSE_CACHE_SYMLINKS 
>> - * 
>> - *  7.29 
>> - *  - add FUSE_NO_OPENDIR_SUPPORT flag 
>> - * 
>> - *  7.30 
>> - *  - add FUSE_EXPLICIT_INVAL_DATA 
>> - *  - add FUSE_IOCTL_COMPAT_X32 
>> - * 
>> - *  7.31 
>> - *  - add FUSE_WRITE_KILL_PRIV flag 
>>   */ 
>>   
>>  #ifndef _LINUX_FUSE_H 
>> @@ -168,7 +154,7 @@ 
>>  #define FUSE_KERNEL_VERSION 7 
>>   
>>  /** Minor version number of this interface */ 
>> -#define FUSE_KERNEL_MINOR_VERSION 31 
>> +#define FUSE_KERNEL_MINOR_VERSION 27 
>>
> I have applied this patch but when I started testing your later patches 
> that enable DAX logic I would get error messages about the wrong protocol 
> version:
>
> OSv v0.54.0-179-g2f92fc91
> 4 CPUs detected
> Firmware vendor: SeaBIOS
> bsd: initializing - done
> VFS: mounting ramfs at /
> VFS: mounting devfs at /dev
> net: initializing - done
> vga: Add VGA device instance
> eth0: ethernet address: 52:54:00:12:34:56
> virtio-blk: Add blk device instances 0 as vblk0, devsize=6470656
> random: virtio-rng registered as a source.
> virtio-fs: Detected device with tag: [myfs] and num_queues: 1
> virtio-fs: Detected DAX window with length 67108864
> virtio-fs: Add device instance 0 as [virtiofs1]
> random: intel drng, rdrand registered as a source.
> random:  initialized
> VFS: unmounting /dev
> VFS: mounting rofs at /rofs
> VFS: mounting devfs at /dev
> VFS: mounting procfs at /proc
> VFS: 

[osv-dev] [COMMIT osv master] docker: refine build docker files

2020-04-30 Thread Commit Bot
From: Waldemar Kozaczuk 
Committer: Waldemar Kozaczuk 
Branch: master

docker: refine build docker files

Based on the original patch sent by Fotis Xenakis:

"Changes since v1:
- Don't create redundant /osv directory in builder dockerfiles.
- Remove dependency on wget in runner dockerfiles."

Signed-off-by: Fotis Xenakis 
Signed-off-by: Waldemar Kozaczuk 

---
diff --git a/docker/Dockerfile.builder-fedora-base 
b/docker/Dockerfile.builder-fedora-base
--- a/docker/Dockerfile.builder-fedora-base
+++ b/docker/Dockerfile.builder-fedora-base
@@ -17,20 +17,20 @@ RUN yum install -y git python3 file which
 #
 
 # - prepare directories
-RUN mkdir -p /osv/scripts
+RUN mkdir -p /git-repos/osv/scripts
 
 # - get setup.py
 ARG GIT_ORG_OR_USER=cloudius-systems
 ARG GIT_BRANCH=master
-ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/linux_distro.py
 /osv/scripts/
-ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/setup.py
 /osv/scripts/
+ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/linux_distro.py
 /git-repos/osv/scripts/
+ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/setup.py
 /git-repos/osv/scripts/
 
 # - install all required packages and remove scripts
-RUN python3 /osv/scripts/setup.py && rm -rf /osv/scripts
+RUN python3 /git-repos/osv/scripts/setup.py && rm -rf /git-repos/osv/scripts
 
 # - install Capstan
 ADD 
https://github.com/cloudius-systems/capstan/releases/latest/download/capstan 
/usr/local/bin/capstan
 RUN chmod u+x /usr/local/bin/capstan
 
-WORKDIR /osv
+WORKDIR /git-repos/osv
 CMD /bin/bash
diff --git a/docker/Dockerfile.builder-ubuntu-base 
b/docker/Dockerfile.builder-ubuntu-base
--- a/docker/Dockerfile.builder-ubuntu-base
+++ b/docker/Dockerfile.builder-ubuntu-base
@@ -23,22 +23,22 @@ RUN apt-get update -y && apt-get install -y git python3
 #
 
 # - prepare directories
-RUN mkdir -p /osv/scripts
+RUN mkdir -p /git-repos/osv/scripts
 
 # - get setup.py
 ARG GIT_ORG_OR_USER=cloudius-systems
 ARG GIT_BRANCH=master
-ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/linux_distro.py
 /osv/scripts/
-ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/setup.py
 /osv/scripts/
+ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/linux_distro.py
 /git-repos/osv/scripts/
+ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/setup.py
 /git-repos/osv/scripts/
 
 # - install all required packages and remove scripts
-RUN python3 /osv/scripts/setup.py && rm -rf /osv/scripts
+RUN python3 /git-repos/osv/scripts/setup.py && rm -rf /git-repos/osv/scripts
 
 RUN update-alternatives --set java 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
 
 # - install Capstan
 ADD 
https://github.com/cloudius-systems/capstan/releases/latest/download/capstan 
/usr/local/bin/
 RUN chmod u+x /usr/local/bin/capstan
 
-WORKDIR /osv
+WORKDIR /git-repos/osv
 CMD /bin/bash
diff --git a/docker/Dockerfile.runner-fedora b/docker/Dockerfile.runner-fedora
--- a/docker/Dockerfile.runner-fedora
+++ b/docker/Dockerfile.runner-fedora
@@ -17,7 +17,6 @@ python3 \
 file \
 which \
 curl \
-wget \
 qemu-system-x86 \
 qemu-img
 
@@ -31,7 +30,7 @@ ARG GIT_BRANCH=master
 RUN git clone --depth 1 -b ${GIT_BRANCH} --single-branch 
https://github.com/${GIT_ORG_OR_USER}/osv.git
 
 # - install Capstan
-RUN wget 
https://github.com/cloudius-systems/capstan/releases/latest/download/capstan -O 
/usr/local/bin/capstan
+ADD 
https://github.com/cloudius-systems/capstan/releases/latest/download/capstan 
/usr/local/bin/capstan
 RUN chmod u+x /usr/local/bin/capstan
 
 CMD /bin/bash
diff --git a/docker/Dockerfile.runner-ubuntu b/docker/Dockerfile.runner-ubuntu
--- a/docker/Dockerfile.runner-ubuntu
+++ b/docker/Dockerfile.runner-ubuntu
@@ -21,7 +21,6 @@ RUN apt-get update -y && apt-get install -y \
 git \
 python3 \
 curl \
-wget \
 qemu-system-x86 \
 qemu-utils
 
@@ -35,7 +34,7 @@ ARG GIT_BRANCH=master
 RUN git clone --depth 1 -b ${GIT_BRANCH} --single-branch 
https://github.com/${GIT_ORG_OR_USER}/osv.git
 
 # - install Capstan
-RUN wget 
https://github.com/cloudius-systems/capstan/releases/latest/download/capstan -O 
/usr/local/bin/capstan
+ADD 
https://github.com/cloudius-systems/capstan/releases/latest/download/capstan 
/usr/local/bin/capstan
 RUN chmod u+x /usr/local/bin/capstan
 
 CMD /bin/bash

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/d00b2f05a48927c7%40google.com.


[osv-dev] [COMMIT osv master] travis: fix the build

2020-04-30 Thread Commit Bot
From: Waldemar Kozaczuk 
Committer: Waldemar Kozaczuk 
Branch: master

travis: fix the build

Signed-off-by: Waldemar Kozaczuk 

---
diff --git a/.travis.yml b/.travis.yml
--- a/.travis.yml
+++ b/.travis.yml
@@ -27,7 +27,7 @@ jobs:
 - cp /tmp/osv-version "$ARTIFACTS_DIR"
 - docker cp build:/root/.capstan/repository/osv-loader/osv-loader.qemu 
"$ARTIFACTS_DIR"
 - gzip "$ARTIFACTS_DIR"/osv-loader.qemu
-- docker cp 
build:/root/.capstan/repository/osv-loader/kernel-stripped.elf 
"$ARTIFACTS_DIR"/kernel.elf
+- docker cp build:/git-repos/osv/build/release/kernel-stripped.elf 
"$ARTIFACTS_DIR"/kernel.elf
 - gzip "$ARTIFACTS_DIR"/kernel.elf
 - docker cp build:/root/.capstan/repository/osv-loader/index.yaml 
"$ARTIFACTS_DIR"
 - docker cp build:/root/.capstan/packages/osv.bootstrap.mpm 
"$ARTIFACTS_DIR"
diff --git a/scripts/build-capstan-mpm-packages 
b/scripts/build-capstan-mpm-packages
--- a/scripts/build-capstan-mpm-packages
+++ b/scripts/build-capstan-mpm-packages
@@ -263,7 +263,7 @@ build_httpserver_html5_cli_package() {
 }
 
 build_httpserver_monitoring_package() {
-  build_osv_image "httpserver-monitoring-api" selected none
+  build_osv_image "httpserver-monitoring-api" all none
   build_mpm "httpserver-monitoring-api"
 }
 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/e3d43a05a48927d9%40google.com.


Re: [osv-dev] [COMMIT osv master] docker: update default distros

2020-04-30 Thread Waldek Kozaczuk
Applied wrong patch-:) For whatever reason rolling back and applying V2
does not work. Will have to manually "cherry-pick".

On Thu, Apr 30, 2020 at 2:34 PM Commit Bot  wrote:

> From: Fotis Xenakis 
> Committer: Waldemar Kozaczuk 
> Branch: master
>
> docker: update default distros
>
> Signed-off-by: Fotis Xenakis 
> Message-Id: <
> vi1pr03mb43837c53ceece1f8722996f5a6...@vi1pr03mb4383.eurprd03.prod.outlook.com
> >
>
> ---
> diff --git a/docker/Dockerfile.builder b/docker/Dockerfile.builder
> --- a/docker/Dockerfile.builder
> +++ b/docker/Dockerfile.builder
> @@ -8,7 +8,7 @@
>  # This Docker file defines a container intended to build, test and publish
>  # OSv kernel as well as many applications ...
>  #
> -ARG DIST="fedora-29"
> +ARG DIST="fedora-31"
>  FROM osvunikernel/osv-${DIST}-builder-base
>
>  #
> @@ -33,8 +33,8 @@ CMD /bin/bash
>  #
>  # NOTES
>  #
> -# Build the container based on default Fedora 29 base image:
> -# docker build -t osv/builder-fedora-29 -f Dockerfile.builder .
> +# Build the container based on default Fedora 31 base image:
> +# docker build -t osv/builder-fedora-31 -f Dockerfile.builder .
>  #
>  # Build the container based of specific Ubuntu version
>  # docker build -t osv/builder-ubuntu-19.10 -f Dockerfile.builder
> --build-arg DIST="ubuntu-19.10" .
> @@ -43,8 +43,8 @@ CMD /bin/bash
>  # docker build -t osv/builder-fedora-31 -f Dockerfile.builder --build-arg
> DIST="fedora-31" --build-arg GIT_ORG_OR_USER=a_user .
>  #
>  # Run the container FIRST time example:
> -# docker run -it --privileged osv/builder-fedora-29
> -#
> +# docker run -it --privileged osv/builder-fedora-31
> +#
>  # To restart:
>  # docker restart ID (from docker ps -a) && docker attach ID
>  #
> diff --git a/docker/Dockerfile.builder-fedora-base
> b/docker/Dockerfile.builder-fedora-base
> --- a/docker/Dockerfile.builder-fedora-base
> +++ b/docker/Dockerfile.builder-fedora-base
> @@ -7,7 +7,7 @@
>  # This Docker file defines an image based on Ubuntu distribution and
> provides
>  # all packages necessary to build and run kernel and applications.
>  #
> -ARG DIST_VERSION=29
> +ARG DIST_VERSION=31
>  FROM fedora:${DIST_VERSION}
>
>  RUN yum install -y git python3 file which
> @@ -17,20 +17,20 @@ RUN yum install -y git python3 file which
>  #
>
>  # - prepare directories
> -RUN mkdir /git-repos
> +RUN mkdir -p /osv/scripts
>
> -# - clone OSv just to get setup.py
> -WORKDIR /git-repos
> +# - get setup.py
>  ARG GIT_ORG_OR_USER=cloudius-systems
> -RUN git clone --depth 1 -b master --single-branch
> https://github.com/${GIT_ORG_OR_USER}/osv.git
> -WORKDIR 
> /git-repos/osv
> +ARG GIT_BRANCH=master
> +ADD
> https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/linux_distro.py
> /osv/scripts/
> +ADD
> https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/setup.py
> /osv/scripts/
>
> -# - install all required packages and remove OSv git repo
> -RUN scripts/setup.py
> -RUN rm -rf /git-repos
> +# - install all required packages and remove scripts
> +RUN python3 /osv/scripts/setup.py && rm -rf /osv/scripts
>
>  # - install Capstan
> -RUN wget
> https://github.com/cloudius-systems/capstan/releases/latest/download/capstan
> -O /usr/local/bin/capstan
> +ADD
> https://github.com/cloudius-systems/capstan/releases/latest/download/capstan
> /usr/local/bin/capstan
>  RUN chmod u+x /usr/local/bin/capstan
>
> +WORKDIR /osv
>  CMD /bin/bash
> diff --git a/docker/Dockerfile.builder-ubuntu-base
> b/docker/Dockerfile.builder-ubuntu-base
> --- a/docker/Dockerfile.builder-ubuntu-base
> +++ b/docker/Dockerfile.builder-ubuntu-base
> @@ -7,7 +7,7 @@
>  # This Docker file defines an image based on Ubuntu distribution and
> provides
>  # all packages necessary to build and run kernel and applications.
>  #
> -ARG DIST_VERSION=19.04
> +ARG DIST_VERSION=19.10
>  FROM ubuntu:${DIST_VERSION}
>
>  ENV DEBIAN_FRONTEND noninteractive
> @@ -23,22 +23,22 @@ RUN apt-get update -y && apt-get install -y git python3
>  #
>
>  # - prepare directories
> -RUN mkdir /git-repos
> +RUN mkdir -p /osv/scripts
>
> -# - clone OSv
> -WORKDIR /git-repos
> +# - get setup.py
>  ARG GIT_ORG_OR_USER=cloudius-systems
> -RUN git clone --depth 1 -b master --single-branch
> https://github.com/${GIT_ORG_OR_USER}/osv.git
> -WORKDIR 
> /git-repos/osv
> +ARG GIT_BRANCH=master
> +ADD
> https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/linux_distro.py
> /osv/scripts/
> +ADD
> https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/setup.py
> /osv/scripts/
>
> -# - install all required packages and delete OSv repo
> -RUN scripts/setup.py
> -RUN rm -rf /git-repos
> +# - install all required packages and remove scripts
> +RUN python3 /osv/scripts/setup.py && rm -rf /osv/scripts
>
>  RUN update-alternatives --set java
> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
>

[osv-dev] [COMMIT osv master] docker: update default distros

2020-04-30 Thread Commit Bot
From: Fotis Xenakis 
Committer: Waldemar Kozaczuk 
Branch: master

docker: update default distros

Signed-off-by: Fotis Xenakis 
Message-Id: 


---
diff --git a/docker/Dockerfile.builder b/docker/Dockerfile.builder
--- a/docker/Dockerfile.builder
+++ b/docker/Dockerfile.builder
@@ -8,7 +8,7 @@
 # This Docker file defines a container intended to build, test and publish
 # OSv kernel as well as many applications ...
 #
-ARG DIST="fedora-29"
+ARG DIST="fedora-31"
 FROM osvunikernel/osv-${DIST}-builder-base
 
 #
@@ -33,8 +33,8 @@ CMD /bin/bash
 #
 # NOTES
 #
-# Build the container based on default Fedora 29 base image:
-# docker build -t osv/builder-fedora-29 -f Dockerfile.builder .
+# Build the container based on default Fedora 31 base image:
+# docker build -t osv/builder-fedora-31 -f Dockerfile.builder .
 #
 # Build the container based of specific Ubuntu version
 # docker build -t osv/builder-ubuntu-19.10 -f Dockerfile.builder --build-arg 
DIST="ubuntu-19.10" .
@@ -43,8 +43,8 @@ CMD /bin/bash
 # docker build -t osv/builder-fedora-31 -f Dockerfile.builder --build-arg 
DIST="fedora-31" --build-arg GIT_ORG_OR_USER=a_user .
 #
 # Run the container FIRST time example:
-# docker run -it --privileged osv/builder-fedora-29
-# 
+# docker run -it --privileged osv/builder-fedora-31
+#
 # To restart:
 # docker restart ID (from docker ps -a) && docker attach ID
 #
diff --git a/docker/Dockerfile.builder-fedora-base 
b/docker/Dockerfile.builder-fedora-base
--- a/docker/Dockerfile.builder-fedora-base
+++ b/docker/Dockerfile.builder-fedora-base
@@ -7,7 +7,7 @@
 # This Docker file defines an image based on Ubuntu distribution and provides
 # all packages necessary to build and run kernel and applications.
 #
-ARG DIST_VERSION=29
+ARG DIST_VERSION=31
 FROM fedora:${DIST_VERSION}
 
 RUN yum install -y git python3 file which
@@ -17,20 +17,20 @@ RUN yum install -y git python3 file which
 #
 
 # - prepare directories
-RUN mkdir /git-repos
+RUN mkdir -p /osv/scripts
 
-# - clone OSv just to get setup.py
-WORKDIR /git-repos
+# - get setup.py
 ARG GIT_ORG_OR_USER=cloudius-systems
-RUN git clone --depth 1 -b master --single-branch 
https://github.com/${GIT_ORG_OR_USER}/osv.git
-WORKDIR /git-repos/osv
+ARG GIT_BRANCH=master
+ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/linux_distro.py
 /osv/scripts/
+ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/setup.py
 /osv/scripts/
 
-# - install all required packages and remove OSv git repo
-RUN scripts/setup.py
-RUN rm -rf /git-repos
+# - install all required packages and remove scripts
+RUN python3 /osv/scripts/setup.py && rm -rf /osv/scripts
 
 # - install Capstan
-RUN wget 
https://github.com/cloudius-systems/capstan/releases/latest/download/capstan -O 
/usr/local/bin/capstan
+ADD 
https://github.com/cloudius-systems/capstan/releases/latest/download/capstan 
/usr/local/bin/capstan
 RUN chmod u+x /usr/local/bin/capstan
 
+WORKDIR /osv
 CMD /bin/bash
diff --git a/docker/Dockerfile.builder-ubuntu-base 
b/docker/Dockerfile.builder-ubuntu-base
--- a/docker/Dockerfile.builder-ubuntu-base
+++ b/docker/Dockerfile.builder-ubuntu-base
@@ -7,7 +7,7 @@
 # This Docker file defines an image based on Ubuntu distribution and provides
 # all packages necessary to build and run kernel and applications.
 #
-ARG DIST_VERSION=19.04
+ARG DIST_VERSION=19.10
 FROM ubuntu:${DIST_VERSION}
 
 ENV DEBIAN_FRONTEND noninteractive
@@ -23,22 +23,22 @@ RUN apt-get update -y && apt-get install -y git python3
 #
 
 # - prepare directories
-RUN mkdir /git-repos
+RUN mkdir -p /osv/scripts
 
-# - clone OSv
-WORKDIR /git-repos
+# - get setup.py
 ARG GIT_ORG_OR_USER=cloudius-systems
-RUN git clone --depth 1 -b master --single-branch 
https://github.com/${GIT_ORG_OR_USER}/osv.git
-WORKDIR /git-repos/osv
+ARG GIT_BRANCH=master
+ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/linux_distro.py
 /osv/scripts/
+ADD 
https://raw.githubusercontent.com/${GIT_ORG_OR_USER}/osv/${GIT_BRANCH}/scripts/setup.py
 /osv/scripts/
 
-# - install all required packages and delete OSv repo
-RUN scripts/setup.py
-RUN rm -rf /git-repos
+# - install all required packages and remove scripts
+RUN python3 /osv/scripts/setup.py && rm -rf /osv/scripts
 
 RUN update-alternatives --set java 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
 
 # - install Capstan
-RUN wget 
https://github.com/cloudius-systems/capstan/releases/latest/download/capstan -O 
/usr/local/bin/capstan
+ADD 
https://github.com/cloudius-systems/capstan/releases/latest/download/capstan 
/usr/local/bin/
 RUN chmod u+x /usr/local/bin/capstan
 
+WORKDIR /osv
 CMD /bin/bash
diff --git a/docker/Dockerfile.runner-fedora b/docker/Dockerfile.runner-fedora
--- a/docker/Dockerfile.runner-fedora
+++ b/docker/Dockerfile.runner-fedora
@@ -8,7 +8,7 @@
 # This Docker file defines a container intended to run and test OSv
 # It comes with capstan that can pull kernel and pre-built MPM packages
 #
-ARG 

[osv-dev] [COMMIT osv master] travis: simplify CIRP publishing and add new artifacts

2020-04-30 Thread Commit Bot
From: Waldemar Kozaczuk 
Committer: Waldemar Kozaczuk 
Branch: master

travis: simplify CIRP publishing and add new artifacts

This patch simplifies travis CIRP publishing by collapsing 2 stages
into one. It also adds kernel.elf and httpserver monitoring MPM to
the list of published artifacts.

Signed-off-by: Waldemar Kozaczuk 

---
diff --git a/.travis.yml b/.travis.yml
--- a/.travis.yml
+++ b/.travis.yml
@@ -10,23 +10,25 @@ before_install:
   - pushd docker && docker build -t osv/builder -f ./Dockerfile.builder 
--build-arg DIST="ubuntu-19.10" . && popd
   - docker run -it --privileged -d --name build osv/builder
 stages:
-  - build
-  - publish
+  - build_and_publish
 env:
   global:
 - CIRP_GITHUB_REPO_SLUG="osvunikernel/osv-nightly-releases"
 jobs:
   include:
-- stage: build
+- stage: build_and_publish
   script:
 - docker exec build ./scripts/build clean
 - docker exec build ./scripts/build-capstan-mpm-packages kernel
 - docker exec build ./scripts/build-capstan-mpm-packages unit_tests
+- docker exec build ./scripts/build-capstan-mpm-packages monitoring
 - docker exec build ./scripts/osv-version.sh > /tmp/osv-version
 - export ARTIFACTS_DIR="$(mktemp -d)"
 - cp /tmp/osv-version "$ARTIFACTS_DIR"
 - docker cp build:/root/.capstan/repository/osv-loader/osv-loader.qemu 
"$ARTIFACTS_DIR"
 - gzip "$ARTIFACTS_DIR"/osv-loader.qemu
+- docker cp 
build:/root/.capstan/repository/osv-loader/kernel-stripped.elf 
"$ARTIFACTS_DIR"/kernel.elf
+- gzip "$ARTIFACTS_DIR"/kernel.elf
 - docker cp build:/root/.capstan/repository/osv-loader/index.yaml 
"$ARTIFACTS_DIR"
 - docker cp build:/root/.capstan/packages/osv.bootstrap.mpm 
"$ARTIFACTS_DIR"
 - docker cp build:/root/.capstan/packages/osv.bootstrap.yaml 
"$ARTIFACTS_DIR"
@@ -36,17 +38,8 @@ jobs:
 - docker cp build:/root/.capstan/packages/osv.zfs-tests.yaml 
"$ARTIFACTS_DIR"
 - docker cp build:/root/.capstan/packages/osv.rofs-tests.mpm 
"$ARTIFACTS_DIR"
 - docker cp build:/root/.capstan/packages/osv.rofs-tests.yaml 
"$ARTIFACTS_DIR"
-- ./.travis/cirp/cleanup1.sh
-- ./.travis/cirp/store.sh "$ARTIFACTS_DIR"
-- ./.travis/cirp/cleanup2.sh
-- stage: publish
-  script:
-- docker exec build ./scripts/osv-version.sh > /tmp/osv-version
-- export ARTIFACTS_DIR="$(mktemp -d)"
-- ./.travis/cirp/collect.sh "$ARTIFACTS_DIR"
+- docker cp 
build:/root/.capstan/packages/osv.httpserver-monitoring-api.mpm "$ARTIFACTS_DIR"
+- docker cp 
build:/root/.capstan/packages/osv.httpserver-monitoring-api.yaml 
"$ARTIFACTS_DIR"
 - ./.travis/cirp/cleanup4.sh
 - ./.travis/cirp/publish.sh "$ARTIFACTS_DIR" $(cat /tmp/osv-version)
 - ./.travis/cirp/cleanup5.sh
-  cache:
-directories:
-  - /opt/cirp
diff --git a/.travis/cirp/cleanup4.sh b/.travis/cirp/cleanup4.sh
--- a/.travis/cirp/cleanup4.sh
+++ b/.travis/cirp/cleanup4.sh
@@ -27,5 +27,5 @@ set -euo pipefail
 . .travis/cirp/install.sh
 
 ci-release-publisher cleanup_publish
-ci-release-publisher cleanup_store --scope current-build 
previous-finished-builds \
-   --release complete incomplete
+#ci-release-publisher cleanup_store --scope current-build 
previous-finished-builds \
+#   --release complete incomplete
diff --git a/.travis/cirp/cleanup5.sh b/.travis/cirp/cleanup5.sh
--- a/.travis/cirp/cleanup5.sh
+++ b/.travis/cirp/cleanup5.sh
@@ -33,5 +33,5 @@ fi
 . .travis/cirp/install.sh
 
 ci-release-publisher cleanup_publish
-ci-release-publisher cleanup_store --scope current-build 
previous-finished-builds \
-   --release complete incomplete
+#ci-release-publisher cleanup_store --scope current-build 
previous-finished-builds \
+#   --release complete incomplete
diff --git a/scripts/build-capstan-mpm-packages 
b/scripts/build-capstan-mpm-packages
--- a/scripts/build-capstan-mpm-packages
+++ b/scripts/build-capstan-mpm-packages
@@ -406,6 +406,9 @@ case "$1" in
   kernel_and_modules)
 echo "Building kernel and standard modules ..."
 build_kernel_and_standard_osv_modules;;
+  monitoring)
+echo "Building httpserver monitoring mpm..."
+build_httpserver_monitoring_package;;
   jdk)
 echo "Building Java 8 and 11 JREs ..."
 build_java_jdk_packages;;

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/c40caf05a4865009%40google.com.


[osv-dev] [COMMIT osv master] rofs: optimize memory utilization by integrating with page cache

2020-04-30 Thread Commit Bot
From: Waldemar Kozaczuk 
Committer: Waldemar Kozaczuk 
Branch: master

rofs: optimize memory utilization by integrating with page cache

This patch optimizes memory utilization by integrating with page cache.
In essence it eliminates second copy of file data in memory when mapping files 
using mmap().
For example simple java example need 9MB less to run.

The crux of the changes involves adding new vnops function of type VOP_CACHE -
rofs_map_cached_page() - that ensures that requested page of a file is loaded
from disk into ROFS cache (by triggering read from disk if missing) and
eventually registers the page into pagecache by calling 
pagecache::map_read_cached_page().

This partially addresses #979

Signed-off-by: Waldemar Kozaczuk 

---
diff --git a/fs/rofs/rofs.hh b/fs/rofs/rofs.hh
--- a/fs/rofs/rofs.hh
+++ b/fs/rofs/rofs.hh
@@ -128,6 +128,8 @@ struct rofs_info {
 namespace rofs {
 int
 cache_read(struct rofs_inode *inode, struct device *device, struct 
rofs_super_block *sb, struct uio *uio);
+int
+cache_get_page_address(struct rofs_inode *inode, struct device *device, 
struct rofs_super_block *sb, struct uio *uio, void **addr);
 }
 
 int rofs_read_blocks(struct device *device, uint64_t starting_block, uint64_t 
blocks_count, void* buf);
diff --git a/fs/rofs/rofs_cache.cc b/fs/rofs/rofs_cache.cc
--- a/fs/rofs/rofs_cache.cc
+++ b/fs/rofs/rofs_cache.cc
@@ -10,8 +10,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 
 /*
  * From cache perspective let us divide each file into sequence of contiguous 
32K segments.
@@ -56,20 +58,36 @@ class file_cache_segment {
 this->starting_block = _starting_block;
 this->block_count = _block_count;
 this->data_ready = false;   // Data has to be loaded from disk
-this->data = malloc(_cache->sb->block_size * _block_count);
+auto size = _cache->sb->block_size * _block_count;
+// Only allocate contiguous page-aligned memory if size greater or 
equal a page
+// to make sure page-cache mapping works properly
+if (size >= mmu::page_size) {
+this->data = memory::alloc_phys_contiguous_aligned(size, 
mmu::page_size);
+} else {
+this->data = malloc(size);
+}
 #if defined(ROFS_DIAGNOSTICS_ENABLED)
 rofs_block_allocated += block_count;
 #endif
 }
 
 ~file_cache_segment() {
-free(this->data);
+auto size = this->cache->sb->block_size * this->block_count;
+if (size >= mmu::page_size) {
+memory::free_phys_contiguous_aligned(this->data);
+} else {
+free(this->data);
+}
 }
 
 uint64_t length() {
 return this->block_count * this->cache->sb->block_size;
 }
 
+void* memory_address(off_t offset) {
+return this->data + offset;
+}
+
 bool is_data_ready() {
 return this->data_ready;
 }
@@ -93,12 +111,16 @@ class file_cache_segment {
 blocks_remaining++;
 }
 auto block_count_to_read = std::min(block_count, blocks_remaining);
-print("[rofs] [%d] -> file_cache_segment::write() i-node: %d, starting 
block %d, reading [%d] blocks at disk offset [%d]\n",
+print("[rofs] [%d] -> file_cache_segment::read_from_disk() i-node: %d, 
starting block %d, reading [%d] blocks at disk offset [%d]\n",
   sched::thread::current()->id(), cache->inode->inode_no, 
starting_block, block_count_to_read, block);
 auto error = rofs_read_blocks(device, block, block_count_to_read, 
data);
 this->data_ready = (error == 0);
 if (error) {
-print("! Error reading from disk\n");
+printf("! Error reading from disk\n");
+} else {
+if (bytes_remaining < this->length()) {
+memset(data + bytes_remaining, 0, this->length() - 
bytes_remaining);
+}
 }
 return error;
 }
@@ -190,8 +212,8 @@ plan_cache_transactions(struct file_cache *cache, struct 
uio *uio) {
 bytes_to_read -= transaction.bytes_to_read;
 transactions.push_back(transaction);
 }
-//
-// Miss -> read from disk
+//
+// Miss -> read from disk
 else {
 print("[rofs] [%d] -> rofs_cache_get_segment_operations i-node: 
%d, cache segment %d MISS at file offset %d\n",
   sched::thread::current()->id(), cache->inode->inode_no, 
cache_segment_index, file_offset);
@@ -271,4 +293,43 @@ cache_read(struct rofs_inode *inode, struct device 
*device, struct rofs_super_bl
 return error;
 }
 
+// Ensure a page (4096 bytes) of a file specified by offset is in memory in 
cache. Otherwise
+// load it from disk and eventually return address of the page in memory.
+int
+cache_get_page_address(struct rofs_inode *inode, struct device *device, struct 
rofs_super_block *sb, struct uio *uio, void **addr)
+{
+// Find existing one or 

[osv-dev] [COMMIT osv master] pagecache: refactor to allow integration with non-ZFS filesystems

2020-04-30 Thread Commit Bot
From: Waldemar Kozaczuk 
Committer: Waldemar Kozaczuk 
Branch: master

pagecache: refactor to allow integration with non-ZFS filesystems

So far the pagecache layer has been pretty tightly integrated with ZFS
and more specifically with its ARC cache layer. This patch refactors
the pagecache implementation to make it support other filesystems than zfs.

In essence we modify all necessary places like retrieving and releasing
cached file pages to behave slightly differently depending on the filesystem
(ZFS versus non-ZFS) given vnode belongs to. The changes only apply to
read cache where ZFS page caching requires tighter integration with ARC.

This patch adds new integration function - map_read_cached_page() -
intended to be used by non-ZFS filesystem implementations to register
cached file pages into page cache.

Signed-off-by: Waldemar Kozaczuk 

---
diff --git a/core/pagecache.cc b/core/pagecache.cc
--- a/core/pagecache.cc
+++ b/core/pagecache.cc
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -161,7 +162,7 @@ class cached_page {
 public:
 cached_page(hashkey key, void* page) : _key(key), _page(page) {
 }
-~cached_page() {
+virtual ~cached_page() {
 }
 
 void map(mmu::hw_ptep<0> ptep) {
@@ -198,7 +199,7 @@ class cached_page_write : public cached_page {
 _vp = fp->f_dentry->d_vnode;
 vref(_vp);
 }
-~cached_page_write() {
+virtual ~cached_page_write() {
 if (_page) {
 if (_dirty) {
 writeback();
@@ -238,7 +239,7 @@ class cached_page_write : public cached_page {
 
 class cached_page_arc;
 
-unsigned drop_read_cached_page(cached_page_arc* cp, bool flush = true);
+static unsigned drop_arc_read_cached_page(cached_page_arc* cp, bool flush = 
true);
 
 class cached_page_arc : public cached_page {
 public:
@@ -267,7 +268,7 @@ class cached_page_arc : public cached_page {
 
 public:
 cached_page_arc(hashkey key, void* page, arc_buf_t* ab) : cached_page(key, 
page), _ab(ref(ab, this)) {}
-~cached_page_arc() {
+virtual ~cached_page_arc() {
 if (!_removed && unref(_ab, this)) {
 arc_unshare_buf(_ab);
 }
@@ -282,7 +283,7 @@ class cached_page_arc : public cached_page {
 std::for_each(it.first, it.second, [](arc_map::value_type& p) {
 auto cp = p.second;
 cp->_removed = true;
-count += drop_read_cached_page(cp, false);
+count += drop_arc_read_cached_page(cp, false);
 });
 arc_cache_map.erase(ab);
 if (count) {
@@ -296,10 +297,14 @@ static bool operator==(const 
cached_page_arc::arc_map::value_type& l, const cach
 }
 
 std::unordered_multimap 
cached_page_arc::arc_cache_map;
-static std::unordered_map read_cache;
+//Map used to store read cache pages for ZFS filesystem interacting with ARC
+static std::unordered_map arc_read_cache;
+//Map used to store read cache pages for non-ZFS filesystems
+static std::unordered_map read_cache;
 static std::unordered_map write_cache;
 static std::deque write_lru;
-static mutex arc_lock; // protects against parallel access to the read cache
+static mutex arc_read_lock; // protects against parallel access to the ARC 
read cache
+static mutex read_lock; // protects against parallel access to the read cache
 static mutex write_lock; // protect against parallel access to the write cache
 
 template
@@ -314,38 +319,76 @@ static T find_in_cache(std::unordered_map& 
cache, hashkey& key)
 }
 }
 
+static void add_read_mapping(cached_page *cp, mmu::hw_ptep<0> ptep)
+{
+cp->map(ptep);
+}
+
 TRACEPOINT(trace_add_read_mapping, "buf=%p, addr=%p, ptep=%p", void*, void*, 
void*);
-void add_read_mapping(cached_page_arc *cp, mmu::hw_ptep<0> ptep)
+static void add_arc_read_mapping(cached_page_arc *cp, mmu::hw_ptep<0> ptep)
 {
 trace_add_read_mapping(cp->arcbuf(), cp->addr(), ptep.release());
-cp->map(ptep);
+add_read_mapping(cp, ptep);
 }
 
-TRACEPOINT(trace_remove_mapping, "buf=%p, addr=%p, ptep=%p", void*, void*, 
void*);
-void remove_read_mapping(cached_page_arc* cp, mmu::hw_ptep<0> ptep)
+template
+static void remove_read_mapping(std::unordered_map& cache, 
cached_page* cp, mmu::hw_ptep<0> ptep)
 {
-trace_remove_mapping(cp->arcbuf(), cp->addr(), ptep.release());
 if (cp->unmap(ptep) == 0) {
-read_cache.erase(cp->key());
+cache.erase(cp->key());
 delete cp;
 }
 }
 
+TRACEPOINT(trace_remove_mapping, "buf=%p, addr=%p, ptep=%p", void*, void*, 
void*);
+static void remove_arc_read_mapping(cached_page_arc* cp, mmu::hw_ptep<0> ptep)
+{
+trace_remove_mapping(cp->arcbuf(), cp->addr(), ptep.release());
+remove_read_mapping(arc_read_cache, cp, ptep);
+}
+
 void remove_read_mapping(hashkey& key, mmu::hw_ptep<0> ptep)
 {
-SCOPE_LOCK(arc_lock);
-cached_page_arc* cp = find_in_cache(read_cache, key);
+SCOPE_LOCK(read_lock);
+cached_page* cp = find_in_cache(read_cache, 

[osv-dev] [COMMIT osv master] sysfs: add new pseudo-files free_page_ranges and pools to help monitor memory utilization

2020-04-30 Thread Commit Bot
From: Waldemar Kozaczuk 
Committer: Waldemar Kozaczuk 
Branch: master

sysfs: add new pseudo-files free_page_ranges and pools to help monitor memory 
utilization

This patch enhances sysfs by adding two new pseudo-files aimed
to help monitor memory utilization over HTTP api.

One of the new files located under /sys/osv/memory/free_page_ranges gives
more detailed insight about free_page_ranges that holds all free registered
physical memory at given point in time like in this example:

huge 0001 002114469888
  03 0002 00040960

Each row shows information about free page ranges for given size range
- huge (>= 256MB), 16, 15 .. 1 where the number (order) is log2() of the minimum
of the corresponding range size in 4K pages. For example 3 represents page
ranges of size between 16K - 32K. The second column displays number of page 
ranges
for given size order (always 1 for huge) and last column displays total number
of bytes for given range order.

For more information please read 
https://github.com/cloudius-systems/osv/wiki/Managing-Memory-Pages.

Second new file located under /sys/osv/memory/pools gives detailed
information about L1 local memory pools and L2 global memory pool like in this 
example:

global l2 (in batches) 64 16 48 24
cpu 0 l1 (in pages) 512 128 384 158
cpu 1 l1 (in pages) 512 128 384 255
cpu 2 l1 (in pages) 512 128 384 251
cpu 3 l1 (in pages) 512 128 384 000

The last 4 columns show respectively max, low watermark, high watermark and 
current
number of pages or batches of pages for given L1 or L2 pool.

For more information please read 
https://github.com/cloudius-systems/osv/wiki/Memory-Management#high-level-layer.

Signed-off-by: Waldemar Kozaczuk 

---
diff --git a/core/mempool.cc b/core/mempool.cc
--- a/core/mempool.cc
+++ b/core/mempool.cc
@@ -541,7 +541,7 @@ void reclaimer::wait_for_memory(size_t mem)
 
 class page_range_allocator {
 public:
-static constexpr unsigned max_order = 16;
+static constexpr unsigned max_order = page_ranges_max_order;
 
 page_range_allocator() : _deferred_free(nullptr) { }
 
@@ -571,6 +571,22 @@ class page_range_allocator {
 return size;
 }
 
+void stats(stats::page_ranges_stats& stats) const {
+stats.order[max_order].ranges_num = _free_huge.size();
+stats.order[max_order].bytes = 0;
+for (auto& pr : _free_huge) {
+stats.order[max_order].bytes += pr.size;
+}
+
+for (auto order = max_order; order--;) {
+stats.order[order].ranges_num = _free[order].size();
+stats.order[order].bytes = 0;
+for (auto& pr : _free[order]) {
+stats.order[order].bytes += pr.size;
+}
+}
+}
+
 private:
 template
 void insert(page_range& pr) {
@@ -822,6 +838,15 @@ void page_range_allocator::for_each(unsigned min_order, 
Func f)
 }
 }
 
+namespace stats {
+void get_page_ranges_stats(page_ranges_stats )
+{
+WITH_LOCK(free_page_ranges_lock) {
+free_page_ranges.stats(stats);
+}
+}
+}
+
 static void* mapped_malloc_large(size_t size, size_t offset)
 {
 //TODO: For now pre-populate the memory, in future consider doing lazy 
population
@@ -1123,6 +1148,8 @@ static size_t large_object_size(void *obj)
 
 namespace page_pool {
 
+static std::vector l1_pool_stats;
+
 // L1-pool (Percpu page buffer pool)
 //
 // if nr < max * 1 / 4
@@ -1137,6 +1164,7 @@ struct l1 {
 : _fill_thread(sched::thread::make([] { fill_thread(); },
 
sched::thread::attr().pin(cpu).name(osv::sprintf("page_pool_l1_%d", cpu->id
 {
+cpu_id = cpu->id;
 _fill_thread->start();
 }
 
@@ -1160,12 +1188,15 @@ struct l1 {
 void* pop()
 {
 assert(nr);
+l1_pool_stats[cpu_id]._nr = nr - 1;
 return _pages[--nr];
 }
 void push(void* page)
 {
 assert(nr < 512);
 _pages[nr++] = page;
+l1_pool_stats[cpu_id]._nr = nr;
+
 }
 void* top() { return _pages[nr - 1]; }
 void wake_thread() { _fill_thread->wake(); }
@@ -1177,6 +1208,7 @@ struct l1 {
 static constexpr size_t watermark_lo = max * 1 / 4;
 static constexpr size_t watermark_hi = max * 3 / 4;
 size_t nr = 0;
+unsigned int cpu_id;
 
 private:
 std::unique_ptr _fill_thread;
@@ -1266,6 +1298,14 @@ class l2 {
 return true;
 }
 
+void stats(stats::pool_stats )
+{
+stats._nr = get_nr();
+stats._max = _max;
+stats._watermark_lo = _watermark_lo;
+stats._watermark_hi = _watermark_hi;
+}
+
 void fill_thread();
 void refill();
 void unfill();
@@ -1291,6 +1331,7 @@ static sched::cpu::notifier _notifier([] () {
 if (smp_allocator_cnt++ == sched::cpus.size()) {
 smp_allocator = true;
 }
+l1_pool_stats.resize(sched::cpus.size());
 });
 static inline l1& get_l1()
 {
@@ -1469,6 +1510,21 @@ void l2::free_batch(page_batch& batch)
 
 }
 
+namespace stats {
+void 

[osv-dev] Running apps from Linux host in disk-less mode with virtio-fs

2020-04-30 Thread Waldek Kozaczuk
Traditionally, in order to run an app on OSv one would have to build a ZFS 
or ROFS image with a kernel in it and have it exposed as virtio-blk or 
similar block device. With virtio-fs, new host filesystem sharing 
virtualization mechanism (
https://fosdem.org/2020/schedule/event/vai_virtio_fs/attachments/slides/3666/export/events/attachments/vai_virtio_fs/slides/3666/virtio_fs_A_Shared_File_System_for_Virtual_Machines_FOSDEM.pdf),
 
it is possible to run an arbitrary app from Linux host directly using pure 
OSv kernel (please note *kernel.elf artifact below which is in essence 
loader.elf with empty bootfs*):

#In another terminal
./build/virtiofsd --socket-path=/tmp/vhostqemu -o 
source=/home/wkozaczuk/projects/osv/apps/native-example -o cache=always

/home/wkozaczuk/projects/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m 4G \
-smp 4 \
-vnc :1 \
-gdb tcp::1234,server,nowait \
-kernel /home/wkozaczuk/projects/osv/build/last/kernel.elf \
-append "/virtiofs/hello" \
-netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 \
-device virtio-net-pci,netdev=un0 \
-device virtio-rng-pci \
-enable-kvm \
-cpu host,+x2apic \
-chardev stdio,mux=on,id=stdio,signal=off \
-mon chardev=stdio,mode=readline \
-device isa-serial,chardev=stdio \
-chardev socket,id=char0,path=/tmp/vhostqemu \
-device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs \
-object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on -numa 
node,memdev=mem

OSv v0.54.0-179-g2f92fc91
Solaris: NOTICE: Cannot find the pool label for '/dev/vblk0.1'
eth0: 192.168.122.15
Booted up in 103.70 ms
Cmdline: /virtiofs/hello
WARNING: application::prepare_argv(): missing libvdso.so -> may prevent 
shared libraries specifically Golang ones from functioning
Hello from C code

Because bootfs is empty, I had to apply the following hack-patch to 
explicitly mount virtio-fs filesystem:
diff --git a/fs/vfs/main.cc b/fs/vfs/main.cc
index 46dcb62f..6df78d34 100644
--- a/fs/vfs/main.cc
+++ b/fs/vfs/main.cc
@@ -2322,6 +2322,13 @@ void pivot_rootfs(const char* path)
 closedir(fs_lib_dir);
 }
 
+// /dev/virtiofs1, /virtiofs, virtiofs
+mkdir("/virtiofs", 0666);
+ret = sys_mount("/dev/virtiofs0", "/virtiofs", "virtiofs", MNT_RDONLY, 
nullptr);
+if (ret) {
+printf("failed to virtiofs mount, error = %s\n", strerror(ret));
+}
+
 auto ent = setmntent("/etc/fstab", "r");
 if (!ent) {
 return;

In the long run, we could add a new kernel parameter letting one pass this 
information as a boot parameter. 

Why am I writing about it and why I find virtio-fs so exciting? 
0) No need to create images: expose an arbitrary app directory on Linux 
host filesystem and run it on OSv. 
1) No duplication of application files if running multiple OSv instances on 
a host.  
2) More memory efficient as eventually with DAX (physical pages from the 
host could be directly mapped to virtual memory on OSv) almost no memory 
copying. 

In this mode running a Linux app on OSv might be better described as a 
"highly isolated" process than traditional unikernel. What do you think? 

Waldek

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/121dbfe3-d278-445c-94a8-38841bc0231b%40googlegroups.com.