Re: [osv-dev] Re: [PATCH] stdio: fix setvbuf() to not set a weird buf size

2021-11-29 Thread Rick Payne


Thanks! That v2 patch solves the problem for me.

Cheers
Rick

On Mon, 2021-11-29 at 14:33 +0200, Nadav Har'El wrote:
> Oops, forgot a Makefile patch. I'll send a v2.
> 
> --
> Nadav Har'El
> n...@scylladb.com
> 
> 
> On Mon, Nov 29, 2021 at 2:07 PM Nadav Har'El 
> wrote:
> > When we upgraded our Musl version, we got a newer version of
> > setvbuf()
> > changed in Musl commit 0b80a7b04 in 2018. The old version of
> > setvbuf()
> > silently ignored the user's buffer and continued to use stdio's
> > default
> > buffer - a 1024-byte buffer. The new version tried to be better,
> > and
> > use the user-supply buffer, but created a problem:
> > 
> > Musl's internal implementation uses the first UNGET (=8) bytes of
> > the
> > buffer for ungetc() support. When Musl wants a 1024-byte buffer,
> > they
> > actually allocate 8+1024. In order not to change that
> > implementation
> > detail, their setvbuf() subtracted 8 bytes from the user's supplied
> > buffer and its length. So if the user provided a 131072 byte (128
> > KB)
> > buffer, actually only 131064 bytes were used as the buffer size.
> > This
> > causes problems for reading from block devices - where the size of
> > the buffer must be a multiple of BSIZE, and no longer is - as
> > exposed
> > in issue #1180.
> > 
> > The ugly fix in this patch is to subtract 512, instead of 8, from
> > the
> > user's given buffer size. A better fix would have been to move the
> > 8-byte unget buffer outside the main buffer, or alternatively for
> > setvbuf() to allocate the buffer itself (and remember to free it
> > later)
> > but both fixes would require much more elaborate changes to Musl's
> > stdio, which I didn't want to do.
> > 
> > The code in this patch is a modified version of
> > musl_1.1.24/src/stdio/setvbuf.c.
> > 
> > Fixes #1180.
> > 
> > Signed-off-by: Nadav Har'El 
> > ---
> >  libc/stdio/setvbuf.c | 43
> > +++
> >  1 file changed, 43 insertions(+)
> >  create mode 100644 libc/stdio/setvbuf.c
> > 
> > diff --git a/libc/stdio/setvbuf.c b/libc/stdio/setvbuf.c
> > new file mode 100644
> > index ..81614307
> > --- /dev/null
> > +++ b/libc/stdio/setvbuf.c
> > @@ -0,0 +1,43 @@
> > +#include "stdio_impl.h"
> > +
> > +/* The behavior of this function is undefined except when it is
> > the first
> > + * operation on the stream, so the presence or absence of locking
> > is not
> > + * observable in a program whose behavior is defined. Thus no
> > locking is
> > + * performed here. */
> > +
> > +int setvbuf(FILE *restrict f, char *restrict buf, int type, size_t
> > size)
> > +{
> > +   f->lbf = EOF;
> > +
> > +   if (type == _IONBF) {
> > +   f->buf_size = 0;
> > +   } else if (type == _IOLBF || type == _IOFBF) {
> > +// Because of 
> > https://github.com/cloudius-systems/osv/issues/1180
> > +// in OSv we need to subtract BSIZE (512) bytes, where
> > Musl subtracts
> > +// UNGET (8) bytes from the user's buffer size. This
> > subtraction is
> > +// ugly and wasteful and we should eventually consider a
> > different
> > +// approach. Two better but considerably more complex
> > approaches:
> > +// 1. Move the UNGET buffer to be in the FILE object, not
> > the
> > +//buffer, so we could use the full buffer for actual
> > reading.
> > +// 2. Allocate size bytes here (by the way, POSIX does
> > this when
> > +//buf==NULL and we don't support it yet) - instead of
> > using
> > +//the user's buffer. We can size the allocation at
> > UNGET+size
> > +//so we'll have real buffer size of "size".
> > +//The problem with this approach is that we'll need to
> > remember
> > +//to free this buffer when the file is closed (or
> > buffer is
> > +//changed again) and this code is currently missing.
> > +#define BSIZE_ALIGNED_UNGET 512
> > +   if (buf && size >= BSIZE_ALIGNED_UNGET) {
> > +   f->buf = (void *)(buf +
> > BSIZE_ALIGNED_UNGET);
> > +   f->buf_size = size - BSIZE_ALIGNED_UNGET;
> > +   }
> > +   if (type == _IOLBF && f->buf_size)
> > +   f->lbf = '\n';
> > +   } else {
> > +   return -1;
> > +   }
> > +
> > +   f->flags |= F_SVB;
> > +
> > +   return 0;
> > +}

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/1739610da23601f136f15bf4b794320c13e7d5b5.camel%40rossfell.co.uk.


Re: [osv-dev] iso-read (cloud-init) issue

2021-11-28 Thread Rick Payne


Thanks Nadav, I realise my report was a bit vague (even for me!).

I'm a bit limited on data here at the moment (slow satellite email
only) but I compiled up the most recent OSv tree I have from git. The
tag is 29eb3d53.

Compiled like this:

  scripts/build mode=debug fs_size_mb=500 image=cloud-init

Then I put a test iso in the current directory (doesn't seem to matter
what it is, I had an Ubuntu image iso to hand), and ran:

  scripts/run.py -d --cloud-init-image=test.iso \
 -e /usr/mgmt/cloud-init.so

It hits the same assert:

rickp@clifford:~/src/osv-cloudius$ scripts/run.py -d --cloud-init-
image=test.iso -e /usr/mgmt/cloud-init.soWARNING: Image format was not
specified for '/home/rickp/src/osv-cloudius/test.iso' and probing
guessed raw.
 Automatically detecting the format is dangerous for raw
images, write operations on block 0 will be restricted.
 Specify the 'raw' format explicitly to remove the
restrictions.
OSv v0.56.0-18-g29eb3d53
eth0: 192.168.122.15
Booted up in 250.69 ms
Cmdline: /usr/mgmt/cloud-init.so
Assertion failed: (uio->uio_resid % BSIZE) == 0 (fs/vfs/vfs_bdev.cc:
bdev_read: 33)

[backtrace]
0x40224a28 
0x40224a8d <__assert_fail+62>
0x406b99ad 
0x40460008 
0x406c93d0 
0x406c880b 
0x406c4213 
0x406c0e1d 
0x4068d0d2 
0x4068d142 
0x406f243f 
0x4073bc83 
0x106a772a 
0x106a7c0c 
0x106d9a50 
0x106d9a8a 
0x106d9011 
0x106d9077 
0x106d90cd 
0x106d8a03 
0x106d8b07 
0x106d8ad6 
0x1068cc9f 
0x1068d347 
0x4067b38e 
0x4067ace4 
0x4067ab08 
0x40679e97
, std::allocator > const&,
std::vector,
std::allocator >, std::allocator, std::allocator > > > const&, bool,
std::unordered_map, std::allocator >,
std::__cxx11::basic_string,
std::allocator >, std::hash, std::allocator > >,
std::equal_to,
std::allocator > >,
std::allocator, std::allocator > const,
std::__cxx11::basic_string,
std::allocator > > > > const*, waiter*,
std::__cxx11::basic_string,
std::allocator > const&, std:0x40662943
,
std::allocator >, std::vector, std::allocator >,
std::allocator,
std::allocator > > >, int*, bool,
std::unordered_map, std::allocator >,
std::__cxx11::basic_string,
std::allocator >, std::hash, std::allocator > >,
std::equal_to,
std::allocator > >,
std::allocator, std::allocator > const,
std::__cxx11::basic_string,
std::allocator > > > > const*)+147>
0x1015eaef 
0x1015f881 
0x4067b38e 
0x4067ace4 
0x4067a647 
0x4067a66f 
0x406fb1a8 
0x406fde5a 
0x40498c39 ::operator()() const+53>
0x405eb5ab 
0x405e6f32 
0x40493d32 
0x0018f0130018f7ff 
0x405eadf1 
0x7d894848ec834852 

I'd git bitsect it but I'm unable to deal with the submodule stuff
without data. Might be able to do this later this week though.

Cheers,
Rick


On Sun, 2021-11-28 at 11:27 +0200, Nadav Har'El wrote:
> As you noted I fixed a very similar (apparently...) bug in the past
> (#877 as you noticed) in commit 
> https://github.com/cloudius-systems/osv/commit/0b651428b91663255d8da9a4913663d0cd4cc710
> 
> The patch involved fixing a musl bug, so because Waldek recently did
> a lot of musl-related changes it was reasonable that somehow this bug
> returned - but looking at the latest code it doesn't seem to me that
> this is the case - it seems __stdio_read is still my fixed version.
> We also have a unit test tests/tst-fread.cc for this case, which
> doesn't seem to break - so I wonder how your use case is different.
> 
> So I guess this needs more debugging. Can you please create an issue
> on how this error can be reproduced on the master version (what image
> to build and how to run it) and I'll try to debug it - unless you
> plan to? We should add more printouts on how much exactly the caller
> tried to read.
> 
> Another possibility is that the application was genuinely trying to
> read a part of a block. OSv *could* implement this (reading more and
> then dropping a part of it) but I'm not sure that it should - or why
> this only started to happen recently - so I don't want to go in this
> direction unless we know why.
> 
> --
> Nadav Har'El
> n...@scylladb.com
> 
> 
> On Sat, Nov 27, 2021 at 12:19 AM Rick Payne 
> wrote:
> > Hiya,
> > 
> > Trying to get cloud-init working on a fairly recent OSv (0.55)
> > image
> > using an ISO to provide the YAML file. This used to work fine on my
> > previous OSv image, but now I'm hitting an assert failure:
> > 
> > Assertion failed: 

[osv-dev] iso-read (cloud-init) issue

2021-11-26 Thread Rick Payne
Hiya,

Trying to get cloud-init working on a fairly recent OSv (0.55) image
using an ISO to provide the YAML file. This used to work fine on my
previous OSv image, but now I'm hitting an assert failure:

Assertion failed: (uio->uio_resid % BSIZE) == 0 (fs/vfs/vfs_bdev.cc:
bdev_read: 33)

[backtrace]
0x4023acac <__assert_fail+28>
0x404459e5 
0x4044e62d 
0x4044babe 
0x404491ba 
0x4043ef10 
0x40464b4d <__stdio_read+61>
0x404994cc 
0x106a772a 
0x106a7c0c 
0x106d9a50 
0x106d9a8a 
0x106d9011 
0x106d9077 
0x106d90cd 
0x106d8a03 
0x106d8b07 
0x106d8ad6 
0x1068cc9f 
0x1068d347 
0x404377a0 
0x40223523 

The iso hasn't changed and mounts fine on a loopback under linux.
Building a new iso gives the same error. GDB backtrace is as follows:

#0  0x40399f32 in processor::cli_hlt () at
arch/x64/processor.hh:247
#1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  osv::halt () at arch/x64/power.cc:26
#3  0x4023ac82 in abort (fmt=fmt@entry=0x406535f0 "Assertion
failed: %s (%s: %s: %d)\n")
at runtime.cc:140
#4  0x4023acad in __assert_fail (expr=expr@entry=0x406a46d1
"(uio->uio_resid % BSIZE) == 0", 
file=file@entry=0x406a46a6 "fs/vfs/vfs_bdev.cc",
line=line@entry=33, 
func=func@entry=0x406a469c "bdev_read") at runtime.cc:147
#5  0x404459e6 in bdev_read (dev=,
uio=0x212ff480, ioflags=)
at fs/vfs/vfs_bdev.cc:42
#6  0x4044e62e in device_read (dev=0xa00100a18080,
uio=0x212ff480, ioflags=0)
at fs/devfs/device.cc:387
#7  0x4044babf in vfs_file::read (this=0xa00101545900,
uio=0x212ff480, flags=)
at fs/vfs/vfs_fops.cc:54
#8  0x404491bb in sys_read (fp=0xa00101545900,
iov=iov@entry=0x212ff530, niov=niov@entry=2, 
offset=offset@entry=-1, count=count@entry=0x212ff4f8) at
fs/vfs/vfs_syscalls.cc:275
#9  0x4043ef11 in preadv (fd=,
iov=iov@entry=0x212ff530, iovcnt=iovcnt@entry=2, 
offset=offset@entry=-1) at fs/vfs/main.cc:422
#10 0x4043ef80 in readv (fd=,
iov=iov@entry=0x212ff530, iovcnt=iovcnt@entry=2)
at fs/vfs/main.cc:436
#11 0x40464b4e in __stdio_read (f=0x900101577000,
buf=, len=2048)
at libc/stdio/__stdio_read.c:28
#12 0x404994cd in fread (destv=, size=1,
nmemb=2048, f=0x900101577000)
at musl/src/stdio/fread.c:26
#13 0x106a772b in _stdio_read ()
#14 0x106a7c0d in cdio_stream_read ()
#15 0x106d9a51 in iso9660_seek_read_framesize ()
#16 0x106d9a8b in iso9660_iso_seek_read ()
#17 0x106d9012 in iso9660_ifs_read_pvd_loglevel ()
#18 0x106d9078 in iso9660_ifs_read_pvd ()

#5  0x404459e6 in bdev_read (dev=,
uio=0x212ff480, ioflags=)
at fs/vfs/vfs_bdev.cc:42
42  while (uio->uio_resid > 0) {
(gdb) p uio->uio_resid
$8 = 133112

(gdb) up
#6  0x4044e62e in device_read (dev=0xa00100a18080,
uio=0x212ff480, ioflags=0)
at fs/devfs/device.cc:387
387 error = (*ops->read)(dev, uio, ioflags);
(gdb) p dev->name
$9 = "vblk1\000\000\000\000\000\000"

It seem to be reading the correct block device:

virtio-blk: Add blk device instances 0 as vblk0, devsize=524288000
virtio-blk: Add blk device instances 1 as vblk1, devsize=378880

I see there was a similar assert that Justin hit a while back (#877). I
made the same change (replace fread with read and change the args) and
that no longer asserts, but doesn't read the ISO (as Justin found).

Any ideas?

Cheers
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/e3c276c404ba713bad5a87c13eddcfb837a64b88.camel%40rossfell.co.uk.


Re: [osv-dev] shm_open / shm_unlink

2021-10-12 Thread Rick Payne
Hi Greg,

Yes - it would be great to get support for elixir. I did try and engage with 
the elixir community before to see how we could use the same tooling - but I 
didn’t get much back. I’ll contact you off list…

Cheers
Rick

> On 13 Oct 2021, at 06:53, Gregory Burd  wrote:
> 
> 
> Rick,
> 
> I've been following your work with rebar3 and considering adapting it into 
> distillery (https://github.com/bitwalker/distillery).  I'd love to be able to 
> generate an image (AMI, or whatever format) that is the combination of OSv, 
> Elixir, BEAM, etc. in a single easy step.  Thanks for digging into the 
> missing pieces in OTP-24, if you want help or at least a reviewer I'm happy 
> to do what I can.
> 
> -greg
> 
>> On Mon, Oct 4, 2021 at 6:30 AM Rick Payne  wrote:
>> On Mon, 2021-10-04 at 10:37 +0300, Nadav Har'El wrote:
>> > You're welcome. If you have patches that might be useful to others as
>> > well, please post them.
>> 
>> Will do. Not there yet - but once I can verify things, I'll send
>> patches..
>> 
>> Cheers
>> Rick
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "OSv Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to osv-dev+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/osv-dev/eea05cd62db7700330ac2a86e6794e7f9d6e0ec9.camel%40rossfell.co.uk.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/1C3B2E45-0C60-4F4E-AD7C-B52B800A4F7D%40rossfell.co.uk.


Re: [osv-dev] shm_open / shm_unlink

2021-10-04 Thread Rick Payne
On Mon, 2021-10-04 at 10:37 +0300, Nadav Har'El wrote:
> You're welcome. If you have patches that might be useful to others as
> well, please post them.

Will do. Not there yet - but once I can verify things, I'll send
patches..

Cheers
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/eea05cd62db7700330ac2a86e6794e7f9d6e0ec9.camel%40rossfell.co.uk.


Re: [osv-dev] shm_open / shm_unlink

2021-10-04 Thread Rick Payne
Hi,

On Mon, 2021-10-04 at 09:38 +0300, Nadav Har'El wrote:
> 
> So the good (?) news is that the shm_* code didn't matter at all.
> Maybe it isn't even getting used... The problem is something
> completely different:

Ah, I think the shm_open is still relevant. For now I'm including the
musl file and will see where that ends up. We're using zfs and I have
/dev/shm created.

> There is one minor side-bug here - that dlsym() apparently fails to
> set errno which is why we get the silly "No error information"
> message there.
> 
> The main problem here is we are missing an alias __sigaction for
> sigaction, which this code seems to be looking for. You can triviall
> add such an alias in libc/aliases.ld and see if it solves the
> problem.

Yup, thats what I tried today. Making progress - I now seem to get to
the point that some erlang code is being run - so this is encouraging.

> OSv's signal handler *do* support sigaltstack() and SA_ONSTACK so
> that should work.

I saw that.

> I'm not sure their "overriding sigaction()" hack will work as
> expected in OSv - whether or not the "libraries" (?) will see the
> modified sigaction() or OSv's one also depends on the load order - I
> guess you can check that.

Yes, its going to be somewhat interesting. I'll let you know how I get
on...

Thanks for all the pointers...

Cheers
Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/f5af1e520186c8a12d71cbee803ed37b68c9bcb3.camel%40rossfell.co.uk.


Re: [osv-dev] shm_open / shm_unlink

2021-10-03 Thread Rick Payne


On Sun, 2021-10-03 at 11:04 +0300, Nadav Har'El wrote:
> I'm curious where -
> Maybe we have a bug in our /dev (fs/devfs/*) implementation? It
> should generate good errors when trying to open /dev/shm/something -
> not assertion failures and crashes.

Well, this is one of those rabbit holes...  The assert was happening in
the abort code. Not sure I understand why yet, but the root cause seems
to be that the erlang runtime is looking for __sigaction. It wants to
play games with the signal handler (for reaons outlined below).

#if !(defined(__GLIBC__) || defined(__DARWIN__) || defined(__NetBSD__)
||  \
  defined(__FreeBSD__) || defined(__sun__))
/*
 * Unknown libc -- assume musl, which does not allow safe signals
 */
#error "beamasm requires a libc that can guarantee that sigaltstack
works"
#endif /* !(__GLIBC__ || __DARWIN__ || __NetBSD__ || __FreeBSD__
||\
*
__sun__) \
*/

Now because of the way erts is being built, it actually thinks is on
glibc. So we end up in this function:


static int (*next_sigaction)(int, const struct sigaction *, struct
sigaction *);

static void do_init(void) {
next_sigaction = dlsym(RTLD_NEXT, NEXT_SIGACTION);

if (next_sigaction != 0) {
return;
}

perror("dlsym");
abort();
}

NEXT_SIGACTION is set to '__sigaction' as it would be for glibc. So the
perror("dlsym") is responsible for the message "dlsym: No error
information" we get. Then we abort().

So before I start to put more effort into shm_open / shm_unlink - I
need to understand a bit more about the signal requirements. The
relevnt comment seems to be:

/*
 * Erlang code compiled to x86 native code uses RSP as its stack
pointer. This
 * improves performance in several ways:
 *
 * - It permits the use of the x86 call and ret instructions, which
 *   reduces code volume and improves branch prediction.
 * - It avoids stealing a gp register to act as a stack pointer.
 *
 * Unix signal handlers are by default delivered onto the current
stack, i.e.
 * RSP. This is a problem since our native-code stacks are small and
may not
 * have room for the Unix signal handler.
 *
 * There is a way to redirect signal handlers to an "alternate" signal
stack by
 * using the SA_ONSTACK flag with the sigaction() library call.
Unfortunately,
 * this has to be specified explicitly for each signal, and it is
difficult to
 * enforce given the presence of libraries.
 *
 * Our solution is to override the C library's signal handler setup
procedure
 * with our own which enforces the SA_ONSTACK flag.
 */

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/697e46243f653f65057ae75ce9d6b46677a8f7bd.camel%40rossfell.co.uk.


Re: [osv-dev] shm_open / shm_unlink

2021-10-03 Thread Rick Payne
On Sun, 2021-10-03 at 10:38 +0300, Nadav Har'El wrote:
> 
> Shared memory support was never a big priority because it's main use
> case is multiple processes that want to share memory, and those
> (multiple processes) were never a thing in OSv. However, you're right
> that it's a shame to lose compatibility with an existing application
> just because it wants to use shared memory even if it didn't have to.
> 
> OSv does support the System V shared memory API (see libc/shm.cc) but
> not the Posix one. I think it should be fairly easy to implement the
> Posix API (shm_open(), shm_unlink()) almost exactly the same as the
> code in libc/shm.cc. That existing code already has a "shm_file"
> implementation - a file descriptor that can be mmapped, so I think
> the implementation should be very easy and straightforward (might be
> even easier than what's already in the existing libc/shm.cc). Let me
> know if you need any help doing it.

Thanks, I'll take a look. My first attempt was to try and add the musl
mman/shm_open.c file but that just resulted in me hitting the
page_fault assert 'assert(ef->rflags & processor::rflags_if);'

I'll take a look at the libc/shm.cc code and see what I can puzzle
out...

Cheers
Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/1de1271cf9b937e627cce71450beca3403dc0d92.camel%40rossfell.co.uk.


[osv-dev] shm_open / shm_unlink

2021-10-03 Thread Rick Payne


I've been playing around with my rebar3_osv tool (which turns an erlang
application into an OSv image). I'm trying to move everything to the
latest erlang release (24.1).

OTP-24 comes with asmjit which can give quite a performance boost in
some cases. However, it requires the use of shm_open and shm_unlink.

I don't find this under OSv - whats the status? Is it something that
could be done under OSv or am I going to find this is a blocking issue?
Currently this is where I'm at:

/otp/erts-12.1.1/bin/beam.smp: ignoring missing symbol shm_unlink
/otp/erts-12.1.1/bin/beam.smp: ignoring missing symbol shm_open
dlsym: No error information
Aborted


Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/491e69e8dd9f8e748d96c545cf8159556da1ba47.camel%40rossfell.co.uk.


Re: [osv-dev] ISR and schd::preemptable problems in resolve_pltgot

2020-08-06 Thread Rick Payne


> On 6 Aug 2020, at 21:00, Nadav Har'El  wrote:
> 
> By the way, if this problem really bothered us, it should be possible with 
> relatively small effort to
> make symbol-resolution lock-free. We actually have the list of objects 
> protected by RCU,
> not a mutex (see commit 68afb68ee84769db064949839ee50bff08145c6e)
> so it is already possible to do most of the lookup in a lock-free manner, and 
> we just
> need to replace the last remaining mutexes in the resolv_pltgot code, which 
> might not
> be difficult.

Noted, but as you says, suspect its not an issue for what I’m doing. If it 
becomes an issue, I’ll come back to you for pointers on how to progress this 
fix ;)

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/A199DA1E-AF69-4DCD-A82E-AC195865C772%40rossfell.co.uk.


Re: [osv-dev] ISR and schd::preemptable problems in resolve_pltgot

2020-08-06 Thread Rick Payne
On Thu, 2020-08-06 at 11:02 +0300, Nadav Har'El wrote:

> We have a trick that will do both the on-load symbol resolution and
> loading the entire object
> and not page by page: If you add:
> 
> asm(".pushsection .note.osv-mlock, \"a\"; .long 0, 0, 0;
> .popsection");
> 
> To your source code, both things will auto-magically happen :-)
> We have in  a macro doing this, OSV_ELF_MLOCK_OBJECT(),
> but you don't really need that macro, you can just copy this line.
> 
> I suggest you use this trick, instead of the "z,now" thing.

Thanks for the great explaination - and yes, that does indeed sort the
issue, thanks ;)

> Unless you executable is huge, I don't think you'll have problems
> doing this for the entire executable (you only need this "section"
> thing in one place in your source code), but if that bothers you, you
> can always put the problematic code in a separate shared object, and
> only that object will be marked with this special section.

Understood. I'll see how we get on, but we have your backup plan
anyway.

Thanks again. I'll try and get my patch into something that could be
applied, as I know there are others looking to do similar things.

cheers,
Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/bdc13eabd2a9c974e88699eaef5481dc11b25dad.camel%40rossfell.co.uk.


[osv-dev] ISR and schd::preemptable problems in resolve_pltgot

2020-08-05 Thread Rick Payne


I've been noodling with the old 'assigned virtio' code, trying to make
it work again so I can use this method to get raw packets. See
discusion on the 'raw sockets' thread.

I'm mostly there (though I will admit, I'm far from a C++ programmer,
so the code is pretty hideous), but I ran into an issue with the
interrupt routine crashing. It turns out that if the interrupt routine
is in my .so file, then the symbol resolution is not done at load time,
rather its done on first call.

I was hitting an assert in resolve_pltgot. I remember having a similar
issue a while ago (2016!) - back then it was a bug in
pthread_spin_lock. In this case, because the symbols aren't resolved on
loading of the .so file, the first time the isr is called, we try to
solve the missing symbols. However, we are not sched::preemptable() at
that point, so the assert fires.

I tried the existing isr code, copying it into my code. The wait part
looks like this:

void receiver(virtio::vring *vq)
{
  while (1) {
sched::thread::wait_until([&] {
  bool have_elements = vq->used_ring_not_empty();
  if (!have_elements) {
vq->enable_interrupts();

// we must check that the ring is not empty *after*
// we enable interrupts to avoid a race where a packet
// may have been delivered between queue->used_ring_not_empty()
// and queue->enable_interrupts() above
have_elements = vq->used_ring_not_empty();
if (have_elements) {
  vq->disable_interrupts();
}
  }
  return have_elements;
});
  ...


sched::thread::wait_until and sched::thread::stop_wait are not
resolved. I changed resolve_pltgot to what was suggested back in 2016:

diff --git a/core/elf.cc b/core/elf.cc
index ffb16004..5cdf1ec2 100644
--- a/core/elf.cc
+++ b/core/elf.cc
@@ -816,7 +816,7 @@ void object::relocate_pltgot()
 
 void* object::resolve_pltgot(unsigned index)
 {
-assert(sched::preemptable());
+// assert(sched::preemptable());
 auto rel = dynamic_ptr(DT_JMPREL);
 auto slot = rel[index];
 auto info = slot.r_info;
@@ -826,6 +826,14 @@ void* object::resolve_pltgot(unsigned index)
 void *addr = _base + slot.r_offset;
 auto sm = symbol(sym);
 
+if (!sched::preemptable()) {
+auto nameidx = (dynamic_ptr(DT_SYMTAB) + sym)-
>st_name;
+auto name = dynamic_ptr(DT_STRTAB) + nameidx;
+std::cerr << "resolve_pltgot " << demangle(name) <<
+  " in " << pathname() << " found in '" <<
+  sm.obj->pathname() << "'\n";
+}
+
 if (sm.obj != this) {
 WITH_LOCK(_used_by_resolve_plt_got_mutex) {
 _used_by_resolve_plt_got.insert(sm.obj-
>shared_from_this());

This no longer asserts, and shows the symbols being resolved:

resolve_pltgot _ZN5sched6thread9stop_waitEv
(sched::thread::stop_wait()) in /tests/tst-assign-virtio.so found in ''
Handling packets
resolve_pltgot _ZN6virtio5vring17enable_interruptsEv
(virtio::vring::enable_interrupts()) in /tests/tst-assign-virtio.so
found in ''
resolve_pltgot _ZN5sched6thread4waitEv (sched::thread::wait()) in
/tests/tst-assign-virtio.so found in ''

So what is the appropriate solution here? I could try and arrange for
my code to be linked with "-Wl,-z,now" but that feels like quite a big
hammer (as that could then apply to the whole app?)

OTOH, it feels unnatural to be relying on symbol lookups in an isr. If
resolve_pltgot really does not need the assert check (whch was
suggested to be the case in 2016), maybe there is nothing to do here
other than comment out an assert?

Probably I should arrange the code so that I can call
virtio_driver::wait_for_queue, but at the moment, I can't. I don't
think we should assume that the user's code would want that - or do we?

Thoughts?

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/7fa151d59007a824119e83d6906f64ea1f94fe3d.camel%40rossfell.co.uk.


Re: [osv-dev] Re: raw socket support ?

2020-07-24 Thread Rick Payne
On Fri, 2020-07-24 at 08:38 +0300, Pekka Enberg wrote:
> 
> I think Nadav is thinking of "OSv memcached", which bypasses the
> TCP/IP stack.

It uses the pfil_add_hook feature, which means it gets to see the
packets before they're passed to the netstack. You get the full packet
but only if its already destined for the IP stack (ie. you can't
intercept l2 packets).

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/6f1799349bc4bfd6888c4457f52905b0ab51d8ef.camel%40rossfell.co.uk.


Re: [osv-dev] Re: raw socket support ?

2020-07-23 Thread Rick Payne
On Thu, 2020-07-23 at 12:38 +0300, Nadav Har'El wrote:
> We also had until recently something more general, "assigned virtio",
> where the application
> gets access directly to the viritio rings (and needs to work with
> them - the kernel doesn't
> touch them any more). Waldek recently removed it but I guess you can
> see how it worked
> before that, and if needed we can bring it back.

Ah yes, I sort of remember that, and I see commit 65c558ce17f89d4ae.
I'll have a look, thanks.

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/055b189f64d1886a15f4b764bfa00ca2e3c477b1.camel%40rossfell.co.uk.


Re: [osv-dev] Re: raw socket support ?

2020-07-23 Thread Rick Payne


Another alternative (for me and maybe others) would be to have a
standard way to hook packets direct from the virtio interface.
Especially if we could ensure that we don't dhcp on that interface.

For instance, setup 2 interfaces from the host - 1 for OSv-y stuff, and
one just for an application which takes over the interface at a very
low level.

Rick

On Mon, 2020-07-20 at 21:32 -0700, Waldek Kozaczuk wrote:
> Honestly, I am not familiar enough in this subject domain so I cannot
> give any authoritative advice. But it seems to me that the latter
> idea of introducing and supporting LINUX_AF_PACKET may be less effort
> and more in line with what Charles did. I am not familiar with BPF so
> I do not know if it would even make sense for OSv nor how much effort
> it would be to support it.
> 
> Waldek
> 
> On Thursday, July 16, 2020 at 9:43:14 AM UTC-4 vfu...@inf.ufsm.br
> wrote:
> > Hi,
> > 
> >  I think that the type of raw sockets (SOCK_RAW) works well in OSv,
> > it is possible to hook L3 packets (assuming an INET domain) with no
> > problems. I guess that BSD does not support the PACKET domain, thus
> > the OSv does not support too. If I remember well, we can hook
> > frames in BSD by using libpcap, which in turn uses bpf
> > (alternatively, we can use bpf directly). However, the bsd/sys/net
> > in OSv does not have the FreeBSD's bpf implementation. Maybe one
> > interesting way to enable L2 packet hooking is by migrating the bpf
> > from sys/net of FreeBSD to the bsd/sys/net of OSv (what do you
> > think about that guys?). Another possibility is to create a kind of
> > PACKET domain support (similar to NETLINK support provided by
> > Charles Meyers in the Spirent fork), but in that case, we will need
> > to hook the bytes of frames straightforward from the mbuffs.
> > 
> > Regards,
> > Vinicius
> > 
> > Em quinta-feira, 16 de julho de 2020 às 02:57:52 UTC-3, Pekka
> > Enberg escreveu:
> > > Hi,
> > > 
> > > On Wed, Jul 15, 2020 at 6:13 PM Waldek Kozaczuk <
> > > jwkoz...@gmail.com> wrote:
> > > > Hi,
> > > > 
> > > > Unfortunately, I have no idea what it would take to add raw
> > > > sockets support. Please be aware that we maintain another IPV6
> > > > branch - https://github.com/cloudius-systems/osv/tree/ipv6 -
> > > > which besides IPV6 might have better networking support but I
> > > > doubt it supports raw sockets.
> > > > 
> > > > I am also adding Charles Meyers from Spirent who wrote original
> > > > IPV6 support to this thread. He may have some thoughts on this
> > > > matter. Also, Spirent has also its own OSv fork - 
> > > > https://github.com/SpirentOrion/osv - which has extra
> > > > stuff/fixes to networking stack (I would like to port some of
> > > > those to the mainline OSv at some point).
> > > > 
> > > 
> > > The TCP/IP stack supports raw sockets (it's the FreeBSD stack
> > > after all).
> > > 
> > > One potential issue is that the Linux socket()
> > > compatibility layer is incorrect. I see that linux_socket()
> > > (called by socket() function) has some support for raw sockets:
> > > 
> > > https://github.com/cloudius-systems/osv/blob/master/bsd/sys/compat/linux/linux_socket.cc#L619
> > >  
> > > 
> > > However, it's bit picky on the "domain" and "protocol" and will
> > > ignore what Frederic attempted to do:
> > > 
> > >fd=socket(AF_PACKET,SOCK_RAW,htons(ETH_P_ALL));
> > > 
> > > I assume if you fix up linux_socket() to do what Linux does,
> > > things will work fine.
> > > 
> > > - Pekka
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/osv-dev/397b053e-b06b-467e-89b8-75d9dd6d117an%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/6a3ce5daecbcb84bcdee0738242937e11d2f9d33.camel%40rossfell.co.uk.


Re: [osv-dev] Problems with socat

2020-07-22 Thread Rick Payne


Thanks, but the question was more 'why does the OSv socketpair() only
support SOCK_STREAM, and not SOCK_DGRAM?'. I guess there has to be a
reason why that decision was made.

As another datapoint, I removed the assert in OSv and things 'worked'
(though I can't tell if that was luck or because that code wasn't
really exercised).

Rick

On Wed, 2020-07-22 at 18:24 -0700, Dor Laor wrote:
> Best is to put a breakpoint and start single stepping and read
> those variable values
> 
> On Wed, Jul 22, 2020 at 5:57 PM Rick Payne 
> wrote:
> > Trying to characterise some performance stuff, I thought I'd run
> > socat
> > under OSv however it panics:
> > 
> > $ sudo scripts/run.py -n -e 'socat tcp4-listen:6971
> > open:/dev/null'OSv
> > v0.55.0
> > eth0: 192.168.122.76
> > Booted up in 3245.70 ms
> > Cmdline: socat tcp4-listen:6971 open:/dev/null
> > Assertion failed: type == SOCK_STREAM (libc/af_local.cc:
> > socketpair_af_local: 101)
> > 
> > [backtrace]
> > 0x402228ae 
> > 0x40222917 <__assert_fail+64>
> > 0x406fb1d0 
> > 0x40246c37 
> > 0x10034587 
> > 0x657473696c2d346f 
> > 
> > The particular function in socat which triggers this is:
> > 
> > static int diag_sock_pair(void) {
> >int handlersocks[2];
> > 
> >if (socketpair(AF_UNIX, SOCK_DGRAM, 0, handlersocks) < 0) {
> >   diag_sock_send = -1;
> >   diag_sock_recv = -1;
> >   return -1;
> >}
> >diag_sock_send = handlersocks[1];
> >diag_sock_recv = handlersocks[0];
> >return 0;
> > }
> > 
> > And in OSv:
> > 
> > #5  0x406fb1d1 in socketpair_af_local (type=2, proto=0,
> > sv=0x20700b20) at libc/af_local.cc:101
> > 101assert(type == SOCK_STREAM);
> > 
> > Not sure why this restriction exists, and I've changed my local
> > socat
> > to use SOCK_STREAM - but just thought we should note a difference
> > between linux and OSv here...
> > 
> > Rick
> > 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/3b85ba561868f548df6423f3b00a2d67582fffeb.camel%40rossfell.co.uk.


[osv-dev] Problems with socat

2020-07-22 Thread Rick Payne


Trying to characterise some performance stuff, I thought I'd run socat
under OSv however it panics:

$ sudo scripts/run.py -n -e 'socat tcp4-listen:6971 open:/dev/null'OSv
v0.55.0
eth0: 192.168.122.76
Booted up in 3245.70 ms
Cmdline: socat tcp4-listen:6971 open:/dev/null
Assertion failed: type == SOCK_STREAM (libc/af_local.cc:
socketpair_af_local: 101)

[backtrace]
0x402228ae 
0x40222917 <__assert_fail+64>
0x406fb1d0 
0x40246c37 
0x10034587 
0x657473696c2d346f 

The particular function in socat which triggers this is:

static int diag_sock_pair(void) {
   int handlersocks[2];

   if (socketpair(AF_UNIX, SOCK_DGRAM, 0, handlersocks) < 0) {
  diag_sock_send = -1;
  diag_sock_recv = -1;
  return -1;
   }
   diag_sock_send = handlersocks[1];
   diag_sock_recv = handlersocks[0];
   return 0;
}

And in OSv:

#5  0x406fb1d1 in socketpair_af_local (type=2, proto=0,
sv=0x20700b20) at libc/af_local.cc:101
101assert(type == SOCK_STREAM);

Not sure why this restriction exists, and I've changed my local socat
to use SOCK_STREAM - but just thought we should note a difference
between linux and OSv here...

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/f604ee0d7c79713f24744c47903f5e5515bfbb3f.camel%40rossfell.co.uk.


Re: [osv-dev] [PATCH 2/3] lzloader: fix memset() implementation

2020-05-25 Thread Rick Payne
On Mon, 2020-05-25 at 09:15 +0300, Nadav Har'El wrote:
> 
> Do you mean 
> https://github.com/cloudius-systems/osv/commit/d52cb12546ff2acd5255a2ac8897891e421f07dc
> which just turned off optimization?

Yes.

> At the time, I suggested the memset() fix for #913, because seemed to
> me like an "obvious" caused for infinite recursion (which happened
> now in Fedora 32). But you said you tested and it didn't help - in
> your case you saw that memset() wasn't even used (you commented it
> out and everything still compiled fine).

Right, I recall at the time that the memset fix looked like it should
do it, but didn't.

> So I guess we can't revert the turn-off-optimization patch. Maybe
> it's not actually needed for modern compilers (you can check...) but
> it will still be needed for the specific compiler which caused you
> problem #913 in the first place.

I suspect it was a compiler issue, as I can't reproduce it on Ubuntu
19.10, using gcc 9.2.1. Compiling that file with -O2 works for me now.

Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/eeaa07f9e101ac22a15d418ce1e591e95c652d8b.camel%40rossfell.co.uk.


Re: [osv-dev] [PATCH 2/3] lzloader: fix memset() implementation

2020-05-24 Thread Rick Payne


I think this is also related to 913. I'll try without my patch (which
we've been having to use since then).

Rick

On Sat, 2020-05-23 at 23:47 +0300, Nadav Har'El wrote:
> Some compilers apparently optimize code in fastlz/ to call memset(),
> so
> the uncompressing boot loader, fastlz/lzloader.cc, needs to implement
> this function. The current implementation called the "builtin"
> memset,
> which, if you look at the compilation result, actually calls memset()
> and results in endless recursion and a hanging boot... This started
> happening on Fedora 32 with Gcc 10, for example.
> 
> So let's implement memset() using the base_memset() we already have
> in
> libc/string/memset.c.
> 
> Fixes #1084.
> 
> Signed-off-by: Nadav Har'El 
> ---
>  fastlz/lzloader.cc | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/fastlz/lzloader.cc b/fastlz/lzloader.cc
> index f65fb2be..7eae2191 100644
> --- a/fastlz/lzloader.cc
> +++ b/fastlz/lzloader.cc
> @@ -21,11 +21,16 @@ extern char _binary_loader_stripped_elf_lz_start;
>  extern char _binary_loader_stripped_elf_lz_end;
>  extern char _binary_loader_stripped_elf_lz_size;
>  
> -// std libraries used by fastlz.
> -extern "C" void *memset(void *s, int c, size_t n)
> -{
> -return __builtin_memset(s, c, n);
> -}
> +// The code in fastlz.cc does not call memset(), but some version of
> gcc
> +// implement some assignments by calling memset(), so we need to
> implement
> +// a memset() function. This is not performance-critical so let's
> stick to
> +// the basic implementation we have in libc/string/memset.c. To
> avoid
> +// compiling this source file a second time (the loader needs
> different
> +// compile parameters), we #include it here instead.
> +extern "C" void *memset(void *s, int c, size_t n);
> +#define memset_base memset
> +#include "libc/string/memset.c"
> +#undef memset_base
>  
>  extern "C" void uncompress_loader()
>  {
> -- 
> 2.26.2
> 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/d18a6dd80dcdad104b8ed7cc0348e0ebc36dfd19.camel%40rossfell.co.uk.


Re: [osv-dev] Re: NMI crash in memcpy() between memory areas allocated with mmu::map_anon()

2020-03-28 Thread Rick Payne


With your latest 2 patches, our production box which was having
problems has run fine for the last 48hours. Thanks for working so hard
on fixing it! It has been quite the pain point for us.

Are bugs 784 and 1077 something we should worry about?

Rick

On Thu, 2020-03-26 at 12:50 -0700, Waldek Kozaczuk wrote:
> This was actually caused by a bug in one of the older versions of the
> "mempool: use map_anon() for large allocations or when memory is
> fragmented"  patch. It turns out I forgot that object_size() also
> needs to account mamp_anon() based allocations and do it properly ;-) 
> My latest - version 4 - of this patch should work better, plus I
> added a unit test around it. But it still needs to be reviewed.
> 
> On Wednesday, March 25, 2020 at 11:48:52 AM UTC-4, Waldek Kozaczuk
> wrote:
> > This is really related to the "OOM query" thread but I wanted to
> > send new email as the other thread has gotten quite long.
> > 
> > I any case we are troubleshooting an app crash which happens pretty
> > instantly after boot and one of the of thread stack trace looks
> > like this:
> > 
> > (gdb) bt
> > #0  0x403a7bea in processor::cli_hlt () at
> > arch/x64/processor.hh:247
> > #1  nmi (ef=0x80003fa1c068) at arch/x64/exceptions.cc:306
> > #2  
> > #3  0x403940a3 in memcpy_repmov_ssse3 (dest=0x2000415014c0,
> > src=0x20004e7851d4, n=16) at /usr/include/c++/9/array:185
> > #4  0x11756a5b in ?? ()
> > #5  0x in ?? ()
> > 
> > Also this is with the last 2 patches - "[PATCH V2 1/2] mempool: fix
> > a bug in page_range_allocator() when handling worst case O(n)
> > scenario" and "[PATCH V2 2/2] mempool: use map_anon() for large
> > allocations or when memory is fragmented" applied to address
> > fragmentation that make malloc_large() use mmu::map_anon() in
> > certain cases.
> > 
> > So as you tell mempy (or specifically memcpy_repmov_ssse3())
> > triggers NMI (Non-maskable interrupt) exception in memcpy between
> > memory areas allocated with mmu::map_anon() (see
> > dest=0x2000415014c0,
> > src=0x20004e7851d4, n=16). I really have no idea why. But have a
> > hunch that possibly it happens because mapping tables are not being
> > refreshed properly/flushed. Possibly allocation in requested on one
> > cpu and then memcpy()  called on another one which does not see a
> > mapping yet because. Or maybe TLB needs to flushed. From cursory
> > reading it look mmu::map_anon() might be doing it (somewhere
> > downstream) but not 100% sure.
> > 
> > Or maybe this NMI is caused by misaligned memory allocation (had
> > question in my patch if it really addresses it properly). Or maybe
> > a bug in my patch? Or maybe there is something fundamental in the
> > way memory allocated with map_anon() vs allocation using contiguous
> > physical memory. 
> > 
> > Anybody has other smart ideas?
> > 
> > Waldek
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/osv-dev/899b38b5-aed4-4497-ab83-c161a6b673ea%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/8d3c483ae9c4338676b2b644361aa9ab5928418c.camel%40rossfell.co.uk.


Re: [osv-dev] Re: OOM query

2020-03-24 Thread Rick Payne


I backed out the original patch, applied the other two. Do I need the
first one still? We're not on master, but we're pretty close. Last
synced on March 3rd, commit ref 92eb26f3a645

scripts/build check runs fine:

OK (131 tests run, 274.753 s)

Rick

On Tue, 2020-03-24 at 17:15 -0400, Waldek Kozaczuk wrote:
> Is it with the exact same code as on master with the latest 2 patches
> I sent applied ? Does ‘scripts/build check’ pass for you?
> 
> On Tue, Mar 24, 2020 at 16:56 Rick Payne 
> wrote:
> > I tried the patches, but its crashing almost instantly...
> > 
> > page fault outside application, addr: 0x56c0
> > [registers]
> > RIP: 0x403edd23 
> > RFL:
> > 0x00010206  CS:  0x0008  SS: 
> > 0x0010
> > RAX: 0x56c0  RBX: 0x200056c00040  RCX:
> > 0x004c  RDX: 0x0008
> > RSI: 0x004c  RDI: 0x200056c00040  RBP:
> > 0x200041501740  R8:  0x
> > R9:  0x5e7a7333  R10: 0x  R11:
> > 0x  R12: 0x004c
> > R13: 0x5e7a7333  R14: 0x  R15:
> > 0x  RSP: 0x2000415016f8
> > Aborted
> > 
> > [backtrace]
> > 0x40343779 
> > 0x4034534d  > exception_frame*)+397>
> > 0x403a667b 
> > 0x403a54c6 
> > 0x1174c2b0 
> > 0x 
> > 
> > (gdb) osv heap
> > 0x8e9ad000 0x22a53000
> > 0x8e9a2000 0x4000
> > 
> > Rick
> > 
> > 
> > 
> > On Mon, 2020-03-23 at 22:06 -0700, Waldek Kozaczuk wrote:
> > > I have sent a more complete patch that should also address
> > > fragmentation issue with requests >= 4K and < 2MB.
> > > 
> > > On Monday, March 23, 2020 at 6:12:51 PM UTC-4, Waldek Kozaczuk
> > wrote:
> > > > I have just sent a new patch to the mailing list. I am hoping
> > it
> > > > will address the OOM crash if my theory of heavy memory
> > > > fragmentation is right. It would be nice if Nadav could review
> > it.
> > > > 
> > > > Regardless if you have another crash in production and are able
> > to
> > > > connect with gdb, could you run 'osv heap' - it should show
> > > > free_page_ranges. If memory is heavily fragmented we should see
> > a
> > > > long list.
> > > > 
> > > > It would be nice to recreate that load in dev env and capture
> > the
> > > > memory trace data (BWT you do not need to enable backtrace to
> > have
> > > > enough useful information). It would help us better understand
> > how
> > > > memory is allocated by the app. I saw you send me one trace but
> > it
> > > > does not seem to be revealing anything interesting.
> > > > 
> > > > Waldek 
> > > > 
> > > > On Monday, March 23, 2020 at 1:19:18 AM UTC-4, rickp wrote:
> > > > > On Sun, 2020-03-22 at 22:08 -0700, Waldek Kozaczuk wrote: 
> > > > > > 
> > > > > > 
> > > > > > On Monday, March 23, 2020 at 12:36:52 AM UTC-4, rickp
> > wrote: 
> > > > > > > Looks to me like its trying to allocate 40MB but the
> > > > > available 
> > > > > > > memory 
> > > > > > > is 10GB, surely? 10933128KB is 10,933MB 
> > > > > > > 
> > > > > > 
> > > > > > I misread the number - forgot about 1K. 
> > > > > > 
> > > > > > Any chance you could run the app outside of production with
> > > > > memory 
> > > > > > tracing enabled - 
> > > > > > 
> > > > > 
> > https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#tracing-memory-allocations
> > > > >  
> > > > > >  (without --trace-backtrace) for while? And then we can
> > have a
> > > > > better 
> > > > > > sense of what kind of allocations it makes. The output of
> > > > > trace 
> > > > > > memory-analyzer would be really helpful. 
> > > > > 
> > > > > I can certainly run that local with locally generated
> > workloads,
> > > > > which 
> > > > > should be close enough - but we've never managed to trigger
> > the
> > > > > oom 
> > > > > condition that way (other than by really constraining the
> > memory 
> > > > > artificially). It should be close enough though - let me see
> > what
> > > > > I can 
> > > > > do. 
> > > > > 
> > > > > Rick 
> > > > > 
> > > > > 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the
> > Google
> > > Groups "OSv Development" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email to osv-dev+unsubscr...@googlegroups.com.
> > > To view this discussion on the web visit 
> > > 
> > https://groups.google.com/d/msgid/osv-dev/e88bb103-ca6f-4f12-b944-e2d1391e2f8e%40googlegroups.com
> > > .
> > 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/75323fbb15af066032b00c4d0bcfce0f10b2c9af.camel%40rossfell.co.uk.


Re: [osv-dev] Re: OOM query

2020-03-24 Thread Rick Payne


I tried the patches, but its crashing almost instantly...

page fault outside application, addr: 0x56c0
[registers]
RIP: 0x403edd23 
RFL:
0x00010206  CS:  0x0008  SS:  0x0010
RAX: 0x56c0  RBX: 0x200056c00040  RCX:
0x004c  RDX: 0x0008
RSI: 0x004c  RDI: 0x200056c00040  RBP:
0x200041501740  R8:  0x
R9:  0x5e7a7333  R10: 0x  R11:
0x  R12: 0x004c
R13: 0x5e7a7333  R14: 0x  R15:
0x  RSP: 0x2000415016f8
Aborted

[backtrace]
0x40343779 
0x4034534d 
0x403a667b 
0x403a54c6 
0x1174c2b0 
0x 

(gdb) osv heap
0x8e9ad000 0x22a53000
0x8e9a2000 0x4000

Rick



On Mon, 2020-03-23 at 22:06 -0700, Waldek Kozaczuk wrote:
> I have sent a more complete patch that should also address
> fragmentation issue with requests >= 4K and < 2MB.
> 
> On Monday, March 23, 2020 at 6:12:51 PM UTC-4, Waldek Kozaczuk wrote:
> > I have just sent a new patch to the mailing list. I am hoping it
> > will address the OOM crash if my theory of heavy memory
> > fragmentation is right. It would be nice if Nadav could review it.
> > 
> > Regardless if you have another crash in production and are able to
> > connect with gdb, could you run 'osv heap' - it should show
> > free_page_ranges. If memory is heavily fragmented we should see a
> > long list.
> > 
> > It would be nice to recreate that load in dev env and capture the
> > memory trace data (BWT you do not need to enable backtrace to have
> > enough useful information). It would help us better understand how
> > memory is allocated by the app. I saw you send me one trace but it
> > does not seem to be revealing anything interesting.
> > 
> > Waldek 
> > 
> > On Monday, March 23, 2020 at 1:19:18 AM UTC-4, rickp wrote:
> > > On Sun, 2020-03-22 at 22:08 -0700, Waldek Kozaczuk wrote: 
> > > > 
> > > > 
> > > > On Monday, March 23, 2020 at 12:36:52 AM UTC-4, rickp wrote: 
> > > > > Looks to me like its trying to allocate 40MB but the
> > > available 
> > > > > memory 
> > > > > is 10GB, surely? 10933128KB is 10,933MB 
> > > > > 
> > > > 
> > > > I misread the number - forgot about 1K. 
> > > > 
> > > > Any chance you could run the app outside of production with
> > > memory 
> > > > tracing enabled - 
> > > > 
> > > https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#tracing-memory-allocations
> > >  
> > > >  (without --trace-backtrace) for while? And then we can have a
> > > better 
> > > > sense of what kind of allocations it makes. The output of
> > > trace 
> > > > memory-analyzer would be really helpful. 
> > > 
> > > I can certainly run that local with locally generated workloads,
> > > which 
> > > should be close enough - but we've never managed to trigger the
> > > oom 
> > > condition that way (other than by really constraining the memory 
> > > artificially). It should be close enough though - let me see what
> > > I can 
> > > do. 
> > > 
> > > Rick 
> > > 
> > > 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/osv-dev/e88bb103-ca6f-4f12-b944-e2d1391e2f8e%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/8e9dae61b2f412998468916508404d4867e759a3.camel%40rossfell.co.uk.


Re: [osv-dev] Re: OOM query

2020-03-22 Thread Rick Payne
On Sun, 2020-03-22 at 22:08 -0700, Waldek Kozaczuk wrote:
> 
> 
> On Monday, March 23, 2020 at 12:36:52 AM UTC-4, rickp wrote:
> > Looks to me like its trying to allocate 40MB but the available
> > memory 
> > is 10GB, surely? 10933128KB is 10,933MB 
> > 
> 
> I misread the number - forgot about 1K.
> 
> Any chance you could run the app outside of production with memory
> tracing enabled - 
> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#tracing-memory-allocations
>  (without --trace-backtrace) for while? And then we can have a better
> sense of what kind of allocations it makes. The output of trace
> memory-analyzer would be really helpful.

I can certainly run that local with locally generated workloads, which
should be close enough - but we've never managed to trigger the oom
condition that way (other than by really constraining the memory
artificially). It should be close enough though - let me see what I can
do.

Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/77725fd3d095c86e37648713ab4bd3185a496500.camel%40rossfell.co.uk.


Re: [osv-dev] Re: OOM query

2020-03-22 Thread Rick Payne


Looks to me like its trying to allocate 40MB but the available memory
is 10GB, surely? 10933128KB is 10,933MB

Its probably erlang garbage collection, but it seems very strange to me
as sometimes it happens very quickly after we start the system. There's
no way we're allocate more memory than the system ever claims to have -
so if it is fragmentation, why do we do so poor compared to the linux
kernel?

Is anyone else running VMs based on OSv for more than a few days?

Rick

On Sun, 2020-03-22 at 18:01 -0700, Waldek Kozaczuk wrote:
> Either way, it looks like the app is trying to allocate almost 40MB
> and the remaining memory is ~ 10MB. So, in this case, there is no
> surprise, is there?
> 
> In the previous case, you reported there were two threads each
> requesting ~5MB and there was plenty of memory. If I understand the
> code correctly the remaining memory must have been fragmented and
> there was no big enough free page range.
> 
> On Sunday, March 22, 2020 at 8:31:01 PM UTC-4, Waldek Kozaczuk wrote:
> > It does for me. Are you using the latest version of this script
> > from master? I think there were couple of things I changed there
> > recently as part of wide upgrade to Python 3. 
> > 
> > On Sun, Mar 22, 2020 at 20:27 Rick Payne 
> > wrote:
> > > Does that command work for you? For me I get:
> > > 
> > > (gdb) osv heap
> > > Python Exception  %x format: an integer is
> > > required,
> > > not gdb.Value: 
> > > Error occurred in Python command: %x format: an integer is
> > > required, 
> > > not gdb.Value
> > > 
> > > Rick
> > > 
> > > On Sun, 2020-03-22 at 19:37 -0400, Waldek Kozaczuk wrote:
> > > > Can you run ‘osv heap’ from gdb at this point?
> > > > 
> > > > On Sun, Mar 22, 2020 at 19:31 Rick Payne 
> > > > wrote:
> > > > > Ok, so i applied the patch, and the printf and this is what I
> > > see:
> > > > > 
> > > > > page_range_allocator: no ranges found for size 39849984 and
> > > exact
> > > > > order: 14
> > > > > Waiter: 1_scheduler, bytes: 39849984
> > > > > Out of memory: could not reclaim any further. Current memory:
> > > > > 10933128
> > > > > Kb, target: 0 Kb
> > > > > 
> > > > > Rick
> > > > > 
> > > > > On Sat, 2020-03-21 at 21:16 -0700, Waldek Kozaczuk wrote:
> > > > > > 
> > > > > > 
> > > > > > I think that OSv might be hitting some sort of
> > > fragmentation
> > > > > scenario
> > > > > > (another question is why we get to this point) which should
> > > be
> > > > > > handled but is not due to the bug this patch fixes:
> > > > > > 
> > > > > > diff --git a/core/mempool.cc b/core/mempool.cc
> > > > > > index d902eea8..11fd1456 100644
> > > > > > --- a/core/mempool.cc
> > > > > > +++ b/core/mempool.cc
> > > > > > @@ -702,10 +702,13 @@ page_range*
> > > > > page_range_allocator::alloc(size_t
> > > > > > size)
> > > > > >  for (auto&& pr : _free[exact_order - 1]) {
> > > > > >  if (pr.size >= size) {
> > > > > >  range = ≺
> > > > > > +remove_list(exact_order - 1, *range);
> > > > > >  break;
> > > > > >  }
> > > > > >  }
> > > > > > -return nullptr;
> > > > > > +if (!range) {
> > > > > > +return nullptr;
> > > > > > +}
> > > > > >  } else if (order == max_order) {
> > > > > >  range = &*_free_huge.rbegin();
> > > > > >  if (range->size < size) {
> > > > > > 
> > > > > > Can you give it a try and see what happens with this patch?
> > > > > > 
> > > > > > Waldek
> > > > > > 
> > > > > > PS. You might also add this printout to verify if you are
> > > hitting
> > > > > the
> > > > > > hard fragmentation case:
> > > > > > 
> > > > > >  page_range* range = nullptr;
> > > > > >  if (!bitset) {
> > > > > >  if (!exact_order || _free[exact_order -
>

Re: [osv-dev] Re: OOM query

2020-03-22 Thread Rick Payne


Does that command work for you? For me I get:

(gdb) osv heap
Python Exception  %x format: an integer is required,
not gdb.Value: 
Error occurred in Python command: %x format: an integer is required, 
not gdb.Value

Rick

On Sun, 2020-03-22 at 19:37 -0400, Waldek Kozaczuk wrote:
> Can you run ‘osv heap’ from gdb at this point?
> 
> On Sun, Mar 22, 2020 at 19:31 Rick Payne 
> wrote:
> > Ok, so i applied the patch, and the printf and this is what I see:
> > 
> > page_range_allocator: no ranges found for size 39849984 and exact
> > order: 14
> > Waiter: 1_scheduler, bytes: 39849984
> > Out of memory: could not reclaim any further. Current memory:
> > 10933128
> > Kb, target: 0 Kb
> > 
> > Rick
> > 
> > On Sat, 2020-03-21 at 21:16 -0700, Waldek Kozaczuk wrote:
> > > 
> > > 
> > > I think that OSv might be hitting some sort of fragmentation
> > scenario
> > > (another question is why we get to this point) which should be
> > > handled but is not due to the bug this patch fixes:
> > > 
> > > diff --git a/core/mempool.cc b/core/mempool.cc
> > > index d902eea8..11fd1456 100644
> > > --- a/core/mempool.cc
> > > +++ b/core/mempool.cc
> > > @@ -702,10 +702,13 @@ page_range*
> > page_range_allocator::alloc(size_t
> > > size)
> > >  for (auto&& pr : _free[exact_order - 1]) {
> > >  if (pr.size >= size) {
> > >  range = ≺
> > > +remove_list(exact_order - 1, *range);
> > >  break;
> > >  }
> > >  }
> > > -return nullptr;
> > > +if (!range) {
> > > +return nullptr;
> > > +}
> > >  } else if (order == max_order) {
> > >  range = &*_free_huge.rbegin();
> > >  if (range->size < size) {
> > > 
> > > Can you give it a try and see what happens with this patch?
> > > 
> > > Waldek
> > > 
> > > PS. You might also add this printout to verify if you are hitting
> > the
> > > hard fragmentation case:
> > > 
> > >  page_range* range = nullptr;
> > >  if (!bitset) {
> > >  if (!exact_order || _free[exact_order - 1].empty()) {
> > > +printf("page_range_allocator: no ranges found for
> > size
> > > %ld and exact order: %d\n", size, exact_order);
> > >  return nullptr;
> > >  }
> > > 
> > > On Saturday, March 21, 2020 at 7:17:16 PM UTC-4, rickp wrote:
> > > > Ok, with the waiter_print, we see: 
> > > > 
> > > > Waiter: 2_scheduler, bytes: 4984832 
> > > > Waiter: 5_scheduler, bytes: 4984832 
> > > > Out of memory: could not reclaim any further. Current memory: 
> > > > 10871576Kb, target: 0 Kb 
> > > > 
> > > > 'osv mem' in gdb reports loads of free memory: 
> > > > 
> > > > (gdb) osv mem 
> > > > Total Memory: 12884372480 Bytes 
> > > > Mmap Memory:  1610067968 Bytes (12.50%) 
> > > > Free Memory:  11132493824 Bytes (86.40%) 
> > > > 
> > > > So why is it failing to allocate 5MB when it claims to have
> > 11GB
> > > > free? 
> > > > 
> > > > Any more debug I can provide? 
> > > > 
> > > > Rick 
> > > > 
> > > > On Sat, 2020-03-21 at 15:32 +1000, Rick Payne wrote: 
> > > > > And indeed, back on the 'release' image, and we hit an oom: 
> > > > > 
> > > > > (gdb) b memory::oom 
> > > > > Breakpoint 1 at 0x403ee300: file core/mempool.cc, line 505. 
> > > > > (gdb) c 
> > > > > Continuing. 
> > > > > 
> > > > > Thread 1 hit Breakpoint 1, memory::oom (target=target@entry=0
> > )
> > > > at 
> > > > > core/mempool.cc:505 
> > > > > 505 core/mempool.cc: No such file or directory. 
> > > > > (gdb) osv memory 
> > > > > Total Memory: 12884372480 Bytes 
> > > > > Mmap Memory:  1541332992 Bytes (11.96%) 
> > > > > Free Memory:  11490983936 Bytes (89.19%) 
> > > > > 
> > > > > I don't have the _waiters list print unforutnately (as I
> > hadn't 
> > > > > included it in this image). It looks like this though: 
> > > > > 
> > > > > (

Re: [osv-dev] Re: OOM query

2020-03-22 Thread Rick Payne


Ok, so i applied the patch, and the printf and this is what I see:

page_range_allocator: no ranges found for size 39849984 and exact
order: 14
Waiter: 1_scheduler, bytes: 39849984
Out of memory: could not reclaim any further. Current memory: 10933128
Kb, target: 0 Kb

Rick

On Sat, 2020-03-21 at 21:16 -0700, Waldek Kozaczuk wrote:
> 
> 
> I think that OSv might be hitting some sort of fragmentation scenario
> (another question is why we get to this point) which should be
> handled but is not due to the bug this patch fixes:
> 
> diff --git a/core/mempool.cc b/core/mempool.cc
> index d902eea8..11fd1456 100644
> --- a/core/mempool.cc
> +++ b/core/mempool.cc
> @@ -702,10 +702,13 @@ page_range* page_range_allocator::alloc(size_t
> size)
>  for (auto&& pr : _free[exact_order - 1]) {
>  if (pr.size >= size) {
>  range = ≺
> +remove_list(exact_order - 1, *range);
>  break;
>  }
>  }
> -return nullptr;
> +if (!range) {
> +return nullptr;
> +}
>  } else if (order == max_order) {
>  range = &*_free_huge.rbegin();
>  if (range->size < size) {
> 
> Can you give it a try and see what happens with this patch?
> 
> Waldek
> 
> PS. You might also add this printout to verify if you are hitting the
> hard fragmentation case:
> 
>  page_range* range = nullptr;
>  if (!bitset) {
>  if (!exact_order || _free[exact_order - 1].empty()) {
> +printf("page_range_allocator: no ranges found for size
> %ld and exact order: %d\n", size, exact_order);
>  return nullptr;
>  }
> 
> On Saturday, March 21, 2020 at 7:17:16 PM UTC-4, rickp wrote:
> > Ok, with the waiter_print, we see: 
> > 
> > Waiter: 2_scheduler, bytes: 4984832 
> > Waiter: 5_scheduler, bytes: 4984832 
> > Out of memory: could not reclaim any further. Current memory: 
> > 10871576Kb, target: 0 Kb 
> > 
> > 'osv mem' in gdb reports loads of free memory: 
> > 
> > (gdb) osv mem 
> > Total Memory: 12884372480 Bytes 
> > Mmap Memory:  1610067968 Bytes (12.50%) 
> > Free Memory:  11132493824 Bytes (86.40%) 
> > 
> > So why is it failing to allocate 5MB when it claims to have 11GB
> > free? 
> > 
> > Any more debug I can provide? 
> > 
> > Rick 
> > 
> > On Sat, 2020-03-21 at 15:32 +1000, Rick Payne wrote: 
> > > And indeed, back on the 'release' image, and we hit an oom: 
> > > 
> > > (gdb) b memory::oom 
> > > Breakpoint 1 at 0x403ee300: file core/mempool.cc, line 505. 
> > > (gdb) c 
> > > Continuing. 
> > > 
> > > Thread 1 hit Breakpoint 1, memory::oom (target=target@entry=0)
> > at 
> > > core/mempool.cc:505 
> > > 505 core/mempool.cc: No such file or directory. 
> > > (gdb) osv memory 
> > > Total Memory: 12884372480 Bytes 
> > > Mmap Memory:  1541332992 Bytes (11.96%) 
> > > Free Memory:  11490983936 Bytes (89.19%) 
> > > 
> > > I don't have the _waiters list print unforutnately (as I hadn't 
> > > included it in this image). It looks like this though: 
> > > 
> > > (gdb) p _oom_blocked._waiters 
> > > $4 = 
> > >
> > { > cl 
> > > ai 
> > > mer_waiters::wait_node,
> > boost::intrusive::list_node_traits, 
> > > (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag,
> > 1>, 
> > > unsigned long, false, void>> = {static constant_time_size =
> > false, 
> > > static stateful_value_traits = , 
> > > static has_container_from_iterator = , static 
> > > safemode_or_autounlink = true, 
> > > data_ = 
> > >
> > { > > boost::intrusive::list_node_traits, 
> > > (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag,
> > 1>> = 
> > >
> > { > od 
> > > e, 
> > > boost::intrusive::list_node*, boost::intrusive::dft_tag,
> > 1>> = 
> > > {}, static link_mode =
> > boost::intrusive::safe_link}, 
> > >   root_plus_size_ = 
> > > { > > unsigned long, void>> = { 
> > >   static constant_time_size = }, m_header
> > = 
> > > {> = { 
> > > next_ = 0x200041305580, prev_ = 0x200041305580},  > data 
> > > fields>, } 
> > > 
> > > 'osv waiters' fails, annoyingly: 
> > > 
> > > (gdb) osv waiters 
>

Re: [osv-dev] Re: OOM query

2020-03-21 Thread Rick Payne


Ok, with the waiter_print, we see:

Waiter: 2_scheduler, bytes: 4984832
Waiter: 5_scheduler, bytes: 4984832
Out of memory: could not reclaim any further. Current memory:
10871576Kb, target: 0 Kb

'osv mem' in gdb reports loads of free memory:

(gdb) osv mem
Total Memory: 12884372480 Bytes
Mmap Memory:  1610067968 Bytes (12.50%)
Free Memory:  11132493824 Bytes (86.40%)

So why is it failing to allocate 5MB when it claims to have 11GB free?

Any more debug I can provide?

Rick

On Sat, 2020-03-21 at 15:32 +1000, Rick Payne wrote:
> And indeed, back on the 'release' image, and we hit an oom:
> 
> (gdb) b memory::oom
> Breakpoint 1 at 0x403ee300: file core/mempool.cc, line 505.
> (gdb) c
> Continuing.
> 
> Thread 1 hit Breakpoint 1, memory::oom (target=target@entry=0) at
> core/mempool.cc:505
> 505 core/mempool.cc: No such file or directory.
> (gdb) osv memory
> Total Memory: 12884372480 Bytes
> Mmap Memory:  1541332992 Bytes (11.96%)
> Free Memory:  11490983936 Bytes (89.19%)
> 
> I don't have the _waiters list print unforutnately (as I hadn't
> included it in this image). It looks like this though:
> 
> (gdb) p _oom_blocked._waiters
> $4 =
> { ai
> mer_waiters::wait_node, boost::intrusive::list_node_traits,
> (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 1>,
> unsigned long, false, void>> = {static constant_time_size = false,
> static stateful_value_traits = , 
> static has_container_from_iterator = , static
> safemode_or_autounlink = true, 
> data_ =
> { boost::intrusive::list_node_traits,
> (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 1>> =
> { e,
> boost::intrusive::list_node*, boost::intrusive::dft_tag, 1>> =
> {}, static link_mode = boost::intrusive::safe_link}, 
>   root_plus_size_ =
> { unsigned long, void>> = {
>   static constant_time_size = }, m_header =
> {> = {
> next_ = 0x200041305580, prev_ = 0x200041305580},  fields>, }
> 
> 'osv waiters' fails, annoyingly:
> 
> (gdb) osv waiters
> waiters:
> Python Exception  Cannot access memory at
> address 0x42c00020982ac010: 
> Error occurred in Python command: Cannot access memory at address
> 0x42c00020982ac010
> 
> Rick
> 
> On Sat, 2020-03-21 at 06:42 +1000, Rick Payne wrote:
> > So we've been trying with a debug image (with the tcp kassert
> > disabled). We do not get an OOM crash, but after some time (24hrs
> > or
> > more) all cores just hit idle and never recover. It stops
> > responding
> > to
> > TCP connections and when you attached gdb, you see this:
> > 
> > Thread 1 received signal SIGINT, Interrupt.
> > processor::sti_hlt () at arch/x64/processor.hh:252
> > 252 arch/x64/processor.hh: No such file or directory.
> > (gdb) info thread
> >   Id   Target Id Frame 
> > * 1Thread 1 (CPU#0 [halted ]) processor::sti_hlt () at
> > arch/x64/processor.hh:252
> >   2Thread 2 (CPU#1 [halted ]) processor::sti_hlt () at
> > arch/x64/processor.hh:252
> >   3Thread 3 (CPU#2 [halted ]) processor::sti_hlt () at
> > arch/x64/processor.hh:252
> >   4Thread 4 (CPU#3 [halted ]) processor::sti_hlt () at
> > arch/x64/processor.hh:252
> >   5Thread 5 (CPU#4 [halted ]) processor::sti_hlt () at
> > arch/x64/processor.hh:252
> >   6Thread 6 (CPU#5 [halted ]) processor::sti_hlt () at
> > arch/x64/processor.hh:252
> > 
> > The threads themselves look liks this:
> > 
> > (gdb) bt
> > #0  processor::sti_hlt () at arch/x64/processor.hh:252
> > #1  0x405e4016 in arch::wait_for_interrupt () at
> > arch/x64/arch.hh:43
> > #2  0x405d7f10 in sched::cpu::do_idle
> > (this=0x8001b040)
> > at core/sched.cc:404
> > #3  0x405d7fc1 in sched::cpu::idle
> > (this=0x8001b040)
> > at
> > core/sched.cc:423
> > #4  0x405d73ef in sched::cpuoperator()(void)
> > const (__closure=0x800100156070)
> > at core/sched.cc:165
> > #5  0x405e0b57 in std::_Function_handler > sched::cpu::init_idle_thread():: >::_M_invoke(const
> > std::_Any_data &) (__functor=...) at
> > /usr/include/c++/9/bits/std_function.h:300
> > #6  0x40496206 in std::function::operator()()
> > const
> > (this=0x800100156070)
> > at /usr/include/c++/9/bits/std_function.h:690
> > #7  0x405db386 in sched::thread::main
> > (this=0x800100156040)
> > at core/sched.cc:1210
> > #8  0x405d7173 in sched::thread_main_c
> > (t=0x800100156040)
> > at arch/x64/arch-s

Re: [osv-dev] Re: OOM query

2020-03-20 Thread Rick Payne


And indeed, back on the 'release' image, and we hit an oom:

(gdb) b memory::oom
Breakpoint 1 at 0x403ee300: file core/mempool.cc, line 505.
(gdb) c
Continuing.

Thread 1 hit Breakpoint 1, memory::oom (target=target@entry=0) at
core/mempool.cc:505
505 core/mempool.cc: No such file or directory.
(gdb) osv memory
Total Memory: 12884372480 Bytes
Mmap Memory:  1541332992 Bytes (11.96%)
Free Memory:  11490983936 Bytes (89.19%)

I don't have the _waiters list print unforutnately (as I hadn't
included it in this image). It looks like this though:

(gdb) p _oom_blocked._waiters
$4 =
{,
(boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 1>,
unsigned long, false, void>> = {static constant_time_size = false,
static stateful_value_traits = , 
static has_container_from_iterator = , static
safemode_or_autounlink = true, 
data_ =
{,
(boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 1>> =
{*, boost::intrusive::dft_tag, 1>> =
{}, static link_mode = boost::intrusive::safe_link}, 
  root_plus_size_ = {> = {
  static constant_time_size = }, m_header =
{> = {
next_ = 0x200041305580, prev_ = 0x200041305580}, , }

'osv waiters' fails, annoyingly:

(gdb) osv waiters
waiters:
Python Exception  Cannot access memory at
address 0x42c00020982ac010: 
Error occurred in Python command: Cannot access memory at address
0x42c00020982ac010

Rick

On Sat, 2020-03-21 at 06:42 +1000, Rick Payne wrote:
> So we've been trying with a debug image (with the tcp kassert
> disabled). We do not get an OOM crash, but after some time (24hrs or
> more) all cores just hit idle and never recover. It stops responding
> to
> TCP connections and when you attached gdb, you see this:
> 
> Thread 1 received signal SIGINT, Interrupt.
> processor::sti_hlt () at arch/x64/processor.hh:252
> 252 arch/x64/processor.hh: No such file or directory.
> (gdb) info thread
>   Id   Target Id Frame 
> * 1Thread 1 (CPU#0 [halted ]) processor::sti_hlt () at
> arch/x64/processor.hh:252
>   2Thread 2 (CPU#1 [halted ]) processor::sti_hlt () at
> arch/x64/processor.hh:252
>   3Thread 3 (CPU#2 [halted ]) processor::sti_hlt () at
> arch/x64/processor.hh:252
>   4Thread 4 (CPU#3 [halted ]) processor::sti_hlt () at
> arch/x64/processor.hh:252
>   5Thread 5 (CPU#4 [halted ]) processor::sti_hlt () at
> arch/x64/processor.hh:252
>   6Thread 6 (CPU#5 [halted ]) processor::sti_hlt () at
> arch/x64/processor.hh:252
> 
> The threads themselves look liks this:
> 
> (gdb) bt
> #0  processor::sti_hlt () at arch/x64/processor.hh:252
> #1  0x405e4016 in arch::wait_for_interrupt () at
> arch/x64/arch.hh:43
> #2  0x405d7f10 in sched::cpu::do_idle
> (this=0x8001b040)
> at core/sched.cc:404
> #3  0x405d7fc1 in sched::cpu::idle (this=0x8001b040)
> at
> core/sched.cc:423
> #4  0x405d73ef in sched::cpuoperator()(void)
> const (__closure=0x800100156070)
> at core/sched.cc:165
> #5  0x405e0b57 in std::_Function_handler sched::cpu::init_idle_thread():: >::_M_invoke(const
> std::_Any_data &) (__functor=...) at
> /usr/include/c++/9/bits/std_function.h:300
> #6  0x40496206 in std::function::operator()() const
> (this=0x800100156070)
> at /usr/include/c++/9/bits/std_function.h:690
> #7  0x405db386 in sched::thread::main
> (this=0x800100156040)
> at core/sched.cc:1210
> #8  0x405d7173 in sched::thread_main_c (t=0x800100156040)
> at arch/x64/arch-switch.hh:321
> #9  0x404911b3 in thread_main () at arch/x64/entry.S:113
> 
> I tried the 'osv waiters' command but jsut get:
> 
> (gdb) osv waiters
> waiters:
> Python Exception  Cannot access memory at
> address 0x42c00020880a9010: 
> Error occurred in Python command: Cannot access memory at address
> 0x42c00020880a9010
> 
> I think we'll go back to a 'release' image and see if we get the oom
> with a few more clues...
> 
> Rick
> 
> 
> On Wed, 2020-03-11 at 08:07 -0700, Waldek Kozaczuk wrote:
> > 
> > On Tuesday, March 10, 2020 at 10:53:17 PM UTC-4, rickp wrote:
> > > I've not found a way to reproduce this other than in production
> > > yet, 
> > > which is annoying. I've built an image with this patch, and will
> > > see if 
> > > we can run it with gdb too. 
> > > 
> > > What should I be looking for if it hits? 
> > 
> > In case of oom() this patch should show a list of threads (waiters)
> > along with amount of memory requested. I am hoping it will give
> > better clues where the problem is. 
> > > Note, the 'osv pagetable walk' co

Re: [osv-dev] Re: OOM query

2020-03-20 Thread Rick Payne
1029 // Wake up all waiters that are waiting
> > and 
> > > > > now have a chance to succeed. 
> > > > > 1030 // If we could not wake any, there is 
> > > > > nothing really we can do. 
> > > > > 1031 if (!_oom_blocked.wake_waiters()) { 
> > > > > 1032 oom(); 
> > > > > 1033 } 
> > > > > 1034 } 
> > > > > 1035 
> > > > > 1036 if (balloon_api) { 
> > > > > 1037 balloon_api->voluntary_return(); 
> > > > > 1038 } 
> > > > > 1039 } 
> > > > > 1040 } 
> > > > > 1041 } 
> > > > > 
> > > > > We got oom() because target was '>= 0'. Now the target is 
> > > > > calculated as the result of  bytes_until_normal(). 
> > > > > 
> > > > >  495 ssize_t reclaimer::bytes_until_normal(pressure curr) 
> > > > >  496 { 
> > > > >  497 assert(mutex_owned(&free_page_ranges_lock)); 
> > > > >  498 if (curr == pressure::PRESSURE) { 
> > > > >  499 return watermark_lo - stats::free(); 
> > > > >  500 } else { 
> > > > >  501 return 0; 
> > > > >  502 } 
> > > > >  503 } 
> > > > > 
> > > > > which seems to indicate that when 0 is returned there no need
> > to 
> > > > > reclaim any memory. 
> > > > > 
> > > > > So here are two things that might be wrong: 
> > > > > 
> > > > > 1. Shouldn't if (target >= 0) be changed to if (target > 0)
> > {? 
> > > > > 
> > > > > 2. Shouldn't we re-read the target in second WITH_LOCK
> > instead of 
> > > > > comparing the original value in the beginning of the body of
> > the 
> > > > > loop? The line before - _shrinker_loop(target, [this] {
> > return 
> > > > > _oom_blocked.has_waiters(); }); - might have just released
> > enough 
> > > > > memory to bring target below 0, right? 
> > > > > 
> > > > > In any case it would be useful to print the value of the
> > target 
> > > > > before oom(): 
> > > > > 
> > > > >  if (!_oom_blocked.wake_waiters()) { 
> > > > > 
> > > > >  printf("--> Target: %ld\n", target); 
> > > > > 
> > > > >  oom(); 
> > > > > 
> > > > >  } 
> > > > > 
> > > > > 
> > > > > On Monday, March 9, 2020 at 10:51:02 PM UTC-4, rickp wrote: 
> > > > > > We're pretty close to current on OSv, but it also happens
> > on an 
> > > > > > older 
> > > > > > image. We have changed some stuff in our app, but I think
> > that 
> > > > > > may just 
> > > > > > be provking the bug. Certainly from gdb, I can see that
> > both 
> > > > > > mmaped and 
> > > > > > normal memory fluctuate up and down but eveything looks
> > sane. 
> > > > > > 
> > > > > > More debug in wake_waiters would be useful, but I'm losing
> > the 
> > > > > > argument 
> > > > > > to continue with OSv at the moment which makes testing this
> > a 
> > > > > > bit 
> > > > > > 'political'. 
> > > > > > 
> > > > > > btw - when we do run the system out of memory, it seems to
> > hang 
> > > > > > rather 
> > > > > > than generate an oom. Have you tried it? 
> > > > > > 
> > > > > > The tcp_do_segment one has been mentioned before (by
> > someone 
> > > > > > else). The 
> > > > > > issue is that the kassert only has effect in the debug
> > build. 
> > > > > > I'd guess 
> > > > > > that the socket is being closed, but still has segments
> > that 
> > > > > > have not 
> > > > > > been processed, or something like that. I'll try and narrow
> > it 
> > > > > > down a 
> > > > > > bit if 

Re: [osv-dev] Re: OOM query

2020-03-10 Thread Rick Payne
) {
> > > 1019 if (_oom_blocked.wake_waiters()) {
> > > 1020 continue;
> > > 1021 }
> > > 1022 }
> > > 1023 }
> > > 1024 
> > > 1025 _shrinker_loop(target, [this] { return
> > > _oom_blocked.has_waiters(); });
> > > 1026 
> > > 1027 WITH_LOCK(free_page_ranges_lock) {
> > > 1028 if (target >= 0) {
> > > 1029 // Wake up all waiters that are waiting and
> > > now have a chance to succeed.
> > > 1030 // If we could not wake any, there is
> > > nothing really we can do.
> > > 1031 if (!_oom_blocked.wake_waiters()) {
> > > 1032 oom();
> > > 1033 }
> > > 1034 }
> > > 1035 
> > > 1036 if (balloon_api) {
> > > 1037 balloon_api->voluntary_return();
> > > 1038 }
> > > 1039 }
> > > 1040 }
> > > 1041 }
> > > 
> > > We got oom() because target was '>= 0'. Now the target is
> > > calculated as the result of  bytes_until_normal(). 
> > > 
> > >  495 ssize_t reclaimer::bytes_until_normal(pressure curr)
> > >  496 {
> > >  497 assert(mutex_owned(&free_page_ranges_lock));
> > >  498 if (curr == pressure::PRESSURE) {
> > >  499 return watermark_lo - stats::free();
> > >  500 } else {
> > >  501 return 0;
> > >  502 }
> > >  503 }
> > > 
> > > which seems to indicate that when 0 is returned there no need to
> > > reclaim any memory.
> > > 
> > > So here are two things that might be wrong:
> > > 
> > > 1. Shouldn't if (target >= 0) be changed to if (target > 0) {? 
> > > 
> > > 2. Shouldn't we re-read the target in second WITH_LOCK instead of
> > > comparing the original value in the beginning of the body of the
> > > loop? The line before - _shrinker_loop(target, [this] { return
> > > _oom_blocked.has_waiters(); }); - might have just released enough
> > > memory to bring target below 0, right?
> > > 
> > > In any case it would be useful to print the value of the target
> > > before oom():
> > > 
> > >  if (!_oom_blocked.wake_waiters()) {
> > > 
> > >  printf("--> Target: %ld\n", target);
> > > 
> > >  oom();
> > > 
> > >  }
> > > 
> > > 
> > > On Monday, March 9, 2020 at 10:51:02 PM UTC-4, rickp wrote:
> > > > We're pretty close to current on OSv, but it also happens on an
> > > > older 
> > > > image. We have changed some stuff in our app, but I think that
> > > > may just 
> > > > be provking the bug. Certainly from gdb, I can see that both
> > > > mmaped and 
> > > > normal memory fluctuate up and down but eveything looks sane. 
> > > > 
> > > > More debug in wake_waiters would be useful, but I'm losing the
> > > > argument 
> > > > to continue with OSv at the moment which makes testing this a
> > > > bit 
> > > > 'political'. 
> > > > 
> > > > btw - when we do run the system out of memory, it seems to hang
> > > > rather 
> > > > than generate an oom. Have you tried it? 
> > > > 
> > > > The tcp_do_segment one has been mentioned before (by someone
> > > > else). The 
> > > > issue is that the kassert only has effect in the debug build.
> > > > I'd guess 
> > > > that the socket is being closed, but still has segments that
> > > > have not 
> > > > been processed, or something like that. I'll try and narrow it
> > > > down a 
> > > > bit if I get time. 
> > > > 
> > > > Rick 
> > > > 
> > > > On Mon, 2020-03-09 at 22:32 -0400, Waldek Kozaczuk wrote: 
> > > > > Does it happen with the very latest OSv code? Did it start
> > > > happening 
> > > > > at some point more often? 
> > > > > 
> > > > > I wonder if we could add some helpful printouts in
> > > > wake_waiters(). 
> > > > > 
> > > > > Btw that assert() failure in tcp_do_segment() rings a 

Re: [osv-dev] Re: OOM query

2020-03-09 Thread Rick Payne


We're pretty close to current on OSv, but it also happens on an older
image. We have changed some stuff in our app, but I think that may just
be provking the bug. Certainly from gdb, I can see that both mmaped and
normal memory fluctuate up and down but eveything looks sane.

More debug in wake_waiters would be useful, but I'm losing the argument
to continue with OSv at the moment which makes testing this a bit
'political'.

btw - when we do run the system out of memory, it seems to hang rather
than generate an oom. Have you tried it?

The tcp_do_segment one has been mentioned before (by someone else). The
issue is that the kassert only has effect in the debug build. I'd guess
that the socket is being closed, but still has segments that have not
been processed, or something like that. I'll try and narrow it down a
bit if I get time.

Rick

On Mon, 2020-03-09 at 22:32 -0400, Waldek Kozaczuk wrote:
> Does it happen with the very latest OSv code? Did it start happening
> at some point more often? 
> 
> I wonder if we could add some helpful printouts in wake_waiters(). 
> 
> Btw that assert() failure in tcp_do_segment() rings a bell. 
> 
> On Mon, Mar 9, 2020 at 22:25 Rick Payne  wrote:
> > I can't add much other than I doubt its fragmentation. Sometimes
> > this
> > happens within a few minutes of the system starting. At no point do
> > I
> > think we're using more than 2GB of ram (of the 12GB) either.
> > 
> > I did compile up a debug verison of OSv and built the system with
> > that,
> > but I've been unable to trigger the oom(). Worse, I hit a kassert
> > in
> > the netchannel code that seems to be ignored in the 'release'
> > build,
> > but panics in the debug build:
> > 
> > [E/384 bsd-kassert]: tcp_do_segment: TCPS_LISTEN
> > Assertion failed: tp->get_state() > 1
> > (bsd/sys/netinet/tcp_input.cc:
> > tcp_do_segment: 1076)
> > 
> > [backtrace]
> > 0x40221330 
> > 0x40221399 <__assert_fail+64>
> > 0x402a4798 
> > 0x402a97c2 
> > 0x402a98a1 
> > 0x402aa448 
> > 0x40656a9a ::operator()(mbuf*)
> > const+76>
> > 0x40655855 
> > 0x4023b165 
> > 0x4023b4d7 
> > 0x4024cd21 
> > 0x406a6a10 
> > 0x406a64f7 
> > 0x4067cd42 
> > 
> > So at the moment, I'm a bit stuck with getting any more info...
> > 
> > Rick
> > 
> > On Mon, 2020-03-09 at 08:52 -0700, Waldek Kozaczuk wrote:
> > > As I understand this stack trace the oom() was called here as
> > part of
> > > _do_reclaim():
> > > 
> > > 1025 WITH_LOCK(free_page_ranges_lock) {
> > > 1026 if (target >= 0) {
> > > 1027 // Wake up all waiters that are waiting and
> > now
> > > have a chance to succeed.
> > > 1028 // If we could not wake any, there is
> > nothing
> > > really we can do.
> > > 1029 if (!_oom_blocked.wake_waiters()) {
> > > 1030 oom();
> > > 1031 }
> > > 1032 }
> > > 1033 
> > > 1034 if (balloon_api) {
> > > 1035 balloon_api->voluntary_return();
> > > 1036 }
> > > 1037 }
> > > 
> > > so it seems wake_waiters() returned false. I wonder if the memory
> > was
> > > heavily fragmented or there is some logical bug in there. This
> > method
> > > is called from two places and I wonder if this part of
> > wake_waiters()
> > > is correct:
> > > 
> > >  921 if (!_waiters.empty()) {
> > >  922 reclaimer_thread.wake();
> > >  923 }
> > >  924 return woken;
> > > 
> > > 
> > > should this if also set woken to true?
> > > 
> > > Also could we also enhance the oom() logic to print out more
> > useful
> > > information if this happens once again?
> > > 
> > > On Tuesday, March 3, 2020 at 2:21:40 AM UTC-5, rickp wrote:
> > > > Had a crash on a system that I don't understand. Its a VM with
> > > > 12GB 
> > > > allocated, we were running without about 10.5GB free according
> > to
> > > > the 
> > > > API. 
> > > > 
> > > > Out of the blue, we had a panic: 
> > > > 
> > > > Out of memory: could not reclaim any further. Current memory:
> > > > 109

Re: [osv-dev] Re: OOM query

2020-03-09 Thread Rick Payne


I can't add much other than I doubt its fragmentation. Sometimes this
happens within a few minutes of the system starting. At no point do I
think we're using more than 2GB of ram (of the 12GB) either.

I did compile up a debug verison of OSv and built the system with that,
but I've been unable to trigger the oom(). Worse, I hit a kassert in
the netchannel code that seems to be ignored in the 'release' build,
but panics in the debug build:

[E/384 bsd-kassert]: tcp_do_segment: TCPS_LISTEN
Assertion failed: tp->get_state() > 1 (bsd/sys/netinet/tcp_input.cc:
tcp_do_segment: 1076)

[backtrace]
0x40221330 
0x40221399 <__assert_fail+64>
0x402a4798 
0x402a97c2 
0x402a98a1 
0x402aa448 
0x40656a9a ::operator()(mbuf*)
const+76>
0x40655855 
0x4023b165 
0x4023b4d7 
0x4024cd21 
0x406a6a10 
0x406a64f7 
0x4067cd42 

So at the moment, I'm a bit stuck with getting any more info...

Rick

On Mon, 2020-03-09 at 08:52 -0700, Waldek Kozaczuk wrote:
> As I understand this stack trace the oom() was called here as part of
> _do_reclaim():
> 
> 1025 WITH_LOCK(free_page_ranges_lock) {
> 1026 if (target >= 0) {
> 1027 // Wake up all waiters that are waiting and now
> have a chance to succeed.
> 1028 // If we could not wake any, there is nothing
> really we can do.
> 1029 if (!_oom_blocked.wake_waiters()) {
> 1030 oom();
> 1031 }
> 1032 }
> 1033 
> 1034 if (balloon_api) {
> 1035 balloon_api->voluntary_return();
> 1036 }
> 1037 }
> 
> so it seems wake_waiters() returned false. I wonder if the memory was
> heavily fragmented or there is some logical bug in there. This method
> is called from two places and I wonder if this part of wake_waiters()
> is correct:
> 
>  921 if (!_waiters.empty()) {
>  922 reclaimer_thread.wake();
>  923 }
>  924 return woken;
> 
> 
> should this if also set woken to true?
> 
> Also could we also enhance the oom() logic to print out more useful
> information if this happens once again?
> 
> On Tuesday, March 3, 2020 at 2:21:40 AM UTC-5, rickp wrote:
> > Had a crash on a system that I don't understand. Its a VM with
> > 12GB 
> > allocated, we were running without about 10.5GB free according to
> > the 
> > API. 
> > 
> > Out of the blue, we had a panic: 
> > 
> > Out of memory: could not reclaim any further. Current memory:
> > 10954988 
> > Kb 
> > [backtrace] 
> > 0x403f6320  
> > 0x403f71cc  
> > 0x403f722f  
> > 0x4040f29b  
> > 0x403ae412  
> > 
> > The 'Out of memory' message seems to print stats::free() and that 
> > number suggests we have plenty of free ram. 
> > 
> > Have I misunderstood, or is there something I need to be looking
> > at? 
> > 
> > Cheers, 
> > Rick 
> > 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/osv-dev/8f7e00a5-edfe-4487-aa5a-5072a560c6e3%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/63b1cad2ad3c960fcfdfbbed9e4b014a7c75645e.camel%40rossfell.co.uk.


[osv-dev] OOM query

2020-03-02 Thread Rick Payne


Had a crash on a system that I don't understand. Its a VM with 12GB
allocated, we were running without about 10.5GB free according to the
API.

Out of the blue, we had a panic:

Out of memory: could not reclaim any further. Current memory: 10954988
Kb
[backtrace]
0x403f6320 
0x403f71cc 
0x403f722f 
0x4040f29b 
0x403ae412 

The 'Out of memory' message seems to print stats::free() and that
number suggests we have plenty of free ram.

Have I misunderstood, or is there something I need to be looking at?

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/5038fd8cd1c1d73d8bbd33deedb9117e1e9dcc5f.camel%40rossfell.co.uk.


Re: [osv-dev] aarch64 resurrected

2020-01-01 Thread Rick Payne
On Wed, 2020-01-01 at 21:17 -0800, Waldek Kozaczuk wrote:
> I had to manually add uush.so to usr_ramfs.manifest. For whatever
> reason, the image built on Ubuntu would never boot at all. I wonder
> if that has to do with some mixed up gcc path on Ubuntu and maybe we
> end up compiling and/or linking against some wrong artifacts
> (external/host maybe related).  

When I last looked at this, it was something to do with the
instantiation of the early console. There's a thread where Nadav and I
discussed it. His build hit the dtb issue you have no fixed, whereas I
got what you see.

I have still never got that part working (though I've not put much
effort in, to be fair). I assume its some compiler difference. For any
aarch64 stuff I've been doing, I've resorted to a fedora29 vm :(

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/59641e630e833ba37f2fdbdea32d8a2cd9699785.camel%40rossfell.co.uk.


Re: [osv-dev] Re: [PATCH 14/16] cloud-init: Added support for Network v1 and ConfigDrive data source

2019-12-10 Thread Rick Payne
Hi,

> If you are interested and have time to help me, I am willing to
> create ipv6 branch and apply the remaining patches in this series to
> it. That way we can independently test it before we apply them to the
> master branch. But I would need some help. I do not have expertise in
> networking code that you seem to have.

I'm happy to test. When I asked about v6 support before I did not have
the time to contribute the huge amount of code needed. The side effect
of taking the netchannels design means diverging from the Freedbsd code
and increasing the porting effort.

> I do not think we are NOT interested. It is just we have a very
> limited number of volunteers contributing to the project. I myself
> (who contributes solely on its own personal time) have only so much
> capacity, time and more importantly knowledge in relevant areas of
> the code to review or contribute. Having said that it is my
> impression that in the last couple of months we started receiving
> contributions from more people and organizations. Which seems to be
> an encouraging sign.

Understood - and I too am working on OSv on my own dime, so its
disappointing when I (or others) put in efforts to solve issues that
end up going nowhere. I'll point out that I'm not a c++ programmer
which makes it harder for me to ensure I'm doing the right thing.

> Now regarding the clock issue, if I remember correctly you sent a
> patch that Nadav had some reservations related to potential
> performance degradation. Have you had a chance to somehow measure the
> performance impact of your code changes? 

I saw no impact in the performance of the code at the time of I
wouldn't have suggested the patch. We have this running in production
on customer sites.

I dug out bpftrace and measured an increase from 200 MSR_WRITE calls on
an unaltered OSv instance, to about 1500 when we're stressing our
setup. Its in the noise of the cpu usage on our setup. Anyway, to us
the correct time is more important and we cannot run NTP of course (as
we would on a regular vm).

And for us - thats what this comes down to. We could give up on OSv and
just go back to a regular VM and it would probably be easier (we've had
some pushback already).

> Could you please elaborate on the "multicast mac addresses on the
> interfaces" issue?

If you want to do multicast protocols, you need to add the computed mac
address to the tables in the kernel (ie. the SIOCADDMULTI ioctl). The
ioct should map into VIRTIO_NET_CTRL_MAC_TABLE_SET to update the
filtering in the host. I implemented this but its more C than C++.

Admittedly, its probably not many people that want that - but its an
area where we are not compatible with Linux.

> Let me try to apply this patch as-is. Regarding the "program options"
> part could you please send me your version of the patch so that I can
> extract the relevant parts and apply it.

I'm trying to track down a bug that I'm seeing with Charles' network-
module where we need an inbound packet before packets flow outbound.
Once thats done I can let you have the patch.

Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/90823163932b68524baccb75ae5cadea027c393f.camel%40rossfell.co.uk.


Re: [osv-dev] Re: [PATCH 14/16] cloud-init: Added support for Network v1 and ConfigDrive data source

2019-12-10 Thread Rick Payne
Hi,

On Tue, 2019-12-10 at 05:43 -0800, Waldek Kozaczuk wrote:
> Hi,
> 
> I think that Nadav had some code review comments he hoped to get
> resolved before he could apply the entire series. I think he did
> apply couple of the very first and trivial ones though. I wonder if
> some of the individual patches like this one can be applied piece by
> piece. All in all, I would love to finally merge this code. I think
> that Nadav mentioned some of the bug fixes that are part of it as
> well that would be nice to merge.

Well, the v6 support is required to be taken seriously I guess ;) At
the moment we have to accept v6 connections on the host and bounce to a
v4 port on OSv. Horrible.

Certainly some parts of Charles' patch set look like they could be
applied independently. Taking this one, Nadav didn't have any
objections - but then it just languished.

> As I understand Spirent (so you and Charles, right) maintain a 

I'm nothing to do with Spirent - not sure why you have that impression.

> separate fork of OSv - https://github.com/SpirentOrion/osv - which
> has many fixes in networking stack beyond the IPV6 patches that
> Charles sent, right? I would love to bring them on to the original
> repo. Now we have some sort of an automated test framework that could
> let us test the networking stack against over 20 or more workloads - 
> https://github.com/cloudius-systems/osv/wiki/Automated-Testing-Framework

I do maintain my own separate tree because we have a slew of stuff that
we have to have fixed (multicast mac addresses on the interfaces, the
long ongoing clock problem). After the clock issue, I just wanted to
press on with our stuff and if you're not interested in fixing things,
so be it.

Certainly I can see why Charles didn't come back.

> Lastly as far as cloud-init and program options fallout. Indeed
> cloud-init was the only app I did not remove program option in lieu
> of our custom lighter core/options.cc (boost and this specific
> library were giving us a lot of grief). If you have a patch handy
> with those fixes to cloud-init please send me one and I will happily
> apply.

Understood. I'd missed the implications when you did your initial post
on the subject.

My patch is intertwined with the network-module stuff, which is why I
was asking the question. Its easier to submit one patch than unroll
them.

Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/9c70ae22f022c59baa1f508ee8deeeb715a4a9d2.camel%40rossfell.co.uk.


[osv-dev] Re: [PATCH 14/16] cloud-init: Added support for Network v1 and ConfigDrive data source

2019-12-09 Thread Rick Payne


I don't see that this was applied? The reason I ask is that we're using
cloud-init to set the IP addresses on multiple interfaces inside OSv
(to separate database traffic from the protocols). I have a secondary
patch to turn off dhcp that we also use.

I have my own network-module.cc, as I missed this one on the list. I've
just fixed up the cloud-init code in my tree to deal with the
libboost_program_options fall out and *then* I discover this code.

So how to proceed - guidance please? Charles' code is more complete
than mine, so I could try and merge that and get it working on the
current codebase - would that be acceptable?

Rick

On Mon, 2018-08-13 at 12:32 +0300, Nadav Har'El wrote:
> All looks good. Thanks.
> 
> 
> --
> Nadav Har'El
> n...@scylladb.com
> 
> On Tue, Aug 7, 2018 at 5:49 AM, Charles Myers <
> charles.my...@spirent.com> wrote:
> > https://cloudinit.readthedocs.io/en/latest/topics/network-config-format-v1.html
> > 
> > Currently only interface naming, static IP, routes and DNS are
> > supported.e
> > 
> > Signed-off-by: Charles Myers 
> > ---
> >  modules/cloud-init/Makefile  |   2 +-
> >  modules/cloud-init/main.cc   |  77 +---
> >  modules/cloud-init/network-module.cc | 369
> > +++
> >  modules/cloud-init/network-module.hh |  43 
> >  4 files changed, 465 insertions(+), 26 deletions(-)
> >  create mode 100644 modules/cloud-init/network-module.cc
> >  create mode 100644 modules/cloud-init/network-module.hh
> > 
> > diff --git a/modules/cloud-init/Makefile b/modules/cloud-
> > init/Makefile
> > index daee8c3..ae45b24 100644
> > --- a/modules/cloud-init/Makefile
> > +++ b/modules/cloud-init/Makefile
> > @@ -13,7 +13,7 @@ INCLUDES += -I$(HTTPSERVER_API_DIR)
> > 
> >  # the build target executable:
> >  TARGET = cloud-init
> > -CPP_FILES := client.cc cloud-init.cc data-source.cc main.cc
> > template.cc cassandra-module.cc json.cc
> > +CPP_FILES := client.cc cloud-init.cc data-source.cc main.cc
> > template.cc network-module.cc cassandra-module.cc json.cc
> >  OBJ_FILES := $(addprefix obj/,$(CPP_FILES:.cc=.o))
> >  DEPS := $(OBJ_FILES:.o=.d)
> > 
> > diff --git a/modules/cloud-init/main.cc b/modules/cloud-
> > init/main.cc
> > index 5af23f0..a6674ef 100644
> > --- a/modules/cloud-init/main.cc
> > +++ b/modules/cloud-init/main.cc
> > @@ -9,6 +9,7 @@
> >  #include 
> >  #include 
> >  #include "cloud-init.hh"
> > +#include "network-module.hh"
> >  #include "files-module.hh"
> >  #include "server-module.hh"
> >  #include "cassandra-module.hh"
> > @@ -24,59 +25,84 @@ using namespace std;
> >  using namespace init;
> >  namespace po = boost::program_options;
> > 
> > -// config_disk() allows to use NoCloud VM configuration method -
> > see
> > +// config_disk() allows to use NoCloud and ConfigDrive VM
> > configuration method - see
> >  // 
> > http://cloudinit.readthedocs.io/en/0.7.9/topics/datasources/nocloud.html
> > .
> > +// 
> > http://cloudinit.readthedocs.io/en/0.7.9/topics/datasources/configdrive.html
> > +//
> >  // NoCloud method provides two files with cnfiguration data
> > (/user-data and
> >  // /meta-data) on a disk. The disk is required to have label
> > "cidata".
> >  // It can contain ISO9660 or FAT filesystem.
> >  //
> > +// ConfigDrive (version 2) method uses an unpartitioned VFAT or
> > ISO9660 disk 
> > +// with files.
> > +// openstack/
> > +//  - 2012-08-10/ or latest/
> > +//- meta_data.json
> > +//- user_data (not mandatory)
> > +//  - content/
> > +//-  (referenced content files)
> > +//- 0001
> > +//- 
> > +// ec2
> > +//  - latest/
> > +//- meta-data.json (not mandatory)
> > +//- user-data
> > +//
> >  // config_disk() checks whether we have a second disk (/dev/vblkX)
> > with
> >  // ISO image, and if there is, it copies the configuration file
> > from
> > -// /user-data to the given file.
> > +// the user user-data file to the given file.
> >  // config_disk() returns true if it has successfully read the
> > configuration
> > -// into the requested file. It triest to get configuratioe from
> > first few
> > +// into the requested file. It tries to get configuration from
> > first few
> >  // vblk devices, namely vblk1 to vblk10.
> >  //
> >  // OSv implementation limitations:
> >  // The /meta-data file is currently ignored.
> >  // Only ISO9660 filesystem is supported.
> > -// The mandatory "cidata" volume label is not checked.
> > +// The mandatory "cidata" (NoCloud) and "config-2" (ConfigDrive)
> > volume labels are not checked.
> >  //
> >  // Example ISO image can be created by running
> >  // cloud-localds cloud-init.img cloud-init.yaml
> >  // The cloud-localds command is provided by cloud-utils package
> > (fedora).
> >  static bool config_disk(const char* outfile) {
> > +const char * userdata_file_paths[] {
> > +"/user-data",  // NoCloud
> > +"/openstack/latest/user_data", // ConfigDrive OpenStack
> > +"/ec2/latest/user-d

Re: [osv-dev] [PATCH] memory: enforce physical free memory ranges do not start at 0

2019-08-23 Thread Rick Payne


Great, thanks for that! This patch certainly solved my issue - its no
longer crashing at that point and is making much better progress...

Rick

On Sat, 2019-08-24 at 00:14 -0400, Waldemar Kozaczuk wrote:
> Most of the time the kernel code references memory using virtual
> addresses.
> However some allocated system structures like page tables use
> physical addresses.
> For that reason it is critical that physical addresses are never 0
> which for example
> in case of page table would mean that given entry is empty.
> 
> This patch enforces that free physical memory ranges registered
> during initial memory setup do NOT start at address 0. This can only
> be enforced
> before the physical range address gets translated to virtual one. To
> that effct we
> slightly modify the method mmu::free_initial_memory_range() to detect
> if passed in
> range start address is 0 and adjust it accordingly so that the very
> 1st
> page of the range is skipped in this case. We also remove similar
> unneccesary code
> from mempool.cc:free_initial_memory_range() that operates on virtual
> addresses.
> 
> Fixes #1049
> Fixes #1050
> 
> Signed-off-by: Waldemar Kozaczuk 
> ---
>  core/mempool.cc | 4 
>  core/mmu.cc | 4 
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/core/mempool.cc b/core/mempool.cc
> index 9b1a19d5..070f8c92 100644
> --- a/core/mempool.cc
> +++ b/core/mempool.cc
> @@ -1626,10 +1626,6 @@ void free_initial_memory_range(void* addr,
> size_t size)
>  if (!size) {
>  return;
>  }
> -if (addr == nullptr) {
> -++addr;
> ---size;
> -}
>  auto a = reinterpret_cast(addr);
>  auto delta = align_up(a, page_size) - a;
>  if (delta > size) {
> diff --git a/core/mmu.cc b/core/mmu.cc
> index dc39ddb1..f6036771 100644
> --- a/core/mmu.cc
> +++ b/core/mmu.cc
> @@ -1852,6 +1852,10 @@ void linear_map(void* _virt, phys addr, size_t
> size,
>  
>  void free_initial_memory_range(uintptr_t addr, size_t size)
>  {
> +if (!addr) {
> +++addr;
> +--size;
> +}
>  memory::free_initial_memory_range(phys_cast(addr), size);
>  }
>  
> -- 
> 2.20.1
> 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/83ff515bedac6d6865af6da7fa9810d7888f32fe.camel%40rossfell.co.uk.


Re: [osv-dev] Re: [PATCH] Fix bug in arch_setup_free_memory

2019-08-22 Thread Rick Payne
Hi Waldek,

Our tree is up to date with f7b6bee552b41f56a55, plus I manually
applied your patch from this thread (as at the time it wasn't
committed).

Error seems to happen regardless of memory size I specify - but we
can't go below 2GB for memory reasons (btw - we seem to just freeze
when we run out of memory?).

Happens whether we use SMP or not.

We do have some modifications to OSv (mainly - do not use DHCP as we're
specifying the IP addresses in the cloudinit file, and some other hooks
into the network code that we're not actually using in this instance).
We've been using those changes for quite a while now - and I doubt this
is related. Again, its hard for me to try with a stock OSv due to the
complexity of the setup (multiple interfaces, talking to database
server etc).

We're keen to keep moving forward with the OSv version due to the
number of very useful fixes you've found (that may be causing some of
the crashes we see in production).

Cheers,
Rick



On Thu, 2019-08-22 at 16:02 -0400, Waldek Kozaczuk wrote:
> Rick,
> 
> Does this error happen with specific memory configuration? Or is more
> generic? I have lost track if in this email thread we are still
> talking about error related to the change in memory allocation I made
> to use memory below kernel? Also are you using the latest master or
> 0.53 specifically?
> 
> I thought we are talking about error when one passes 1.01 or 1.02 GB
> as memory size. Is it true?
> 
> I understand we have found slew of possible other bugs. 
> 
> Sorry I am a bit confused,
> Waldek
> 
> On Thu, Aug 22, 2019 at 15:53 Rick Payne 
> wrote:
> > On Thu, 2019-08-22 at 21:49 +1100, Rick Payne wrote:
> > > On Thu, 2019-08-22 at 12:30 +0300, Nadav Har'El wrote:
> > > 
> > > > Please run "osv syms" to allow gdb to find your application
> > object
> > > > files, and show lines there. Perhaps it's a segfault inside
> > your
> > > > application, not the kernel?
> > > 
> > > I had, but I had forgotten to add our stuff to the usr.manifest
> > so
> > > the
> > > tool could find them. I think this is better (from a different
> > run,
> > > apologies):
> > 
> > Ok, with a debug build of the ERTS, it seems to be failing in the
> > garbage collector for the beam. At this point its probably
> > allocating
> > memory and moving objects around - so I'm a bit suspicious of the
> > changes in OSv in this area:
> > 
> > #44 
> > #45 0x15bfa75e in move_boxed (ptr=0x20006c2016e0, hdr=128,
> > hpp=0x200040f55738, 
> > orig=0x20005fbffcc0) at beam/erl_gc.h:91
> > #46 0x15c00014 in sweep (src_size=0, src=0x0, ohsz=0,
> > oh=0x20005f28 "@\002", 
> > type=ErtsSweepNewHeap, n_htop=0x20005fb0,
> > n_hp=0x20005fbffcc8)
> > at beam/erl_gc.c:2184
> > ---Type  to continue, or q  to quit---
> > #47 sweep_new_heap (n_hp=0x20005f28, n_htop=0x20005f001a58, 
> > old_heap=0x20005f28 "@\002", old_heap_size=0) at
> > beam/erl_gc.c:2237
> > #48 0x15bff060 in do_minor (p=0x20004943dd78,
> > live_hf_end=0xfff8, 
> > mature=0x20006b600028 "\200", mature_size=14266416,
> > new_sz=2072833, 
> > objv=0x20004943de28, nobj=3) at beam/erl_gc.c:1678
> > #49 0x15bfe1b4 in minor_collection (p=0x20004943dd78, 
> > live_hf_end=0xfff8, need=0, objv=0x20004943de28,
> > nobj=3, 
> > ygen_usage=1835980, recl=0x200040f55cf8) at beam/erl_gc.c:1426
> > #50 0x15bfc2cf in garbage_collect (p=0x20004943dd78, 
> > live_hf_end=0xfff8, need=0, objv=0x20004943de28,
> > nobj=3, fcalls=4000, 
> > max_young_gen_usage=0) at beam/erl_gc.c:746
> > #51 0x15bfc937 in erts_garbage_collect_nobump
> > (p=0x20004943dd78, need=0, 
> > objv=0x20004943de28, nobj=3, fcalls=4000) at beam/erl_gc.c:882
> > #52 0x15a8ecda in erts_execute_dirty_system_task
> > (c_p=0x20004943dd78)
> > at beam/erl_process.c:10543
> > #53 0x15a714bf in erts_dirty_process_main
> > (esdp=0x80007fc75d00)
> > at beam/beam_emu.c:1201
> > #54 0x15a8ac04 in sched_dirty_cpu_thread_func
> > (vesdp=0x80007fc75d00)
> > at beam/erl_process.c:8512
> > #55 0x15d0c7e8 in thr_wrapper (vtwd=0x202fea50) at
> > pthread/ethread.c:118
> > #56 0x40461c96 in
> > pthread_private::pthreadoperator() (
> > __closure=0xa0007f896a00) at libc/pthread.cc:114
> > #57 std::_Function_handler > pthread_private::pthread::pt

Re: [osv-dev] Re: [PATCH] Fix bug in arch_setup_free_memory

2019-08-22 Thread Rick Payne
On Thu, 2019-08-22 at 21:49 +1100, Rick Payne wrote:
> On Thu, 2019-08-22 at 12:30 +0300, Nadav Har'El wrote:
> 
> > Please run "osv syms" to allow gdb to find your application object
> > files, and show lines there. Perhaps it's a segfault inside your
> > application, not the kernel?
> 
> I had, but I had forgotten to add our stuff to the usr.manifest so
> the
> tool could find them. I think this is better (from a different run,
> apologies):

Ok, with a debug build of the ERTS, it seems to be failing in the
garbage collector for the beam. At this point its probably allocating
memory and moving objects around - so I'm a bit suspicious of the
changes in OSv in this area:

#44 
#45 0x15bfa75e in move_boxed (ptr=0x20006c2016e0, hdr=128,
hpp=0x200040f55738, 
orig=0x20005fbffcc0) at beam/erl_gc.h:91
#46 0x15c00014 in sweep (src_size=0, src=0x0, ohsz=0,
oh=0x20005f28 "@\002", 
type=ErtsSweepNewHeap, n_htop=0x20005fb0, n_hp=0x20005fbffcc8)
at beam/erl_gc.c:2184
---Type  to continue, or q  to quit---
#47 sweep_new_heap (n_hp=0x20005f28, n_htop=0x20005f001a58, 
old_heap=0x20005f28 "@\002", old_heap_size=0) at
beam/erl_gc.c:2237
#48 0x15bff060 in do_minor (p=0x20004943dd78,
live_hf_end=0xfff8, 
mature=0x20006b600028 "\200", mature_size=14266416,
new_sz=2072833, 
objv=0x20004943de28, nobj=3) at beam/erl_gc.c:1678
#49 0x15bfe1b4 in minor_collection (p=0x20004943dd78, 
live_hf_end=0xfff8, need=0, objv=0x20004943de28,
nobj=3, 
ygen_usage=1835980, recl=0x200040f55cf8) at beam/erl_gc.c:1426
#50 0x15bfc2cf in garbage_collect (p=0x20004943dd78, 
live_hf_end=0xfff8, need=0, objv=0x20004943de28,
nobj=3, fcalls=4000, 
max_young_gen_usage=0) at beam/erl_gc.c:746
#51 0x15bfc937 in erts_garbage_collect_nobump
(p=0x20004943dd78, need=0, 
objv=0x20004943de28, nobj=3, fcalls=4000) at beam/erl_gc.c:882
#52 0x15a8ecda in erts_execute_dirty_system_task
(c_p=0x20004943dd78)
at beam/erl_process.c:10543
#53 0x15a714bf in erts_dirty_process_main
(esdp=0x80007fc75d00)
at beam/beam_emu.c:1201
#54 0x15a8ac04 in sched_dirty_cpu_thread_func
(vesdp=0x80007fc75d00)
at beam/erl_process.c:8512
#55 0x15d0c7e8 in thr_wrapper (vtwd=0x202fea50) at
pthread/ethread.c:118
#56 0x40461c96 in
pthread_private::pthreadoperator() (
__closure=0xa0007f896a00) at libc/pthread.cc:114
#57 std::_Function_handler >::_M_invoke(const
std::_Any_data---Type  to continue, or q  to quit---
 &) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316
#58 0x403f9647 in sched::thread_main_c (t=0x83579040)
at arch/x64/arch-switch.hh:271
#59 0x4039a793 in thread_main () at arch/x64/entry.S:113
(gdb) 

Like I said, it could be the erlang ERTS but I think thats pretty
unlikely.

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/8d0280c9332d190ec647129e1a2afb30db2629db.camel%40rossfell.co.uk.


Re: [osv-dev] Re: [PATCH] Fix bug in arch_setup_free_memory

2019-08-22 Thread Rick Payne
On Thu, 2019-08-22 at 12:30 +0300, Nadav Har'El wrote:

> Please run "osv syms" to allow gdb to find your application object
> files, and show lines there. Perhaps it's a segfault inside your
> application, not the kernel?

I had, but I had forgotten to add our stuff to the usr.manifest so the
tool could find them. I think this is better (from a different run,
apologies):

#44 
#45 0x15a67675 in process_main ()
#46 0x15a714e9 in sched_thread_func ()
#47 0x15cbab5d in thr_wrapper ()
#48 0x40461c96 in
pthread_private::pthreadoperator() (
__closure=0xa0007fea4200) at libc/pthread.cc:114
#49 std::_Function_handler >::_M_invoke(const
std::_Any_data &) (__functor=...) at
/usr/include/c++/7/bits/std_function.h:316
#50 0x403f9647 in sched::thread_main_c (t=0x80007f4a3040)
at arch/x64/arch-switch.hh:271
#51 0x4039a793 in thread_main () at arch/x64/entry.S:113

process_main is something inside the beam/ERTS (which isn't compiled
with debug unfortunately). I'd guess its unlikely that its a bug in
there as the code is very widely tested.

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/aaf2886ca999f40dc6cc8cbbf627d1b825f5fa74.camel%40rossfell.co.uk.


Re: [osv-dev] Re: [PATCH] Fix bug in arch_setup_free_memory

2019-08-22 Thread Rick Payne
On Thu, 2019-08-22 at 10:33 +0300, Nadav Har'El wrote:
> You're right, it seems there's should be a "return" in the recursive
> case! 
> That being said, I think the spurious wakeup doesn't cause any harm,
> because the wait code rwlock::writer_wait_lockable() loops, and if a
> thread
> is woken while the lock is still taken, it just goes to sleep again.
> It will just lose it's good spot on the queue :-(

I wasn't sure that was the case. I put an assert in
writer_wait_lockable() (see below) and I was able to trigger it by
having 1 thread take the write lock twice, then a second thread attempt
to take the write lock. When the first thread released, the second
thread triggers the assert.

void rwlock::writer_wait_lockable()
{
while (true) {
if (write_lockable()) {
return;
}

_write_waiters.wait(_mtx);
assert((_wowner == sched::thread::current()) ||
   (_wowner == nullptr));

}
}

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/3337ae47fbb6a29f11440e3c5bc78549e6fbe8da.camel%40rossfell.co.uk.


Re: [osv-dev] Re: [PATCH] Fix bug in arch_setup_free_memory

2019-08-22 Thread Rick Payne
On Thu, 2019-08-22 at 10:47 +0300, Nadav Har'El wrote:
> Do you know how to get a backtrace from gdb?

Yes, see below. It wasn't running out of memory:

(gdb)  osv mem
Total Memory: 8589392896 Bytes
Mmap Memory:  2231259136 Bytes (25.98%)
Free Memory:  7464316928 Bytes (86.90%)

This is the most recent one:

(gdb) bt
#0  0x4022987e in abort (fmt=fmt@entry=0x406738f2 "exception
nested too deeply") at runtime.cc:121
#1  0x40397bf6 in sched::arch_cpu::enter_exception
(this=) at arch/x64/arch-cpu.cc:19
#2  sched::exception_guard::exception_guard (this=) at
arch/x64/arch-cpu.cc:37
#3  0x4039a97c in page_fault (ef=0x8001c048) at
arch/x64/mmu.cc:22
#4  
#5  0x40397d3b in safe_load (data=@0x8001032b13d0:
0x0, potentially_bad_pointer=0x2000af80) at arch/x64/safe-ptr.hh:33
#6  backtrace_safe (pc=pc@entry=0x8001032b1330, nr=nr@entry=128) at
arch/x64/backtrace.cc:26
#7  0x402295a6 in print_backtrace () at runtime.cc:79
#8  0x4022987c in abort (fmt=fmt@entry=0x40644c28 "Assertion
failed: %s (%s: %s: %d)\n") at runtime.cc:121
#9  0x402298eb in __assert_fail (expr=expr@entry=0x40658734
"current_id < rings.size()", file=file@entry=0x40658810
"include/lockfree/unordered_ring_mpsc.hh", 
line=line@entry=111, 
func=func@entry=0x40658840 ::emplace(unsigned long&, void const*&, unsigned int&, unsigned
int&, esource&)::__func__> "emplace") at runtime.cc:139
#10 0x4029e027 in unordered_ring_mpsc::emplace (this=)
at include/lockfree/unordered_ring_mpsc.hh:111
#11 random_harvestq_internal (somecounter=,
entropy=, count=, bits=,
origin=)
at bsd/sys/dev/random/random_harvestq.cc:164
#12 0x4039c02b in harvest_interrupt_randomness
(frame=0x8001032b2068, irq=32) at include/osv/intr_random.hh:22
#13 interrupt (frame=0x8001032b2068) at arch/x64/exceptions.cc:259
#14 
#15 console::isa_serial_console::putchar (ch=97 'a') at drivers/isa-
serial.cc:108
#16 console::isa_serial_console::write (this=, 
str=0x409c00c3 
"ge_range_allocator::alloc_aligned(unsigned long, unsigned long,
unsigned long, bool)+550>\n", len=) at drivers/isa-
serial.cc:79
#17 0x403431d8 in console::console_multiplexer::drivers_write
(len=1, 
str=0x409c00c2 
"age_range_allocator::alloc_aligned(unsigned long, unsigned long,
unsigned long, bool)+550>\n", this=0x409a97e0 )
at drivers/console-multiplexer.cc:49
#18 console::console_multiplexeroperator() (len=1, 
str=0x409c00c2 
"age_range_allocator::alloc_aligned(unsigned long, unsigned long,
unsigned long, bool)+550>\n", __closure=)
at drivers/console-multiplexer.cc:36
#19 std::_Function_handler
>::_M_invoke(const std::_Any_data &, const char *&&, unsigned long &&)
(__functor=..., __args#0=, __args#1=) at
/usr/include/c++/7/bits/std_function.h:316
#20 0x40343d33 in std::function::operator()(char const*, unsigned long) const
(__args#1=, __args#0=, 
this=0x409a9840 ) at
/usr/include/c++/7/bits/std_function.h:706
#21 console::LineDiscipline::write(char const*, unsigned long,
std::function&)
(this=0xa001017d9900, 
str=0x409c00c3 
"ge_range_allocator::alloc_aligned(unsigned long, unsigned long,
unsigned long, bool)+550>\n", len=, writer=...)
at drivers/line-discipline.cc:179
#22 0x40343571 in console::console_multiplexer::write_ll (
this=this@entry=0x409a97e0 , str=str@entry=0x409c00a0
 "\n[backtrace]\n", 
len=) at drivers/console-multiplexer.cc:71
#23 0x40342e33 in console::write_ll (msg=msg@entry=0x409c00a0
 "\n[backtrace]\n", len=) at
drivers/console.cc:63
#24 0x403d9b68 in debug_ll (fmt=fmt@entry=0x406740d1 "RIP:
0x%016lx <%s>\n") at core/debug.cc:250
#25 0x4039e752 in dump_registers (ef=ef@entry=0x8001b04
8) at arch/x64/dump.cc:20
#26 0x40334a21 in mmu::vm_sigsegv (addr=,
ef=0x8001b048) at core/mmu.cc:1314
#27 0x4033723a in mmu::vm_fault (addr=, 
addr@entry=18446603337326395384, ef=ef@entry=0x8001b048) at
core/mmu.cc:1337
#28 0x4039a9c1 in page_fault (ef=0x8001b048) at
arch/x64/mmu.cc:38
#29 
#30 memory::page_range_allocator::insert (pr=..., this=0x409c1300
) at core/mempool.cc:578
#31
memory::page_range_allocatoroperator()
(header=..., __closure=) at core/mempool.cc:751
#32
memory::page_range_allocator::for_each > (f=..., min_order=, 
this=0x409c1300 ) at core/mempool.cc:809
#33 memory::page_range_allocator::alloc_aligned (this=this@entry=0x409c
1300 , size=size@entry=2097152, 
offset=offset@entry=0, alignment=alignment@entry=2097152, 
fill=fill@entry=true) at core/mempool.cc:736
#34 0x403e7414 in memory::alloc_huge_page (N=N@entry=2097152)
at core/mempool.cc:1601
#35 0x4033c5ee in
mmu::uninitialized_anonymous_page_provider::map (this=0x40930030
, offset=56623104, ptep=..., pte=...,
write=)
---Type  to continue, or q  to quit---
at core/mmu.cc:1037
#36 0x4033b919 in mmu::populate<(mmu::account_opt)0>::page<1>
(offset=56623104, 

Re: [osv-dev] Re: [PATCH] Fix bug in arch_setup_free_memory

2019-08-21 Thread Rick Payne
On Wed, 2019-08-21 at 13:21 +0300, Nadav Har'El wrote:
> 
> This is often not the problem itself, but rather a result of an
> earlier bug, which caused
> us to want to print an error message and that generated another
> error, and so on.

Understood.

Still working on testing 0.53, and I'm now seeing another page fault
issue:

page fault outsAssertion failed: sched::exception_depth <= 1
(core/sched.cc: reschedule_from_interrupt: 236)

[backtrace]


I get nothing more than that - no backtrace. Will work on getting a bit
more later. Could well be that I'm running this out of memory too...

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/4778d40111df68d37c23a0fa440557b8c6e0b59d.camel%40rossfell.co.uk.


Re: [osv-dev] Re: [PATCH] Fix bug in arch_setup_free_memory

2019-08-21 Thread Rick Payne
On Wed, 2019-08-21 at 12:22 +0300, Nadav Har'El wrote:
> I am guessing (need to verify...) that our rwlock implementation is
> not recursive - a thread already holding the write lock needs to wait
> (forever) for the read lock. If this is true, this is an rwlock bug.

So I was puzzled a bit by the rwlock code. It handles recursive write
lock acquisitions by incrementing a counter:

// recursive write lock
if (_wowner == sched::thread::current()) {
_wrecurse++;
}

On the unlock side though, it does decrement the counter but then goes
on to wake a write_waiter - which seems wrong. Probably I am missing
something - but why is the second part not inside the else clause where
_wowner is set to nullptr?

void rwlock::wunlock()
{
WITH_LOCK(_mtx) {
assert(_wowner == sched::thread::current());

if (_wrecurse > 0) {
_wrecurse--;
} else {
_wowner = nullptr;
}

if (!_write_waiters.empty()) {
_write_waiters.wake_one(_mtx);
} else {
_read_waiters.wake_all(_mtx);
}
}
}

I think you're right that if you hold the write lock and then try and
readlock it will fail:

bool rwlock::read_lockable()
{
return ((!_wowner) && (_write_waiters.empty()));
}

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/d75b970c54f9fe25cda79f2cc0aa0e7eaee3ec2f.camel%40rossfell.co.uk.


Re: [osv-dev] Resurrecting ARM support

2019-08-21 Thread Rick Payne
On Tue, 2019-07-30 at 05:22 -0700, claudio.font...@gmail.com wrote:
> Another issur might be with relocations and thread local variables,
> if I remember correctly there are a few things to fix there, will
> impact applications that use Thread local variables a lot. Possibily
> new relocations have been added which need support.
> 
> I hope somebody picks the work up, unfortunately I cannot justify
> working on it myself anymore.
> 
> Ciao good luck!

Thanks :)

So returning to poking this, I had a build issue between Ubuntu and
Fedora - where my build on Ubuntu 18.04 using gcc 8.3.0 would fail to
run, but Nadav's build on Fedora 29 (using 8.1.1) worked.

However, updating my tree to the latest OSv I get a build failure on a
relocation issue:

  AS bootfs.S
  LINK loader.elf
build/debug.aarch64/arch/aarch64/boot.o: In function `start_elf':
/home/rickp/src/osv/arch/aarch64/boot.S:40:(.text+0x50): relocation
truncated to fit: R_AARCH64_LDST64_ABS_LO12_NC against symbol
`__loader_argc' defined in .bss section in build/debug.aarch64/loader.o
/home/rickp/src/osv/arch/aarch64/boot.S:40: warning: One possible cause
of this error is that the symbol is being referenced in the indicated
code as if it had a larger alignment than was declared where it was
defined.
make: *** [Makefile:1886: build/debug.aarch64/loader.elf] Error 1

The particular code in boot.S looks like:

adrpx3, __loader_argc
ldr x0, [x3, #:lo12:__loader_argc]
adrpx3, __loader_argv
ldr x1, [x3, #:lo12:__loader_argv]
bl  main

So the relocation is specified there. Is this a result of the code
movement done, or has the codebase grown in size enough to trip this up
now?

Anyone handy with arm assembler?

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/79369205909f02976dee0003a84c8aad1fd4eb96.camel%40rossfell.co.uk.


Re: [osv-dev] README

2019-08-21 Thread Rick Payne
On Sun, 2019-08-18 at 21:00 -0700, Waldek Kozaczuk wrote:
> Hi,
> 
> I have just pushed a commit to update the main README page to make it
> better reflect the current state of OSv.
> 
> If you see anything you want to be changed/improved/added/removed or
> if anything can be phrased better or is misspelled please let me
> know. If you are a committer, feel free to make the changes on the
> spot.

A slight nit - OSv can't run the Erlang BEAM runtime unmodified as it
relies on fork() to spawn 'port' processes. The setup in the OSv apps
directory (and my modified environment used by rebar3_osv) both have a
patched ERTS that does not build, or stubs out the parts requiring
fork.

Not a huge issue, but worth fixing I think.

Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/01c55421e795529286e9b9e308109f52c3f3e7ec.camel%40rossfell.co.uk.


Re: [osv-dev] Re: [PATCH] Fix bug in arch_setup_free_memory

2019-08-21 Thread Rick Payne


Thanks for this - I was hitting a wierd page fault issue on our
application as we've recently moved from 0.52 to the latest OSv.
Something like this, which occurs early on in startup:

Assertion failed: ef->rflags & processor::rflags_if (arch/x64/mmu.cc:
page_fault: 34)

[backtrace]
0x402298ea <__assert_fail+26>
0x4039aa30 
0x40399826 
0x4039c0a8 
0x4039a779 
0x40214ca1 
0x403f9646 
0x4039a7a2 

On fixing the memory to 2GB in virsh, the problem was fixed. Applying
your patch also fixed it, it seems.

Rick

On Tue, 2019-08-20 at 20:04 -0700, Waldek Kozaczuk wrote:
> This patch definitely fixes an apparent bug I introduced myself in
> the past. I have tested that issue #1048 goes away with 4,5,6, 7 or
> 8GB of memory. I have also verified using cli module that free memory
> is reported properly now.
> 
> However, there is still 1 question and 1 issue outstanding:
> 1. I do not understand how this bug arch_setup_free_memory() would
> lead to a page fault reported by issue 1048 or other "read errors"
> with higher memory (8GB, end so). I would expect this bug lead to OSv
> missing to use the memory above 1GB in the e820 block but still be
> able to operate properly without the page fault. Is there another
> underlying bug that this patch actually covers?
> 
> 2. After this patch the tst-huge.so does not pass - actually hangs or
> never completes. I have played with it a bit and discovered that it
> passes if I run it with the right amount of memory - 128M < m <= 1G,
> but fails with anything above 1GB (the deafult is 2GB). It could be
> that the test is flaky and has to have right amount of free memory to
> pass (?).
> 
> Here is the stacktrace of where it was stuck:
> 
> sched::thread::switch_to (this=this@entry=0x801ba040) at
> arch/x64/arch-switch.hh:108
> #1  0x403ff794 in sched::cpu::reschedule_from_interrupt
> (this=0x8001d040, called_from_yield=called_from_yield@entry=f
> alse, 
> preempt_after=..., preempt_after@entry=...) at core/sched.cc:339
> #2  0x403ffc8c in sched::cpu::schedule () at
> include/osv/sched.hh:1310
> #3  0x40400372 in sched::thread::wait (this=this@entry=0x
> 814a1040) at core/sched.cc:1214
> #4  0x40428072 in sched::thread::do_wait_for sched::wait_object > (mtx=...) at include/osv/mutex.h:41
> #5  sched::thread::wait_for (mtx=...) at
> include/osv/sched.hh:1220
> #6  waitqueue::wait (this=this@entry=0x408ec550
> , mtx=...) at core/waitqueue.cc:56
> #7  0x403e2d83 in rwlock::reader_wait_lockable
> (this=) at core/rwlock.cc:174
> #8  rwlock::rlock (this=this@entry=0x408ec520 )
> at core/rwlock.cc:29
> #9  0x4034ad98 in rwlock_for_read::lock (this=0x408ec520
> ) at include/osv/rwlock.h:113
> #10 std::lock_guard::lock_guard (__m=...,
> this=) at /usr/include/c++/8/bits/std_mutex.h:162
> #11
> lock_guard_for_with_lock::lock_guard_for_with_lock
> (lock=..., this=) at include/osv/mutex.h:89
> #12 mmu::vm_fault (addr=18446603337326391296, addr@entry=184466033373
> 26395384, ef=ef@entry=0x814a6068) at core/mmu.cc:1334
> #13 0x403a746e in page_fault (ef=0x814a6068) at
> arch/x64/mmu.cc:38
> #14 
> #15 0x403f2114 in memory::page_range_allocator::insert
> (this=this@entry=0x40904300 , pr=...)
> at core/mempool.cc:575
> #16 0x403ef83c in
> memory::page_range_allocatoroperator
> () (header=..., __closure=)
> at core/mempool.cc:751
> #17
> memory::page_range_allocatoroperator
> () (header=..., __closure=) at core/mempool.cc:736
> #18
> memory::page_range_allocator::for_each alloc_aligned(size_t, size_t, size_t,
> bool):: > (f=..., min_order= out>, this=0x40904300 ) at
> core/mempool.cc:809
> #19 memory::page_range_allocator::alloc_aligned (this=this@entry=0x40
> 904300 , size=size@entry=2097152, 
> offset=offset@entry=0, alignment=alignment@entry=2097152, 
> fill=fill@entry=true) at core/mempool.cc:736
> #20 0x403f0164 in memory::alloc_huge_page (N=N@entry=2097152)
> at core/mempool.cc:1601
> #21 0x4035030e in
> mmu::uninitialized_anonymous_page_provider::map (this=0x40873150
> , offset=83886080, 
> ptep=..., pte=..., write=) at include/osv/mmu-
> defs.hh:219
> #22 0x40355b94 in mmu::populate<(mmu::account_opt)1>::page<1>
> (offset=83886080, ptep=..., this=0x201ffd70)
> at include/osv/mmu-defs.hh:235
> #23 mmu::page, 1> (ptep=..., offset=83886080,
> pops=...) at core/mmu.cc:311
> #24 mmu::map_level, 2>::operator()
> (base_virt=35185397596160, parent=..., this=)
> at core/mmu.cc:437
> #25 mmu::map_level,
> 3>::map_range<2> (this=, ptep=...,
> base_virt=35184372088832, 
> slop=4096, page_mapper=..., size=132120576, vcur=)
> at core/mmu.cc:399
> #26 mmu::map_level, 3>::operator()
> (base_virt=35184372088832, parent=..., this=)
> at core/mmu.cc:449
> #27 mmu::map_level,
> 4>::map_range<3> (this=, ptep=...,
> base_virt=35184372088832, 
> slop=4096, page

Re: [osv-dev] Modernizing and cleaning build system

2019-03-31 Thread Rick Payne
On Sun, 2019-03-31 at 11:54 -0700, Waldek Kozaczuk wrote:
> The second group's need should be addressed by Capstan. So we should
> avoid duplication between what Capstan does well (and hopefully will
> do even better in future) and OSv build system. Now capstan packages
> are often generated using osv-apps and OSv build system.

As I've mentioned before - we're using a plugin I wrote to the erlang
build tool 'rebar3' which takes an erlang/OTP application and an OSv
base image and builts a unikernel image.

The reason for this approach is the slightly complex nature of making
the image. We're emulating what the OSv build system does, using cpio
to push in the components to the file system over a tcp port.

Is that what capstan does too? Or is there a simpler way that I've
missed?

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [osv-dev] Re: Problems to pass arguments to a JAR in run.yaml

2019-03-31 Thread Rick Payne


> On 31 Mar 2019, at 19:56, roberto battistoni  wrote:
> 
> Sorry but the DHCP offers the IP both in the NAT and BRIDGE configuration. I 
> think that the "forward" does not work in the bridge configuration. 

Why would it be forwarding in ‘bridge’ mode? I think you’re slightly confused. 
In bridge mode, you wouldn’t use a port forward as you could talk directly to 
the device.

Perhaps you can look at how QEMU is actually invoked but the two different 
options?

> For example:
> 
> capstan run -n "nat" -f "8000:8000" -e "--verbose /cli/cli.so" uni ==> THIS 
> WORKS and "curl http://localhost:8000/os/version"; returns correctly the 
> version "0.53"

I’m not sure what capstan does, but I think for NAT it installs a port forward 
to the QEMU options. Thus port 8000 on localhost is forwarded to OSv port 8000.

> capstan run -n "bridge" -f "8000:8000" -e "--verbose /cli/cli.so" uni ==> 
> THIS DOES NOT WORK and "curl http://localhost:8000/os/version"; returns "(7) 
> Failed to connect to localhost port 8000: Connection refused"

I believe in bridge mode it does not - you should be using 
http://:8000/ (which I think was 192.168.122.168 in your 
example).

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [osv-dev] Re: Problems to pass arguments to a JAR in run.yaml

2019-03-30 Thread Rick Payne
On Thu, 2019-03-28 at 14:20 +0100, roberto battistoni wrote:
> [I/211 dhcp]: Received DHCPACK message from DHCP server:
> 192.168.122.1 regarding offerred IP address: 192.168.122.168

This is the typical subnet used by libvirt/qemu, and is typically only
available locally on the machine unless you do something to route
incoming traffic to your machine.

Are you trying to get a 'public' IP addresses? What do you expect to
assign the IP address?

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [osv-dev] Re: AWS EC2 ARM instance support?

2019-03-25 Thread Rick Payne


Interesting on the Pi front. I have a couple spare so I can play with
(just not with me this week).

I'd suggest opening an issue on running on AWS arm64 instances,
separetely to the Pi one, for sure.

Perhaps a second one for the ENA driver as that would be useful for
both arm64 and x86 on AWS.

Rick

On Mon, 2019-03-25 at 13:44 -0700, Waldek Kozaczuk wrote:
> I wonder if we should create a dedicated issue to track it. There is
> an existing one - https://github.com/cloudius-systems/osv/issues/717
> - which seems very related. BTW per this issue somebody reported being able 
> to (almost) boot OSv on Raspberry Pi! 
> 
> There is also another issue tracking new KVM based EC2 instances
> support - https://github.com/cloudius-systems/osv/issues/924.
> 
> Finally I also wanted to point out that Firecracker team is actively
> working on ARM support as well - 
> https://github.com/firecracker-microvm/firecracker/issues/648 and 
> https://github.com/firecracker-microvm/firecracker/milestone/3. This
> would be another alternative to run OSv on and possibly easier as we
> already support virtio-mmio drivers which x64 firecracker exposes and
> I quite likely the arm version would support it as well. Just
> saying...
> 
> Waldek
> 
> On Monday, March 25, 2019 at 3:43:40 PM UTC-4, rickp wrote:
> > On Mon, 2019-03-25 at 20:19 +0100, Rick Payne wrote: 
> > > 
> > > The amazon A1 instances has these devices. I guess we need ENA 
> > > support 
> > > for Amazon's new hypervisors anyway... (SR-IOV). 
> > 
> > Another thing we'd need to do - the A1 instances use a GICv3,
> > whereas 
> > OSv supports GICv2. Whether we can just ignore the extra features
> > I'm 
> > not sure - but we'd have to do some work, as with the Amazon DTB, I
> > get 
> > this: 
> > 
> > arch-setup: failed to get GICv2 information from dtb. 
> > 
> > So GICv3 work, ENA driver (and NVMe if we don't have it) and we may
> > be 
> > on our way... 
> > 
> > Rick 
> > 
> > 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to osv-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [osv-dev] Re: AWS EC2 ARM instance support?

2019-03-25 Thread Rick Payne
On Mon, 2019-03-25 at 20:19 +0100, Rick Payne wrote:
> 
> The amazon A1 instances has these devices. I guess we need ENA
> support
> for Amazon's new hypervisors anyway... (SR-IOV).

Another thing we'd need to do - the A1 instances use a GICv3, whereas
OSv supports GICv2. Whether we can just ignore the extra features I'm 
not sure - but we'd have to do some work, as with the Amazon DTB, I get
this:

arch-setup: failed to get GICv2 information from dtb.

So GICv3 work, ENA driver (and NVMe if we don't have it) and we may be
on our way...

Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [osv-dev] Re: AWS EC2 ARM instance support?

2019-03-25 Thread Rick Payne
On Mon, 2019-03-25 at 12:05 +0100, Rick Payne wrote:
> Also once I get my console working, we hit the DTB issue as we're not
> specifying a device tree, which will be the next issue to work on.

I built the dtb for the amazon instance (alpine), and get a bit
further. Feels like the wrong thing to do with the qemu setup as we
should probably have one which describes the emulated hardware. I'm a
little suprised that qemu-system-aarch64 does't have this - so I'm
probably still missing something in my understanding.

The amazon A1 instances has these devices. I guess we need ENA support
for Amazon's new hypervisors anyway... (SR-IOV).

00:00.0 Host bridge: Amazon.com, Inc. Device 0200
Physical Slot: 0
Flags: fast devsel

00:01.0 Serial controller: Amazon.com, Inc. Device 8250 (prog-if 03
[16650])
Physical Slot: 1
Flags: fast devsel, IRQ 6
Memory at 80118000 (32-bit, non-prefetchable) [size=4K]
Kernel driver in use: serial

00:04.0 Non-Volatile memory controller: Amazon.com, Inc. Device 8061
(prog-if 02 [NVM Express])
Subsystem: Amazon.com, Inc. Device 
Physical Slot: 4
Flags: bus master, fast devsel, latency 0, IRQ 4, NUMA node 0
Memory at 8011 (32-bit, non-prefetchable) [size=16K]
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=3 Masked-
Capabilities: [bc] Power Management version 2
Kernel driver in use: nvme

00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter
(ENA)
Subsystem: Amazon.com, Inc. Elastic Network Adapter (ENA)
Physical Slot: 5
Flags: bus master, fast devsel, latency 0, IRQ 5
Memory at 80114000 (32-bit, non-prefetchable) [size=16K]
Memory at 8000 (32-bit, prefetchable) [size=1M]
Memory at 8010 (32-bit, non-prefetchable) [size=64K]
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=2 Masked-
Capabilities: [bc] Power Management version 2
Kernel driver in use: ena
Kernel modules: ena

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [osv-dev] Re: AWS EC2 ARM instance support?

2019-03-25 Thread Rick Payne


On Sun, 2019-03-24 at 18:39 +0100, Rick Payne wrote:
> On Sun, 2019-03-24 at 18:08 +0200, Nadav Har'El wrote:
> > 
> > $ qemu-system-aarch64 --version
> > QEMU emulator version 3.0.0 (qemu-3.0.0-3.fc29)
> 
> Hmm, I'm on:
> 
> QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.10)

So the update on this, after some offline work with Nadav, is that I
can build fine on Fedora29, but not on Ubuntu 18.04.2. Not sure why
yet.

Also once I get my console working, we hit the DTB issue as we're not
specifying a device tree, which will be the next issue to work on.

Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [osv-dev] Re: AWS EC2 ARM instance support?

2019-03-24 Thread Rick Payne
On Sun, 2019-03-24 at 18:08 +0200, Nadav Har'El wrote:
> 
> $ qemu-system-aarch64 --version
> QEMU emulator version 3.0.0 (qemu-3.0.0-3.fc29)

Hmm, I'm on:

QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.10)

I wonder if thats the issue. I'll try updating (though no Ubuntu
package for 18.04, so I'll see what I can do).

Thanks for the datapoints - very useful.

Cheers,
Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [osv-dev] Re: AWS EC2 ARM instance support?

2019-03-24 Thread Rick Payne
On Sun, 2019-03-24 at 17:36 +0200, Nadav Har'El wrote:
>  $ aarch64-linux-gnu-gcc --version
> aarch64-linux-gnu-gcc (GCC) 8.1.1 20180626 (Red Hat Cross 8.1.1-3)

I tried this version:

aarch64-linux-gnu-gcc (Ubuntu 8.2.0-1ubuntu2~18.04) 8.2.0

Same issue.

What qemu are you using?

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [osv-dev] Re: AWS EC2 ARM instance support?

2019-03-24 Thread Rick Payne
Hi Nadav,

On Sun, 2019-03-24 at 12:11 +0200, Nadav Har'El wrote:
> 
> Why the "-S"? It causes the machine not really to start.

Oh sorry, I was using the debugger as it just crashes everytime, so no
point not using it.

> OSv v0.53.0-3-g8cd7d8aa

So thats interesting, you're getting passed the point I can't. That
makes me wonder if its a compiler version issue. Which version of the
arm compiler are you using - I assume you're cross compiling?

> Which is a different problem than you reported.

Indeed, most odd...

> I don't know anything about "pl011", but maybe qemu doesn't emulate
> it,
> or our driver doesn't know how to recognize qemu's version of it?
> (just a wild guess).

Nope on both cases. qemu supports pl011 and we detect it.

> +console::aarch64_console.pl011.write("a", 1);

This line succeeds, I get an 'a' on the console.

> >  console::arch_early_console = console::aarch64_console.pl011;
> > +console::arch_early_console.write("b", 1);

We crash at this point as the assignment didn't happen and there is no
write method.

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [osv-dev] Re: AWS EC2 ARM instance support?

2019-03-23 Thread Rick Payne
On Fri, 2019-03-22 at 14:24 -0700, Waldek Kozaczuk wrote:
> I would be interested except I have literally zero experience with
> ARM (except I know how to spell it ;-))

 :)

> The only ARM machine I have access to is Raspberry PI 3 (
> https://www.raspberrypi.org/products/raspberry-pi-3-model-b/). It
> might be fun to get OSv either directly boot on it. I think it might
> be even possible to run Linux with KVM on Raspberry PI. 

So I'm not actually using hardware for now. I'm using qemu-system-
aarch64 as shown on the wiki page:

  qemu-system-aarch64 -S -s -nographic -machine virt -kernel
build/release.aarch64/loader.img -cpu cortex-a57 -m 1024M -append "
--nomount /tools/uush.so"

The problem I'm having is that the assignment of the pl011 serial
driver to the early console port isn't working, and thus we end up with
no method for console::arch_early_console.write. I made this quick
hack:

diff --git a/arch/aarch64/arch-setup.cc b/arch/aarch64/arch-setup.cc
index 4f4be836..24a4e6a8 100644
--- a/arch/aarch64/arch-setup.cc
+++ b/arch/aarch64/arch-setup.cc
@@ -185,7 +185,9 @@ void arch_init_early_console()
 }
 
 new (&console::aarch64_console.pl011) console::PL011_Console();
+console::aarch64_console.pl011.write("a", 1);
 console::arch_early_console = console::aarch64_console.pl011;
+console::arch_early_console.write("b", 1);
 int irqid;
 u64 addr = dtb_get_uart(&irqid);
 if (!addr) {

and I get 'a' printed, but the call to
console::arch_early_console.write fails.

Why can't we assign there?

Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[osv-dev] Re: AWS EC2 ARM instance support?

2019-03-22 Thread Rick Payne
On Mon, 2018-12-03 at 10:14 +0200, Nadav Har'El wrote:
> 
> Unfortunately, I haven't heard from anyone in the two groups who
> previously contributed the ARM support to OSv, so it's not making any
> progress.
> If I understand correctly, OSv still builds correctly for ARM (make
> arch=aarch64) but although the kernel supposedly worked, it didn't
> have disk drivers, so you can only build the example application into
> the kernel (i.e., a ramdisk, scripts/build image=... fs=ramfs).
> There's definitely no support for anything new which just came out.
> But if you're interested to work on it, you are very welcome to adopt
> OSv's ARM support.

I'm also interested in aarch64 support. I can get it to build the
release (though it tries to build the disk image using qemu-system-x86
which probably isn't going to help) - but it fails to start, hittng
entry_invalid in arch/aarch64/entry.S.

(gdb) bt
#0  entry_invalid () at arch/aarch64/entry.S:132
#1  0x402c6f8c in debug_early (msg=0x23c5 "", 
msg@entry=0x404f93a8 "OSv v0.53.0-3-g8cd7d8aa\n") at
core/debug.cc:271
#2  0x400d6898 in premain () at loader.cc:101
#3  0x400c004c in start_elf () at arch/aarch64/boot.S:37

The debug release fails to build at all, failing like this:

...
  AS bootfs.S
  LINK loader.elf
build/debug.aarch64/arch/aarch64/boot.o: In function `start_elf':
/home/rickp/src/osv-armtest/arch/aarch64/boot.S:40:(.text+0x50):
relocation truncated to fit: R_AARCH64_LDST64_ABS_LO12_NC against
symbol `__loader_argc' defined in .bss section in
build/debug.aarch64/loader.o
/home/rickp/src/osv-armtest/arch/aarch64/boot.S:40: warning: One
possible cause of this error is that the symbol is being referenced in
the indicated code as if it had a larger alignment than was declared
where it was defined.
make: *** [build/debug.aarch64/loader.elf] Error 1
Makefile:1873: recipe for target 'build/debug.aarch64/loader.elf'
failed
make failed. Exiting from build script

If anyone else is interesting in making some progress, let me know -
it'd be great to get this booting on the new AWS instances (or if you
have hardware, that'd be interesting too!)

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Hitting the assert in lockfree::mutex::unlock()

2019-01-24 Thread Rick Payne
On Thu, 2019-01-24 at 16:28 +0200, Nadav Har'El wrote:
> Before I invest too much more time into trying to think if there's a
> case you missed, can you please confirm or deny that this bug happens
> only in the futex-with-timeout use case?

Unfortunately, I can't at the moment. Its an erlang based application,
so I've no idea what the erts (virtual machine runtime) is doing
beneath our code. As I've not got it trapped in gdb yet, I don't know
how that futex is being used.

The erts/beam code is pretty complex to follow, but looking at
 erts/include/internal/pthread/ethr_event.h, I can see that it may be
calling futex with a timeout. Still investigating though.

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Hitting the assert in lockfree::mutex::unlock()

2019-01-23 Thread Rick Payne

Anyone seen a crash like this?

[backtrace]
0x0022926a <__assert_fail+26>
0x003d3884 
0x0041e889 
0x003e7aeb 
0x003e8086 <__syscall+1254>

The particular assert is this one:

while(true) {
wait_record *other = waitqueue.pop();
if (other) {
assert(other->thread() != sched::thread::current()); //
this thread isn't waiting, we know that :(
other->wake();
return;
}
// Some concurrent lock() is in progress (we know this because
of
// count) but it hasn't yet put itself on the wait queue.
if (++sequence == 0U) ++sequence;  // pick a number, but not 0
auto ourhandoff = sequence;
handoff.store(ourhandoff);
// If the queue is empty, the concurrent lock() is before
adding
// itself, and therefore will definitely find our handoff
later.
if (waitqueue.empty())
return;
// A thread already appeared on the queue, let's try to take
the
// handoff ourselves and awaken it. If somebody else already
took
// the handoff, great, we're done - they are responsible now.
if (!handoff.compare_exchange_strong(ourhandoff, 0U))
return;
}

Unfortunately, I don't have this trapped in gdb yet, so hard to do any
further debugging. We're working on that - but just wondered if anyone
had tripped over something similar?

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [PATCH] Handle wall clock MSR correctly

2018-11-20 Thread Rick Payne
On Tue, 2018-11-20 at 21:27 +0200, Nadav Har'El wrote:
> On Wed, Nov 14, 2018 at 10:19 AM Nadav Har'El 
> wrote:
> > On Tue, Oct 23, 2018 at 2:47 AM Rick Payne 
> > wrote:
> > Glauber, I'd love to get your opinion about this patch. Clearly our
> > original assumption that we can only call the wallclock MSR once,
> > was wrong. If I understand correctly, Linux guest does the same
> > thing as this patch - use the wallclock MSR every time the wall
> > clock is required - so this patch is the right way to go. But isn't
> > this slow, and perhaps pointless, to call this MSR every time the
> > wall clock is needed? Was this the intention when the pv clock ABI
> > was designed?
> > 
> 
> So, I consulted with Glauber in private, and his opinion was that as
> I noted later in this thread - Linux actually does *not* do what this
> patch does (write the wallclock MSR every time the wallclock is
> read). Glauber said that I was rightly worried that the approach in
> this patch - doing the MSR  on every wall clock read is is too costly
> (remember that every MSR write requires an exit to the hypervisor)

Thanks for that.

> A much more efficient approach could be to run this MSR every once in
> a while (e.g., once each second). One possibility could be to have a
> thread (or re-use some existing system thread) which periodically
> refreshes the wallclock offset. Another possibility is to have the
> wallclock function check if enough monotonic has passed since the
> last wallclock offset refresh, and if enough time has passed
> (remember to check in a thread-safe way!), refresh it. Each of these
> approaches has ugly sides, so I wonder what Linux does for this.
> Would be great if you or someone could try to understand what a Linux
> guest does here.

Do you have any idea how often the host is updating this value? That
would seem to be the critical factor. It seemed to change quite a bit
when I as doing some debugging, but I must admit this area is a bit of
a black box to me, so I was just happy when I found the obvious
mistake.

For instance, would updating it once a second during a call, be
sufficient? ie. have the kvmclock::wall_clock_boot() note the time it
was last updated, and then make the MSR call there after sufficient
time? The problem is, what is sufficient time?

Cheers,
Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Next (0.53.0 and onwards) OSv releases roadmap proposal

2018-11-13 Thread Rick Payne
On Tue, 2018-11-13 at 10:54 -0800, Waldek Kozaczuk wrote:
> COMPLETE BUT NOT COMMITTED PATCHES

Be nice if we could get the timing fix either committed or commented on
(I sent a second patch, but heard nothing). This is our major pain
point at the moment.

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [PATCH] Handle wall clock MSR correctly

2018-10-22 Thread Rick Payne


Something like this?

diff --git a/drivers/kvmclock.cc b/drivers/kvmclock.cc
index 68389dfb..4af229d5 100644
--- a/drivers/kvmclock.cc
+++ b/drivers/kvmclock.cc
@@ -31,6 +31,8 @@ protected:
 private:
 static bool _new_kvmclock_msrs;
 pvclock_wall_clock* _wall;
+u64 _wall_phys;
+msr _wall_time_msr;
 static percpu _sys;
 pvclock _pvclock;
 };
@@ -50,11 +52,11 @@ static u8 get_pvclock_flags()
 kvmclock::kvmclock()
 : _pvclock(get_pvclock_flags())
 {
-auto wall_time_msr = (_new_kvmclock_msrs) ?
- msr::KVM_WALL_CLOCK_NEW :
msr::KVM_WALL_CLOCK;
+_wall_time_msr = (_new_kvmclock_msrs) ?
+ msr::KVM_WALL_CLOCK_NEW : msr::KVM_WALL_CLOCK;
 _wall = new pvclock_wall_clock;
 memset(_wall, 0, sizeof(*_wall));
-processor::wrmsr(wall_time_msr, mmu::virt_to_phys(_wall));
+_wall_phys = mmu::virt_to_phys(_wall);
 }
 
 void kvmclock::init_on_cpu()
@@ -79,6 +81,7 @@ bool kvmclock::probe()
 
 u64 kvmclock::wall_clock_boot()
 {
+processor::wrmsr(_wall_time_msr, _wall_phys);
 return _pvclock.wall_clock_boot(_wall);
 }
 
Cheers,
Rick

On Mon, 2018-10-22 at 16:13 +0300, Nadav Har'El wrote:
> 
> On Mon, Oct 22, 2018 at 1:53 AM Rick Payne 
> wrote:
> > We need to write the wall clock MSR every time we use it, to ensure
> > we
> > get the updated value. This allows the guest OSv to track the time
> > of
> > the host correctly.
> 
> After this patch, the host NTP keeps the time accurate on the guest
> as well?
> 
> Wow, I don't know how we missed this... Glauber even wrote in Linux's
> virtual/kvm/msr.txt:
> 
>MSR_KVM_WALL_CLOCK_NEW:
>   
>  The hypervisor is only guaranteed to update this data at the
> moment of MSR write.
>  Users that want to reliably query this information more than
> once have
>  to write more than once to this MSR.
> 
> So it seems you're indeed right!
> 
> I seems that Linux as a guest also does what you did in this patch -
> use the MSR on every invocation.
> This is sad because it will slow down every call of
> osv::clock::wall::now(), although it won't
> slow down osv::clock::uptime::now() which is much more important
> (e.g., for the scheduler).
> 
> Glauber, do you know if we have any other option?
> 
> It's surprising for me that KVM cannot write to the boot wallclock
> address when it is changed (by Linux),
> instead relying on polling from the guest side.
> 
> Another thing we could do is to reread the wall clock periodically
> (e.g., once each second)
> but this will be ugly and cause probably not very desirable clock
> skips. 
> 
> 
> > Signed-off-by: Rick Payne 
> > ---
> >  drivers/kvmclock.cc | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/kvmclock.cc b/drivers/kvmclock.cc
> > index 68389dfb..2e182954 100644
> > --- a/drivers/kvmclock.cc
> > +++ b/drivers/kvmclock.cc
> > @@ -50,11 +50,8 @@ static u8 get_pvclock_flags()
> >  kvmclock::kvmclock()
> >  : _pvclock(get_pvclock_flags())
> >  {
> > -auto wall_time_msr = (_new_kvmclock_msrs) ?
> > - msr::KVM_WALL_CLOCK_NEW :
> > msr::KVM_WALL_CLOCK;
> >  _wall = new pvclock_wall_clock;
> >  memset(_wall, 0, sizeof(*_wall));
> > -processor::wrmsr(wall_time_msr, mmu::virt_to_phys(_wall));
> >  }
> > 
> >  void kvmclock::init_on_cpu()
> > @@ -79,6 +76,9 @@ bool kvmclock::probe()
> > 
> >  u64 kvmclock::wall_clock_boot()
> >  {
> > +auto wall_time_msr = (_new_kvmclock_msrs) ?
> > + msr::KVM_WALL_CLOCK_NEW :
> > msr::KVM_WALL_CLOCK;
> > +processor::wrmsr(wall_time_msr, mmu::virt_to_phys(_wall));
> 
> Perhaps we can shave off a few more nanoseconds if we saved
> wall_time_msr and mmu::virt_to_phys(_wall) and didn't need to
> recalculate them every time?
> 
> >  return _pvclock.wall_clock_boot(_wall);
> >  }
> > 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [PATCH] Handle wall clock MSR correctly

2018-10-22 Thread Rick Payne
On Mon, 2018-10-22 at 16:13 +0300, Nadav Har'El wrote:
> 
> After this patch, the host NTP keeps the time accurate on the guest
> as well?

Not verified yet - should be able to do that soon though. I suspect it
totally sorts the issue though.

> Another thing we could do is to reread the wall clock periodically
> (e.g., once each second)
> but this will be ugly and cause probably not very desirable clock
> skips. 

That was my thinking too.

> 
> Perhaps we can shave off a few more nanoseconds if we saved
> wall_time_msr and mmu::virt_to_phys(_wall) and didn't need to
> recalculate them every time?

I'll resubmit with those changes - though I was hoping I'd missed
somthing with respect to how it all worked :) It does seem somewhat
suboptimal to have to write the MSR each time :(

Cheers,
Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[PATCH] Handle wall clock MSR correctly

2018-10-21 Thread Rick Payne


We need to write the wall clock MSR every time we use it, to ensure we
get the updated value. This allows the guest OSv to track the time of
the host correctly.

Signed-off-by: Rick Payne 
---
 drivers/kvmclock.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/kvmclock.cc b/drivers/kvmclock.cc
index 68389dfb..2e182954 100644
--- a/drivers/kvmclock.cc
+++ b/drivers/kvmclock.cc
@@ -50,11 +50,8 @@ static u8 get_pvclock_flags()
 kvmclock::kvmclock()
 : _pvclock(get_pvclock_flags())
 {
-auto wall_time_msr = (_new_kvmclock_msrs) ?
- msr::KVM_WALL_CLOCK_NEW :
msr::KVM_WALL_CLOCK;
 _wall = new pvclock_wall_clock;
 memset(_wall, 0, sizeof(*_wall));
-processor::wrmsr(wall_time_msr, mmu::virt_to_phys(_wall));
 }
 
 void kvmclock::init_on_cpu()
@@ -79,6 +76,9 @@ bool kvmclock::probe()
 
 u64 kvmclock::wall_clock_boot()
 {
+auto wall_time_msr = (_new_kvmclock_msrs) ?
+ msr::KVM_WALL_CLOCK_NEW :
msr::KVM_WALL_CLOCK;
+processor::wrmsr(wall_time_msr, mmu::virt_to_phys(_wall));
 return _pvclock.wall_clock_boot(_wall);
 }
 
-- 
2.17.1


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: OSv time drifting when running under KVM

2018-10-19 Thread Rick Payne


Ok, so circling back to this as I'm still having issues on the deployed
boxes. I suspect from the e-mails below that this is caused by changes
in the host time not making it into OSv.

This seems to be related to KVM_WALL_CLOCK_NEW not being updated. The
docs say:

data: 4-byte alignment physical address of a memory area which
must be
in guest RAM. This memory is expected to hold a copy of the
following
structure:

struct pvclock_wall_clock {
u32   version;
u32   sec;
u32   nsec;
} __attribute__((__packed__));

whose data will be filled in by the hypervisor. The hypervisor
is only
guaranteed to update this data at the moment of MSR write.
Users that want to reliably query this information more than
once have
to write more than once to this MSR.

OSv sets the MSR once, and then uses the result repeatedly - in fact it
barrier()s it which is odd as we know it shouldn't change unless we
write the MSR - which makes me think there was some misunderstanding
here (which could easily be on my part!).

Indeed, when I use 'date' to grossly change the host time, the OSv
guest is not seeing this. However, if I modify pvclock::wall_clock_boot
to write the MSR each time, then it gets the change and my OSv guest
stays in-sync with the host.

So do I make the changes to do this each time - not sure of the
cost/penalty of doing this. Or am I missing something else?

How does ntp slew time - does it just set the host time incrementally,
or is there another interface its using (can't easily check, offshore
at present).

Cheers,
Rick

On Mon, 2018-09-17 at 11:17 +0300, Nadav Har'El wrote:
> 
> On Mon, Sep 17, 2018 at 4:20 AM, Rick Payne 
> wrote:
> > On Mon, 2018-09-17 at 01:41 +0300, Nadav Har'El wrote:
> > > I have a wild guess, but I'm not a big clock expert, and I'm
> > CCing
> > > Glauber who might have better ideas.
> > > 
> > > My guess is that you have ntp running in the *host*, but not in
> > the
> > > guest (we don't have an ntp client for OSv), and somehow this is
> > > causing this drift in wall-clock time between the two. My guess
> > (and
> > > again, this is a guess, I don't know if that's true), that the
> > > adjtime / adjtimex / ntp_adjtime or whatever system call that ntp
> > > uses to gradually adjust the time in the host, doesn't cause the
> > same
> > > adjustment to be propagated to the guest by the paravirtual clock
> > > mechanism (which probably relies on the clock frequency being
> > fixed,
> > > while adjtime tweaks it a bit).
> > 
> > So I'm on a boat, and no NTP server - however you seem to have
> > nailed
> > it.
> 
> On one hand, it's great that we understand this now, but on the other
> hand
> it's very sad that although the host keeps accurate time (or at least
> thinks
> that it does), it cannot just pass it to the guest (via kvm-clock) as
> we always
> implicitly assumed that it does.
> 
> It seems to me (but again, I'm not an expert on this), that if QEMU
> or KVM is unable
> to track the host ntp's adjtime() modifications, it needs to modify
> the *wall clock*
> value periodically to track the host's changing notion of how long
> ago the epoch was.
> 
> I suspect that this issue is not specific to OSv guests, and will
> also occur on
> Linux guests which do not run ntpd inside them. If this is indeed the
> case (and
> it would be great if you could verify this), I think we should ask
> from advice
> from the KVM experts on the KVM mailing list, what can be done.
> Popular
> wisdom on the web suggests that you must run an ntp client on your
> guest as
> well, and with some effort we can get some ntp client (e.g., chrony
> working on OSv).
> But in the long run, that would be sad for KVM - if KVM has the
> opportunity
> to pass the guest a perfectly accurate clock (based on ntp running in
> the
> host) it would be a waste not to seize that opportunity, and I wonder
> if there's
> a reason why not.
> 
>  
> > > 
> > > You can verify this guess by stopping the ntpd/chronyd demon in
> > the
> > > host and seeing if the drift remains or goes away.
> > 
> > I turned off the systemd timesync (timedatectl set-ntp off) and now
> > it
> > all works perfectly.
> 
> Wow, it's so sad to see that systemd took over yet another stand-
> alone
> daemon. Yet another nail the coffin of the Unix philosophy :-(
> 
> 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ELF linker woes with GraalVM

2018-10-19 Thread Rick Payne
On Fri, 2018-10-19 at 10:58 -0700, Waldek Kozaczuk wrote:
> Recently I have been playing with GraalVM (
> https://github.com/oracle/graal) to see if it is possible to run it
> on OSv. To that extent I created new OSv app - 
> https://github.com/cloudius-systems/osv-apps/tree/master/graalvm-example
> . As you can see it has a simple bootstrap main.so that loads a
> shared library libhello.so generated by GraalVM (similar to golang).
> The checked in line to build main.so is actually incorrect and should
> be 
> $(CC) -pie -o $@ $(CFLAGS) -I. main.c -L. -lhello -ldl
> 
> In any case the app crashes like so:
> page fault outside application, addr: 0x10d3f000
> [registers]

This looks very similar to the issue I had with GNU_RELRO sections. In
my case the getenv symbol was the cause of failure. I tried a few of
Nadav's suggestions but got no closer to solving it - then the problem
went away for me. Sorry I can't be much help.

(See my thread 'Page fault outside of application').

Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: OSv time drifting when running under KVM

2018-09-16 Thread Rick Payne
On Mon, 2018-09-17 at 01:41 +0300, Nadav Har'El wrote:
> I have a wild guess, but I'm not a big clock expert, and I'm CCing
> Glauber who might have better ideas.
> 
> My guess is that you have ntp running in the *host*, but not in the
> guest (we don't have an ntp client for OSv), and somehow this is
> causing this drift in wall-clock time between the two. My guess (and
> again, this is a guess, I don't know if that's true), that the
> adjtime / adjtimex / ntp_adjtime or whatever system call that ntp
> uses to gradually adjust the time in the host, doesn't cause the same
> adjustment to be propagated to the guest by the paravirtual clock
> mechanism (which probably relies on the clock frequency being fixed,
> while adjtime tweaks it a bit).

So I'm on a boat, and no NTP server - however you seem to have nailed
it.

> 
> You can verify this guess by stopping the ntpd/chronyd demon in the
> host and seeing if the drift remains or goes away.

I turned off the systemd timesync (timedatectl set-ntp off) and now it
all works perfectly.

> I tried to look on Google if my guess has any merit, and something
> which surprised me is that a lot of people suggest running ntpd on
> *both* host and guest. But if running ntpd on the host alone would
> have magically cause the guest's clock to also be accurate, why would
> anyone recommend running ntpd on the guest? So maybe the guest indeed
> misses the ntpd adjustments from the host? I couldn't find anyone
> discussing this. Maybe Glauber remembers something on this.

So now I need to find out what on earth the systemd service is doing -
as clearly its the root cause for my problem. Secondly I need to find
out what I can run on the host to keep the time in sync in a sane way
such that the guest OSv processes do not drift.

I'll check to see what the deployed service is using in terms of time
synchronisation...

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: OSv time drifting when running under KVM

2018-09-16 Thread Rick Payne
On Fri, 2018-09-14 at 12:12 -0700, Waldek Kozaczuk wrote:
> Stupid question: are you sure the KVM accellarion is actually enabled
> and kvm-clock actually used? If not OSv would fall back to hpet which
> we know we have some problems with.

The processor flags are: "sse3 cmpxchg16b x2apic clflush kvmclock
kvmclock2 kvm_pv_eoi kvmclock_stable"

I'm pretty sure that means its using the kvmclock, so I'm at a bit of a
loss to understand why its drifing (and its a second or more an hour,
so quite significant).

This has been running a few hours:

# curl http://192.168.x.x/os/date && TZ=UTC date
"Sun Sep 16 22:17:39 UTC 2018"
 Sun Sep 16 22:17:57 UTC 2018

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: OSv time drifting when running under KVM

2018-09-14 Thread Rick Payne
On Fri, 2018-09-14 at 16:25 -0700, Dor Laor wrote:
> Need to look at the guest log and the restapi

The log says:

/usr/bin/qemu-system-x86_64 -name xxx -S -machine pc-i440fx-
xenial,accel=kvm,usb=off -m 8192 -realtime mlock=off -smp
4,sockets=4,cores=1,threads=1 ...

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: OSv time drifting when running under KVM

2018-09-14 Thread Rick Payne
On Fri, 2018-09-14 at 12:12 -0700, Waldek Kozaczuk wrote:
> Stupid question: are you sure the KVM accellarion is actually enabled
> and kvm-clock actually used? If not OSv would fall back to hpet which
> we know we have some problems with.

My turn for a stupid question - how would I know?

I do 'virsh edit ...' to edit the XML and the first line is:



so I assumed its kvm based. I tried to ensure that kvmclock was used by
changing the timer settings to:

  

  

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


OSv time drifting when running under KVM

2018-09-13 Thread Rick Payne


We have a problem with OSv's wall clock drifting away from the
hypervisor's. For example, this VM has been running for under 24hrs,
and when I compare the hypervisor time, with that retrieved from the
httpserver-api, I get this:

$ curl http://192.168.x.x/os/date && TZ=UTC date
"Thu Sep 13 19:17:25 UTC 2018" Thu Sep 13 19:17:28 UTC 2018

The OSv images are being run under virsh control, using kvm. We've
tried a few configuration optiosn for the clock on KVM but its not
helping. Any ideas what we're doing wrong?

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Exodus: Lightweight relocation engine. Useful?

2018-02-02 Thread Rick Payne
On Fri, 2018-02-02 at 16:28 -0800, Dan Kaminsky wrote:
> That's where I hit my wall, but it seems pretty much a bullseye on
> your wheelhouse.  Suggestions?

I saw the exodus announcement too - very nice. I wondered if you could
alter it to emit a C program that dlopen()s the binary, looks up main()
and calls it? Thats what has been done in several places (for some of
the erlang 'port' functionality for instance).

The executable and libraries would still need to be pie, but with many
linux distros using that by default, it may not be so limiting...

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-30 Thread Rick Payne
On Tue, 2018-01-30 at 11:47 +0200, Nadav Har'El wrote:
> I have a vague feeling that fix_permissions() cannot just work on the
> whole object it needs to know which of the PT_LOAD segments (see
> file::load_segment()) the RELRO falls in, but I'm hazy on the
> details. Maybe even file::load_segment() maps the segment with the
> wrong alignment? But unfortunately, I cannot even reproduce the
> problem you are seeing (even though I do have gcc 7.2.1), let alone
> fix it.

Thanks for the insights. I should be able to make our erlang/otp repo
public shortly, then you can try and reproduce it with the code base
I'm using.

I'll try and get to it this week...

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-29 Thread Rick Payne
On Mon, 2018-01-29 at 11:43 +0200, Nadav Har'El wrote:
> 1. Your compiler defaults to "full relro" (-Wl,-z,now -Wl,-z,relro)
> but for some reason object::relocate_pltgot() doesn't recognize the
> bind_now.

FWIW, on both workign and non-working builds, I see '-pie -z now -z
relro' being passed to the linker stage for erlexec. I see very little
difference between the two :(

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-29 Thread Rick Payne
On Mon, 2018-01-29 at 12:27 +0200, Nadav Har'El wrote:
> Both versions used "-pie", not "-shared"?

Should be, yes. Its exactly the same build setup and the Makefile shows
'-pie' for LDFLAGS.

I don't think gcc7.2 contains any of the -mindirect-branch changes, so
thats a red-herring. I'll continue poking at this tomorrow (its getting
late here).

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-29 Thread Rick Payne
On Mon, 2018-01-29 at 11:43 +0200, Nadav Har'El wrote:
> 
> Hmm, I don't know, I wasn't aware anything like that changed.
> We usually change parts of the object marked by PT_GNU_RELRO to read-
> only in object::fix_permissions(), I'm guessing (but didn't check)
> this what caused the read-only page you're seeing.

I'll take a look.

> The compiler usually does NOT mark the .GOT.PLT section - for
> function lookup - as RELRO, because this needs to be modified after
> startup, every time a function is used for the first time;

Maybe I'm not following. The GNU_RELO sections look the same between
the 2 versions of erlexec. First one (-ubuntu17.10) fails, second one
is fine:

rickp@mo:~$ readelf --headers /usr/local/packages/OTP-20.0.5-OSv-
ubuntu17.10/erts-9.0.5/bin/erlexec | grep -2 RELRO
  GNU_STACK  0x 0x
0x
 0x 0x  RW 0x10
  GNU_RELRO  0xebe8 0x0020ebe8
0x0020ebe8
 0x0418 0x0418  R  0x1

rickp@mo:~$ readelf --headers /usr/local/packages/OTP-20.0.5-OSv/erts-
9.0.5/bin/erlexec | grep -2 RELRO
  GNU_STACK  0x 0x
0x
 0x 0x  RW 0x10
  GNU_RELRO  0xec08 0x0020ec08
0x0020ec08
 0x03f8 0x03f8  R  0x1

> Only when "-z now" is used during linking (DT_BIND_NOW object flag)
> do we do all the function lookups on startup (see
> object::relocate_pltgot()) and then, it's ok that the .GOT.PLT is
> also marked RELRO and made read-only.
> 
> I'm *guessing* (with no evidence) that one of the following happened:
> 1. Your compiler defaults to "full relro" (-Wl,-z,now -Wl,-z,relro)
> but for some reason object::relocate_pltgot() doesn't recognize the
> bind_now.

So there is definitely a difference in the binaries. In the one that
fails, getenv is defined like this, in the .rela.plt section:

0020ee30  00010007 R_X86_64_JUMP_SLO  getenv@GL
IBC_2.2.5 + 0

But in the one that works, its like this, .rela.dyn section:

0020ee28  00010006 R_X86_64_GLOB_DAT  getenv@GL
IBC_2.2.5 + 0

I see LDFLAGS being set to '-pie' so I don't really understand why the
first one is a jump slot, vs what I'd expect (GLOB_DAT).

> 2. Somehow the loop in object::relocate_pltgot() missed some of the
> functions - like getenv() 

I think its suspicious that getenv() is the first thing to be fixed up,
so I suspect its more fundamental.

> 3. Something in the new compiler changed the meaning of PT_GNU_RELRO
> or added other flags which confused object::fix_permissions() and
> caused it to make a page read-only when it shouldn't have.

Ok. I think I need to do some more reading on elf...

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-29 Thread Rick Payne
On Mon, 2018-01-29 at 10:54 +0200, Nadav Har'El wrote:
> This all seems reasonable.
> Maybe we somehow got the PLT becoming read-only, so we are getting a
> pagefault trying to write to it?
> Can you please try in gdb "osv mmap" and look at the mapping which
> includes the faulting address (0x1aa0fe28), is it read-write or
> read-only?

New build, so a slightly different address, but in the same range (and
its the same crash). I think you've nailed it though:
(gdb) up#6  0x003c451c in mmu::vm_fault
(addr=17592355974704, ef=0x83d82068) at
core/mmu.cc:13301330vm_sigsegv(addr, ef);(gdb) p/x
addr$1 = 0x1a20ee30
0x1a00 0x1a00f000 [60.0
kB]flags=fmF  perm=rx   offset=0x path=/otp/erts-
9.0.5/bin/erlexec0x1a20e000 0x1a20f000 [4.0
kB] flags=fmF  perm=roffset=0xe000 path=/otp/erts-
9.0.5/bin/erlexec0x1a20f000 0x1a21 [4.0
kB] flags=fmF  perm=rw   offset=0xf000 path=/otp/erts-
9.0.5/bin/erlexec
That address is in the second segment, and thus marked 'r'. Is gcc7
doing something different thats incompatible with the elf loader in
OSv? Related to the intel fiasco?
Cheers,Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Using stock PIE executables from standard distributions?

2018-01-28 Thread Rick Payne


> On 28 Jan 2018, at 22:23, Nadav Har'El  wrote:
> 
> However, sadly, we still do have bugs with PIE support that need to be fixed 
> before running PIEs on OSv becomes a hassle-free experience:
> 
> A bug which relatively-recently became relevant (as gcc changed) is 
> https://github.com/cloudius-systems/osv/issues/689, which prevents PIEs which 
> use getopt() with "optarg" from working.
> A harder bug is https://github.com/cloudius-systems/osv/issues/352 which I 
> think is still partially relevant - I think we still have problems with 
> thread-local variables in PIEs (but not in shared libraries).
> 
> Please check the PIE which interests you, and see if one of these bugs 
> affects you, or if there are any other bugs.
> Both the aforementioned bug reports contain also ideas on how to fix them, if 
> you're looking for 

Aha, and maybe my ERTS issue was another bug?

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-28 Thread Rick Payne


> On 24 Jan 2018, at 22:07, Rick Payne  wrote:
> 
> I don't believe so. I think this is right where erlexec is being started. 
> I'll work on verifying that now.

I fixed the problem by recompiling my Erlang ERTS system using gcc 6.2. Ubuntu 
17.10 has 7.2 which seems to be the issue. I did try 6.4 on Ubuntu 17.10 but 
that also failed.

So for now, my fix is to use gcc-6.2…

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-24 Thread Rick Payne

On 24/01/18 17:09, Rick Payne wrote:

Hi Geraldo,

On 23/01/18 19:58, Geraldo Netto wrote:

Hello Rick,

Rick, could you please, provide the full output with the -V ?
eg: scripts/run.py  -V


Its a custom build, I'm running it via qemu direct.


Here it is:

qemu-system-x86_64: -mon chardev=stdio,mode=readline,default: option 
'default' does nothing and is deprecated

OSv v0.24-497-gbb2c4e4e-rebar3
2 CPUs detected
Firmware vendor: SeaBIOS
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done
vga: Add VGA device instance
eth0: ethernet address: 00:11:11:11:11:01
virtio-blk: Add blk device instances 0 as vblk0, devsize=524288000
random: virtio-rng registered as a source.
random: intel drng, rdrand registered as a source.
random:  initialized
VFS: unmounting /dev
VFS: mounting zfs at /zfs
zfs: mounting osv/zfs from device /dev/vblk0.1
random: device unblocked.
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
program zpool.so returned 1
BSD shrinker: event handler list found: 0xa0010092a300
BSD shrinker found: 1
BSD shrinker: unlocked, running
[I/32 dhcp]: Broadcasting DHCPDISCOVER message with xid: [1711999488]
[I/32 dhcp]: Waiting for IP...
[I/32 dhcp]: Broadcasting DHCPDISCOVER message with xid: [507118272]
[I/32 dhcp]: Waiting for IP...
[I/32 dhcp]: Broadcasting DHCPDISCOVER message with xid: [1416187925]
[I/32 dhcp]: Waiting for IP...
[I/212 dhcp]: Received DHCPOFFER message from DHCP server: 192.168.122.1 
regarding offerred IP address: 192.168.122.61
[I/212 dhcp]: Broadcasting DHCPREQUEST message with xid: [1416187925] to 
SELECT offered IP: 192.168.122.61

[I/212 dhcp]: DHCP received hostname: osv

[I/212 dhcp]: Received DHCPACK message from DHCP server: 192.168.122.1 
regarding offerred IP address: 192.168.122.61
[I/212 dhcp]: Server acknowledged IP 192.168.122.61 for interface eth0 
with time to lease in seconds: 3600

eth0: 192.168.122.61
[I/212 dhcp]: Configuring eth0: ip 192.168.122.61 subnet mask 
255.255.255.0 gateway 192.168.122.1 MTU 1500

[I/212 dhcp]: Set hostname to: osv
Running from /init/00-cmdline: /usr/mgmt/cloud-init.so;

Running from /init/30-auto-02: /libhttpserver-api.so &!
httpserver: loaded plugin from path: 
/usr/mgmt/plugins/libhttpserver-api_fs.so

page fault outside application, addr: 0x1a20fe28
[registers]
RIP: 0x00492c7b int, void*, long)+67>

RFL: 0x00010206  CS:  0x0008  SS:  0x0010
RAX: 0x1a20fe28  RBX: 0xa00104529530  RCX: 
0x65746567  RDX: 0x006c38bb
RSI: 0x1a0011d6  RDI: 0x00dd8290  RBP: 
0x202fe690  R8:  0x0010
R9:  0x900100735000  R10: 0x202feb20  R11: 
0x0001e200  R12: 0x0009
R13: 0x900100735000  R14: 0x202feb20  R15: 
0x0001e200  RSP: 0x202fe660

Aborted


Cheers,
Rick

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-24 Thread Rick Payne

Hi Geraldo,

On 23/01/18 19:58, Geraldo Netto wrote:

Hello Rick,

Rick, could you please, provide the full output with the -V ?
eg: scripts/run.py  -V


Its a custom build, I'm running it via qemu direct.


I may be wrong but erlexec may not work in OSv

because OSv does not provide fork(), execXX(), ...
also, if I'm not mistaken, elf support is incomplete which means you
can only load native software in a dlopen() fashion
Other friends from this list may provide more information/details/fix
any misinformation i might have said


This is all taken care of already. FWIW, I have a rebar plugin which 
generates an image from the standard build (plus custom OTP to handle 
the issues you mention) - which I'll be releasing soon. Just need to get 
it working again after I updated both OTP relase and OSv...


Rick

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-24 Thread Rick Payne

Hi,

On 23/01/18 20:16, Nadav Har'El wrote:
I don't have any bright ideas, but just a few small comments below, 
hopefully (?) they will help something...


Appreciated...

This writes in "addr", which seems a reasonable address (doesn't seem 
like junk).
In object::resolve_pltgot() you can see the addr is _base + 
slot.r_offset maybe you
can print them and see with "nm"/"readelf" of the object being loaded if 
this offset

address makes sense (in the PLT section)?


So that made sense as far as I can see:

(gdb)
#9  0x00492c7b in elf::object::arch_relocate_jump_slot (
this=0xa0010327b400, sym=1, addr=0x1aa0fe28, addend=0)
at arch/x64/arch-elf.cc:109
109 *static_cast(addr) = symsym.relocated_addr();
(gdb) p symsym.obj._base
$1 = (void *) 0x0
(gdb) up
#10 0x003fdfd7 in elf::object::resolve_pltgot (
this=0xa0010327b400, index=0) at core/elf.cc:692
692 if (!arch_relocate_jump_slot(sym, addr, slot.r_addend)) {
(gdb) p slot.r_offset
$2 = 2162216
(gdb) p/x slot.r_offset
$3 = 0x20fe28
(gdb)

$ readelf -a _build/default/rel/dbgp_webapi/erts-9.0.5/bin/erlexec | 
grep 20fe28
0020fe28  00010007 R_X86_64_JUMP_SLO  
getenv@GLIBC_2.2.5 + 0



If the address is correct, maybe we have some sort of TLB flush problem or
something - we mapped the new area but some CPUs don't see it yet, e.g.,
from something like 
https://github.com/cloudius-systems/osv/commit/7e38453390d6c0164a72e30b2616b0f3c3025349
Can you reproduce this bug? If you can, you can confirm (or rule out) 
this wild guess by changing in

arch/x64/mmu.cc, flush_tlb_all(), the line

if (sched::thread::current()->is_app())

to if(false).

If the bug goes away, it can be related. If it doesn't go away, than 
it's not related.


I tried that, same crash.

But this is just a wild guess - probably wrong... I can't think of a 
better explanation now.



#10 0x003fdfd7 in elf::object::resolve_pltgot (
     this=0xa001042d9e00, index=0) at core/elf.cc:692
#11 0x004021ca in elf_resolve_pltgot (index=0,
obj=0xa001042d9e00)
     at core/elf.cc:1538
#12 0x0048727d in __elf_resolve_pltgot () at
arch/x64/elf-dl.S:47
#13 0xa001042d9e00 in ?? ()


This is strange, it's running dynamically-generated code, which calls 
getenv()?


I don't believe so. I think this is right where erlexec is being 
started. I'll work on verifying that now.


I have a start-otp.so which loads the erlexec and sets off a pthread to 
run it, so my hypothesis is that this is at the point that start-otp is 
loading up the erlexec library.


Cheers,
Rick

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Page fault outside of application

2018-01-23 Thread Rick Payne


A few moving parts, so not sure what is causing this - but trying to 
start an erlang application I'm seeing this:


eth0: 192.168.122.61
page fault outside application, addr: 0x1a60fe28
[registers]
RIP: 0x00492dd1 int, void*, long)+67>

C
(gdb) bt
#0  processor::cli_hlt () at arch/x64/processor.hh:248
#1  0x00209ac4 in arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  0x00499033 in osv::halt () at arch/x64/power.cc:24
#3  0x0022c65f in abort (fmt=0xa23855 "Aborted\n") at runtime.cc:132
#4  0x0022c522 in abort () at runtime.cc:98
#5  0x003c4b26 in mmu::vm_sigsegv (addr=17592360173096,
ef=0x800104713068) at core/mmu.cc:1316
#6  0x003c4bc2 in mmu::vm_fault (addr=17592360173096,
ef=0x800104713068) at core/mmu.cc:1330
#7  0x004887fd in page_fault (ef=0x800104713068)
at arch/x64/mmu.cc:38
#8  
#9  0x00492dd1 in elf::object::arch_relocate_jump_slot (
this=0xa001042d9e00, sym=1, addr=0x1a60fe28, addend=0)
at arch/x64/arch-elf.cc:109
#10 0x003fdfd7 in elf::object::resolve_pltgot (
this=0xa001042d9e00, index=0) at core/elf.cc:692
#11 0x004021ca in elf_resolve_pltgot (index=0, 
obj=0xa001042d9e00)

at core/elf.cc:1538
#12 0x0048727d in __elf_resolve_pltgot () at arch/x64/elf-dl.S:47
#13 0xa001042d9e00 in ?? ()
#14 0x042d9e00 in ?? ()
#15 0x in ?? ()

Any pointers as to how to debug this further? It seems to be trying to 
resolve symbols in 'erlexec' - specifically getenv.


Cheers
Rick

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[PATCH] lzloader miscompiles with gcc 6.2

2017-11-16 Thread Rick Payne
-O2 optimisation on lzloader when compiled with gcc 6.2.0 causes
the resulting image to fail to boot. Reducing the optimisation
resolves this problem.

Fixes #913

Signed-off-by: Rick Payne 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 0850f7c..8dd2371 100644
--- a/Makefile
+++ b/Makefile
@@ -458,7 +458,7 @@ $(out)/loader-stripped.elf.lz.o: $(out)/loader-stripped.elf 
$(out)/fastlz/lz
 
 $(out)/fastlz/lzloader.o: fastlz/lzloader.cc | generated-headers
$(makedir)
-   $(call quiet, $(CXX) $(CXXFLAGS) -O2 -m32 -fno-instrument-functions -o 
$@ -c fastlz/lzloader.cc, CXX $<)
+   $(call quiet, $(CXX) $(CXXFLAGS) -O0 -m32 -fno-instrument-functions -o 
$@ -c fastlz/lzloader.cc, CXX $<)
 
 $(out)/lzloader.elf: $(out)/loader-stripped.elf.lz.o $(out)/fastlz/lzloader.o 
arch/x64/lzloader.ld \
$(out)/fastlz/fastlz.o
-- 
2.7.4

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Memory limit?

2017-11-07 Thread Rick Payne
Hi,

> I am not sure how this will help, as the later malloc() can still fail when 
> it wants to allocate physically-contiguous memory.
> 
> One hack you can try to fix 
> https://github.com/cloudius-systems/osv/issues/854 and hopefully your issue 
> is to change in core/mempool.cc, the function std_malloc(), replace the
> 
> } else {
> ret = memory::malloc_large(size, alignment);
> }
> 
> by something like (completely untested!)

I was just writing when yours arrived. I worked around it like this:

iff --git a/core/mmu.cc b/core/mmu.cc
index f929412..2296612 100644
--- a/core/mmu.cc
+++ b/core/mmu.cc
@@ -1454,7 +1454,7 @@ static initialized_anonymous_page_provider 
page_allocator_init;
 static page_allocator *page_allocator_noinitp = &page_allocator_noinit, 
*page_allocator_initp = &page_allocator_init;

 anon_vma::anon_vma(addr_range range, unsigned perm, unsigned flags)
-: vma(range, perm, flags, true, (flags & mmap_uninitialized) ? 
page_allocator_noinitp : page_allocator_initp)
+: vma(range, perm, flags | mmap_small, true, (flags & mmap_uninitialized) 
? page_allocator_noinitp : page_allocator_initp)
 {
 }

I don't understand why file_vma is mmap_small whereas anon_vma isn't, but it 
prevents the later assert. Whether its correct or not is another matter :)

> Unfortunately, I'm not familiar with this complex templated code, only Gleb 
> is (CC'ed).
> Gleb, in commit 1b31de0e on of the changes you did was
> 
> -inline u64 pt_element_common::next_pt_pfn() const { return pfn(false); }
> +inline u64 pt_element_common::next_pt_pfn() const {
> +assert(!large());
> +return pfn();
> +}
> 
> Can you try to recall why you added this assert here (and in a couple of 
> other places too). If this assert is really justified, do you have any guess 
> what sort of bug may cause it to trigger?

I can get a better stack trace from the debug image, but it just shows slightly 
more detail. I'd need to stare at the mmu code for quite a bit longer before I 
get closer to understanding it - but the hack above has certainly allowed me to 
get more memory in use.

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Memory limit?

2017-11-06 Thread Rick Payne (Offshore)
> Out of memory: could not reclaim any further. Current memory: 5122256 Kb
> 
> This suggests there was 5GB free while the allocation failed.
> This *can* be a fragmentation issue (e.g., you asked for a 1 GB allocation, 
> but we couldn't free a 1GB consecutive area), but can also be a malloc() of a 
> ridiculous amount. Since commit 7ea953ca7d6533c025e535be49ee5bd2567fc8f3 a 
> malloc() of over the amount of memory we have prints a different error 
> message, but perhaps you still have some very large (but less than 10GB) 
> single allocation?
> 
> The sad thing is that since we fail in the memory reclaimer, not in the 
> malloc(), you know which malloc() failed. This is 
> https://github.com/cloudius-systems/osv/issues/585.
> One ad-hoc thing you can try is to connect with gdb, and see which OSv thread 
> is waiting in malloc - and see what malloc() it is trying to do.

In an attempt to work around this, I've been trying to get the BEAM vm to 
pre-allocate memory, which it does via mmap. However, this didn't help 
initially as the memory wasn't being populated. I altered the mmap calls to 
include MAP_POPULATE to get them filled at startup, and now I get this crash. 
The debug output is from the erlang runtime system's os_mmap function. It seems 
to turn from the first call to mmap for a 2GB chunk, but asserts shortly after 
that (and the following is all I get):

Attempting to mmap 2147483648 bytes to 0
mmaped 2147483648 bytes to address 2040
Assertion failed: !large() (arch/x64/arch-mmu.hh: next_pt_addr: 82)

[backtrace]
0x002281da <__assert_fail+26>
0x00331a35 
0x0033da0c , 
1>::operator()(mmu::hw_ptep<1>, unsigned long)+76>
0x0033dc4a , 
2>::operator()(mmu::hw_ptep<2>, unsigned long)+314>
0x0033debc , 
3>::operator()(mmu::hw_ptep<3>, unsigned long)+284>
0x0033e11d  
>(unsigned long, unsigned long, unsigned long, 
mmu::populate<(mmu::account_opt)1>&, unsigned long)+413>
0x0033ee0f (mmu::vma*, void*, unsigned long, 
bool)+1231>
0x00337521 
0x00459345 

Any clues?

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Tracepoints

2017-11-04 Thread Rick Payne

Is it possible to add tracepoints at runtime? It would be very nice to be able 
to add tracepoints to code other than the OSv code - ie, to be able to create 
them in a golang app, or from erlang (via a NIF). I’d be happy to write the 
erlang NIF if someone could point me to how to create them (if its even 
possible)?

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Memory limit?

2017-11-02 Thread Rick Payne
> This suggests there was 5GB free while the allocation failed.
> This *can* be a fragmentation issue (e.g., you asked for a 1 GB allocation, 
> but we couldn't free a 1GB consecutive area), but can also be a malloc() of a 
> ridiculous amount. Since commit 7ea953ca7d6533c025e535be49ee5bd2567fc8f3 a 
> malloc() of over the amount of memory we have prints a different error 
> message, but perhaps you still have some very large (but less than 10GB) 
> single allocation?

I don’t think so. I’m running the erlang OTP BEAM virtual machine though.

> The sad thing is that since we fail in the memory reclaimer, not in the 
> malloc(), you know which malloc() failed. This is 
> https://github.com/cloudius-systems/osv/issues/585.
> One ad-hoc thing you can try is to connect with gdb, and see which OSv thread 
> is waiting in malloc - and see what malloc() it is trying to do.

It fails allocating 25MB:

#9  0x003dde15 in std_malloc (size=size@entry=24117248, 
alignment=alignment@entry=16)
at core/mempool.cc:1559
1559ret = memory::malloc_large(size, alignment);

The size surprises me a little - so I need to understand whats going on. 
However, as you say, there is plenty of memory available so I guess it is 
fragmentation.

> We also have an issue about huge malloc() calls not actually needing 
> consecutive memory: https://github.com/cloudius-systems/osv/issues/854 - if 
> this is fixed it will be easier to allocate large amounts of contiguous 
> memory with malloc().

Fixing that is probably beyond my understanding of OSv at the moment…

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Memory limit?

2017-11-01 Thread Rick Payne

I’m stressing OSv a bit, and though I start the VM with 10G of memory, it seems 
to fail after just over 5GB. Is there a limit that I’m hitting, or perhaps my 
memory usage is fragmenting things too much?

Out of memory: could not reclaim any further. Current memory: 5122256 Kb
[backtrace]
0x003d9fcc 
0x003da861 
0x003da8eb 
0x003f09d6 
0x00391de2 

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: OSv image under ProxMox VE5.1

2017-10-30 Thread Rick Payne

> On 31 Oct 2017, at 01:45, Player, Timmons  wrote:
> 
> I’ve run into the same issue under VMware on VM’s that lack a serial port.  
> I’ve used the following patch locally with success…

Ah yes, that works too, thanks!

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: OSv image under ProxMox VE5.1

2017-10-30 Thread Rick Payne


> On 30 Oct 2017, at 17:40, Rick Payne  wrote:
> 
> 
> Anyone tried this? I took one of my apps, converted it to a raw image and dd 
> that into the ProxMox disk container. It boots, but seems to get horribly 
> confused by the console - continuously receiving something and taking 100% of 
> the CPU.

If anyone else gets stuck on this, I found the solution was to do this on the 
ProxMox server:

  qm set 101 -serial0 socket

where 101 is the VM IS.

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


OSv image under ProxMox VE5.1

2017-10-29 Thread Rick Payne

Anyone tried this? I took one of my apps, converted it to a raw image and dd 
that into the ProxMox disk container. It boots, but seems to get horribly 
confused by the console - continuously receiving something and taking 100% of 
the CPU.

I tried with just a very simple image (cli,http-server) and that boots and 
stops at the prompt, but I can’t type anything. Again, its consuming 100% of 
CPU.

Any suggestions, or anyone else tried this?

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: lzloader compile issue

2017-10-01 Thread Rick Payne (Offshore)
> Thanks, I missed your original mail on this last year. I now opened an issue 
> about this, with your suggested workaround:
> https://github.com/cloudius-systems/osv/issues/913

I saw, thanks.

> Given the vast number of compiler versions that people use, it's not 
> surprising that once in a while we get a miscompiled
> and non-working OSv from one of them. Recently I worked around (see commit 
> c38806f49cd2d0610227b62ee7170983684c6987)
> a bug which caused the round() function to hang in an infinite loop (which 
> also caused the VM to hang while using all the CPU).

Understood - and I did see your round() fix and was hoping it fixed something 
for me - but alas not.

> It would be nice if you could try whether other optimization options (e.g., 
> -O1) are enough to prevent this bug or -O0 is really
> needed. With some experimentation/printouts/debugging I guess you can also 
> figure out what hangs, but I don't know how difficult
> it would be.
> 
> In your original patch (see issue 913) you changed only one source file to be 
> compiled without -O2, but in your latest patch, you
> changed more source files - was this necessary? In your original patch, you 
> only changed compilation of lzloader.cc; This is a trivial
> source file, and I don't mind at all to permanently compile this specific 
> file without optimization. However, the newer patch also changes the 
> compilation of
> the uncompression implementation - this can have effect in boot performance, 
> so if needed to change that too, we would need to
> measure its effect on boot speed.

It does indeed seem that the original patch is sufficient. It wasn’t at one 
time, I’m sure - but for now I can use the single change for both debug and 
release images.

$(call quiet, $(CXX) $(CXXFLAGS) -O0 -m32 -fno-instrument-functions -o $@ 
-c fastlz/lzloader.cc, CXX $<)

Its such a small file, but my assembler foo is very rusty. Attached are 
objdumps of the two resulting compiles of lzloader.o, one with -O0 which 
results in a working image, and one with -O1 with results in a hang as the VM 
starts.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

build/release.x64/fastlz/lzloader.o: file format elf32-i386


Disassembly of section .text:

 :
extern char _binary_loader_stripped_elf_lz_end;
extern char _binary_loader_stripped_elf_lz_size;

// std libraries used by fastlz.
extern "C" void *memset(void *s, int c, size_t n)
{
   0:   55  push   %ebp
   1:   89 e5   mov%esp,%ebp
   3:   53  push   %ebx
   4:   83 ec 04sub$0x4,%esp
   7:   e8 fc ff ff ff  call   8 
   c:   05 01 00 00 00  add$0x1,%eax
return __builtin_memset(s, c, n);
  11:   83 ec 04sub$0x4,%esp
  14:   ff 75 10pushl  0x10(%ebp)
  17:   ff 75 0cpushl  0xc(%ebp)
  1a:   ff 75 08pushl  0x8(%ebp)
  1d:   89 c3   mov%eax,%ebx
  1f:   e8 fc ff ff ff  call   20 
  24:   83 c4 10add$0x10,%esp
}
  27:   8b 5d fcmov-0x4(%ebp),%ebx
  2a:   c9  leave  
  2b:   c3  ret

002c :

extern "C" void uncompress_loader()
{
  2c:   55  push   %ebp
  2d:   89 e5   mov%esp,%ebp
  2f:   53  push   %ebx
  30:   83 ec 04sub$0x4,%esp
  33:   e8 fc ff ff ff  call   34 
  38:   05 01 00 00 00  add$0x1,%eax
// pass a the last (maxout) parameter of fastlz_decompress. Let it
// uncompress as much as it has input. The Makefile already verifies
// that the uncompressed kernel doesn't overwrite this uncompression code.
// Sadly, "INT_MAX" is the largest number we can pass. If we ever need
// more than 2GB here, it won't work.
fastlz_decompress(&_binary_loader_stripped_elf_lz_start,
  3d:   8b 90 00 00 00 00   mov0x0(%eax),%edx
(size_t) &_binary_loader_stripped_elf_lz_size,
BUFFER_OUT, INT_MAX);
  43:   68 ff ff ff 7f  push   $0x7fff
  48:   68 00 00 20 00  push   $0x20
  4d:   52  push   %edx
  4e:   8b 90 00 00 00 00   mov0x0(%eax),%edx
  54:   52  push   %edx
  55:   89 c3   mov%eax,%ebx
  57:   e8 fc ff ff ff  call   58 
  5c:   83 c4 10add$0x10,%esp
}
  5f:   90  nop
  60:   8b 5d fcmov-0x4(%ebp),%ebx
  63:   c9  leave  
  64:   c3  ret

Disassembly of section .text.__x86.get_pc_thunk.ax:

 <__x86.get_pc_thunk.ax>:
{
   0:   8b 

lzloader compile issue

2017-09-30 Thread Rick Payne
Hi,

I’m still having to apply this patch to get OSv to work on my Ubuntu 16.10 box. 
Without it, the produced image fails to start even for the initial cpio phase. 
Its clearly some optimisation issue with gcc 6.2 (gcc version 6.2.0 20161005 
(Ubuntu 6.2.0-5ubuntu12)), so just in case anyone else is having a problem, 
this is what I use to fix it:

Cheers,
Rick

diff --git a/Makefile b/Makefile
index 8372cd8..50fb396 100644
--- a/Makefile
+++ b/Makefile
@@ -446,11 +446,11 @@ $(out)/arch/x64/boot32.o: $(out)/loader.elf

$(out)/fastlz/fastlz.o:
   $(makedir)
-   $(call quiet, $(CXX) $(CXXFLAGS) -O2 -m32 -fno-instrument-functions -o 
$@ -c fastlz/fastlz.cc, CXX fastlz/fastlz
.cc)
+   $(call quiet, $(CXX) $(CXXFLAGS) -O0 -g -m32 -fno-instrument-functions 
-o $@ -c fastlz/fastlz.cc, CXX fastlz/fas
tlz.cc)

$(out)/fastlz/lz: fastlz/fastlz.cc fastlz/lz.cc | generated-headers
   $(makedir)
-   $(call quiet, $(CXX) $(CXXFLAGS) -O2 -o $@ $(filter %.cc, $^), CXX $@)
+   $(call quiet, $(CXX) $(CXXFLAGS) -O0 -g -o $@ $(filter %.cc, $^), CXX 
$@)

$(out)/loader-stripped.elf.lz.o: $(out)/loader-stripped.elf $(out)/fastlz/lz
   $(call quiet, $(out)/fastlz/lz $(out)/loader-stripped.elf, LZ 
loader-stripped.elf)
@@ -459,7 +459,7 @@ $(out)/loader-stripped.elf.lz.o: $(out)/loader-stripped.elf 
$(out)/fastlz/lz
$(out)/fastlz/lzloader.o: fastlz/lzloader.cc | generated-headers
   $(makedir)
   #$(call quiet, $(CXX) $(CXXFLAGS) -O2 -m32 -fno-instrument-functions -o 
$@ -c fastlz/lzloader.cc, CXX $<)
-   $(call quiet, $(CXX) $(CXXFLAGS) -g -m32 -fno-instrument-functions -o 
$@ -c fastlz/lzloader.cc, CXX $<)
+   $(call quiet, $(CXX) $(CXXFLAGS) -O0 -g -m32 -fno-instrument-functions 
-o $@ -c fastlz/lzloader.cc, CXX $<)

$(out)/lzloader.elf: $(out)/loader-stripped.elf.lz.o $(out)/fastlz/lzloader.o 
arch/x64/lzloader.ld \
   $(out)/fastlz/fastlz.o

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: SO_BINDTODEVICE

2017-08-28 Thread Rick Payne
Hi,

> If you come across such differences, please open issues in the OSv bug 
> tracker, so even if we don't fix them immediately, we'll fix them eventually.
> 
> The intention of OSv is to be compatible with Linux's ABI, not BSD, so such 
> differences should be - eventually - eliminated.

Understood. I’ll start to open issues. There may be a few of them.

> Maybe, I'm not familiar with this ioctl so I don't really know if it's needed 
> or not. You can send the patch and we'll see. You can also open an issue on 
> the bug tracker and attach a patch, for future consideration if anybody will 
> want these ioctls in the future.

Its not really the ioctl I’m talking about - its the command to the hypervisor 
to adjust the mac address filter tables. I will probably want to be able to add 
more unicast addresses to that (in the VRRP case, we have to receive frames on 
our mac address and the ‘virtual ip’ mac address).

> Are you running qemu in bridged mode (run.py -n, or similar)? If so, I think 
> that yes, IGMP snooping happens by default on the bridge. E.g.,
> $ cat /sys/class/net/virbr0/bridge/multicast_snooping
> 1

Yes, thats why I can receive the VRRP multicast frames it seems. Internally, 
the bsd stack is adding the multicast mac address to the filter list, which 
ends up in the virtio-net if_ioctl code which doesn’t actually do anything. I 
fixed that so it builds the list of macs to send to the hypervisor via the 
virtio-net control channel. As I’ll need it for the unicast ones ultimately, I 
needed the code done. I’ll submit it in a bug in case anyone else wants it.

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: SO_BINDTODEVICE

2017-08-26 Thread Rick Payne (Offshore)

> I saw that. I also noticed that multicast won’t work because the SIOCADDMULTI 
> call does nothing in the hypervisor drivers (so the multicast mac addresses 
> are never added to the filter tables for the interface concerned). I’m part 
> way through the code to use the controlq to handle that for virtio-net. I can 
> probably do the IFF_PROMISC flag at the same time.

So a quick update on this. I think I can do without the SO_BINDTODEVICE as the 
likelihood is that there is only a single interface involved. At least, I have 
this working without the SO_BINDTODEVICE now. I suspect if I had more than one 
interface, I couldn’t determine easily which interface the packet really 
arrived on though, and possibly would have issues sending to more than 1 
interface due to the routing.

However it has highlighted how different the net stack is compared to linux. 
There are a number of other gotchas - for instance, when receiving message from 
a SOCK_RAW socket, Linux gives you the raw packet as is. The OSv/BSD stack 
gives you the full packet but the length in the ip header is in native endian 
and has had the ip header length subtracted. When sending raw frames, the OSv 
stack overwrites the TTL field (at least for multicast frames) regardless of 
what you set in the raw header (even with IP_HDRINCL set).

I guess we should probably note that the network stack is not 100% compatible 
with linux on the ABI Compatibility page - just to avoid surprises.

One final thing - I wrote the code for virtio-net to handle SIOCADDMULTI and 
SIOCDELMULTI, but it seems that they may not be needed after all. Maybe 
qemu/kvm does IGMP snooping or something as I receive the multicast frames in 
OSv without my changes (wasted effort!). Is there any interest in the patch 
just for completeness?

Cheers,
Rick


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: SO_BINDTODEVICE

2017-08-21 Thread Rick Payne (Offshore)

> On 22 Aug 2017, at 01:33, Nadav Har'El  wrote:
> 
> 
> I don't remember what libpcap uses on Linux. Please try :-)

Haven’t had a chance yet.

> If I remember correctly (I have't touched this stuff in two decades ;-)), 
> libpcap was implemented on BSD using a "BPF" (Berkeley Packet Filter) 
> bytecode language. I don't think we ever made any effort to make this work on 
> OSv, and looking now at the code, it seems to be #if 0'ed out, so it will 
> take some work to revive.

I saw that. I also noticed that multicast won’t work because the SIOCADDMULTI 
call does nothing in the hypervisor drivers (so the multicast mac addresses are 
never added to the filter tables for the interface concerned). I’m part way 
through the code to use the controlq to handle that for virtio-net. I can 
probably do the IFF_PROMISC flag at the same time.

I don’t know about vmxnet3 driver as I’m not familiar with the hypervisor that 
provides that. Can you point me at the code for the ‘other side’ for that?

Cheers,
Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


  1   2   >