On Thu, Aug 6, 2020 at 1:00 AM Rick Payne <[email protected]> wrote:

>
> I've been noodling with the old 'assigned virtio' code, trying to make
> it work again so I can use this method to get raw packets. See
> discusion on the 'raw sockets' thread.
>
> I'm mostly there (though I will admit, I'm far from a C++ programmer,
> so the code is pretty hideous), but I ran into an issue with the
> interrupt routine crashing. It turns out that if the interrupt routine
> is in my .so file, then the symbol resolution is not done at load time,
> rather its done on first call.
>
> I was hitting an assert in resolve_pltgot. I remember having a similar
> issue a while ago (2016!) - back then it was a bug in
> pthread_spin_lock. In this case, because the symbols aren't resolved on
> loading of the .so file, the first time the isr is called, we try to
> solve the missing symbols. However, we are not sched::preemptable() at
> that point, so the assert fires.
>
> I tried the existing isr code, copying it into my code. The wait part
> looks like this:
>
> void receiver(virtio::vring *vq)
> {
>   while (1) {
>     sched::thread::wait_until([&] {
>       bool have_elements = vq->used_ring_not_empty();
>       if (!have_elements) {
>         vq->enable_interrupts();
>
>         // we must check that the ring is not empty *after*
>         // we enable interrupts to avoid a race where a packet
>         // may have been delivered between queue->used_ring_not_empty()
>         // and queue->enable_interrupts() above
>         have_elements = vq->used_ring_not_empty();
>         if (have_elements) {
>           vq->disable_interrupts();
>         }
>       }
>       return have_elements;
>     });
>   ...
>
>
> sched::thread::wait_until and sched::thread::stop_wait are not
> resolved. I changed resolve_pltgot to what was suggested back in 2016:
>
> diff --git a/core/elf.cc b/core/elf.cc
> index ffb16004..5cdf1ec2 100644
> --- a/core/elf.cc
> +++ b/core/elf.cc
> @@ -816,7 +816,7 @@ void object::relocate_pltgot()
>
>  void* object::resolve_pltgot(unsigned index)
>  {
> -    assert(sched::preemptable());
> +    // assert(sched::preemptable());
>      auto rel = dynamic_ptr<Elf64_Rela>(DT_JMPREL);
>      auto slot = rel[index];
>      auto info = slot.r_info;
> @@ -826,6 +826,14 @@ void* object::resolve_pltgot(unsigned index)
>      void *addr = _base + slot.r_offset;
>      auto sm = symbol(sym);
>
> +    if (!sched::preemptable()) {
> +        auto nameidx = (dynamic_ptr<Elf64_Sym>(DT_SYMTAB) + sym)-
> >st_name;
> +        auto name = dynamic_ptr<const char>(DT_STRTAB) + nameidx;
> +        std::cerr << "resolve_pltgot " << demangle(name) <<
> +          " in " << pathname() << " found in '" <<
> +          sm.obj->pathname() << "'\n";
> +    }
> +
>      if (sm.obj != this) {
>          WITH_LOCK(_used_by_resolve_plt_got_mutex) {
>              _used_by_resolve_plt_got.insert(sm.obj-
> >shared_from_this());
>
> This no longer asserts, and shows the symbols being resolved:
>
> resolve_pltgot _ZN5sched6thread9stop_waitEv
> (sched::thread::stop_wait()) in /tests/tst-assign-virtio.so found in ''
> Handling packets
> resolve_pltgot _ZN6virtio5vring17enable_interruptsEv
> (virtio::vring::enable_interrupts()) in /tests/tst-assign-virtio.so
> found in ''
> resolve_pltgot _ZN5sched6thread4waitEv (sched::thread::wait()) in
> /tests/tst-assign-virtio.so found in ''
>
> So what is the appropriate solution here? I could try and arrange for
> my code to be linked with "-Wl,-z,now" but that feels like quite a big
> hammer (as that could then apply to the whole app?)
>
> OTOH, it feels unnatural to be relying on symbol lookups in an isr. If
> resolve_pltgot really does not need the assert check (whch was
> suggested to be the case in 2016), maybe there is nothing to do here
> other than comment out an assert?
>

The assert was there to make a rare random crash into a reliable crash. The
thing is
that symbol resolution needs to take a mutex, because in very rare cases it
might be
running in parallel to someone loading an additional shared object. This
taking of a mutex
may block the thread, and when in non-preemptable state, will cause a crash
at that
point. So the idea of the assert is to make you aware of this problem much
earlier,
before you start seeing it once a year in the field.

This was explained in the commit that added this assert, back in 2014:
930cb83d9.

The "z,now" solution will work, but there is another problem you should be
aware of:
When user code is running in non-preemptable state, both the code's stack,
and the code
itself, must already be in memory and cannot be swapped-in. The latter in
particular means
that the executable must be entirely loaded into memory, not page by page
("demand paging")
as we normally do.

We have a trick that will do *both *the on-load symbol resolution and
loading the entire object
and not page by page: If you add:

asm(".pushsection .note.osv-mlock, \"a\"; .long 0, 0, 0; .popsection");

To your source code, both things will auto-magically happen :-)
We have in <osv/elf.hh> a macro doing this, OSV_ELF_MLOCK_OBJECT(), but you
don't really need that macro, you can just copy this line.

I suggest you use this trick, instead of the "z,now" thing.

Unless you executable is huge, I don't think you'll have problems doing
this for the entire executable (you only need this "section" thing in one
place in your source code), but if that bothers you, you can always put the
problematic code in a separate shared object, and only that object will be
marked with this special section.


>
> Probably I should arrange the code so that I can call
> virtio_driver::wait_for_queue, but at the moment, I can't. I don't
> think we should assume that the user's code would want that - or do we?
>
> Thoughts?
>
> Rick
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/7fa151d59007a824119e83d6906f64ea1f94fe3d.camel%40rossfell.co.uk
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjsrSKm4MGECMBi92jG_KqFmV382XkLA4ZsRueM4obd1AQ%40mail.gmail.com.

Reply via email to