On Thu, Aug 18, 2016 at 11:04 AM, Antti Kantee <[email protected]> wrote:
> On 18/08/16 16:41, Myungho Jung wrote:
>
>> re: dlsym(). At some point I thought we could "emulate" dlsym(), at least
>>> as good as what you've done, by building a symbol address table using the
>>> toolchain and linking that table into the finished image. Now that
>>> there's
>>> a use case, it might be worth doing and cutting down the patches that
>>> way.
>>>
>>>
>> Yes, I just generated it from source codes using grep and sed but it's not
>> a good way. I referenced the patch below for supporting static build on
>> mac
>> osx.
>>
>> http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/55573c377d64
>>
>> I found that if the first argument of dlopen is NULL, it returns a handle
>> of the main program. But, this way was not working in Rump. Is it possible
>> to add dl library working like this to rump instead of returning NULL
>> pointer?
>>
>
> What do you specifically mean by "this"? Yes, we can add a reasonably
> correct approximation of dl*(). That said, I'm wary of adding completely
> bogus emulation, since if those routines are present, people (and
> especially programs/configure scripts) tend to assume they work the same as
> on every other platform.
I mean 'dlopen(NULL, RTLD_FIRST)' is supposed to return a handle of running
binary itself. So, if it is also possible in rump, we can just replace
dlopen("filename", flags) with dlopen(NULL, flags) though should be care
about name confliction between libraries. The problem is that dl* functions
are not in a library but in ld.so and linked in runtime in netbsd. If ld.so
can be built as static library, those ugly lookup table in my patch will be
unnecessary. I tried to extract ld.so from netbsd source in rumprun repo
but failed to build as static library because of complicated dependencies.
Alternatively, dlsym may be able to be emulated using BFD library(
https://sourceware.org/binutils/docs/bfd/). But, in this case, java binary
should be included in file system image and I also failed to build the
library in rump.
>
> re: libffi patch: I wonder if there was a reason for originally writing
>>> the code in that order, doesn't really seem to make a difference.
>>> However,
>>> doesn't the same memory corruption problem exist in the same file at
>>> least
>>> for functions returning a struct?
>>>
>>
>>
>> Actually, I'm not sure what cause the problem. It only happens randomly
>> when running jetty but you can definitely find the bug with gdb. You may
>> need to compile libffi and OpenJDK in debug mode using -g flag. First, set
>> breakpoint to line 78 of unix64.S in libffi: leap 24(%rbp), %rsp. then if
>> you command next and print *(int*)$rbp, it will show 0xe02b and garbage
>> values in next addresses like *(int*)($rbp+8*x). It makes jump to wrong
>> address and page fault error. It is so weird because I could not see any
>> other threads in gdb with info threads. I'm guessing that it may be caused
>> by a bug in netbsd or rump.
>>
>
> The problem is, like the unchanged comment in the patch says, the amd64
> redzone. On amd64, programs are allowed to use 128bytes below the stack
> pointer. On a normal operating system with multiple privilege levels,
> handling an interrupt while executing usermode code switches to another
> stack. However, that does not happen on Rumprun, since everything already
> runs in ring0. C code is explicitly compiled with -mno-red-zone, which
> causes the compiler to never generate code which uses the 128byte area.
> However, that compiler flag does not affect handwritten assembly.
>
> So, essentially, if your code is using the red zone, and an interrupt
> occurs, whatever the interrupt handler pushes onto the stack will corrupt
> the application stack. One way to fix the problem is to adjust the code to
> not use the red zone. Another way (easier?) is to cli/sti around the red
> zone use.
>
> It's a bit unfortunate that the whole thing leaks through into application
> code causing lossage like the one you saw. I'm not really aware of an easy
> workaround, but then again I don't know anything about x86 anyway.
>
>
Now, I'm clear what caused the problem though should think more about how
to fix it. And page fault error I found in server build may be for the same
reason as well.
> It's been building for a while now -- the hg checkout alone took several
>>> minutes. I assume it'll be building for a while longer. I'll let you
>>> know
>>> what happens ;)
>>>
>>>
>> So, I'm consdering uploading full source on github.
>>
>
> Whatever you decide to do, do *NOT* include sources in rumprun-packages.
> The idea is to keep the rumprun-packages repo slim, and people who are
> interested in e.g. rust should not have to fetch half a gig of java sources
> to get to the rust wrapper Makefiles.
>
>
I got it. Then, I'll look for a better way to shorten patch by emulating
dlsym.
Thanks!
Myungho