On 18/08/16 16:41, Myungho Jung wrote:
re: dlsym(). At some point I thought we could "emulate" dlsym(), at least
as good as what you've done, by building a symbol address table using the
toolchain and linking that table into the finished image. Now that there's
a use case, it might be worth doing and cutting down the patches that way.
Yes, I just generated it from source codes using grep and sed but it's not
a good way. I referenced the patch below for supporting static build on mac
osx.
http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/55573c377d64
I found that if the first argument of dlopen is NULL, it returns a handle
of the main program. But, this way was not working in Rump. Is it possible
to add dl library working like this to rump instead of returning NULL
pointer?
What do you specifically mean by "this"? Yes, we can add a reasonably
correct approximation of dl*(). That said, I'm wary of adding
completely bogus emulation, since if those routines are present, people
(and especially programs/configure scripts) tend to assume they work the
same as on every other platform.
re: libffi patch: I wonder if there was a reason for originally writing
the code in that order, doesn't really seem to make a difference. However,
doesn't the same memory corruption problem exist in the same file at least
for functions returning a struct?
Actually, I'm not sure what cause the problem. It only happens randomly
when running jetty but you can definitely find the bug with gdb. You may
need to compile libffi and OpenJDK in debug mode using -g flag. First, set
breakpoint to line 78 of unix64.S in libffi: leap 24(%rbp), %rsp. then if
you command next and print *(int*)$rbp, it will show 0xe02b and garbage
values in next addresses like *(int*)($rbp+8*x). It makes jump to wrong
address and page fault error. It is so weird because I could not see any
other threads in gdb with info threads. I'm guessing that it may be caused
by a bug in netbsd or rump.
The problem is, like the unchanged comment in the patch says, the amd64
redzone. On amd64, programs are allowed to use 128bytes below the stack
pointer. On a normal operating system with multiple privilege levels,
handling an interrupt while executing usermode code switches to another
stack. However, that does not happen on Rumprun, since everything
already runs in ring0. C code is explicitly compiled with
-mno-red-zone, which causes the compiler to never generate code which
uses the 128byte area. However, that compiler flag does not affect
handwritten assembly.
So, essentially, if your code is using the red zone, and an interrupt
occurs, whatever the interrupt handler pushes onto the stack will
corrupt the application stack. One way to fix the problem is to adjust
the code to not use the red zone. Another way (easier?) is to cli/sti
around the red zone use.
It's a bit unfortunate that the whole thing leaks through into
application code causing lossage like the one you saw. I'm not really
aware of an easy workaround, but then again I don't know anything about
x86 anyway.
It's been building for a while now -- the hg checkout alone took several
minutes. I assume it'll be building for a while longer. I'll let you know
what happens ;)
So, I'm consdering uploading full source on github.
Whatever you decide to do, do *NOT* include sources in rumprun-packages.
The idea is to keep the rumprun-packages repo slim, and people who are
interested in e.g. rust should not have to fetch half a gig of java
sources to get to the rust wrapper Makefiles.