On Thu, 15 Oct 2020 17:08:28 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> wrote:
>> This patch contains the changes associated with the first incubation round >> of the foreign linker access API incubation >> (see JEP 389 [1]). This work is meant to sit on top of the foreign memory >> access support (see JEP 393 [2] and >> associated pull request [3]). >> The main goal of this API is to provide a way to call native functions from >> Java code without the need of intermediate >> JNI glue code. In order to do this, native calls are modeled through the >> MethodHandle API. I suggest reading the >> writeup [4] I put together few weeks ago, which illustrates what the foreign >> linker support is, and how it should be >> used by clients. Disclaimer: the pull request mechanism isn't great at >> managing *dependent* reviews. For this reasons, >> I'm attaching a webrev which contains only the differences between this PR >> and the memory access PR. I will be >> periodically uploading new webrevs, as new iterations come out, to try and >> make the life of reviewers as simple as >> possible. A big thank to Jorn Vernee and Vladimir Ivanov - they are the >> main architects of all the hotspot changes you >> see here, and without their help, the foreign linker support wouldn't be >> what it is today. As usual, a big thank to >> Paul Sandoz, who provided many insights (often by trying the bits first >> hand). Thanks Maurizio >> Webrev: >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev >> >> Javadoc: >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html >> >> Specdiff (relative to [3]): >> >> http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html >> >> CSR: >> >> https://bugs.openjdk.java.net/browse/JDK-8254232 >> >> >> >> ### API Changes >> >> The API changes are actually rather slim: >> >> * `LibraryLookup` >> * This class allows clients to lookup symbols in native libraries; the >> interface is fairly simple; you can load a library >> by name, or absolute path, and then lookup symbols on that library. >> * `FunctionDescriptor` >> * This is an abstraction that is very similar, in spirit, to `MethodType`; >> it is, at its core, an aggregate of memory >> layouts for the function arguments/return type. A function descriptor is >> used to describe the signature of a native >> function. >> * `CLinker` >> * This is the real star of the show. A `CLinker` has two main methods: >> `downcallHandle` and `upcallStub`; the first takes >> a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a >> `FunctionDescriptor` and returns a >> `MethodHandle` instance which can be used to call the target native >> symbol. The second takes an existing method handle, >> and a `FunctionDescriptor` and returns a new `MemorySegment` >> corresponding to a code stub allocated by the VM which >> acts as a trampoline from native code to the user-provided method >> handle. This is very useful for implementing upcalls. >> * This class also contains the various layout constants that should be >> used by clients when describing native signatures >> (e.g. `C_LONG` and friends); these layouts contain additional ABI >> classfication information (in the form of layout >> attributes) which is used by the runtime to *infer* how Java arguments >> should be shuffled for the native call to take >> place. >> * Finally, this class provides some helper functions e.g. so that clients >> can convert Java strings into C strings and >> back. >> * `NativeScope` >> * This is an helper class which allows clients to group together logically >> related allocations; that is, rather than >> allocating separate memory segments using separate *try-with-resource* >> constructs, a `NativeScope` allows clients to >> use a _single_ block, and allocate all the required segments there. This >> is not only an usability boost, but also a >> performance boost, since not all allocation requests will be turned into >> `malloc` calls. >> * `MemorySegment` >> * Only one method added here - namely `handoff(NativeScope)` which allows >> a segment to be transferred onto an existing >> native scope. >> >> ### Safety >> >> The foreign linker API is intrinsically unsafe; many things can go wrong >> when requesting a native method handle. For >> instance, the description of the native signature might be wrong (e.g. have >> too many arguments) - and the runtime has, >> in the general case, no way to detect such mismatches. For these reasons, >> obtaining a `CLinker` instance is >> a *restricted* operation, which can be enabled by specifying the usual JDK >> property `-Dforeign.restricted=permit` (as >> it's the case for other restricted method in the foreign memory API). ### >> Implementation changes The Java changes >> associated with `LibraryLookup` are relative straightforward; the only >> interesting thing to note here is that library >> loading does _not_ depend on class loaders, so `LibraryLookup` is not >> subject to the same restrictions which apply to >> JNI library loading (e.g. same library cannot be loaded by different >> classloaders). As for `NativeScope` the changes >> are again relatively straightforward; it is an API which sits neatly on top >> of the foreign meory access API, providing >> some kind of allocation service which shares the same underlying memory >> segment(s), and turns an allocation request >> into a segment slice, which is a much less expensive operation. >> `NativeScope` comes in two variants: there are native >> scopes for which the allocation size is known a priori, and native scopes >> which can grow - these two schemes are >> implemented by two separate subclasses of `AbstractNativeScopeImpl`. Of >> course the bulk of the changes are to support >> the `CLinker` downcall/upcall routines. These changes cut pretty deep into >> the JVM; I'll briefly summarize the goal of >> some of this changes - for further details, Jorn has put together a detailed >> writeup which explains the rationale >> behind the VM support, with some references to the code [5]. The main idea >> behind foreign linker is to infer, given a >> Java method type (expressed as a `MethodType` instance) and the description >> of the signature of a native function >> (expressed as a `FunctionDescriptor` instance) a _recipe_ that can be used >> to turn a Java call into the corresponding >> native call targeting the requested native function. This inference scheme >> can be defined in a pretty straightforward >> fashion by looking at the various ABI specifications (for instance, see [6] >> for the SysV ABI, which is the one used on >> Linux/Mac). The various `CallArranger` classes, of which we have a flavor >> for each supported platform, do exactly that >> kind of inference. For the inference process to work, we need to attach >> extra information to memory layouts; it is no >> longer sufficient to know e.g. that a layout is 32/64 bits - we need to know >> whether it is meant to represent a >> floating point value, or an integral value; this knowledge is required >> because floating points are passed in different >> registers by most ABIs. For this reason, `CLinker` offers a set of >> pre-baked, platform-dependent layout constants which >> contain the required classification attributes (e.g. a `Clinker.TypeKind` >> enum value). The runtime extracts this >> attribute, and performs classification accordingly. A native call is >> decomposed into a sequence of basic, primitive >> operations, called `Binding` (see the great javadoc on the `Binding.java` >> class for more info). There are many such >> bindings - for instance the `Move` binding is used to move a value into a >> specific machine register/stack slot. So, the >> main job of the various `CallingArranger` classes is to determine, given a >> Java `MethodType` and `FunctionDescriptor` >> what is the set of bindings associated with the downcall/upcall. At the >> heart of the foreign linker support is the >> `ProgrammableInvoker` class. This class effectively generates a >> `MethodHandle` which follows the steps described by the >> various bindings obtained by `CallArranger`. There are actually various >> strategies to interpret these bindings - listed >> below: >> * basic intepreted mode; in this mode, all bindings are interpreted using a >> stack-based machine written in Java (see >> `BindingInterpreter`), except for the `Move` bindings. For these bindings, >> the move is implemented by allocating >> a *buffer* (whose size is ABI specific) and by moving all the lowered >> values into positions within this buffer. The >> buffer is then passed to a piece of assembly code inside the VM which >> takes values from the buffer and moves them in >> their expected registers/stack slots (note that each position in the >> buffer corresponds to a different register). This >> is the most general invocation mode, the more "customizable" one, but also >> the slowest - since for every call there is >> some extra allocation which takes place. >> >> * specialized interpreted mode; same as before, but instead of interpreting >> the bindings with a stack-based interpreter, >> we generate a method handle chain which effectively interprets all the >> bindings (again, except `Move` ones). >> >> * intrinsified mode; this is typically used in combination with the >> specialized interpreted mode described above >> (although it can also be used with the Java-based binding interpreter). >> The goal here is to remove the buffer >> allocation and copy by introducing an additional JVM intrinsic. If a >> native call recipe is constant (e.g. the set of >> bindings is constant, which is probably the case if the native method >> handle is stored in a `static`, `final` field), >> then the VM can generate specialized assembly code which interprets the >> `Move` binding without the need to go for an >> intermediate buffer. This gives us back performances that are on par with >> JNI. >> >> For upcalls, the support is not (yet) as advanced, and only the basic >> interpreted mode is available there. We plan to >> add support for intrinsified modes there as well, which should considerably >> boost perfomances (probably well beyond >> what JNI can offer at the moment, since the upcall support in JNI is not >> very well optimized). Again, for more >> readings on the internals of the foreign linker support, please refer to [5]. >> #### Test changes >> >> Many new tests have been added to validate the foreign linker support; we >> have high level tests (see `StdLibTest`) >> which aim at testing the linker from the perspective of code that clients >> could write. But we also have deeper >> combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to >> stress every corner of the ABI >> implementation. There are also some great tests (see the `callarranger` >> folder) which test the various `CallArranger`s >> for all the possible platforms; these tests adopt more of a white-box >> approach - that is, instead of treating the >> linker machinery as a black box and verify that the support works by >> checking that the native call returned the results >> we expected, these tests aims at checking that the set of bindings generated >> by the call arranger is correct. This also >> mean that we can test the classification logic for Windows, Mac and Linux >> regardless of the platform we're executing >> on. Some additional microbenchmarks have been added to compare the >> performances of downcall/upcall with JNI. [1] - >> https://openjdk.java.net/jeps/389 [2] - https://openjdk.java.net/jeps/393 >> [3] - >> https://git.openjdk.java.net/jdk/pull/548 [4] - >> https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md >> [5] - >> http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html > > Maurizio Cimadamore has updated the pull request incrementally with one > additional commit since the last revision: > > Re-add file erroneously deleted (detected as rename) I looked through some Hotspot runtime code and that looks ok. I saw a couple of strange things on my way through the code. See comments. src/hotspot/cpu/x86/foreign_globals_x86.cpp line 2: > 1: /* > 2: * Copyright (c) 2018, Oracle and/or its affiliates. All rights reserved. Copyright should be 2020. All the new files should have 2020 as the copyright, a bunch don't. src/hotspot/cpu/x86/foreign_globals_x86.cpp line 56: > 54: } > 55: > 56: const ABIDescriptor parseABIDescriptor(JNIEnv* env, jobject jabi) { I don't know if you care about performance but of these env->calls transition into the VM and back out again. You should prefix all the code that comes from java to native with JNI_ENTRY and just use native JVM code to implement these. src/hotspot/cpu/x86/foreign_globals_x86.hpp line 32: > 30: #define __ _masm-> > 31: > 32: struct VectorRegister { Why are these structs and not classes? src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 3885: > 3883: > 3884: __ flush(); > 3885: } I think as a future RFE we should refactor this function and generate_native_wrapper since they're similar (this is nicer to read). If I can remove is_critical_native code they will be more similar. ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/634