Hi swift-evolution, 

For the last few weeks, I've been working on introducing some Swift in a pure-C 
codebase. While the Clang importer makes the process quite smooth, there are 
still some rough edges.

Here is a (lengthy) proposal resulting from that experience.
Rendered version: 
https://gist.github.com/Fruneau/fa83fe87a316514797c1eeaaaa2e5012

Introduction
=======

Directly importing C APIs is a core feature of the Swift compiler. In that 
process, C pointers are systematically imported as `Unsafe*Pointer` swift 
objects. However, in C we make the distinction between pointers that reference 
a single object, and those pointing to an array of objects. In the case of a 
single object of type `T`, the Swift compiler should be able to import the 
parameter `T *` as a `inout T`, and `T const *` as `T`. Since the compiler 
cannot makes the distinction between pointer types by itself, we propose to add 
an attribute of C pointer for that purpose.

Motivation
=======

Let consider the following C API:

```c
typedef struct sb_t {
    char * _Nonnull data;
    int len;
    int size;
} sb_t;

/** Append the string \p str to \p sb. */
void sb_adds(sb_t * _Nonnull sb, const char * _Nonnull str);

/** Append the content of \p other to \p sb. */
void sb_addsb(sb_t * _Nonnull sb, const sb_t * _Nonnull other);

/** Returns the amount of available memory of \p sb. */
int sb_avail(const sb_t * _Nonnull sb);
```

This is imported in Swift as follow:

```swift
struct sb_t {
    var data: UnsafeMutablePointer<Int8>
    var len: Int32
    var size: Int32
}

func sb_adds(_ sb: UnsafeMutablePointer<sb_t>, _ str: UnsafePointer<Int8>)
func sb_addsb(_ sb: UnsafeMutablePointer<sb_t>, _ other: UnsafePointer<sb_t>)
func sb_avail(_ sb: UnsafePointer<sb_t>) -> Int32
```

`sb_adds()` takes two pointers: the first one is supposed to point to a single 
object named `sb` that will be mutated in order to add the content of `str` 
which points to a c-string. So we have two kinds of pointers: the first points 
to a single object, the second to a buffer. But both are represented using 
`Unsafe*Pointer`. Swift cannot actually make the difference between those two 
kind of pointers since the C language provides no way to express it.

`sb_addsb()` takes two objects of type `sb_t`. The first is mutated by the 
function by appending the content of the second one, which is `const`. The 
constness is properly reflected in Swift. However, the usage of the imported 
API is Swift might be surprising since Swift requires usage of an `inout` 
parameter in order to build an `Unsafe*Pointer` object:

```swift
var sb = sb_t(...)
let sb2 = sb_t(...)
sb_addsb(&sb, &sb2) // error: cannot pass immutable value as inout argument: 
'sb2' is a 'let' constant
sb_addsb(&sb, sb2) // cannot convert value of type 'sb_t' to expected argument 
type 'UnsafePointer<sb_t>!'

var sb3 = sb_t(...)
sb_addsb(&sb, &sb3) // works
```

```swift
sb_avail(&sb2) // cannot convert value of type 'sb_t' to expected argument type 
'UnsafePointer<sb_t>!'
```


However, Swift also provides the `swift_name()` attribute that allows remapping 
a C function to a Swift method, which includes mapping one of the parameter to 
`self:`:

```c 
__attribute__((swift_name("sb_t.add(self:string:)")))
void sb_adds(sb_t * _Nonnull sb, const char * _Nonnull str);
__attribute__((swift_name("sb_t.add(self:other:)")))
void sb_addsb(sb_t * _Nonnull sb, const sb_t * _Nonnull other);
__attribute__((swift_name("sb_t.avail(self:)")))
int sb_avail(const sb_t * _Nonnull sb);
```

```swift
struct sb_t {
    var data: UnsafeMutablePointer<Int8>
    var len: Int32
    var size: Int32

    mutating func add(string: UnsafePointer<Int8>)
    mutating func add(other: UnsafePointer<sb_t>)
    func avail() -> Int32
}
```

With that attribute used, there is no need to convert the parameter mapped to 
`self:` to an `Unsafe*Pointer`. As a consequence, we have an improved API:

```swift
sb2.avail() // This time it works!
```

But we also have some inconsistent behavior since only `self:` is affected by 
this:

```swift
sb.add(other: &sb2)  // error: cannot pass immutable value as inout argument: 
'sb2' is a 'let' constant
sb.add(other: sb2) // cannot convert value of type 'sb_t' to expected argument 
type 'UnsafePointer<sb_t>!'
```


What we observe here is that mapping an argument to `self:` is enough for the 
compiler to be able to change its semantics. As soon as it knows the pointer is 
actually the pointer to a single object, it can deal with it without exposing 
it as an `Unsafe*Pointer`, making the API safer and less surprising.


Proposed solution
================

A new qualifier could be added to inform the compiler that a pointer points to 
a single object. Then the Swift compiler could use that new piece of the 
information to generate API that use directly the object type instead of the 
pointer type. We propose the introduction of a new qualifier named `_Ref`, 
semantically similar to a C++ reference. That is:

* `_Ref` is applied with the same grammar as the `_Nonnull`,  `_Nullable`, 
family
* A pointer tagged `_Ref` cannot be used to access more than the single pointed 
object.
* A pointer tagged `_Ref` is non-owning

Parameters qualified with `_Ref` would then be imported in Swift as follows:

* `T * _Ref _Nonnull` is imported as `inout T`
* `T * _Ref _Nullable` is imported as `inout T?`
* `T const * _Ref _Nonnull` is imported as `T`
* `T const * _Ref _Nullable` is imported as `T?`

Example
=======

In the context of the provided example from the motivation section:

```c
typedef struct sb_t {
    char * _Nonnull data;
    int len;
    int size;
} sb_t;

/** Append the string \p str to \p sb. */
void sb_adds(sb_t * _Ref _Nonnull sb, const char * _Nonnull str);

/** Append the content of \p other to \p sb. */
void sb_addsb(sb_t * _Ref _Nonnull sb, const sb_t * _SIngle _Nonnull other);

/** Returns the amount of available memory of \p sb. */
int sb_avail(const sb_t * _Ref _Nonnull sb);
```

Would be imported as follow:

```swift
struct sb_t {
    var data: UnsafeMutablePointer<Int8>
    var len: Int32
    var size: Int32
}

func sb_adds(_ sb: inout sb_t, _ str: UnsafePointer<Int8>)
func sb_addsb(_ sb: inout sb_t, _ other: sb_t)
func sb_avail(_ sb: sb_t) -> Int32
```

Impact on existing code
=================

This proposal has no impact on existing code since it proposes additive changes 
only. However, opting in for the `_Ref` qualifier on APIs already exposed in 
Swift will impact the generated code.

* For `const` pointers, the change is always source-incompatible
* For non-`const` pointers, the change will be source-compatible everywhere we 
use the `&object` syntax to pass the argument from a plain object, but will 
break sources that passed an `Unsafe*Pointer` as argument.


Alternatives considered
===================

It has been considered to use to qualifiers family instead of the `_Ref`:

- one family to specify the kind of pointer: single object or array
- one family to declare the ownership

This approach has the clear advantage to be more flexible, however it has been 
found to be less expressive. Considering C API already should use nullability 
qualifiers on every single pointers, forcing two additional qualifiers on every 
pointer would be painful and negatively impact the readability of the C APIs.

`_Ref` on the other hand is short and leverage a concept already known by 
developers, but is also more specific to particular use case.


Discussion
========

* Safety: won't this make developper think they are calling safe APIs from 
Swift while the API is actually unsafe?

There is certainly a risk a C API make an improper use of `_Ref` (in 
particular, breaks the non-owning part of the contract). However, this kind of 
safety issues are already present when using the `swift_name()` attribute of 
function and mapping one of the pointer parameter of the function to `self:`, 
or when using the nullability qualifiers.

* What about pointers stored in structures? or pointers returned by functions?

As a qualifier, `_Ref` could also be used on pointers that are not arguments of 
a function:

```c
typedef struct {
    sb_t * _Ref obj;
} sb_ptr_t;

sb_t * _Ref sb_get_singleton(void);
```

Swift, however, cannot import those as `sb_t` but will still be forced to use 
`Unsafe*Pointer<sb_t>` since `sb_t` is a structure and as such is not stored by 
reference.

We could also imagine a standard `Reference<T>` type that would wrap a pointer 
to a `T` (and could exposes the API of `T` on it).

* What about function pointers that take a `_Ref` object?

When an API takes a function pointer whose type includes a `_Ref` qualified 
parameter, the qualifier applies:

```c
void take_cb(int (*a)(sb_t const * _Ref _Nonnull sb, sb_t * _Ref _Nonnull 
other))
```

```swift
func cb(sb: sb_t, other: inout sb_t) {
    ...
}

take_cb(cb)
```

Swift guarantees we cannot break the non-owning contract and that we respect 
the constness of the parameter. This is safer than using the 
`Unsafe*Pointer`-based alternative.

* Other use cases than Swift's?

The `_Ref` qualifier could be used by static analysis to check that functions 
don't access memory it shouldn't access: as long as some code manipulates some 
memory through a `_Ref` qualified pointer, it shouldn't access memory address 
bellow that pointer or above that pointer plus the stride of the type (an 
exception remains for types ending with a zero-length array).

* What about pointers to arrays of objects?

This is another topic. We could imagine a `_Array` qualifier that could take an 
optional length.

```c
/* The number of elements is statically known or passed as argument */
int main(int argc, char ** _Array(argc) argv)

/* The number of element is unknown. */
int puts(const char * _Array str);
```
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to