[swift-evolution] Fixed-sized arrays

Daryle Walker via swift-evolution Sat, 25 Jun 2016 22:14:15 -0700

[I’ve seen the WWDC 2016 Keynote and State of the Platforms videos.  I haven’t 
seen any others so I don’t spoil myself before typing my ideas down.  Apologies 
if this has already been covered.]


Is there any problem to adding fixed-sized arrays?  From what I glanced here, 
it’s more like no one has gotten around to it and less like the very idea is 
hated.

Obviously, the main advantages are that object storage (or pointer storage for 
reference types) is on the stack and that the shape (i.e. the number of extents 
and range for each extent) is fixed at compile time.  This type of, well, type 
can be used when the baggage of length changing isn’t needed  Such arrays are 
also a classic type since we started system-programming languages.  (I was 
surprised by their absence in Swift when I first read the books.)  They can be 
mapped to a vector processing unit’s built-ins.

This:

> struct ArrayOf5<T> {
> 
>     let count = 5
> 
>     var first: T
>     var second: T
>     var third: T
>     var fourth: T
>     var fifth: T
> 
>     init(array: [T]) {
>         first = array[0]
>         second = array[1]
>         third = array[2]
>         fourth = array[3]
>         fifth = array[4]
>     }
> 
>     subscript(index: Int) -> T {
>         get {
>             var preResult: T?
>             switch index {
>             case 0:
>                 preResult = first
>             case 1:
>                 preResult = second
>             case 2:
>                 preResult = third
>             case 3:
>                 preResult = fourth
>             case 4:
>                 preResult = fifth
>             default:
>                 preResult = nil
>             }
>             return preResult!
>         }
> 
>         set {
>             func doAssign(destination: UnsafeMutablePointer<T>) {
>                 destination.memory = newValue
>             }
> 
>             switch index {
>             case 0:
>                 doAssign(&first)
>             case 1:
>                 doAssign(&second)
>             case 2:
>                 doAssign(&third)
>             case 3:
>                 doAssign(&fourth)
>             case 4:
>                 doAssign(&fifth)
>             default:
>                 doAssign(nil)
>             }
>         }
>     }
> 
> }


shouldn’t be better supported than directly having built-in fixed-sized arrays. 
 (I’ve wondered if we should have a library or built-in arrays.  I’ve decided 
that even if we add a library, we still need the built-in.)  Although I 
parametrized the type, altering the shape (right now 1 dimension of 5 elements) 
doesn’t scale since it requires lexical change.  Internally, a built-in array 
type would be something like this, modulo optimizations and not having to come 
up with names for each element.

—————

[The following sections are my ideas about implementation; we don’t have to use 
them.]

What should array type expressions look like in code?

Since generics (currently) only use type-based parameters, we can’t do 
something C++-ish like “array<T, 1, 4, 3>”.  I did have a thought of “[MyType; 
5]”, but that seemed too close to current arrays.  I was shocked to discover 
that Rust uses that syntax.  However, it’s still to bad to use now because 
‘[MyType]” would mean something different that it does currently 
(zero-dimensional array of a single unit instead of one-dimensional array of 
arbitrary number of elements).

Since arrays are product types (I think), let’s use multiplication syntax:

> var myArray: MyType * [6]  // This is an array of six elements, each of 
> `MyType`
> var myArray2: MyType * [6] * [5]  // This is an array of five elements, each 
> a six-element array of `MyType`
> var myArray3: MyType * [3, 4]  // This is an array of three-by-four, each 
> element of `MyType`

The “[ SHAPE ]” syntax affects the type expression to its left.  An empty list 
is not allowed since, although array-literal syntax allows empty lists, the 
subscript operation does not.  Note, to keep the syntax regular, the first 
subscript when using “myArray2” is for five elements, i.e. “0..<5”, and the 
second (if any) is for six, the reverse lexical order of declaration.  This is 
because we don’t have C’s cute “declaration equals usage” type syntax.  Also 
note that “myArray3” requires a single subscript call with two coordinates in 
it to dereference.  Supplying only one coordinate is nonsensical.  (“myArray2” 
is the other way when you want to dereference a “MyType” value.)

Of course, not using a shape expression gives you a zero-dimensional array 
(i.e. a regular variable) of your base type by default.  (So, it would actually 
be "0 + extents(MyType.self)" dimensions.)

Or we could just do it the same way C does it (i.e. “MyType[8]”), and preserve 
nested arrays having the same lexical order in declaration and dereference.  We 
could even forbid multi-dimensional arrays and just do nested 
single-dimensional arrays like C (and move multi-dimensionality to wrapper 
structs), but I feel that would be giving up on improving the abstraction.

——

What about row vs. column order?

If you stick with chaining one-dimensional arrays, you get only row-order, just 
like C (which can only do nested one-dimensional arrays).  If you use a 
multi-extent shape expression, you can override the order:

> var myArray1: MyType * [3, 4] * #rank[0, 1]  // Default; row-order
> var myArray2: MyType * [3, 4] * #rank[1, 0]  // Reversed; column-order
> var myArray3: MyType * [3, 4, 5] * #rank[0, 2, 1]  // Scrambled

The array literal attached to the rank command has to contain every value in 
0..<count exactly once.  The position of the “0” value is the spacing with the 
longest span, “count - 1” is the adjacent elements level.

A rank command can only appear directly after a shape expression or an element 
type expression, and not after another rank command.  The command’s literal 
must have the same length as the number of extents of the corresponding shape 
expression.  A rank command following an element type expression, whether it’s 
a non-static-array type or a static-array type hidden behind a type alias, is 
treated as a zero-dimensional array so “#rank[]” is the only legal rank command 
for them.

The rank command does not pierce type aliases; “#rank[]” is the only valid 
command for them.  The alias itself can have a rank command, which affects all 
objects declared with that alias, and cannot be overridden.  (If the alias 
doesn’t have a rank command, the index storage priority is fixed to the 
default.)  Should a rank-stripping command be added?  Should an rank command 
that pierces aliases and/or overrides previous ranks be added?

Types that differ in number of shape commands, number of extents within 
corresponding commands, size of corresponding extents, index storage 
priorities, or element type are different types.  We should have a function 
that can reshape an array if the total number of the outermost non-static-array 
objects are the same.  (The function could reshape a non-static-array type to a 
static array of one element with any nesting.)

(If we give up on improving the abstraction compared to C, this rank order 
would be simulated in a wrapper struct that rearranges the indices.)

——

What about initialization?

I didn’t use initialization expressions before because it’s hard.  We’ll have 
to work on this.

Zero-dimensional arrays (i.e. a regular non-static-array type) are iniitalized 
as usual.

A one-dimensional static-array gets initialized like a current dynamic array.  
But the syntax given below will also be accepted (with just one coordinate, of 
course).

For higher-dimensional arrays, we need something like:

> [(0, 0, 0): myFirstValue, (2, 4, 1): mySecondValue, …]
> [(0, 0): myFirstValue, (0, 1): mySecondValue, default: myDefaultValue]

and/or:

> var myArray: MyType * [3, 4] = { return 2 * $0 + 7 * $1 }

(We can use “myParam1, myParam2 in” syntax for that last one too.)  Can/could 
we have an array expression that is part (XX, YY) and part $0/$etc formula?  
Maybe with “_” and/or “case” and/or “where” to differentiate which elements get 
the $0/$etc and which ones get a “default:” phrase?  Maybe the _/case/where 
could be used within incomplete (XX, YY)-style indexing?

——

Should there be flexible-ending arrays?  Other value types?

Let’s say that at most one extent in a shape expression can be “any” instead of 
a compile-time integer expression.  Let’s restrict the “any” dimension to be 
the rank-0 one, using “#rank” to override if you didn’t put the “any” first.  
Let’s call these flexible-ending static-arrays.

Flexible-ending static-arrays could be used like the “MyType[]” expression in 
C.  Frequently it can be used as a function parameter without having to 
specialize for each possible array length.  In other words, for a given static 
array type “MyType * […, N, …]”, it can be passed into a function parameter 
that is declared as “MyType * […, any, …]”.  Could there be an automatic way to 
pass in the value of “N”, or would we have to resort to using another function 
parameter or some other out-of-band information and trusting the user to 
synchronize correctly?

It should be possible to pass in different-shaped arrays to said function 
parameters with a reshape call, as long as the total number of elements is a 
multiple of the span across the “any”-ed extent.  Should such a function 
parameter also take a "[MyInnerType]” dynamic array objects, where 
“MyInnerType” is what you get after stripping the outermost, “any”-ed, extent?

A flexible-ending struct is one with a flexible-ending type as its last stored 
property.  A flexible-ending enum is a enum with attributed cases and at least 
one case ends its tuple with a flexible-ending type.  Flexible-ending types 
cannot appear anywhere else as a stored property in a struct or enum.  Should 
we use “final” to mark a struct’s flexible ending property?  Somehow adapt this 
for enums too?  (Right now, “final” is only used for classes.)  Flexible-ending 
structs and enums can be used in function parameters too, allowing different 
lengths for the same parameter.  Like static-arrays, a struct or enum that 
matches a flexible function parameter except that its would-be flexible part is 
fixed can be passed through said parameter.  (It’s the same rule since all 
flexible struct and enums have to have a nested flexible array at their end.)

Besides as function parameters, a flexible-ending object can be declared inside 
a function/method or as a global.  I’m not sure how to do heap versions yet.  
If the object is immutable (i.e. “let”), the initialization block has to cover 
all the elements without using $0/$etc or “default:” so the size can be 
determined.  I’m not sure how mutable objects are supposed to work here.  I’ll 
need your ideas the most here, assuming we keep this feature.  It was inspired 
by seeing code in Ada (discriminated record) and C (“MyType[1]” as last field 
in struct, then allocate with malloc) doing this.

Oh yeah, a flexible-ending static-array, including those that are parts of 
flexible-ending structs and enums, must have a inflexible base type.  
(Otherwise lies madness.)

(If we keep this feature, maybe it should be limited to one-dimensional arrays.)

— 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

[swift-evolution] Fixed-sized arrays

Reply via email to