Re: [swift-evolution] [Proposal] Foundation Swift Encoders

Brent Royal-Gordon via swift-evolution Tue, 04 Apr 2017 18:18:40 -0700

> On Apr 4, 2017, at 2:43 PM, Itai Ferber <[email protected]> wrote:
> I like the separation between keyed and unkeyed containers (and I think 
> "unkeyed" is a good name, though not perfect), but I'm not quite happy with 
> the unkeyed container API. Encoding a value into an unkeyed container appends 
> it to the container's end; decoding a value from an unkeyed container removes 
> it from the container's front. These are very important semantics that the 
> method names in question do not imply at all.
> 
> I think that consistency of phrasing is really important here, and the action 
> words "encode" and "decode" are even more important to connote than the 
> semantics of pushing and popping.
> (Note that there need not be specific directionality to an unkeyed container 
> as long as the ordering of encoded items is eventually maintained on decode.) 
> But on a practical note, names like encodeAtEnd and decodeFromFront (or 
> similar) don't feel like they communicate anything much more useful than the 
> current encode/decode.
> 
Yeah; I stopped short of suggesting specific names because I wasn't totally 
happy with the ones I could think of. (`appendEncoded` and `decodeNext` were my 
best ideas, but they don't make a matched pair.)
> Certain aspects of `UnkeyedDecodingContainer` also feel like they do the same 
> things as `Sequence` and `IteratorProtocol`, but in different and 
> incompatible ways. And I certainly think that the `encode(contentsOf:)` 
> methods on `UnkeyedEncodingContainer` could use equivalents on the 
> `UnkeyedDecodingContainer`. Still, the design in this area is much improved 
> compared to the previous iteration.
> 
> Which aspects of Sequence and IteratorProtocol do you feel like you're 
> missing on UnkeyedDecodingContainer? Keep in mind that methods on 
> UnkeyedDecodingContainer must be able to throw, and an 
> UnkeyedDecodingContainer can hold heterogeneous items whose type is not 
> known, two things that Sequence and IteratorProtocol do not do.
> 
Yeah, that's true. One possibility is to use a closure-scoped block which gets 
passed a sequence that terminates early if it encounters an error:


        self.pets = try encoder.withDecoded(Pet.self) { seq in
                return Array(seq)
                // If there is an error, `seq` terminates early, and once we 
return control from 
                // this closure, `withSequence` will throw it.
        }

But that's awkward in a couple of different ways.
> In terms of an equivalent to encode(contentsOf:), keep in mind that this 
> would only work if the collection you're decoding is homogeneous, in which 
> case, you would likely prefer to decode an Array over getting an unkeyed 
> container, no? (As soon as conditional conformance arrives in Swift, we will 
> be able to express extension Array : Decodable where Element : Decodable { 
> ... } making decoding homogeneous arrays trivial.)
> 
That's true (and I assumed that `Array`s and friends would be `Codable`—we 
don't even need to wait for conditional conformances ), but it's hardly unheard 
of to write your own `Collection` types when you need different semantics.

Swift has two mutation protocols that are important for this purpose: 
`RangeReplaceableCollection` and `SetAlgebra`. You could provide methods on 
`UnkeyedDecodingContainer` that work with them:

        func decodeAll<C: RangeReplaceableCollection>(_ type: C.Type) throws -> 
C where C.Iterator.Element: Encodable {
                var collection = C()
                if let capacity = self.count {
                        collection.reserveCapacity(capacity)
                }
                while !self.isAtEnd {
                        collection.append(try 
self.decode(C.Iterator.Element.self))
                }
        }

        func decodeAll<S: SetAlgebra>(_ type: S.Type) throws -> S where 
S.Element: Encodable {
                var set = S()
                while !self.isAtEnd {
                        set.insert(try self.decode(C.Iterator.Element.self))
                }
        }

        // Usage:
        let array = container.decodeAll(ContiguousArray<Pet>.self)
        let set = container.decodeAll(Set<Pet>.self)

Alternatively, you could take a more informal approach, and simply take a 
function which takes a sequence and returns a value constructed from it. Most 
collection types have an initializer like that. (Interestingly, this is 
basically the same as `withDecoded` above, just with a different signature.)

        // I'm showing this as public, but you could hide it behind an 
AnyIterator instead.
        public final class DecodingIterator<Element: Encodable>: 
IteratorProtocol, Sequence {
                var container: UnkeyedDecodingContainer
                var decodingError: Error?
                
                init(_ container: UnkeyedDecodingContainer) {
                        self.container = container
                }
                
                public func next() {
                        if decodingError != nil || container.isAtEnd {
                                return nil
                        }
                        do {
                                return try container.decode(Element.self)
                        }
                        catch {
                                decodingError = error
                                return nil
                        }
                }
        }
        
        public func decodeAll<T: Encodable, Result>(_ makeResult: 
(DecodingIterator<T>) throws -> Result) throws -> Result {
                let iterator = DecodingSequence<T>(self)
                let result = try makeResult(iterator)
                if let error = iterator.decodingError {
                        throw error
                }
                return result
        }

        // Usage:
        let array = container.decodeAll(ContiguousArray<Pet>.init)
        let set = container.decodeAll(Set<Pet>.init)
        
        // I'm imagining here a future version of Swift where it's possible to 
        // make tuples of `Codable` types `Codable` themselves.
        let dictionary = container.decodeAll(Dictionary<Pet, Person>.init)
> (Tiny nitpick: I keep finding myself saying "encode into", not "encode to" as 
> the API name suggests. Would that be a better parameter label?)
> 
> On a personal note here — I agree with you, and had originally used "into". 
> However, we've reviewed our APIs and more often have balanced from:/to: 
> rather than from:/into: on read/write/streaming calls. We'd like to rein 
> these in a bit and keep them consistent within our naming guidelines, as much 
> as possible.
> 
That's fair. Oh well.
> Should there be a way for an `init(from:)` implementation to determine the 
> type of container in the encoder it's just been handed? Or perhaps the better 
> question is, do we want to promise users that all decoders can tell the 
> difference?
> 
> I think it would be very rare to need this type of information. If a type 
> wants to encode as an array or as a dictionary conditionally, the context for 
> that would likely be present in userInfo.
> 
Yeah, that makes sense. To make that work, though, you might end up having a 
top-level `Decodable` type read a `version` field or something and write it 
into the `userInfo` for other types to examine. Nothing necessarily wrong with 
that, I suppose.

(Hmm...coming back after looking at your `JSONDecoder`, it looks like it might 
not be possible to do this, or perhaps that it could only be done by storing a 
reference type into the `userInfo` before you started writing and modifying it 
during decoding. Is that intentional?)
> If you really must try to decode regardless, you can always try to grab one 
> container type from the decoder, and if it fails, attempt to grab the other 
> container type.
> 
But I assume that some decoders might just misinterpret whatever is present? 
For instance, I could imagine an encoder which doesn't natively support keyed 
containers, so it fakes it by alternating key and value fields in a flat list; 
if you wrote with a keyed container and read with an unkeyed container, or vice 
versa, it would then happily misinterpret whatever was present. So it seems 
like this would probably be a pattern we'd want to discourage.
> * I think it may make sense to class-constrain some of these protocols. 
> `Encodable` and its containers seem to inherently have reference 
> semantics—otherwise data could never be communicated from all those `encode` 
> calls out to the ultimate caller of the API. Class-constraining would clearly 
> communicate this to both the implementer and the compiler. `Decoder` and its 
> containers don't *inherently* have reference semantics, but I'm not sure it's 
> a good idea to potentially copy around a lot of state in a value type.
> 
> I don't think class constraints are necessary. You can take a look at the 
> current implementation of JSONEncoder and JSONDecoder here 
> <https://github.com/itaiferber/swift/blob/3c59bfa749adad2575975e47130b28b731f763e0/stdlib/public/SDK/Foundation/JSONEncoder.swift>
>  (note that this is still a rough implementation and will be updated soon). 
> The model I've followed there is that the encoder itself (_JSONEncoder) has 
> reference semantics, but the containers (_JSONKeyedEncodingContainer, 
> _JSONUnkeyedEncodingContainer) are value-type views into the encoder itself.
> 
> Keep in mind that during the encoding process, the entities created most 
> often will be containers. Without some additional optimizations in place, you 
> end up with a lot of small, short-lived class allocations as containers are 
> brought into and out of scope.
> By not requiring the class constraints, it's at least possible to make all 
> those containers value types with references to the shared encoder.
> 
Okay, I see what you're getting at. But that still makes me think that:

* `Encoder` and `Decoder` should be class-constrained. An `Encoder` *must* 
share some kind of reference with its containers, and while it would certainly 
be possible to hide that reference away somewhere inside a value-typed 
`Encoder`, I can't imagine it's worth penalties like having to pass a larger 
existential container (`class`-constrained protocol existentials are 2+ words 
instead of 5+) and having to virtualize reference-counting operations through a 
value witness table. (See 
https://github.com/apple/swift/blob/master/docs/ABI.rst#class-existential-containers
 for details here.)

* There is no need for the `encode` calls to be marked `mutating`. An encoding 
container is just a view on state within a reference type, so it doesn't mutate 
the container itself, only the portion of the encoder the container refers to. 
(It is not possible for an encoding container to hold value-typed state, 
either, because the encoder never gets a chance to grab that state once you're 
finished with it.) While these calls do mutate *some* state *somewhere*, they 
don't mutate the state of the container.

Basically, what I'm saying is that `Encoder` should always be a reference 
*type* and encoding containers, even if implemented with a value *type*, should 
always have reference *semantics*.

(The `decode` calls on an `UnkeyedDecodingContainer` are a harder case, because 
it really *is* plausible that the "next element" pointer might be stored in the 
container rather than the decoder. I'm not totally sure what to do there.)

(Oh, one other thing: Does `UnkeyedDecodingContainer.count` return the total 
number of elements, if known, or the number of elements *remaining*, if known? 
The latter sounds more useful to me.)
> Having implemented these myself multiple times, I agree — it can be a pain to 
> repeat these implementations, and if you look at the linked implementations 
> above, funneling to one method from all of those is exactly what I do (and in 
> fact, this can be shortened significantly, which I plan on doing soon).
> 
> There is a tradeoff here between ease of use for the end consumer of the API, 
> and ease of coding for the writer of a new Encoder/Decoder, and my argument 
> will always be for the benefit of the end consumer. (There will be orders of 
> magnitude more end consumers of this API than those writing new Encoders and 
> Decoders 😉)
> 
This is true—we must put the API's consumer first.
> Think of the experience for the consumer of this API, especially someone 
> learning it for the first time. It can already be somewhat of a hurdle to 
> figure out what kind of container you need, but even once you get a keyed 
> container (which is what we want to encourage), then what? You start typing 
> container.enc... and in the autocomplete list in Xcode, the only thing that 
> shows up is one autocomplete result: encode(value: Encodable, forKey: ...) 
> Consider the clarity (or lack thereof) of that, as opposed to seeing 
> encode(value: Int, forKey: ...), encode(value: String, forKey: ...), etc. 
> Given a list of types that users are already familiar with helps immensely 
> with pushing them in the right direction and reducing cognitive load. When 
> you see String in that list, you don't have to question whether it's possible 
> to encode a string or not, you just pick it. I have an Int8, can I encode it? 
> Ah, it's in the list, so I can.
> 
This is a very interesting argument that I hadn't considered. But I think it 
still fails, because the list of primitive types doesn't convey what you want 
developers to understand about the types they can encode.

Imagine being new to the API. You start writing your first `encode(to:)` 
method, and you get to this point:

        func encode(to encoder: Encoder) throws {
                encoder.

You expect to see some way to start putting types in the encoder, but instead 
you get this list:

        [CodingKey?]                                    codingPath
        KeyedEncodingContainer<Key>     container(keyedBy: CodingKey.Protocol)
        SingleValueEncodingContainer    singleValueContainer()
        [CodingUserInfoKey : Any]               userInfo
        UnkeyedEncodingContainer                unkeyedContainer()

Okay, two of these are obviously just properties, and the other three ask you 
to choose a "container". Maybe you read some documentation or tutorials, maybe 
you just guess, but either way, you figure out that you want a keyed container 
and that you need a CodingKey enum to key it with.

So you make a keyed container, assign it to a variable, and get to this point:

        func encode(to encoder: Encoder) throws {
                var container = encoder.container(keyedBy: CodingKeys.self)
                
                container.

When you ask autocomplete what you can do with your shiny new container, you 
get this:

        [CodingKey?]                                            codingPath
        Void                                                            
encode(value: Bool?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Data?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Double?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Encodable?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Float?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Int16?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Int32?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Int64?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Int8?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Int?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: String?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: UInt16?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: UInt32?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: UInt64?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: UInt8?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: UInt?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encodeWeak(object: (AnyObject &…), forKey: MyModel.CodingKeys) throws
        KeyedEncodingContainer<CodingKey>       nestedContainer(keyedBy: 
CodingKey.Protocol, forKey: MyModel.CodingKeys)
        UnkeyedEncodingContainer                        
nestedUnkeyedContainer(forKey: MyModel.CodingKeys)
        Encoder                                                 superEncoder()
        Encoder                                                 
superEncoder(forKey: MyModel.CodingKeys)

I think this list would give a new developer an *entirely* incorrect impression 
of how you're supposed to encode your type. The list contains *ten* integer 
types, *two* floating-point types, `Bool`, `Data`, `String`, and, oh yeah, the 
overload that handles literally everything else. The impression I would get is 
that this supports only the listed types—why else would we list every variety 
of integer, for instance?—and the types *users* conform to `Encodable`, if I 
noticed that overload at all. Maybe the nested containers are for coding nested 
arrays and dictionaries—there's certainly no reason this list would make me 
believe I can encode those directly.

If we wanted the overload list to help beginners understand which types 
`encode(_:forKey:)` supports, I would want it to look more like:

        [CodingKey?]                                            codingPath
        Void                                                            
encode(value: [Encodable]?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: [Encodable & Hashable: Encodable]?, forKey: MyModel.CodingKeys) 
throws
        Void                                                            
encode(value: Data?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Date?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Double?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Encodable?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Int?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: Set<Encodable & Hashable>?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: String?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encode(value: URL?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encodeWeak(object: (AnyObject &…), forKey: MyModel.CodingKeys) throws
        KeyedEncodingContainer<CodingKey>       nestedContainer(keyedBy: 
CodingKey.Protocol, forKey: MyModel.CodingKeys)
        UnkeyedEncodingContainer                        
nestedUnkeyedContainer(forKey: MyModel.CodingKeys)
        Encoder                                                 superEncoder()
        Encoder                                                 
superEncoder(forKey: MyModel.CodingKeys)

These ten overloads more fairly represent the kinds of Swift- and 
Foundation-provided encodable types people actually use most often. The 
presence of `Array`, `Dictionary`, and `Set` convey that encoded types may have 
a complex inner structure. The absence of all the `Int`/`UInt` varieties does 
mean it's not immediately obvious that it supports every single one, but Swift 
guidelines highly discourage use of those types anyway unless you *really* need 
those semantics for some reason.

Or, alternatively, just offer the one overload:

        [CodingKey?]                                            codingPath
        Void                                                            
encode(value: Encodable?, forKey: MyModel.CodingKeys) throws
        Void                                                            
encodeWeak(object: (AnyObject &…), forKey: MyModel.CodingKeys) throws
        KeyedEncodingContainer<CodingKey>       nestedContainer(keyedBy: 
CodingKey.Protocol, forKey: MyModel.CodingKeys)
        UnkeyedEncodingContainer                        
nestedUnkeyedContainer(forKey: MyModel.CodingKeys)
        Encoder                                                 superEncoder()
        Encoder                                                 
superEncoder(forKey: MyModel.CodingKeys)

And make sure the little two-sentence blurb at the bottom of the autocomplete 
list mentions some of the compatible types:

        Encodes the given value for the given key. Built-in Encodable types 
include most basic value types in the standard library and Foundation, such as 
Int, Double, String, Date, and URL, plus Array, Dictionary, and Set when their 
elements are Encodable.
> Even for advanced users of the API, though, there's something to be said for 
> static guarantees in the overloading. As someone familiar with the API (who 
> might even know all the primitives by heart), I might wonder if the Encoder 
> I'm using has correctly switched on the generic type. (It would have to be a 
> dynamic switch.) Did the implementer remember to switch on Int16 correctly? 
> Or did they forget it and will I be falling into a generic case which is not 
> appropriate here?
> 
I get that, which is why I'm *not* suggesting going to `switch`; I'm suggesting 
that primitives all be handled thorough single-value containers. An advanced 
user would know that all of the single-value container's types were explicitly 
implemented by every encoder.
> Let's take a step back, though. This is mostly annoying to implement because 
> of the repetition, right? If someone were to put together a proposal for a 
> proper macro system in Swift, which is really what we want here, I wouldn't 
> be sad. 😉
> 
Don't tempt me!
> This is already unfortunately a no-go. As mentioned in other emails, you 
> cannot override an associatetype in a subclass of a class, which means that 
> you cannot require a different container type than your superclass. This is 
> especially problematic in the default case where we'd want to encourage types 
> to use keyed containers — every type should have its own keys, and you'd need 
> to have a different keyed container than your parent, keyed on your keys.
> 
Ugh, that's too bad. Class inheritance ruins everything, as usual.
> This might address my last point above, but then what useful interface would 
> EncodingSink and DecodingSource have if a type conforming to EncodingSink 
> could be any one of the containers or even a whole encoder itself?
> 
`EncodingSink`'s members are used by the encoder to convert itself into 
whatever type a given `Encodable` type's `encode(to:)` method wants to write 
itself into. The `encode(to:)` method itself knows which concrete type it wants 
to use, and uses the interface of that concrete type to do its work.

(But as you said, the associatedtype issue prevents this approach.)
> I am curious, though, about your comment above on preconditions being 
> unenforceable, because this is certainly something we would want to hammer 
> out before releasing. What cases are you thinking of that are unenforceable?
> 
I'm just imagining somebody thinking they can construct several different 
containers from the same encoder—for instance, making a keyed and an unkeyed 
container that are peers to one another, or making two keyed containers with 
different coding key types (with the encoder preventing collisions by including 
the coding key type alongside the stringValue). When I was first reading the 
proposal, I was under the impression that coding key types would prevent 
collisions between, for instance, superclass and subclass keys; I realized a 
little later in my read-through that you weren't taking that approach, but I 
could see someone else making the same mistake and not realizing it until 
they'd already written a lot of code.

* * *

Apropos of none of this discussion so far, one other thing I just noticed: 
`JSONEncoder` and `JSONDecoder` have a lot of parallel "strategy" properties, 
but they use different types for them. Have you considered making a single 
`JSONCodingStrategy` type which expresses bidirectional rules? That would allow 
a developer to make a constant containing their strategy and use it as a single 
source of truth.

-- 
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Foundation Swift Encoders

Reply via email to