Re: [swift-evolution] [Proposal] Foundation Swift Encoders

David Hart via swift-evolution Wed, 05 Apr 2017 13:44:36 -0700

> On 5 Apr 2017, at 19:04, Tony Parker <[email protected]> wrote:
> 
> Hi David,
> 
>> On Apr 4, 2017, at 10:33 PM, David Hart via swift-evolution 
>> <[email protected] <mailto:[email protected]>> wrote:
>> 
>> Very interesting discussion below. Here are a few more points:
>> 
>> Sent from my iPhone
>> On 4 Apr 2017, at 23:43, Itai Ferber via swift-evolution 
>> <[email protected] <mailto:[email protected]>> wrote:
>> 
>>> Hi Brent,
>>> 
>>> Thanks for your comments and thorough review! :)
>>> Responses inline.
>>> 
>>> On 4 Apr 2017, at 1:57, Brent Royal-Gordon wrote:
>>> 
>>> 
>>> On Apr 3, 2017, at 1:31 PM, Itai Ferber via swift-evolution 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> Hi everyone,
>>> 
>>> With feedback from swift-evolution and additional internal review, we've 
>>> pushed updates to this proposal, and to the Swift Archival & Serialization 
>>> proposal.
>>> Changes to here mostly mirror the ones made to Swift Archival & 
>>> Serialization, but you can see a specific diff of what's changed here. Full 
>>> content below.
>>> 
>>> We'll be looking to start the official review process very soon, so we're 
>>> interested in any additional feedback.
>>> 
>>> Thanks!
>>> 
>>> — Itai
>>> 
>>> This is a good revision to a good proposal.
>>> 
>>> I'm glad `CodingKey`s now require `stringValue`s; I think the intended 
>>> semantics are now a lot clearer, and key behavior will be much more 
>>> reliable.
>>> 
>>> Agreed
>>> 
>>> 
>>> I like the separation between keyed and unkeyed containers (and I think 
>>> "unkeyed" is a good name, though not perfect), but I'm not quite happy with 
>>> the unkeyed container API. Encoding a value into an unkeyed container 
>>> appends it to the container's end; decoding a value from an unkeyed 
>>> container removes it from the container's front. These are very important 
>>> semantics that the method names in question do not imply at all.
>>> 
>>> I think that consistency of phrasing is really important here, and the 
>>> action words "encode" and "decode" are even more important to connote than 
>>> the semantics of pushing and popping.
>>> (Note that there need not be specific directionality to an unkeyed 
>>> container as long as the ordering of encoded items is eventually maintained 
>>> on decode.) But on a practical note, names like encodeAtEnd and 
>>> decodeFromFront (or similar) don't feel like they communicate anything much 
>>> more useful than the current encode/decode.
>>> 
>>> 
>>> Certain aspects of `UnkeyedDecodingContainer` also feel like they do the 
>>> same things as `Sequence` and `IteratorProtocol`, but in different and 
>>> incompatible ways. And I certainly think that the `encode(contentsOf:)` 
>>> methods on `UnkeyedEncodingContainer` could use equivalents on the 
>>> `UnkeyedDecodingContainer`. Still, the design in this area is much improved 
>>> compared to the previous iteration.
>>> 
>>> Which aspects of Sequence and IteratorProtocol do you feel like you're 
>>> missing on UnkeyedDecodingContainer? Keep in mind that methods on 
>>> UnkeyedDecodingContainer must be able to throw, and an 
>>> UnkeyedDecodingContainercan hold heterogeneous items whose type is not 
>>> known, two things that Sequence and IteratorProtocol do not do.
>>> 
>>> In terms of an equivalent to encode(contentsOf:), keep in mind that this 
>>> would only work if the collection you're decoding is homogeneous, in which 
>>> case, you would likely prefer to decode an Array over getting an unkeyed 
>>> container, no? (As soon as conditional conformance arrives in Swift, we 
>>> will be able to express extension Array : Decodable where Element : 
>>> Decodable { ... } making decoding homogeneous arrays trivial.)
>>> 
>>> 
>>> (Tiny nitpick: I keep finding myself saying "encode into", not "encode to" 
>>> as the API name suggests. Would that be a better parameter label?)
>>> 
>>> On a personal note here — I agree with you, and had originally used "into". 
>>> However, we've reviewed our APIs and more often have balanced from:/to: 
>>> rather than from:/into: on read/write/streaming calls. We'd like to rein 
>>> these in a bit and keep them consistent within our naming guidelines, as 
>>> much as possible.
>>> 
>>> 
>>> I like the functionality of the `userInfo` dictionary, but I'm still not 
>>> totally satisfied casting out of `Any` all the time. I might just have to 
>>> get over that, though.
>>> 
>>> I think this is the closest we can get to a pragmatic balance between 
>>> dynamic needs and static guarantees. :)
>>> 
>>> 
>>> I wonder if `CodingKey` implementations might ever need access to the 
>>> `userInfo`. I suppose you can just switch to a different set of 
>>> `CodingKeys` if you do.
>>> 
>>> I don't think CodingKey should ever know about userInfo — CodingKeys should 
>>> be inert data. If you need to, use the userInfo to switch to a different 
>>> set of keys, as you mention.
>>> 
>>> 
>>> Should there be a way for an `init(from:)` implementation to determine the 
>>> type of container in the encoder it's just been handed? Or perhaps the 
>>> better question is, do we want to promise users that all decoders can tell 
>>> the difference?
>>> 
>>> I think it would be very rare to need this type of information. If a type 
>>> wants to encode as an array or as a dictionary conditionally, the context 
>>> for that would likely be present in userInfo.
>>> If you really must try to decode regardless, you can always try to grab one 
>>> container type from the decoder, and if it fails, attempt to grab the other 
>>> container type.
>>> 
>>> 
>>> * * *
>>> 
>>> I went ahead and implemented a basic version of `Encoder` and `Encodable` 
>>> in a Swift 3 playground, just to get a feel for this system in action and 
>>> experiment with a few things. A few observations:
>>> 
>>> Lots to unpack here, let's go one by one. :)
>>> 
>>> 
>>> * I think it may make sense to class-constrain some of these protocols. 
>>> `Encodable` and its containers seem to inherently have reference 
>>> semantics—otherwise data could never be communicated from all those 
>>> `encode` calls out to the ultimate caller of the API. Class-constraining 
>>> would clearly communicate this to both the implementer and the compiler. 
>>> `Decoder` and its containers don't *inherently* have reference semantics, 
>>> but I'm not sure it's a good idea to potentially copy around a lot of state 
>>> in a value type.
>>> 
>>> I don't think class constraints are necessary. You can take a look at the 
>>> current implementation of JSONEncoder and JSONDecoder here 
>>> <https://github.com/itaiferber/swift/blob/3c59bfa749adad2575975e47130b28b731f763e0/stdlib/public/SDK/Foundation/JSONEncoder.swift>
>>>  (note that this is still a rough implementation and will be updated soon). 
>>> The model I've followed there is that the encoder itself (_JSONEncoder) has 
>>> reference semantics, but the containers (_JSONKeyedEncodingContainer, 
>>> _JSONUnkeyedEncodingContainer) are value-type views into the encoder itself.
>>> 
>>> Keep in mind that during the encoding process, the entities created most 
>>> often will be containers. Without some additional optimizations in place, 
>>> you end up with a lot of small, short-lived class allocations as containers 
>>> are brought into and out of scope.
>>> By not requiring the class constraints, it's at least possible to make all 
>>> those containers value types with references to the shared encoder.
>>> 
>>> 
>>> * I really think that including overloads for every primitive type in all 
>>> three container types is serious overkill. In my implementation, the 
>>> primitive types' `Encodable` conformances simply request a 
>>> `SingleValueEncodingContainer` and write themselves into it. I can't 
>>> imagine any coder doing anything in their overloads that wouldn't be 
>>> compatible with that, especially since they can never be sure when someone 
>>> will end up using the `Encodable` conformance directly instead of the 
>>> primitive. So what are all these overloads buying us? Are they just 
>>> avoiding a generic dispatch and the creation of a new `Encoder` and perhaps 
>>> a `SingleValueEncodingContainer`? I don't think that's worth the increased 
>>> API surface, the larger overload sets, or the danger that an encoder might 
>>> accidentally implement one of the duplicative primitive encoding calls 
>>> inconsistently with the others.
>>> 
>>> To be clear: In my previous comments, I suggested that we should radically 
>>> reduce the number of primitive types. That is not what I'm saying here. I'm 
>>> saying that we should always use a single value container to encode and 
>>> decode primitives, and the other container types should always use 
>>> `Encodable` or `Decodable`. This doesn't reduce the capabilities of the 
>>> system at all; it just means you only have to write the code to handle a 
>>> given primitive type one time instead of three.
>>> 
>>> Having implemented these myself multiple times, I agree — it can be a pain 
>>> to repeat these implementations, and if you look at the linked 
>>> implementations above, funneling to one method from all of those is exactly 
>>> what I do (and in fact, this can be shortened significantly, which I plan 
>>> on doing soon).
>>> 
>>> There is a tradeoff here between ease of use for the end consumer of the 
>>> API, and ease of coding for the writer of a new Encoder/Decoder, and my 
>>> argument will always be for the benefit of the end consumer. (There will be 
>>> orders of magnitude more end consumers of this API than those writing new 
>>> Encoders and Decoders 😉)
>>> Think of the experience for the consumer of this API, especially someone 
>>> learning it for the first time. It can already be somewhat of a hurdle to 
>>> figure out what kind of container you need, but even once you get a keyed 
>>> container (which is what we want to encourage), then what? You start typing 
>>> container.enc... and in the autocomplete list in Xcode, the only thing that 
>>> shows up is one autocomplete result: encode(value: Encodable, forKey: ...) 
>>> Consider the clarity (or lack thereof) of that, as opposed to seeing 
>>> encode(value: Int, forKey: ...), encode(value: String, forKey: ...), etc. 
>>> Given a list of types that users are already familiar with helps immensely 
>>> with pushing them in the right direction and reducing cognitive load. When 
>>> you see String in that list, you don't have to question whether it's 
>>> possible to encode a string or not, you just pick it. I have an Int8, can I 
>>> encode it? Ah, it's in the list, so I can.
>>> 
>>> Even for advanced users of the API, though, there's something to be said 
>>> for static guarantees in the overloading. As someone familiar with the API 
>>> (who might even know all the primitives by heart), I might wonder if the 
>>> Encoder I'm using has correctly switched on the generic type. (It would 
>>> have to be a dynamic switch.) Did the implementer remember to switch on 
>>> Int16 correctly? Or did they forget it and will I be falling into a generic 
>>> case which is not appropriate here?
>>> 
>>> When it comes to overloads vs. dynamically switching on a generic type, I 
>>> think we would generally prefer the static type safety. As a consumer of 
>>> the API I want to be sure that the implementer of the Encoder I'm using was 
>>> aware of these primitive types in some way, and that the compiler helped 
>>> them too to make sure they didn't, say, forget to switch on Data.self. As a 
>>> writer of Encoders, yes, this is a pain, but a sacrifice I'm willing to 
>>> make for the sake of the consumer.
>>> 
>>> Let's take a step back, though. This is mostly annoying to implement 
>>> because of the repetition, right? If someone were to put together a 
>>> proposal for a proper macro system in Swift, which is really what we want 
>>> here, I wouldn't be sad. 😉
>>> 
>> There's also an argument of API surface area. As a user or implementer of 
>> the API, it's much less intimidating to load the documentation for a 
>> protocol and see one central function than many overloads.
>> 
>> I've used many serialization third-party frameworks in Swift. None of them 
>> defined all those overloads, and more importantly, I never saw any user of 
>> those APIs post an issue to GitHub where the cause could be traced back to 
>> the lack of those overloads.
>> 
>> These overloads look to me like remnants of Codable's NSCoding influences 
>> instead of an API reimagined for Swift.
>> 
>> For the same reasons, I continue to believe that decode functions should 
>> overload on the return type. If we follow the arguments in favor of 
>> providing a type argument, then why don't we also have type arguments for 
>> encoders: encode(_ value: T?, forKey key: Key, as type: T.self)? I'm not 
>> advocating that: I'm just pushing the argument to its logical conclusion to 
>> explain why I don't understand it.
> 
> I don’t see a way for a call to encode to become ambiguous by omitting the 
> type argument, whereas the same is not true for a return value from decode. 
> The two seem fundamentally different.


When decoding to a property, there will be no ambiguity. And for other cases, 
Swift developers are already quite used to handling that kind of ambiguity, 
like for literals:

let x: UInt = 10
let y = 20 as CGFloat

> - Tony
> 
>>> 
>>> * And then there's the big idea: Changing the type of the parameter to 
>>> `encode(to:)` and `init(from:)`.
>>> 
>>> ***
>>> 
>>> While working with the prototype, I realized that the vast majority of 
>>> conformances will immediately make a container and then never use the 
>>> `encoder` or `decoder` again. I also noticed that it's illegal to create 
>>> more than one container from the same coder, and there are unenforceable 
>>> preconditions to that effect. So I'm wondering if it would make sense to 
>>> not pass the coder at all, but instead have the conforming type declare 
>>> what kind of container it wants:
>>> 
>>> extension Pet: Codable {
>>> init(from container: KeyedDecodingContainer<CodingKeys>) throws {
>>> name = try container.decode(String.self, forKey: .name)
>>> age = try container.decode(Int.self, forKey: .age)
>>> }
>>> 
>>> func encode(to container: KeyedEncodingContainer<CodingKeys>) throws {
>>> try container.encode(name, forKey: .name)
>>> try container.encode(age, forKey: .age)
>>> }
>>> }
>>> 
>>> extension Array: Encodable where Element: Encodable {
>>> init(from container: UnkeyedDecodingContainer) throws {
>>> self.init()
>>> while !container.isAtEnd {
>>> append(try container.decode(Element.self))
>>> }
>>> }
>>> 
>>> func encode(to container: UnkeyedEncodingContainer) throws {
>>> container.encode(contentsOf: self)
>>> }
>>> }
>>> 
>>> I think this could be implemented by doing the following:
>>> 
>>> 1. Adding an associated type to `Encodable` and `Decodable` for the type 
>>> passed to `encode(to:)`/`init(from:)`.
>>> 
>>> This is already unfortunately a no-go. As mentioned in other emails, you 
>>> cannot override an associatetype in a subclass of a class, which means that 
>>> you cannot require a different container type than your superclass. This is 
>>> especially problematic in the default case where we'd want to encourage 
>>> types to use keyed containers — every type should have its own keys, and 
>>> you'd need to have a different keyed container than your parent, keyed on 
>>> your keys.
>>> 
>>> Along with that, since the typealias would have to be at least as visible 
>>> as your type (potentially public), it would necessitate that your key type 
>>> would be at least as public as your type as well. This would expose your 
>>> type's coding keys, which is prohibitive. (Consider what this would mean 
>>> for frameworks, for instance.)
>>> 
>>> Finally, this also means that you could not request different container 
>>> types based on context — a type could not offer both a dictionary 
>>> representation and a more efficient array representation, since it can only 
>>> statically request one container type.
>>> 
>>> 
>>> 2. Creating protocols for the types that are permitted there. Call them 
>>> `EncodingSink` and `DecodingSource` for now.
>>> 
>>> 3. Creating *simple* type-erased wrappers for the `Unkeyed*Container` and 
>>> `SingleValue*Container` protocols and conforming them to `EncodingSink` and 
>>> `DecodingSource`. These wouldn't need the full generic-subclass dance 
>>> usually used for type-erased wrappers; they just exist so you can strap 
>>> initializers to them. In a future version of Swift which allowed 
>>> initializers on existentials, we could probably get rid of them.
>>> 
>>> (Incidentally, if our APIs always return a type-erased wrapper around the 
>>> `Keyed*ContainerProtocol` types, there's no actual need for the underlying 
>>> protocols to have a `Key` associated type; they can use `CodingKey` 
>>> existentials and depend on the wrapper to enforce the strong key typing. 
>>> That would allow us to use a simple type-erased wrapper for 
>>> `Keyed*Container`, too.)
>>> 
>>> 4. For advanced use cases where you really *do* need to access the encoder 
>>> in order to decide which container type to use, we would also need to 
>>> create a simple type-erased wrapper around `Encoder` and `Decoder` 
>>> themselves, conforming them to the `Sink`/`Source` protocols.
>>> 
>>> This might address my last point above, but then what useful interface 
>>> would EncodingSink and DecodingSource have if a type conforming to 
>>> EncodingSink could be any one of the containers or even a whole encoder 
>>> itself?
>>> 
>>> 
>>> 5. The Source/Sink parameter would need to be `inout`, unless we *do* end 
>>> up class-constraining things. (My prototype didn't.)
>>> 
>>> There are lots of little details that change too, but these are the broad 
>>> strokes.
>>> 
>>> Although this technically introduces more types, I think it actually 
>>> simplifies the design for people who are just using the `Codable` protocol. 
>>> All they have to know about is the `Codable` protocol, the magic 
>>> `CodingKeys` type, the three container types (realistically, probably just 
>>> the `KeyedEncoding/DecodingContainer`), and the top-level encoders they 
>>> want to use. Most users should never need to know about the members of the 
>>> `Encoder` protocol; few even need to know about the other two container 
>>> types. They don't need to do the "create a container" dance. The thing 
>>> would just work with a minimum of fuss.
>>> 
>>> Meanwhile, folks who write encoders *do* deal with a bit more complexity, 
>>> but only because they have to be aware of more type-erased wrappers. In 
>>> other respects, it's simpler for them, too. Keyed containers don't need to 
>>> be generic, and they have a layer of Foundation-provided wrappers above 
>>> them that can help enforce good behavior and (probably) hide the 
>>> implementation a little bit more. I think that overall, it's probably 
>>> better for them, too.
>>> 
>>> Thoughts?
>>> 
>>> For what it's worth, the way to introduce these three different types of 
>>> encoding without the use of associated types is to split the Codable 
>>> protocol up into three protocols, which we've tried in the past 
>>> <https://github.com/itaiferber/swift-evolution/blob/swift-archival-serialization/proposals/XXXX-swift-archival-serialization.md#alternatives-considered>
>>>  (bullet #4). Unfortunately, the results are not great — an even bigger 
>>> explosion of types, overloads, etc.
>>> 
>>> While I agree that the current approach of dynamically requesting 
>>> containers is, well, dynamic, the benefit of not exposing your keys 
>>> publicly and allowing encoding of classes is a big win in comparison, I 
>>> think.
>>> 
>>> I am curious, though, about your comment above on preconditions being 
>>> unenforceable, because this is certainly something we would want to hammer 
>>> out before releasing. What cases are you thinking of that are unenforceable?
>>> 
>>> 
>>> -- 
>>> Brent Royal-Gordon
>>> Architechies
>>> 
>>> Again, thanks for your thorough review! Looking forward to further 
>>> comments. :)
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>> _______________________________________________
>> swift-evolution mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>> <https://lists.swift.org/mailman/listinfo/swift-evolution>

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Foundation Swift Encoders

Reply via email to