Re: [swift-evolution] [Proposal] Foundation Swift Encoders

Tony Parker via swift-evolution Wed, 05 Apr 2017 10:04:55 -0700

Hi David,

> On Apr 4, 2017, at 10:33 PM, David Hart via swift-evolution 
> <[email protected]> wrote:
> 
> Very interesting discussion below. Here are a few more points:
> 
> Sent from my iPhone
> On 4 Apr 2017, at 23:43, Itai Ferber via swift-evolution 
> <[email protected] <mailto:[email protected]>> wrote:
> 
>> Hi Brent,
>> 
>> Thanks for your comments and thorough review! :)
>> Responses inline.
>> 
>> On 4 Apr 2017, at 1:57, Brent Royal-Gordon wrote:
>> 
>> 
>> On Apr 3, 2017, at 1:31 PM, Itai Ferber via swift-evolution 
>> <[email protected] <mailto:[email protected]>> wrote:
>> Hi everyone,
>> 
>> With feedback from swift-evolution and additional internal review, we've 
>> pushed updates to this proposal, and to the Swift Archival & Serialization 
>> proposal.
>> Changes to here mostly mirror the ones made to Swift Archival & 
>> Serialization, but you can see a specific diff of what's changed here. Full 
>> content below.
>> 
>> We'll be looking to start the official review process very soon, so we're 
>> interested in any additional feedback.
>> 
>> Thanks!
>> 
>> — Itai
>> 
>> This is a good revision to a good proposal.
>> 
>> I'm glad `CodingKey`s now require `stringValue`s; I think the intended 
>> semantics are now a lot clearer, and key behavior will be much more reliable.
>> 
>> Agreed
>> 
>> 
>> I like the separation between keyed and unkeyed containers (and I think 
>> "unkeyed" is a good name, though not perfect), but I'm not quite happy with 
>> the unkeyed container API. Encoding a value into an unkeyed container 
>> appends it to the container's end; decoding a value from an unkeyed 
>> container removes it from the container's front. These are very important 
>> semantics that the method names in question do not imply at all.
>> 
>> I think that consistency of phrasing is really important here, and the 
>> action words "encode" and "decode" are even more important to connote than 
>> the semantics of pushing and popping.
>> (Note that there need not be specific directionality to an unkeyed container 
>> as long as the ordering of encoded items is eventually maintained on 
>> decode.) But on a practical note, names like encodeAtEnd and decodeFromFront 
>> (or similar) don't feel like they communicate anything much more useful than 
>> the current encode/decode.
>> 
>> 
>> Certain aspects of `UnkeyedDecodingContainer` also feel like they do the 
>> same things as `Sequence` and `IteratorProtocol`, but in different and 
>> incompatible ways. And I certainly think that the `encode(contentsOf:)` 
>> methods on `UnkeyedEncodingContainer` could use equivalents on the 
>> `UnkeyedDecodingContainer`. Still, the design in this area is much improved 
>> compared to the previous iteration.
>> 
>> Which aspects of Sequence and IteratorProtocol do you feel like you're 
>> missing on UnkeyedDecodingContainer? Keep in mind that methods on 
>> UnkeyedDecodingContainer must be able to throw, and an 
>> UnkeyedDecodingContainercan hold heterogeneous items whose type is not 
>> known, two things that Sequence and IteratorProtocol do not do.
>> 
>> In terms of an equivalent to encode(contentsOf:), keep in mind that this 
>> would only work if the collection you're decoding is homogeneous, in which 
>> case, you would likely prefer to decode an Array over getting an unkeyed 
>> container, no? (As soon as conditional conformance arrives in Swift, we will 
>> be able to express extension Array : Decodable where Element : Decodable { 
>> ... } making decoding homogeneous arrays trivial.)
>> 
>> 
>> (Tiny nitpick: I keep finding myself saying "encode into", not "encode to" 
>> as the API name suggests. Would that be a better parameter label?)
>> 
>> On a personal note here — I agree with you, and had originally used "into". 
>> However, we've reviewed our APIs and more often have balanced from:/to: 
>> rather than from:/into: on read/write/streaming calls. We'd like to rein 
>> these in a bit and keep them consistent within our naming guidelines, as 
>> much as possible.
>> 
>> 
>> I like the functionality of the `userInfo` dictionary, but I'm still not 
>> totally satisfied casting out of `Any` all the time. I might just have to 
>> get over that, though.
>> 
>> I think this is the closest we can get to a pragmatic balance between 
>> dynamic needs and static guarantees. :)
>> 
>> 
>> I wonder if `CodingKey` implementations might ever need access to the 
>> `userInfo`. I suppose you can just switch to a different set of `CodingKeys` 
>> if you do.
>> 
>> I don't think CodingKey should ever know about userInfo — CodingKeys should 
>> be inert data. If you need to, use the userInfo to switch to a different set 
>> of keys, as you mention.
>> 
>> 
>> Should there be a way for an `init(from:)` implementation to determine the 
>> type of container in the encoder it's just been handed? Or perhaps the 
>> better question is, do we want to promise users that all decoders can tell 
>> the difference?
>> 
>> I think it would be very rare to need this type of information. If a type 
>> wants to encode as an array or as a dictionary conditionally, the context 
>> for that would likely be present in userInfo.
>> If you really must try to decode regardless, you can always try to grab one 
>> container type from the decoder, and if it fails, attempt to grab the other 
>> container type.
>> 
>> 
>> * * *
>> 
>> I went ahead and implemented a basic version of `Encoder` and `Encodable` in 
>> a Swift 3 playground, just to get a feel for this system in action and 
>> experiment with a few things. A few observations:
>> 
>> Lots to unpack here, let's go one by one. :)
>> 
>> 
>> * I think it may make sense to class-constrain some of these protocols. 
>> `Encodable` and its containers seem to inherently have reference 
>> semantics—otherwise data could never be communicated from all those `encode` 
>> calls out to the ultimate caller of the API. Class-constraining would 
>> clearly communicate this to both the implementer and the compiler. `Decoder` 
>> and its containers don't *inherently* have reference semantics, but I'm not 
>> sure it's a good idea to potentially copy around a lot of state in a value 
>> type.
>> 
>> I don't think class constraints are necessary. You can take a look at the 
>> current implementation of JSONEncoder and JSONDecoder here 
>> <https://github.com/itaiferber/swift/blob/3c59bfa749adad2575975e47130b28b731f763e0/stdlib/public/SDK/Foundation/JSONEncoder.swift>
>>  (note that this is still a rough implementation and will be updated soon). 
>> The model I've followed there is that the encoder itself (_JSONEncoder) has 
>> reference semantics, but the containers (_JSONKeyedEncodingContainer, 
>> _JSONUnkeyedEncodingContainer) are value-type views into the encoder itself.
>> 
>> Keep in mind that during the encoding process, the entities created most 
>> often will be containers. Without some additional optimizations in place, 
>> you end up with a lot of small, short-lived class allocations as containers 
>> are brought into and out of scope.
>> By not requiring the class constraints, it's at least possible to make all 
>> those containers value types with references to the shared encoder.
>> 
>> 
>> * I really think that including overloads for every primitive type in all 
>> three container types is serious overkill. In my implementation, the 
>> primitive types' `Encodable` conformances simply request a 
>> `SingleValueEncodingContainer` and write themselves into it. I can't imagine 
>> any coder doing anything in their overloads that wouldn't be compatible with 
>> that, especially since they can never be sure when someone will end up using 
>> the `Encodable` conformance directly instead of the primitive. So what are 
>> all these overloads buying us? Are they just avoiding a generic dispatch and 
>> the creation of a new `Encoder` and perhaps a 
>> `SingleValueEncodingContainer`? I don't think that's worth the increased API 
>> surface, the larger overload sets, or the danger that an encoder might 
>> accidentally implement one of the duplicative primitive encoding calls 
>> inconsistently with the others.
>> 
>> To be clear: In my previous comments, I suggested that we should radically 
>> reduce the number of primitive types. That is not what I'm saying here. I'm 
>> saying that we should always use a single value container to encode and 
>> decode primitives, and the other container types should always use 
>> `Encodable` or `Decodable`. This doesn't reduce the capabilities of the 
>> system at all; it just means you only have to write the code to handle a 
>> given primitive type one time instead of three.
>> 
>> Having implemented these myself multiple times, I agree — it can be a pain 
>> to repeat these implementations, and if you look at the linked 
>> implementations above, funneling to one method from all of those is exactly 
>> what I do (and in fact, this can be shortened significantly, which I plan on 
>> doing soon).
>> 
>> There is a tradeoff here between ease of use for the end consumer of the 
>> API, and ease of coding for the writer of a new Encoder/Decoder, and my 
>> argument will always be for the benefit of the end consumer. (There will be 
>> orders of magnitude more end consumers of this API than those writing new 
>> Encoders and Decoders 😉)
>> Think of the experience for the consumer of this API, especially someone 
>> learning it for the first time. It can already be somewhat of a hurdle to 
>> figure out what kind of container you need, but even once you get a keyed 
>> container (which is what we want to encourage), then what? You start typing 
>> container.enc... and in the autocomplete list in Xcode, the only thing that 
>> shows up is one autocomplete result: encode(value: Encodable, forKey: ...) 
>> Consider the clarity (or lack thereof) of that, as opposed to seeing 
>> encode(value: Int, forKey: ...), encode(value: String, forKey: ...), etc. 
>> Given a list of types that users are already familiar with helps immensely 
>> with pushing them in the right direction and reducing cognitive load. When 
>> you see String in that list, you don't have to question whether it's 
>> possible to encode a string or not, you just pick it. I have an Int8, can I 
>> encode it? Ah, it's in the list, so I can.
>> 
>> Even for advanced users of the API, though, there's something to be said for 
>> static guarantees in the overloading. As someone familiar with the API (who 
>> might even know all the primitives by heart), I might wonder if the Encoder 
>> I'm using has correctly switched on the generic type. (It would have to be a 
>> dynamic switch.) Did the implementer remember to switch on Int16 correctly? 
>> Or did they forget it and will I be falling into a generic case which is not 
>> appropriate here?
>> 
>> When it comes to overloads vs. dynamically switching on a generic type, I 
>> think we would generally prefer the static type safety. As a consumer of the 
>> API I want to be sure that the implementer of the Encoder I'm using was 
>> aware of these primitive types in some way, and that the compiler helped 
>> them too to make sure they didn't, say, forget to switch on Data.self. As a 
>> writer of Encoders, yes, this is a pain, but a sacrifice I'm willing to make 
>> for the sake of the consumer.
>> 
>> Let's take a step back, though. This is mostly annoying to implement because 
>> of the repetition, right? If someone were to put together a proposal for a 
>> proper macro system in Swift, which is really what we want here, I wouldn't 
>> be sad. 😉
>> 
> There's also an argument of API surface area. As a user or implementer of the 
> API, it's much less intimidating to load the documentation for a protocol and 
> see one central function than many overloads.
> 
> I've used many serialization third-party frameworks in Swift. None of them 
> defined all those overloads, and more importantly, I never saw any user of 
> those APIs post an issue to GitHub where the cause could be traced back to 
> the lack of those overloads.
> 
> These overloads look to me like remnants of Codable's NSCoding influences 
> instead of an API reimagined for Swift.
> 
> For the same reasons, I continue to believe that decode functions should 
> overload on the return type. If we follow the arguments in favor of providing 
> a type argument, then why don't we also have type arguments for encoders: 
> encode(_ value: T?, forKey key: Key, as type: T.self)? I'm not advocating 
> that: I'm just pushing the argument to its logical conclusion to explain why 
> I don't understand it.


I don’t see a way for a call to encode to become ambiguous by omitting the type 
argument, whereas the same is not true for a return value from decode. The two 
seem fundamentally different.

- Tony

>> 
>> * And then there's the big idea: Changing the type of the parameter to 
>> `encode(to:)` and `init(from:)`.
>> 
>> ***
>> 
>> While working with the prototype, I realized that the vast majority of 
>> conformances will immediately make a container and then never use the 
>> `encoder` or `decoder` again. I also noticed that it's illegal to create 
>> more than one container from the same coder, and there are unenforceable 
>> preconditions to that effect. So I'm wondering if it would make sense to not 
>> pass the coder at all, but instead have the conforming type declare what 
>> kind of container it wants:
>> 
>> extension Pet: Codable {
>> init(from container: KeyedDecodingContainer<CodingKeys>) throws {
>> name = try container.decode(String.self, forKey: .name)
>> age = try container.decode(Int.self, forKey: .age)
>> }
>> 
>> func encode(to container: KeyedEncodingContainer<CodingKeys>) throws {
>> try container.encode(name, forKey: .name)
>> try container.encode(age, forKey: .age)
>> }
>> }
>> 
>> extension Array: Encodable where Element: Encodable {
>> init(from container: UnkeyedDecodingContainer) throws {
>> self.init()
>> while !container.isAtEnd {
>> append(try container.decode(Element.self))
>> }
>> }
>> 
>> func encode(to container: UnkeyedEncodingContainer) throws {
>> container.encode(contentsOf: self)
>> }
>> }
>> 
>> I think this could be implemented by doing the following:
>> 
>> 1. Adding an associated type to `Encodable` and `Decodable` for the type 
>> passed to `encode(to:)`/`init(from:)`.
>> 
>> This is already unfortunately a no-go. As mentioned in other emails, you 
>> cannot override an associatetype in a subclass of a class, which means that 
>> you cannot require a different container type than your superclass. This is 
>> especially problematic in the default case where we'd want to encourage 
>> types to use keyed containers — every type should have its own keys, and 
>> you'd need to have a different keyed container than your parent, keyed on 
>> your keys.
>> 
>> Along with that, since the typealias would have to be at least as visible as 
>> your type (potentially public), it would necessitate that your key type 
>> would be at least as public as your type as well. This would expose your 
>> type's coding keys, which is prohibitive. (Consider what this would mean for 
>> frameworks, for instance.)
>> 
>> Finally, this also means that you could not request different container 
>> types based on context — a type could not offer both a dictionary 
>> representation and a more efficient array representation, since it can only 
>> statically request one container type.
>> 
>> 
>> 2. Creating protocols for the types that are permitted there. Call them 
>> `EncodingSink` and `DecodingSource` for now.
>> 
>> 3. Creating *simple* type-erased wrappers for the `Unkeyed*Container` and 
>> `SingleValue*Container` protocols and conforming them to `EncodingSink` and 
>> `DecodingSource`. These wouldn't need the full generic-subclass dance 
>> usually used for type-erased wrappers; they just exist so you can strap 
>> initializers to them. In a future version of Swift which allowed 
>> initializers on existentials, we could probably get rid of them.
>> 
>> (Incidentally, if our APIs always return a type-erased wrapper around the 
>> `Keyed*ContainerProtocol` types, there's no actual need for the underlying 
>> protocols to have a `Key` associated type; they can use `CodingKey` 
>> existentials and depend on the wrapper to enforce the strong key typing. 
>> That would allow us to use a simple type-erased wrapper for 
>> `Keyed*Container`, too.)
>> 
>> 4. For advanced use cases where you really *do* need to access the encoder 
>> in order to decide which container type to use, we would also need to create 
>> a simple type-erased wrapper around `Encoder` and `Decoder` themselves, 
>> conforming them to the `Sink`/`Source` protocols.
>> 
>> This might address my last point above, but then what useful interface would 
>> EncodingSink and DecodingSource have if a type conforming to EncodingSink 
>> could be any one of the containers or even a whole encoder itself?
>> 
>> 
>> 5. The Source/Sink parameter would need to be `inout`, unless we *do* end up 
>> class-constraining things. (My prototype didn't.)
>> 
>> There are lots of little details that change too, but these are the broad 
>> strokes.
>> 
>> Although this technically introduces more types, I think it actually 
>> simplifies the design for people who are just using the `Codable` protocol. 
>> All they have to know about is the `Codable` protocol, the magic 
>> `CodingKeys` type, the three container types (realistically, probably just 
>> the `KeyedEncoding/DecodingContainer`), and the top-level encoders they want 
>> to use. Most users should never need to know about the members of the 
>> `Encoder` protocol; few even need to know about the other two container 
>> types. They don't need to do the "create a container" dance. The thing would 
>> just work with a minimum of fuss.
>> 
>> Meanwhile, folks who write encoders *do* deal with a bit more complexity, 
>> but only because they have to be aware of more type-erased wrappers. In 
>> other respects, it's simpler for them, too. Keyed containers don't need to 
>> be generic, and they have a layer of Foundation-provided wrappers above them 
>> that can help enforce good behavior and (probably) hide the implementation a 
>> little bit more. I think that overall, it's probably better for them, too.
>> 
>> Thoughts?
>> 
>> For what it's worth, the way to introduce these three different types of 
>> encoding without the use of associated types is to split the Codable 
>> protocol up into three protocols, which we've tried in the past 
>> <https://github.com/itaiferber/swift-evolution/blob/swift-archival-serialization/proposals/XXXX-swift-archival-serialization.md#alternatives-considered>
>>  (bullet #4). Unfortunately, the results are not great — an even bigger 
>> explosion of types, overloads, etc.
>> 
>> While I agree that the current approach of dynamically requesting containers 
>> is, well, dynamic, the benefit of not exposing your keys publicly and 
>> allowing encoding of classes is a big win in comparison, I think.
>> 
>> I am curious, though, about your comment above on preconditions being 
>> unenforceable, because this is certainly something we would want to hammer 
>> out before releasing. What cases are you thinking of that are unenforceable?
>> 
>> 
>> -- 
>> Brent Royal-Gordon
>> Architechies
>> 
>> Again, thanks for your thorough review! Looking forward to further comments. 
>> :)
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
> _______________________________________________
> swift-evolution mailing list
> [email protected] <mailto:[email protected]>
> https://lists.swift.org/mailman/listinfo/swift-evolution 
> <https://lists.swift.org/mailman/listinfo/swift-evolution>

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Foundation Swift Encoders

Reply via email to