Re: [swift-evolution] [Proposal] Foundation Swift Archival & Serialization

Matthew Johnson via swift-evolution Fri, 17 Mar 2017 11:28:03 -0700

> On Mar 17, 2017, at 1:15 PM, Itai Ferber via swift-evolution 
> <[email protected]> wrote:
> 
> On 15 Mar 2017, at 22:58, Zach Waldowski wrote:
> 
> 
> Another issue of scale - I had to switch to a native mail client as replying 
> inline severely broke my webmail client. ;-)
> 
> Again, lots of love here. Responses inline.
> On Mar 15, 2017, at 6:40 PM, Itai Ferber via swift-evolution 
> <[email protected]> wrote:
> Proposed solution
> We will be introducing the following new types:
> 
> protocol Codable: Adopted by types to opt into archival. Conformance may be 
> automatically derived in cases where all properties are also Codable.
> 
> FWIW I think this is acceptable compromise. If the happy path is derived 
> conformances, only-decodable or only-encodable types feel like a lazy way out 
> on the part of a user of the API, and builds a barrier to proper testing.
> 
> [snip]
> 
> Structured types (i.e. types which encode as a collection of properties) 
> encode and decode their properties in a keyed manner. Keys may be 
> String-convertible or Int-convertible (or both), and user types which have 
> properties should declare semantic key enums which map keys to their 
> properties. Keys must conform to the CodingKey protocol:
> public protocol CodingKey { <##snip##> }
> 
> A few things here:
> 
> The protocol leaves open the possibility of having both a String or Int 
> representation, or neither. What should a coder do in either case? Are the 
> representations intended to be mutually exclusive, or not? The protocol 
> design doesn’t seem particularly matching with the flavor of Swift; I’d 
> expect something along the lines of a CodingKey enum and the protocol 
> CodingKeyRepresentable. It’s also possible that the concerns of the two are 
> orthogonal enough that they deserve separate container(keyedBy:) requirements.
> 
> The general answer to "what should a coder do" is "what is appropriate for 
> its format". For a format that uses exclusively string keys (like JSON), the 
> string representation (if present on a key) will always be used. If the key 
> has no string representation but does have an integer representation, the 
> encoder may choose to stringify the integer. If the key has neither, it is 
> appropriate for the Encoder to fail in some way.
> 
> On the flip side, for totally flat formats, an Encoder may choose to ignore 
> keys altogether, in which case it doesn’t really matter. The choice is up to 
> the Encoder and its format.
> 
> The string and integer representations are not meant to be mutually exclusive 
> at all, and in fact, where relevant, we encourage providing both types of 
> representations for flexibility.
> 
> As for the possibility of having neither representation, this question comes 
> up often. I’d like to summarize the thought process here by quoting some 
> earlier review (apologies for the poor formatting from my mail client):
> 
> 
> If there are two options, each of which is itself optional, we have 4 
> possible combinations. But! At the same time we prohibit one combination by 
> what? Runtime error? Why not use a 3-case enum for it? Even further down the 
> rabbit whole there might be a CodingKey<> specialized for a concrete 
> combination, like CodingKey<StringAndIntKey> or just CodingKey<StringKey>, 
> but I’m not sure whether our type system will make it useful or possible…
> 
> public enum CodingKeyValue {
> case integer(value: Int)
> case string(value: String)
> case both(intValue: Int, stringValue: String)
> }
> public protocol CodingKey {
> init?(value: CodingKeyValue)
> var value: CodingKeyValue { get }
> }
> 
> I agree that this certainly feels suboptimal. We’ve certainly explored other 
> possibilities before sticking to this one, so let me try to summarize here:
> 
> * Having a concrete 3-case CodingKey enum would preclude the possibility of 
> having neither a stringValue nor an intValue. However, there is a lot of 
> value in having the key types belong to the type being encoded (more safety, 
> impossible to accidentally mix key types, private keys, etc.); if the 
> CodingKey type itself is an enum (which cannot be inherited from), then this 
> prevents differing key types.
> * Your solution as presented is better: CodingKey itself is still a protocol, 
> and the value itself is the 3-case enum. However, since CodingKeyValue is not 
> literal-representable, user keys cannot be enums RawRepresentable by 
> CodingKeyValue. That means that the values must either be dynamically 
> returned, or (for attaining the benefits that we want to give users — easy 
> representation, autocompletion, etc.) the type has to be a struct with static 
> lets on it giving the CodingKeyValues. This certainly works, but is likely 
> not what a developer would have in mind when working with the API; the power 
> of enums in Swift makes them very easy to reach for, and I’m thinking most 
> users would expect their keys to be enums. We’d like to leverage that where 
> we can, especially since RawRepresentable enums are appropriate in the vast 
> majority of use cases.
> * Three separate CodingKey protocols (one for Strings, one for Ints, and one 
> for both). You could argue that this is the most correct version, since it 
> most clearly represents what we’re looking for. However, this means that 
> every method now accepting a CodingKey must be converted into 3 overloads 
> each accepting different types. This explodes the API surface, is confusing 
> for users, and also makes it impossible to use CodingKey as an existential 
> (unless it’s an empty 4th protocol which makes no static guarantees and the 
> others inherit from).
> * [The current] approach. On the one hand, this allows for the accidental 
> representation of a key with neither a stringValue nor an intValue. On the 
> other, we want to make it really easy to use autogenerated keys, or 
> autogenerated key implementations if you provide the cases and values 
> yourself. The nil value possibility is only a concern when writing 
> stringValue and intValue yourself, which the vast majority of users should 
> not have to do.
> * Additionally, a key word in that sentence bolded above is “generally”. As 
> part of making this API more generalized, we push a lot of decisions to 
> Encoders and Decoders. For many formats, it’s true that having a key with no 
> value is an error, but this is not necessarily true for all formats; for a 
> linear, non-keyed format, it is entirely reasonable to ignore the keys in the 
> first place, or replaced them with fixed-format values. The decision of how 
> to handle this case is left up to Encoders and Decoders; for most formats 
> (and for our implementations), this is certainly an error, and we would 
> likely document this and either throw or preconditionFailure. But this is not 
> the case always.
> * In terms of syntax, there’s another approach that would be really nice (but 
> is currently not feasible) — if enums were RawRepresentable in terms of 
> tuples, it would be possible to give implementations for String, Int, (Int, 
> String), (String, Int), etc., making this condition harder to represent by 
> default unless you really mean to.
> 
> Hope that gives some helpful background on this decision. FWIW, the only way 
> to end up with a key having no intValue or stringValue is manually 
> implementing the CodingKey protocol (which should be exceedingly rare) and 
> implementing the methods by not switching on self, or some other method that 
> would allow you to forget to give a key neither value.
> 
> 
> Speaking of the mutually exclusive representations - what above 
> serializations that doesn’t code as one of those two things? YAML can have 
> anything be a “key”, and despite that being not particularly sane, it is a 
> use case.
> 
> We’ve explored this, but at the end of the day, it’s not possible to 
> generalize this to the point where we could represent all possible options on 
> all possible formats because you cannot make any promises as to what’s 
> possible and what’s not statically.
> 
> We’d like to strike a balance here between strong static guarantees on one 
> end (the extreme end of which introduces a new API for every single format, 
> since you can almost perfectly statically express what’s possible and what 
> isn’) and generalization on the other (the extreme end of which is an empty 
> protocol because there really are encoding formats which are mutually 
> exclusive). So in this case, this API would support producing and consuming 
> YAML with string or integer keys, but not arbitrary YAML.
> 
> 
> For most types, String-convertible keys are a reasonable default; for 
> performance, however, Int-convertible keys are preferred, and Encoders may 
> choose to make use of Ints over Strings. Framework types should provide keys 
> which have both for flexibility and performance across different types of 
> Encoders. It is generally an error to provide a key which has neither a 
> stringValue nor an intValue.
> Could you speak a little more to using Int-convertible keys for performance? 
> I get the feeling int-based keys parallel the legacy of NSCoder’s older 
> design, and I don’t really see anyone these days supporting non-keyed 
> archivers. They strike me as fragile. What other use cases are envisioned for 
> ordered archiving than that?
> 
> We agree that integer keys are fragile, and from years (decades) of 
> experience with NSArchiver, we are aware of the limitations that such 
> encoding offers. For this reason, we will never synthesize integer keys on 
> your behalf. This is something you must put thought into, if using an integer 
> key for archival.
> 
> However, there are use-cases (both in archival and in serialization, but 
> especially so in serialization) where integer keys are useful. Ordered 
> encoding is one such possibility (when the format supports it, integer keys 
> are sequential, etc.), and is helpful for, say, marshaling objects in an XPC 
> context (where both sides are aware of the format, are running the same 
> version of the same code, on the same device) — keys waste time and bandwidth 
> unnecessarily in some cases.
> 
> Integer keys don’t necessarily imply ordered encoding, however. There are 
> binary encoding formats which support integer-keyed dictionaries (read: 
> serialized hash maps) which are more efficient to encode and decode than 
> similar string-keyed ones. In that case, as long as integer keys are chosen 
> with care, the end result is more performant.
> 
> But again, this depends on the application and use case. Defining integer 
> keys requires manual effort because we want thought put into defining them; 
> they are indeed fragile when used carelessly.
> 
> 
> [snip]
> 
> Keyed Encoding Containers
> 
> Keyed encoding containers are the primary interface that most Codable types 
> interact with for encoding and decoding. Through these, Codable types have 
> strongly-keyed access to encoded data by using keys that are semantically 
> correct for the operations they want to express.
> 
> Since semantically incompatible keys will rarely (if ever) share the same key 
> type, it is impossible to mix up key types within the same container (as is 
> possible with Stringkeys), and since the type is known statically, keys get 
> autocompletion by the compiler.
> 
> open class KeyedEncodingContainer<Key : CodingKey> {
> 
> Like others, I’m a little bummed about this part of the design. Your 
> reasoning up-thread is sound, but I chafe a bit on having to reabstract and a 
> little more on having to be a reference type. Particularly knowing that it’s 
> got a bit more overhead involved… I /like/ that NSKeyedArchiver can simply 
> push some state and pass itself as the next encoding container down the stack.
> 
> There’s not much more to be said about why this is a class that I haven’t 
> covered; if it were possible to do otherwise at the moment, then we would.
> 
It is possible using a manually written type-erased wrapper along the lines of 
AnySequence and AnyCollection.  I don’t recall seeing a rationale for why you 
don’t want to go this route.  I would still like to hear more on this topic.


> As for why we do this — this is the crux of the whole API. We not only want 
> to make it easy to use a custom key type that is semantically correct for 
> your type, we want to make it difficult to do the easy but incorrect thing. 
> From experience with NSKeyedArchiver, we’d like to move away from unadorned 
> string (and integer) keys, where typos and accidentally reused keys are 
> common, and impossible to catch statically.
> encode<T : Codable>(_: T?, forKey: String) unfortunately not only encourages 
> code like encode(foo, forKey: "foi") // whoops, typo, it is more difficult to 
> use a semantic key type: encode(foo, forKey: CodingKeys.foo.stringValue). The 
> additional typing and lack of autocompletion makes it an active disincentive. 
> encode<T : Codable>(_: T?, forKey: Key) reverses both of these — it makes it 
> impossible to use unadorned strings or accidentally use keys from another 
> type, and nets shorter code with autocompletion: encode(foo, forKey: .foo)
> 
> The side effect of this being the fact that keyed containers are classes is 
> suboptimal, I agree, but necessary.
> 
> 
> 
> open func encode<Value : Codable>(_ value: Value?, forKey key: Key) throws
> 
> Does this win anything over taking a Codable?
> 
> Taking the concrete type over an existential allows for static dispatch on 
> the type within the implementation, and is a performance win in some cases.
> 
> 
> open func encode(_ value: Bool?, forKey key: Key) throws
> open func encode(_ value: Int?, forKey key: Key) throws
> open func encode(_ value: Int8?, forKey key: Key) throws
> open func encode(_ value: Int16?, forKey key: Key) throws
> open func encode(_ value: Int32?, forKey key: Key) throws
> open func encode(_ value: Int64?, forKey key: Key) throws
> open func encode(_ value: UInt?, forKey key: Key) throws
> open func encode(_ value: UInt8?, forKey key: Key) throws
> open func encode(_ value: UInt16?, forKey key: Key) throws
> open func encode(_ value: UInt32?, forKey key: Key) throws
> open func encode(_ value: UInt64?, forKey key: Key) throws
> open func encode(_ value: Float?, forKey key: Key) throws
> open func encode(_ value: Double?, forKey key: Key) throws
> open func encode(_ value: String?, forKey key: Key) throws
> open func encode(_ value: Data?, forKey key: Key) throws
> 
> What is the motivation behind abandoning the idea of “primitives” from the 
> Alternatives Considered? Performance? Being unable to close the protocol?
> 
> Being unable to close the protocol is the primary reason. Not being able to 
> tell at a glance what the concrete types belonging to this set are is 
> related, and also a top reason.
> 
Looks like we have another strong motivating use case for closed protocols.  I 
hope that will be in scope for Swift 5.  

It would be great for the auto-generated documentation and “headers" to provide 
a list of all public or open types inheriting from a closed class or conforming 
to a closed protocol (when we get them).  This would go a long way towards 
addressing your second reason.

> 
> What ways is encoding a value envisioned to fail? I understand wanting to 
> allow maximum flexibility, and being symmetric to `decode` throwing, but 
> there are plenty of “conversion” patterns the are asymmetric in the ways they 
> can fail (Date formatters, RawRepresentable, LosslessStringConvertible, etc.).
> 
> Different formats support different concrete values, even of primitive types. 
> For instance, you cannot natively encode Double.nan in JSON, but you can in 
> plist. Without additional options on JSONEncoder, encode(Double.nan, forKey: 
> …) will throw.
> 
> 
> /// For `Encoder`s that implement this functionality, this will only encode 
> the given object and associate it with the given key if it encoded 
> unconditionally elsewhere in the archive (either previously or in the future).
> open func encodeWeak<Object : AnyObject & Codable>(_ object: Object?, forKey 
> key: Key) throws
> 
> Is this correct that if I send a Cocoa-style object graph (with weak 
> backrefs), an encoder could infinitely recurse? Or is a coder supposed to 
> detect that?
> 
> encodeWeak has a default implementation that calls the regular encode<T : 
> Codable>(_: T, forKey: Key); only formats which actually support weak 
> backreferencing should override this implementation, so it should always be 
> safe to call (it will simply unconditionally encode the object by default).
> 
> 
> open var codingKeyContext: [CodingKey]
> }
> [snippity snip]
> Alright, those are just my first thoughts. I want to spend a little time 
> marinating in the code from PR #8124 before I comment further. Cheers! I owe 
> you, Michael, and Tony a few drinks for sure.
> 
> Hehe, thanks :)
> 
> 
> Zach Waldowski
> [email protected]
> 
> _______________________________________________
> swift-evolution mailing list
> [email protected]
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Foundation Swift Archival & Serialization

Reply via email to