Re: [swift-evolution] [Proposal] Foundation Swift Archival & Serialization

Itai Ferber via swift-evolution Fri, 17 Mar 2017 12:47:31 -0700

Do you mean versions of the format, or versions of your type?

If the latter, this can be done on a case-by-case basis, as needed. Youcan always do something like


```swift
struct Foo : Codable {
    // Name this as appropriate
    private let jsonVersion = 1.1
}
```

and have it encode as well.

On 17 Mar 2017, at 11:51, T.J. Usiyan wrote:

Is there any sense of encoding versions (as in, changes to the JSON
representation, for instance?) I don't know that it is necessarily agood
idea overall but now is the time to consider it.

On Fri, Mar 17, 2017 at 2:27 PM, Matthew Johnson via swift-evolution <
[email protected]> wrote:
On Mar 17, 2017, at 1:15 PM, Itai Ferber via swift-evolution <
[email protected]> wrote:

On 15 Mar 2017, at 22:58, Zach Waldowski wrote:

Another issue of scale - I had to switch to a native mail client as
replying inline severely broke my webmail client. ;-)

Again, lots of love here. Responses inline.

On Mar 15, 2017, at 6:40 PM, Itai Ferber via swift-evolution <
[email protected]> wrote:
Proposed solution
We will be introducing the following new types:
protocol Codable: Adopted by types to opt into archival. Conformancemaybe automatically derived in cases where all properties are alsoCodable.
FWIW I think this is acceptable compromise. If the happy path isderivedconformances, only-decodable or only-encodable types feel like a lazyway
out on the part of a user of the API, and builds a barrier to proper
testing.

[snip]
Structured types (i.e. types which encode as a collection ofproperties)
encode and decode their properties in a keyed manner. Keys may be
String-convertible or Int-convertible (or both), and user types whichhave
properties should declare semantic key enums which map keys to their
properties. Keys must conform to the CodingKey protocol:
public protocol CodingKey { <##snip##> }

A few things here:
The protocol leaves open the possibility of having both a String orIntrepresentation, or neither. What should a coder do in either case?Are therepresentations intended to be mutually exclusive, or not? Theprotocoldesign doesn’t seem particularly matching with the flavor of Swift;I’d
expect something along the lines of a CodingKey enum and the protocol
CodingKeyRepresentable. It’s also possible that the concerns of thetwo are
orthogonal enough that they deserve separate container(keyedBy:)
requirements.
The general answer to "what should a coder do" is "what isappropriate forits format". For a format that uses exclusively string keys (likeJSON),the string representation (if present on a key) will always be used.If thekey has no string representation but does have an integerrepresentation,the encoder may choose to stringify the integer. If the key hasneither, it
is appropriate for the Encoder to fail in some way.

On the flip side, for totally flat formats, an Encoder may choose to
ignore keys altogether, in which case it doesn’t really matter. Thechoice
is up to the Encoder and its format.

The string and integer representations are not meant to be mutually
exclusive at all, and in fact, where relevant, we encourage providingboth
types of representations for flexibility.
As for the possibility of having neither representation, thisquestioncomes up often. I’d like to summarize the thought process here byquotingsome earlier review (apologies for the poor formatting from my mailclient):
If there are two options, each of which is itself optional, we have 4
possible combinations. But! At the same time we prohibit onecombination bywhat? Runtime error? Why not use a 3-case enum for it? Even furtherdownthe rabbit whole there might be a CodingKey<> specialized for aconcretecombination, like CodingKey<StringAndIntKey> or justCodingKey<StringKey>,but I’m not sure whether our type system will make it useful orpossible…
public enum CodingKeyValue {
case integer(value: Int)
case string(value: String)
case both(intValue: Int, stringValue: String)
}
public protocol CodingKey {
init?(value: CodingKeyValue)
var value: CodingKeyValue { get }
}
I agree that this certainly feels suboptimal. We’ve certainlyexploredother possibilities before sticking to this one, so let me try tosummarize
here:
* Having a concrete 3-case CodingKey enum would preclude thepossibilityof having neither a stringValue nor an intValue. However, there is alot of
value in having the key types belong to the type being encoded (more
safety, impossible to accidentally mix key types, private keys,etc.); ifthe CodingKey type itself is an enum (which cannot be inheritedfrom), then
this prevents differing key types.
* Your solution as presented is better: CodingKey itself is still a
protocol, and the value itself is the 3-case enum. However, since
CodingKeyValue is not literal-representable, user keys cannot beenumsRawRepresentable by CodingKeyValue. That means that the values musteitherbe dynamically returned, or (for attaining the benefits that we wanttogive users — easy representation, autocompletion, etc.) the typehas to bea struct with static lets on it giving the CodingKeyValues. Thiscertainlyworks, but is likely not what a developer would have in mind whenworkingwith the API; the power of enums in Swift makes them very easy toreachfor, and I’m thinking most users would expect their keys to beenums. We’dlike to leverage that where we can, especially since RawRepresentableenums
are appropriate in the vast majority of use cases.
* Three separate CodingKey protocols (one for Strings, one for Ints,andone for both). You could argue that this is the most correct version,sinceit most clearly represents what we’re looking for. However, thismeans thatevery method now accepting a CodingKey must be converted into 3overloadseach accepting different types. This explodes the API surface, isconfusingfor users, and also makes it impossible to use CodingKey as anexistential(unless it’s an empty 4th protocol which makes no static guaranteesand the
others inherit from).
* [The current] approach. On the one hand, this allows for theaccidentalrepresentation of a key with neither a stringValue nor an intValue.On the
other, we want to make it really easy to use autogenerated keys, or
autogenerated key implementations if you provide the cases and values
yourself. The nil value possibility is only a concern when writing
stringValue and intValue yourself, which the vast majority of usersshould
not have to do.
* Additionally, a key word in that sentence bolded above is“generally”.As part of making this API more generalized, we push a lot ofdecisions toEncoders and Decoders. For many formats, it’s true that having akey withno value is an error, but this is not necessarily true for allformats; fora linear, non-keyed format, it is entirely reasonable to ignore thekeys inthe first place, or replaced them with fixed-format values. Thedecision of
how to handle this case is left up to Encoders and Decoders; for most
formats (and for our implementations), this is certainly an error,and wewould likely document this and either throw or preconditionFailure.But
this is not the case always.
* In terms of syntax, there’s another approach that would be reallynice(but is currently not feasible) — if enums were RawRepresentable intermsof tuples, it would be possible to give implementations for String,Int,
(Int, String), (String, Int), etc., making this condition harder to
represent by default unless you really mean to.
Hope that gives some helpful background on this decision. FWIW, theonlyway to end up with a key having no intValue or stringValue ismanuallyimplementing the CodingKey protocol (which should be *exceedingly*rare)
and implementing the methods by not switching on self, or some other
method that would allow you to forget to give a key neither value.

Speaking of the mutually exclusive representations - what above
serializations that doesn’t code as one of those two things? YAMLcan haveanything be a “key”, and despite that being not particularlysane, it is a
use case.
We’ve explored this, but at the end of the day, it’s not possibletogeneralize this to the point where we could represent all possibleoptionson all possible formats because you cannot make any promises as towhat’s
possible and what’s not statically.
We’d like to strike a balance here between strong static guaranteeson oneend (the extreme end of which introduces a new API for every singleformat,since you can almost perfectly statically express what’s possibleand whatisn’) and generalization on the other (the extreme end of which isan empty
protocol because there really are encoding formats which are mutually
exclusive). So in this case, this API would support producing andconsuming
YAML with string or integer keys, but not arbitrary YAML.

For most types, String-convertible keys are a reasonable default; for
performance, however, Int-convertible keys are preferred, andEncoders maychoose to make use of Ints over Strings. Framework types shouldprovidekeys which have both for flexibility and performance across differenttypesof Encoders. It is generally an error to provide a key which hasneither a
stringValue nor an intValue.

Could you speak a little more to using Int-convertible keys for
performance? I get the feeling int-based keys parallel the legacy of
NSCoder’s older design, and I don’t really see anyone these dayssupportingnon-keyed archivers. They strike me as fragile. What other use casesare
envisioned for ordered archiving than that?

We agree that integer keys are fragile, and from years (decades) of
experience with NSArchiver, we are aware of the limitations that such
encoding offers. For this reason, we will never synthesize integerkeys on
your behalf. This is something you must put thought into, if using an
integer key for archival.
However, there are use-cases (both in archival and in serialization,butespecially so in serialization) where integer keys are useful.Orderedencoding is one such possibility (when the format supports it,integer keysare sequential, etc.), and is helpful for, say, marshaling objects inanXPC context (where both sides are aware of the format, are runningthe same
version of the same code, on the same device) — keys waste time and
bandwidth unnecessarily in some cases.
Integer keys don’t necessarily imply ordered encoding, however.There arebinary encoding formats which support integer-keyed dictionaries(read:serialized hash maps) which are more efficient to encode and decodethansimilar string-keyed ones. In that case, as long as integer keys arechosen
with care, the end result is more performant.
But again, this depends on the application and use case. Definingintegerkeys requires manual effort because we want thought put into definingthem;
they are indeed fragile when used carelessly.

[snip]

Keyed Encoding Containers

Keyed encoding containers are the primary interface that most Codable
types interact with for encoding and decoding. Through these, Codabletypes
have strongly-keyed access to encoded data by using keys that are
semantically correct for the operations they want to express.
Since semantically incompatible keys will rarely (if ever) share thesamekey type, it is impossible to mix up key types within the samecontainer(as is possible with Stringkeys), and since the type is knownstatically,
keys get autocompletion by the compiler.

open class KeyedEncodingContainer<Key : CodingKey> {
Like others, I’m a little bummed about this part of the design.Yourreasoning up-thread is sound, but I chafe a bit on having toreabstract anda little more on having to be a reference type. Particularly knowingthatit’s got a bit more overhead involved… I /like/ thatNSKeyedArchiver cansimply push some state and pass itself as the next encoding containerdown
the stack.
There’s not much more to be said about why this is a class that Ihaven’tcovered; if it were possible to do otherwise at the moment, then wewould.
It is possible using a manually written type-erased wrapper along the
lines of AnySequence and AnyCollection. I don’t recall seeing arationalefor why you don’t want to go this route. I would still like tohear more
on this topic.
As for *why* we do this — this is the crux of the whole API. We notonlywant to make it easy to use a custom key type that is semanticallycorrectfor your type, we want to make it difficult to do the easy butincorrectthing. From experience with NSKeyedArchiver, we’d like to move awayfromunadorned string (and integer) keys, where typos and accidentallyreused
keys are common, and impossible to catch statically.
encode<T : Codable>(_: T?, forKey: String) unfortunately not only
encourages code like encode(foo, forKey: "foi") // whoops, typo, itis *more
difficult* to use a semantic key type: encode(foo, forKey:
CodingKeys.foo.stringValue). The additional typing and lack of
autocompletion makes it an active disincentive. encode<T :Codable>(_:T?, forKey: Key) reverses both of these — it makes it impossible touseunadorned strings or accidentally use keys from another type, andnets
shorter code with autocompletion: encode(foo, forKey: .foo)
The side effect of this being the fact that keyed containers areclasses
is suboptimal, I agree, but necessary.
open func encode<Value : Codable>(_ value: Value?, forKey key: Key)throws
Does this win anything over taking a Codable?
Taking the concrete type over an existential allows for staticdispatch onthe type within the implementation, and is a performance win in somecases.
open func encode(_ value: Bool?, forKey key: Key) throws
open func encode(_ value: Int?, forKey key: Key) throws
open func encode(_ value: Int8?, forKey key: Key) throws
open func encode(_ value: Int16?, forKey key: Key) throws
open func encode(_ value: Int32?, forKey key: Key) throws
open func encode(_ value: Int64?, forKey key: Key) throws
open func encode(_ value: UInt?, forKey key: Key) throws
open func encode(_ value: UInt8?, forKey key: Key) throws
open func encode(_ value: UInt16?, forKey key: Key) throws
open func encode(_ value: UInt32?, forKey key: Key) throws
open func encode(_ value: UInt64?, forKey key: Key) throws
open func encode(_ value: Float?, forKey key: Key) throws
open func encode(_ value: Double?, forKey key: Key) throws
open func encode(_ value: String?, forKey key: Key) throws
open func encode(_ value: Data?, forKey key: Key) throws
What is the motivation behind abandoning the idea of “primitives”from theAlternatives Considered? Performance? Being unable to close theprotocol?
Being unable to close the protocol is the primary reason. Not beingableto tell at a glance what the concrete types belonging to this set areis
related, and also a top reason.

Looks like we have another strong motivating use case for closed
protocols.  I hope that will be in scope for Swift 5.
It would be great for the auto-generated documentation and“headers" toprovide a list of all public or open types inheriting from a closedclassor conforming to a closed protocol (when we get them). This would goa
long way towards addressing your second reason.
What ways is encoding a value envisioned to fail? I understandwanting toallow maximum flexibility, and being symmetric to `decode` throwing,butthere are plenty of “conversion” patterns the are asymmetric inthe ways
they can fail (Date formatters, RawRepresentable,
LosslessStringConvertible, etc.).
Different formats support different concrete values, even ofprimitivetypes. For instance, you cannot natively encode Double.nan in JSON,butyou can in plist. Without additional options on JSONEncoder,encode(Double.nan,
forKey: …) will throw.

/// For `Encoder`s that implement this functionality, this will only
encode the given object and associate it with the given key if itencoded
unconditionally elsewhere in the archive (either previously or in the
future).
open func encodeWeak<Object : AnyObject & Codable>(_ object: Object?,
forKey key: Key) throws

Is this correct that if I send a Cocoa-style object graph (with weak
backrefs), an encoder could infinitely recurse? Or is a codersupposed to
detect that?
encodeWeak has a default implementation that calls the regularencode<T :
Codable>(_: T, forKey: Key); only formats which actually support weak
backreferencing should override this implementation, so it shouldalways besafe to call (it will simply unconditionally encode the object bydefault).
open var codingKeyContext: [CodingKey]
}
[snippity snip]
Alright, those are just my first thoughts. I want to spend a littletimemarinating in the code from PR #8124 before I comment further.Cheers! I
owe you, Michael, and Tony a few drinks for sure.

Hehe, thanks :)

Zach Waldowski
[email protected]

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution



_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Foundation Swift Archival & Serialization

Reply via email to