On Mar 17, 2017, at 1:15 PM, Itai Ferber via swift-evolution <
[email protected]> wrote:
On 15 Mar 2017, at 22:58, Zach Waldowski wrote:
Another issue of scale - I had to switch to a native mail client as
replying inline severely broke my webmail client. ;-)
Again, lots of love here. Responses inline.
On Mar 15, 2017, at 6:40 PM, Itai Ferber via swift-evolution <
[email protected]> wrote:
Proposed solution
We will be introducing the following new types:
protocol Codable: Adopted by types to opt into archival. Conformance
may
be automatically derived in cases where all properties are also
Codable.
FWIW I think this is acceptable compromise. If the happy path is
derived
conformances, only-decodable or only-encodable types feel like a lazy
way
out on the part of a user of the API, and builds a barrier to proper
testing.
[snip]
Structured types (i.e. types which encode as a collection of
properties)
encode and decode their properties in a keyed manner. Keys may be
String-convertible or Int-convertible (or both), and user types which
have
properties should declare semantic key enums which map keys to their
properties. Keys must conform to the CodingKey protocol:
public protocol CodingKey { <##snip##> }
A few things here:
The protocol leaves open the possibility of having both a String or
Int
representation, or neither. What should a coder do in either case?
Are the
representations intended to be mutually exclusive, or not? The
protocol
design doesn’t seem particularly matching with the flavor of Swift;
I’d
expect something along the lines of a CodingKey enum and the protocol
CodingKeyRepresentable. It’s also possible that the concerns of the
two are
orthogonal enough that they deserve separate container(keyedBy:)
requirements.
The general answer to "what should a coder do" is "what is
appropriate for
its format". For a format that uses exclusively string keys (like
JSON),
the string representation (if present on a key) will always be used.
If the
key has no string representation but does have an integer
representation,
the encoder may choose to stringify the integer. If the key has
neither, it
is appropriate for the Encoder to fail in some way.
On the flip side, for totally flat formats, an Encoder may choose to
ignore keys altogether, in which case it doesn’t really matter. The
choice
is up to the Encoder and its format.
The string and integer representations are not meant to be mutually
exclusive at all, and in fact, where relevant, we encourage providing
both
types of representations for flexibility.
As for the possibility of having neither representation, this
question
comes up often. I’d like to summarize the thought process here by
quoting
some earlier review (apologies for the poor formatting from my mail
client):
If there are two options, each of which is itself optional, we have 4
possible combinations. But! At the same time we prohibit one
combination by
what? Runtime error? Why not use a 3-case enum for it? Even further
down
the rabbit whole there might be a CodingKey<> specialized for a
concrete
combination, like CodingKey<StringAndIntKey> or just
CodingKey<StringKey>,
but I’m not sure whether our type system will make it useful or
possible…
public enum CodingKeyValue {
case integer(value: Int)
case string(value: String)
case both(intValue: Int, stringValue: String)
}
public protocol CodingKey {
init?(value: CodingKeyValue)
var value: CodingKeyValue { get }
}
I agree that this certainly feels suboptimal. We’ve certainly
explored
other possibilities before sticking to this one, so let me try to
summarize
here:
* Having a concrete 3-case CodingKey enum would preclude the
possibility
of having neither a stringValue nor an intValue. However, there is a
lot of
value in having the key types belong to the type being encoded (more
safety, impossible to accidentally mix key types, private keys,
etc.); if
the CodingKey type itself is an enum (which cannot be inherited
from), then
this prevents differing key types.
* Your solution as presented is better: CodingKey itself is still a
protocol, and the value itself is the 3-case enum. However, since
CodingKeyValue is not literal-representable, user keys cannot be
enums
RawRepresentable by CodingKeyValue. That means that the values must
either
be dynamically returned, or (for attaining the benefits that we want
to
give users — easy representation, autocompletion, etc.) the type
has to be
a struct with static lets on it giving the CodingKeyValues. This
certainly
works, but is likely not what a developer would have in mind when
working
with the API; the power of enums in Swift makes them very easy to
reach
for, and I’m thinking most users would expect their keys to be
enums. We’d
like to leverage that where we can, especially since RawRepresentable
enums
are appropriate in the vast majority of use cases.
* Three separate CodingKey protocols (one for Strings, one for Ints,
and
one for both). You could argue that this is the most correct version,
since
it most clearly represents what we’re looking for. However, this
means that
every method now accepting a CodingKey must be converted into 3
overloads
each accepting different types. This explodes the API surface, is
confusing
for users, and also makes it impossible to use CodingKey as an
existential
(unless it’s an empty 4th protocol which makes no static guarantees
and the
others inherit from).
* [The current] approach. On the one hand, this allows for the
accidental
representation of a key with neither a stringValue nor an intValue.
On the
other, we want to make it really easy to use autogenerated keys, or
autogenerated key implementations if you provide the cases and values
yourself. The nil value possibility is only a concern when writing
stringValue and intValue yourself, which the vast majority of users
should
not have to do.
* Additionally, a key word in that sentence bolded above is
“generally”.
As part of making this API more generalized, we push a lot of
decisions to
Encoders and Decoders. For many formats, it’s true that having a
key with
no value is an error, but this is not necessarily true for all
formats; for
a linear, non-keyed format, it is entirely reasonable to ignore the
keys in
the first place, or replaced them with fixed-format values. The
decision of
how to handle this case is left up to Encoders and Decoders; for most
formats (and for our implementations), this is certainly an error,
and we
would likely document this and either throw or preconditionFailure.
But
this is not the case always.
* In terms of syntax, there’s another approach that would be really
nice
(but is currently not feasible) — if enums were RawRepresentable in
terms
of tuples, it would be possible to give implementations for String,
Int,
(Int, String), (String, Int), etc., making this condition harder to
represent by default unless you really mean to.
Hope that gives some helpful background on this decision. FWIW, the
only
way to end up with a key having no intValue or stringValue is
manually
implementing the CodingKey protocol (which should be *exceedingly*
rare)
and implementing the methods by not switching on self, or some other
method that would allow you to forget to give a key neither value.
Speaking of the mutually exclusive representations - what above
serializations that doesn’t code as one of those two things? YAML
can have
anything be a “key”, and despite that being not particularly
sane, it is a
use case.
We’ve explored this, but at the end of the day, it’s not possible
to
generalize this to the point where we could represent all possible
options
on all possible formats because you cannot make any promises as to
what’s
possible and what’s not statically.
We’d like to strike a balance here between strong static guarantees
on one
end (the extreme end of which introduces a new API for every single
format,
since you can almost perfectly statically express what’s possible
and what
isn’) and generalization on the other (the extreme end of which is
an empty
protocol because there really are encoding formats which are mutually
exclusive). So in this case, this API would support producing and
consuming
YAML with string or integer keys, but not arbitrary YAML.
For most types, String-convertible keys are a reasonable default; for
performance, however, Int-convertible keys are preferred, and
Encoders may
choose to make use of Ints over Strings. Framework types should
provide
keys which have both for flexibility and performance across different
types
of Encoders. It is generally an error to provide a key which has
neither a
stringValue nor an intValue.
Could you speak a little more to using Int-convertible keys for
performance? I get the feeling int-based keys parallel the legacy of
NSCoder’s older design, and I don’t really see anyone these days
supporting
non-keyed archivers. They strike me as fragile. What other use cases
are
envisioned for ordered archiving than that?
We agree that integer keys are fragile, and from years (decades) of
experience with NSArchiver, we are aware of the limitations that such
encoding offers. For this reason, we will never synthesize integer
keys on
your behalf. This is something you must put thought into, if using an
integer key for archival.
However, there are use-cases (both in archival and in serialization,
but
especially so in serialization) where integer keys are useful.
Ordered
encoding is one such possibility (when the format supports it,
integer keys
are sequential, etc.), and is helpful for, say, marshaling objects in
an
XPC context (where both sides are aware of the format, are running
the same
version of the same code, on the same device) — keys waste time and
bandwidth unnecessarily in some cases.
Integer keys don’t necessarily imply ordered encoding, however.
There are
binary encoding formats which support integer-keyed dictionaries
(read:
serialized hash maps) which are more efficient to encode and decode
than
similar string-keyed ones. In that case, as long as integer keys are
chosen
with care, the end result is more performant.
But again, this depends on the application and use case. Defining
integer
keys requires manual effort because we want thought put into defining
them;
they are indeed fragile when used carelessly.
[snip]
Keyed Encoding Containers
Keyed encoding containers are the primary interface that most Codable
types interact with for encoding and decoding. Through these, Codable
types
have strongly-keyed access to encoded data by using keys that are
semantically correct for the operations they want to express.
Since semantically incompatible keys will rarely (if ever) share the
same
key type, it is impossible to mix up key types within the same
container
(as is possible with Stringkeys), and since the type is known
statically,
keys get autocompletion by the compiler.
open class KeyedEncodingContainer<Key : CodingKey> {
Like others, I’m a little bummed about this part of the design.
Your
reasoning up-thread is sound, but I chafe a bit on having to
reabstract and
a little more on having to be a reference type. Particularly knowing
that
it’s got a bit more overhead involved… I /like/ that
NSKeyedArchiver can
simply push some state and pass itself as the next encoding container
down
the stack.
There’s not much more to be said about why this is a class that I
haven’t
covered; if it were possible to do otherwise at the moment, then we
would.
It is possible using a manually written type-erased wrapper along the
lines of AnySequence and AnyCollection. I don’t recall seeing a
rationale
for why you don’t want to go this route. I would still like to
hear more
on this topic.
As for *why* we do this — this is the crux of the whole API. We not
only
want to make it easy to use a custom key type that is semantically
correct
for your type, we want to make it difficult to do the easy but
incorrect
thing. From experience with NSKeyedArchiver, we’d like to move away
from
unadorned string (and integer) keys, where typos and accidentally
reused
keys are common, and impossible to catch statically.
encode<T : Codable>(_: T?, forKey: String) unfortunately not only
encourages code like encode(foo, forKey: "foi") // whoops, typo, it
is *more
difficult* to use a semantic key type: encode(foo, forKey:
CodingKeys.foo.stringValue). The additional typing and lack of
autocompletion makes it an active disincentive. encode<T :
Codable>(_:
T?, forKey: Key) reverses both of these — it makes it impossible to
use
unadorned strings or accidentally use keys from another type, and
nets
shorter code with autocompletion: encode(foo, forKey: .foo)
The side effect of this being the fact that keyed containers are
classes
is suboptimal, I agree, but necessary.
open func encode<Value : Codable>(_ value: Value?, forKey key: Key)
throws
Does this win anything over taking a Codable?
Taking the concrete type over an existential allows for static
dispatch on
the type within the implementation, and is a performance win in some
cases.
open func encode(_ value: Bool?, forKey key: Key) throws
open func encode(_ value: Int?, forKey key: Key) throws
open func encode(_ value: Int8?, forKey key: Key) throws
open func encode(_ value: Int16?, forKey key: Key) throws
open func encode(_ value: Int32?, forKey key: Key) throws
open func encode(_ value: Int64?, forKey key: Key) throws
open func encode(_ value: UInt?, forKey key: Key) throws
open func encode(_ value: UInt8?, forKey key: Key) throws
open func encode(_ value: UInt16?, forKey key: Key) throws
open func encode(_ value: UInt32?, forKey key: Key) throws
open func encode(_ value: UInt64?, forKey key: Key) throws
open func encode(_ value: Float?, forKey key: Key) throws
open func encode(_ value: Double?, forKey key: Key) throws
open func encode(_ value: String?, forKey key: Key) throws
open func encode(_ value: Data?, forKey key: Key) throws
What is the motivation behind abandoning the idea of “primitives”
from the
Alternatives Considered? Performance? Being unable to close the
protocol?
Being unable to close the protocol is the primary reason. Not being
able
to tell at a glance what the concrete types belonging to this set are
is
related, and also a top reason.
Looks like we have another strong motivating use case for closed
protocols. I hope that will be in scope for Swift 5.
It would be great for the auto-generated documentation and
“headers" to
provide a list of all public or open types inheriting from a closed
class
or conforming to a closed protocol (when we get them). This would go
a
long way towards addressing your second reason.
What ways is encoding a value envisioned to fail? I understand
wanting to
allow maximum flexibility, and being symmetric to `decode` throwing,
but
there are plenty of “conversion” patterns the are asymmetric in
the ways
they can fail (Date formatters, RawRepresentable,
LosslessStringConvertible, etc.).
Different formats support different concrete values, even of
primitive
types. For instance, you cannot natively encode Double.nan in JSON,
but
you can in plist. Without additional options on JSONEncoder,
encode(Double.nan,
forKey: …) will throw.
/// For `Encoder`s that implement this functionality, this will only
encode the given object and associate it with the given key if it
encoded
unconditionally elsewhere in the archive (either previously or in the
future).
open func encodeWeak<Object : AnyObject & Codable>(_ object: Object?,
forKey key: Key) throws
Is this correct that if I send a Cocoa-style object graph (with weak
backrefs), an encoder could infinitely recurse? Or is a coder
supposed to
detect that?
encodeWeak has a default implementation that calls the regular
encode<T :
Codable>(_: T, forKey: Key); only formats which actually support weak
backreferencing should override this implementation, so it should
always be
safe to call (it will simply unconditionally encode the object by
default).
open var codingKeyContext: [CodingKey]
}
[snippity snip]
Alright, those are just my first thoughts. I want to spend a little
time
marinating in the code from PR #8124 before I comment further.
Cheers! I
owe you, Michael, and Tony a few drinks for sure.
Hehe, thanks :)
Zach Waldowski
[email protected]
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution