On Wednesday, September 3, 2014 12:40:45 PM UTC-5, henrik lindberg wrote: > > On 2014-02-09 22:23, John Bollinger wrote: > > > > > > On Monday, September 1, 2014 3:55:03 AM UTC-5, henrik lindberg wrote: > > > > Hi, > > Recently I have been looking into serialization of various kinds, > and > > the issue of how we represent and serialize/deserialize numbers have > > come up. > > > > > > [...] > > > > > > Proposal > > ======== > > I would like to cap a Puppet Integer to be a 64 signed value when > used > > as a resource attribute, or anywhere in external formats. This means > a > > value range of -2^63 to 2^63-1 which is in Exabyte range (1 exabyte > > = 2^60). > > > > I would like to cap a Puppet Float to be a 64 bit (IEEE 754 > binary64) > > when used as a resource attribute or anywhere in external formats. > > > > With respect to intermediate results, I propose that we specify that > > values are of arbitrary size and that it is an error to store a > value > > > > > > > > What, specifically, does it mean to "store a value"? Does that mean to > > assign it to a resource attribute? > > It was vague on purpose since I cannot currently enumerate the places > where this should take place, but I was thinking resource attributes at > least. > >
Surely there is a medium between "vague" and "enumerating all possibilities". Or in the alternative, a minimum set of places where Big values must be allowed could be given. Otherwise the proposal is insufficiently defined to reason about, much less implement. > > > > that is to big for the typed representation Integer (64 bit signed). > > For > > Float (64 bit) representation there is no error, but it looses > > precision. > > > > > > > > What about numbers that overflow or underflow a 64-bit Float? > > > > > That would also be an error (when it cannot lose more precision). IEEE floating-point underflow occurs not when a number cannot lose more precision, but rather when it is nonzero but so small that it does not have a normalized representation in the chosen floating-point format. Among IEEE 64-bit doubles, these are nonzero numbers having absolute value less than 2^-1022. Almost all such subnormal representations *can* lose more precision in the sense that there are even less precise subnormals, but they already have less precision than is usual for the format. > > When specifying an attribute to have Number type, automatic > > conversion to Float (with loss of precision) takes place This happens only when the value is "stored", I presume? > if an internal > > integer number is to big for the Integer representation. > > > > (Note, by default, attributes are typed as Any, which means that > > they by > > default would store a Float if the integer value representation > > overflows). > > > > > > > > And if BigDecimal (and maybe BigInteger) were added to the type system, > > then I presume the expectation would be that over/underflowing Floats > > would go there? And maybe that overflowing integers would go there if > > necessary to avoid loss of precision? > > > > If we add them, then the runtime should be specified to gracefully > choose the required size while calculating I thought the whole reason for the proposal and discussion was that Ruby already does handle these gracefully, hence Puppet already has Big values. > and that the types Any and > Number means that they are accepted, but that Integer and Float does not > accept them (when they have values that are outside the valid range). (I > have not thought this through completely at this point I must say). > Clarification: I have no objection to limiting the values allowed for types Integer and Float, as specified in the proposal. What I am concerned about is Puppet pulling back from supporting the full range of numeric values it supports now (near-arbitrary range and precision). > > > 1) If you have BigDecimal then you don't need BigInteger. > > > True, but BigInteger specifies that a fraction is not allowed. > Supposing that I persuaded you that the type system should include a BigDecimal type in some form, I would be completely satisfied to leave it to you to decide whether it should also include a BigInteger type. > > > 2) Why would allowing one or both of the Bigs prevent Number from being > > allowed as a serializable type? > > > Not sure I said that. The problem is that if something is potentially > Big... then a database must be prepared to deal with it and it has a > high cost. *Every* Puppet value is potentially a Big *now*. What new cost is involved? I'm having trouble seeing how a database can deal efficiently with Puppet's current implicit typing anyway, Big values notwithstanding. Without additional type information, it must be prepared for any given value to be a boolean, an Integer, a float, or a 37kb string (among other possibilities). Why do Big values present an especial problem in that regard? > since everything is > basically untyped now (which we translate to the type Any), this means > that PuppetDB must be changed to use BigDecimal instead of integer 64 > and float. That is a loose^3; it is lots of work to implement, bad > performance, and everyone needs to type everything. > > Well, *some*one needs to type everything, somehow. Typing is an inherent aspect of any representation of any value. Indeed, it is loose to call Puppet values "untyped"; they are definitely typed (try inline_template('<%= type($my_variable) >') some time), but the type is not necessarily known *a priori*. It is also loose to call Puppet 3 expressions "untyped" -- it is more precise to say that expressions, including variable dereferences, are implicitly typed. But yes, for efficient numeric storage representations to be used, the types of the values to be stored must be among those for which an efficient representation is available. MOREOVER, unless the storage mechanism is prepared to adapt dynamically to the types of the values presented to it, the specific types of those values must be known in advance, and they must be consistent. In that sense everyone *does* need to type everything, regardless of whether any Big types are among the possibilities. If you do suppose a type-adaptive storage mechanism (so that people don't need to type everything) then the mere possibility of Big values does not impose any additional inefficiency. The actual appearance of Big values might be costly, but if such a value is in fact presented for storage then is it not better to faithfully store it than to fail? > > I think disallowing Bigs in the serialization formats will present its > > own problems, only some of which you have touched on so far. I think > > the type system should offer /opportunities/ for greater efficiency in > > numeric handling, rather than serving as an excuse to limit numeric > > representations. > > > > I don't quite get the point here - the proposed cap is not something > that the type system needs. As an example MsgPack does not have standard > Big types, thus a serialization will need to be special and it is not > possible to just use something like "readInt" to get what you know > should be an integer value. The other example is PuppetDB, where a > decision has to be made how to store integers; the slower Big types, or > a more efficient 64 bit value? This is not just about storage, also > about indexing speed and query/comparisson - and if thinking that some > values are stored as 64 bits and other as big type for the same entity > that would be even slower to query for. > As I said already, I am not against the proposed caps. Rather, I am urging you to not categorically forbid serialization of Big values. Possibly you could allow serialization to some formats -- such as MsgPack -- to fail on Bigs, but it's not clear to me even in that case why failure/nothing is better than something. The issue is the data, not the format -- Puppet (currently) supports Big values, so if it needs to serialize values then it needs to serialize Bigs. As for PuppetDB in particular, you have storage, indexing, and comparison problems (or else fidelity problems) for any number that is not specifically typed Integer or Float. Number is not specific enough, even without Bigs, and Any certainly isn't. If PuppetDB is type-adaptive then Bigs shouldn't present any special problem. If it isn't, then it needs explicit typing (as Integer or Float) for efficiency anyway, so Bigs shouldn't present any special problem. > So - idea, make it safe and efficient for the normal cases. Only when > there is a special case (if indeed we do need the big types) then take > the less efficient route. > > Ok, but I don't see how making it an error to "store" a Big value serves that principle. Safe and efficient for storage of Integers and Floats allows use of native numeric formats; safe and efficient storage for Number or Any does not (even if storing a Big were an error). Numbers that can be represented only as Bigs will not be typed Integer or Float. If such numbers (or such formal types) constitute a special case then fine, but let there be a "less efficient route" (with full fidelity) for that. John -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/60117603-35dc-4160-bd97-600eeb5bad63%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.