[Puppet-dev] Re: A question about numbers and representation

John Bollinger Wed, 03 Sep 2014 15:35:54 -0700


On Wednesday, September 3, 2014 12:40:45 PM UTC-5, henrik lindberg wrote:
>
> On 2014-02-09 22:23, John Bollinger wrote: 
> > 
> > 
> > On Monday, September 1, 2014 3:55:03 AM UTC-5, henrik lindberg wrote: 
> > 
> >     Hi, 
> >     Recently I have been looking into serialization of various kinds, 
> and 
> >     the issue of how we represent and serialize/deserialize numbers have 
> >     come up. 
> > 
> > 
> > [...] 
> > 
> > 
> >     Proposal 
> >     ======== 
> >     I would like to cap a Puppet Integer to be a 64 signed value when 
> used 
> >     as a resource attribute, or anywhere in external formats. This means 
> a 
> >     value range of -2^63 to 2^63-1 which is in Exabyte range (1 exabyte 
> >     = 2^60). 
> > 
> >     I would like to cap a Puppet Float to be a 64 bit (IEEE 754 
> binary64) 
> >     when used as a resource attribute or anywhere in external formats. 
> > 
> >     With respect to intermediate results, I propose that we specify that 
> >     values are of arbitrary size and that it is an error to store a 
> value 
> > 
> > 
> > 
> > What, specifically, does it mean to "store a value"?  Does that mean to 
> > assign it to a resource attribute? 
>
> It was vague on purpose since I cannot currently enumerate the places 
> where this should take place, but I was thinking resource attributes at 
> least. 
>
>


Surely there is a medium between "vague" and "enumerating all 
possibilities".  Or in the alternative, a minimum set of places where Big 
values must be allowed could be given.  Otherwise the proposal is 
insufficiently defined to reason about, much less implement.

 

> > 
> >     that is to big for the typed representation Integer (64 bit signed). 
> >     For 
> >     Float (64 bit) representation there is no error, but it looses 
> >     precision. 
> > 
> > 
> > 
> > What about numbers that overflow or underflow a 64-bit Float? 
> > 
>
>  

> That would also be an error (when it cannot lose more precision). 



IEEE floating-point underflow occurs not when a number cannot lose more 
precision, but rather when it is nonzero but so small that it does not have 
a normalized representation in the chosen floating-point format.  Among 
IEEE 64-bit doubles, these are nonzero numbers having absolute value less 
than 2^-1022.  Almost all such subnormal representations *can* lose more 
precision in the sense that there are even less precise subnormals, but 
they already have less precision than is usual for the format.



> >     When specifying an attribute to have Number type, automatic 
> >     conversion to Float (with loss of precision) takes place



This happens only when the value is "stored", I presume?

 

> if an internal 
> >     integer number is to big for the Integer representation. 
> > 
> >     (Note, by default, attributes are typed as Any, which means that 
> >     they by 
> >     default would store a Float if the integer value representation 
> >     overflows). 
> > 
> > 
> > 
> > And if BigDecimal (and maybe BigInteger) were added to the type system, 
> > then I presume the expectation would be that over/underflowing Floats 
> > would go there?  And maybe that overflowing integers would go there if 
> > necessary to avoid loss of precision? 
> > 
>
> If we add them, then the runtime should be specified to gracefully 
> choose the required size while calculating



I thought the whole reason for the proposal and discussion was that Ruby 
already does handle these gracefully, hence Puppet already has Big values.

 

> and that the types Any and 
> Number means that they are accepted, but that Integer and Float does not 
> accept them (when they have values that are outside the valid range). (I 
> have not thought this through completely at this point I must say). 
>


Clarification: I have no objection to limiting the values allowed for types 
Integer and Float, as specified in the proposal.  What I am concerned about 
is Puppet pulling back from supporting the full range of numeric values it 
supports now (near-arbitrary range and precision).
 

> 
> > 1) If you have BigDecimal then you don't need BigInteger. 
> > 
> True, but BigInteger specifies that a fraction is not allowed. 
>


Supposing that I persuaded you that the type system should include a 
BigDecimal type in some form, I would be completely satisfied to leave it 
to you to decide whether it should also include a BigInteger type.

 

>
> > 2) Why would allowing one or both of the Bigs prevent Number from being 
> > allowed as a serializable type? 
> > 
> Not sure I said that. The problem is that if something is potentially 
> Big... then a database must be prepared to deal with it and it has a 
> high cost.



*Every* Puppet value is potentially a Big *now*.  What new cost is 
involved?  I'm having trouble seeing how a database can deal efficiently 
with Puppet's current implicit typing anyway, Big values notwithstanding.  
Without additional type information, it must be prepared for any given 
value to be a boolean, an Integer, a float, or a 37kb string (among other 
possibilities).  Why do Big values present an especial problem in that 
regard?

 

> since everything is 
> basically untyped now (which we translate to the type Any), this means 
> that PuppetDB must be changed to use BigDecimal instead of integer 64 
> and float. That is a loose^3; it is lots of work to implement, bad 
> performance, and everyone needs to type everything. 
>
>

Well, *some*one needs to type everything, somehow.  Typing is an inherent 
aspect of any representation of any value.  Indeed, it is loose to call 
Puppet values "untyped"; they are definitely typed (try 
inline_template('<%= type($my_variable) >') some time), but the type is not 
necessarily known *a priori*.  It is also loose to call Puppet 3 
expressions "untyped" -- it is more precise to say that expressions, 
including variable dereferences, are implicitly typed.

But yes, for efficient numeric storage representations to be used, the 
types of the values to be stored must be among those for which an efficient 
representation is available.  MOREOVER, unless the storage mechanism is 
prepared to adapt dynamically to the types of the values presented to it, 
the specific types of those values must be known in advance, and they must 
be consistent.  In that sense everyone *does* need to type everything, 
regardless of whether any Big types are among the possibilities.

If you do suppose a type-adaptive storage mechanism (so that people don't 
need to type everything) then the mere possibility of Big values does not 
impose any additional inefficiency.  The actual appearance of Big values 
might be costly, but if such a value is in fact presented for storage then 
is it not better to faithfully store it than to fail?

 

> > I think disallowing Bigs in the serialization formats will present its 
> > own problems, only some of which you have touched on so far.  I think 
> > the type system should offer /opportunities/ for greater efficiency in 
> > numeric handling, rather than serving as an excuse to limit numeric 
> > representations. 
> > 
>
> I don't quite get the point here - the proposed cap is not something 
> that the type system needs. As an example MsgPack does not have standard 
> Big types, thus a serialization will need to be special and it is not 
> possible to just use something like "readInt" to get what you know 
> should be an integer value. The other example is PuppetDB, where a 
> decision has to be made how to store integers; the slower Big types, or 
> a more efficient 64 bit value? This is not just about storage, also 
> about indexing speed and query/comparisson - and if thinking that some 
> values are stored as 64 bits and other as big type for the same entity 
> that would be even slower to query for. 
>


As I said already, I am not against the proposed caps.  Rather, I am urging 
you to not categorically forbid serialization of Big values.

Possibly you could allow serialization to some formats -- such as MsgPack 
-- to fail on Bigs, but it's not clear to me even in that case why 
failure/nothing is better than something.  The issue is the data, not the 
format -- Puppet (currently) supports Big values, so if it needs to 
serialize values then it needs to serialize Bigs.

As for PuppetDB in particular, you have storage, indexing, and comparison 
problems (or else fidelity problems) for any number that is not 
specifically typed Integer or Float.  Number is not specific enough, even 
without Bigs, and Any certainly isn't.  If PuppetDB is type-adaptive then 
Bigs shouldn't present any special problem.  If it isn't, then it needs 
explicit typing (as Integer or Float) for efficiency anyway, so Bigs 
shouldn't present any special problem.



> So - idea, make it safe and efficient for the normal cases. Only when 
> there is a special case (if indeed we do need the big types) then take 
> the less efficient route. 
>
>

Ok, but I don't see how making it an error to "store" a Big value serves 
that principle.  Safe and efficient for storage of Integers and Floats 
allows use of native numeric formats; safe and efficient storage for Number 
or Any does not (even if storing a Big were an error).  Numbers that can be 
represented only as Bigs will not be typed Integer or Float.  If such 
numbers (or such formal types) constitute a special case then fine, but let 
there be a "less efficient route" (with full fidelity) for that.


John

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/60117603-35dc-4160-bd97-600eeb5bad63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[Puppet-dev] Re: A question about numbers and representation

Reply via email to