On 2014-04-07 14:30-0400 Hazen Babcock wrote:

> On 4/6/2014 10:16 PM, Alan W. Irwin wrote:
>> On 2014-04-06 17:35-0400 Hazen Babcock wrote:
>>> In the process of converting some of the PLplot demos to Lisp as part of
>>> the cl-plplot project I found that example 29 triggers a floating point
>>> exception. This is not something that a C compiler will normally report,
>>> but you can make it (or at least gcc) more strict about this. I believe
>>> that the problem occurs in the plP_wcpcx() function in src/plcvt.c,
>>> where we attempt to convert -3.917103e+15 to an integer resulting in an
>>> overflow error.
>> 
>> Obviously, the attempted conversion should generate an integer
>> overflow, but I frankly don't understand why it generates a
>> _floating-point_ exception since clearly -3.917103e+15 is a valid
>> floating-point number.  Do you have some mental model for why there
>> was a generated floating-point exception in this case?
>
> Sorry, I did not explain that very well. The exception that is triggered is 
> FE_INVALID, which I think in this case is caused by the floating point number 
> being too large to convert to an integer.

Interesting.  I am surprised such conversion would cause such an
exception, but I indeed verify (with your modified x29c.c) that
exception occurs on my platform as well, and I am glad it is possible
(at least in this case) to detect integer overflows this way.  More
importantly this verification puts me in position to follow up on this
bug which I will do since the time transformations used in example 29
are largely my responsibility.

>
>> That question is just to satisfy my curiosity about how you found the
>> issue, and the much more important question is why in the world
>> numbers like -3.917103e+15 are being generated by example 29?
>> 
>> One possibility is some variable is unintialized so -3.917103e+15
>> is just random numerical garbage, but I checked that possibility
>> with valgrind and got the following absolutely clean result:
>> 
>> ==19527== Memcheck, a memory error detector
>> ==19527== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et
>> al.
>> ==19527== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright
>> info
>> ==19527== Command: examples/c/x29c -dev psc -o test.psc
>> ==19527== ==19527== ==19527== HEAP SUMMARY:
>> ==19527==     in use at exit: 0 bytes in 0 blocks
>> ==19527==   total heap usage: 2,481 allocs, 2,481 frees, 351,494 bytes
>> allocated
>> ==19527== ==19527== All heap blocks were freed -- no leaks are possible
>> ==19527== ==19527== For counts of detected and suppressed errors, rerun
>> with: -v
>> ==19527== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
>> 
>> plcvt.c contains code for converting between various plot coordinate
>> systems.  Please indicate what device you are using since that
>> should affect some of those transformations.
>
> I tested the xwin, xcairo and qtwidget drivers. They all give slightly 
> different numbers but they are all around -4.0e15.
>
>> plP_wcpcx converts world coordinates to physical device
>> coordinate using the PLINT value of the following transformation
>> 
>> plsc->wpxoff + plsc->wpxscl * x
>
> The exception is triggered by the value of plsc->wpxoff, which is ~4.0e15. 
> However plsc->wpxscl is also suspiciously large at ~2.0e12, but it gets 
> multiplied by x which is 0.0 in the call that causes the exception.
>
>> where all those values being combined together are PLFLT.
>> 
>> It is hard to figure out how an integer physical device coordinate (e.g.,
>> pixels) could correspond to -3.917103e+15 so despite the good valgrind
>> result I am still wondering whether you are dealing with numerical
>> garbage of some kind.
>
> It looks that way, but I'm not having much luck figuring out where this 
> garbage is coming from. The crash occurs on the call to plbox() when creating 
> page 5, which is the first page of this example that just shows a single 
> step.

Time transformations such as occur for example 29 are notorious for
significance loss issues for badly chosen epochs. For example, if the
epoch chosen is the zero of Julian dates, then the number of seconds
since that epoch is roughly 2.e11.  I am pretty sure the default epoch
chosen for the PLplot time transformations is considerably better than
that bad choice, but I may have missed something when I set that up so
I will take a look.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to