With regards to data spanning more than 4 orders of magnitude, this is dealt 
with by having this value user selectable.

With regards to setting q to include most data, using a q at all is user 
selectable.  If you turn it off, Rescale to Data Range would just set min and 
max to include all data.  If data included 0, which is illegal, you have to do 
something.  Setting the min to be q orders of magnitude smaller than max is 
better than setting it to 1!  Then, if that wasn't what you wanted, you could 
always reset it manually.  My goal is to create a smarter way of automatically 
dealing with min and max for log scaling.

With regards to data that is an "invalid value", this is an orthogonal problem. 
 Dealing with invalid data is already dealt with (i.e., nans), and maybe we 
should expand this to include huge positive and negative numbers (1e38 and 
-1e38).  I believe it is still outside of this discussion.

Regarding negative values, my bad - I was not clear.  A vertical normal log 
scale bar, going from a 1e8 to 1e4 may have minor labels (working down) of 1e8, 
1e7, 1e6, 1e5, and 1e4.  I.e., each tick towards the top representing a change 
of 10 million units, each tick towards the top representing a change of a 
thousand units.  Now, for the negative case, this would be reversed.  The color 
bar remains colored as before (for default color map, red on top, blue on 
bottom).  Since more positive numbers are always on top, we would run as 
follows (working down the minor labels): -1e4, -1e5, -1e6, -1e7 and -1e8.  
Units change by the thousands at the top of the color legend, and millions 
towards the bottom.

Once again, you could always manually set your min and max.  But, I am arguing 
that what we currently do - set min to 1, is arbitrary, crude and wrong.

Alan



-----Original Message-----
From: David Thompson [mailto:[email protected]] 
Sent: Tuesday, July 01, 2014 3:44 PM
To: Scott, W Alan
Cc: [email protected]
Subject: [EXTERNAL] Re: [Paraview] Color legend and log scaling

Hi Alan,

> I would like to propose a few changes in our log scaling algorithm for 
> painting a dataset by a variable.   I discussed this with Utkarsh, and he 
> asked that I bounce it off the e-mail list.  So, here goes.
>  
> Currently, when a user log scales a variable, if all data is positive, 
> ParaView just uses the normal min and max.  There are times when this is not 
> proper - for instance when looking at the temperature or density of material 
> in a supernova, or velocity of outbound gas.  Another example is large data, 
> with noise around zero.  I would like to propose that we have a user 
> selectable option to set the minimum at maximum*10^-q, where q is user 
> defined but defaults to 4.  In other words, the minimum would be set to 
> 1*10^-4 of what the maximum is.

Avoiding a "window" around 0 in the initial view sounds good to me. However, I 
can imagine some cases where the data spans more than 4 orders of magnitude. 
One thing I've seen (debatably bad, but something ParaView must deal with) are 
simulations/datasets that use large numbers (i.e., 1e+38) to mark invalid 
values. (The LDAV climate data does this.) Showing a plot with the initial view 
set to [1e34,1e38] would not be useful, since it would only show invalid 
values. Another is chemical reaction simulations where concentrations span much 
more than 4 orders of magnitude (I've seen some span 11 or 12 orders of 
magnitude, but 5 or 6 can be common).

What choosing q to ensure that a significant fraction (say 90%?) of the data is 
actually on-screen? It not terribly hard (even in parallel) to extract a 
fixed-size sample that approximates a histogram to within a few percent. We 
could use that to determine where the bulk of the data resided and ensure that 
the q-value does not leave more than 10% offscreen.

> If all of the user's data is negative, ParaView grumbles, and then seg faults 
> using a current master git pull - not optimal behavior.  In PV 4.1, it just 
> sets min and max to 0.  I would like to propose that ParaView calculate the 
> log of the data, as follows:  Index= -(log(abs(Var))).  Then, just draw the 
> color legend as normal - for instance, red at top, white in the middle and 
> blue at bottom.  Tick marks will be the reverse of positive log scaling - 
> with the dense numbers, more negative numbers at the bottom and less dense, 
> less negative numbers at the top.

I'm not sure I understand this, especially your use of "dense". It sounds like 
you have a particular dataset in mind where the probability density is low near 
zero. Are you saying you want the color scale to be different in the case of 
data that is all negative numbers? Or that log plots in general should have 
colors reversed?

> The problem arises with data that spans positive and negative numbers.  Since 
> the log of 0 is infinity, we have to deal with very small numbers in a 
> special way.  I propose that we find maxVal = max(maximum, abs(minimum)).  
> Then, we set the color bar to run from maxVal to -maxVal.  We log scale the 
> top half of the color legend, running from maxVal to maxVal*10^-4, and we 
> reverse log scale the bottom half of the color legend, running from 
> -maxVal*10^-4 to -maxVal.  We calculate this negative range the same as the 
> all negative data section above.  All data between maxVal*10^-4 and 
> -maxVal*10^-4 would remain white by default, or user selectable black.

I like the idea of having a custom color for the range [-10^q,10^q] and I think 
I understand and agree with the simplified bounds/ticks for the case where the 
data crosses zero -- at least when it firmly crosses the origin. However, whose 
chemistry simulations I mentioned above would occasionally have very slightly 
negative concentrations. Physically they can never be negative but 
floating-point precision meant that some were. In this case, we might have a 
distribution of concentrations that went from -1e-16 to 1 with the bulk in the 
range [1e-10,1]. Using the algorithm above for the axes would assign half of 
the color palette to the range [-1,-1e-4], half to [1e-4,1] and draw a 
significant fraction of the data as "black". If instead the histogram was used 
to choose q, then we might be able to decide that all of the "naughty" data 
were outliers and select a log color-scale of [1e-10,1].

        David

> User selectable functionality would be as follows:
>       * to allow/ not allow negative numbers (default allow)
>       * to be able to change the q exponent (i.e., 4 above) (default 4)
>       * to be able to change the painting color that is too small (default 
> white)
>       * to clamp minimum to some number (such as q == 4 above).  (default on).
>  
> Thoughts?
>  
> Alan
>  
>  
> _______________________________________________
> Powered by www.kitware.com
> 
> Visit other Kitware open-source projects 
> athttp://www.kitware.com/opensource/opensource.html
> 
> Please keep messages on-topic and check the ParaView Wiki 
> at:http://paraview.org/Wiki/ParaView
> 
> Follow this link to subscribe/unsubscribe:
> http://public.kitware.com/mailman/listinfo/paraview

_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/paraview

Reply via email to