> On May 14, 2020, at 12:21 AM, Willem Ferguson via subsurface 
> <[email protected]> wrote:
> I must admit that I do not like any of these three representations. They are 
> inappropriate and inaccurate, leading to misinterpretation.
> 
> The top graph is normally used to indicate trends in three *independent* 
> variables that may or may not be correlated. In the dive the data represent a 
> *single* variable with its min and max values.
> 
> The middle graph is a histogram that would normally also represent three 
> *independent* variables that have been sampled on the same x-axis scale. 
> Again, in the dive case the min and max values represent the *same* variable.
> 
> The bottom graph is normally used to indicate the proportion of a total that 
> is formed by a specific component. In the case of this specific graph, the 
> median would be indicated by the height of the orange bar (i.e. vertical 
> distance between the grey-orange border and the orange/blue border). The max 
> would be indicated by the height of the blue part of the graph, etc. Clearly 
> this is not what is meant.
> 

I agree that the middle and bottom option aren't adequate for the purpose.

> I want to make a call that, if we are dealing with representing statistics, 
> we actually use the proper statistics representations that we are all used 
> to. Most likely that is either some variant of a box and whiskers diagram or 
> a vertical bar chart with error bars. If these diagrams have been shown once 
> to an uninformed person, the interpretation will always be easy. Lets use 
> diagrams for what they are meant to convey and not use a sports car to drive 
> offroad. We do not want any statistics related to Subsurface to be presented 
> in an unprofessional and inappropriate way.
> 

I think we have a couple of choices here. Build the right tool for the 
statistics professional. Or build something that helps make the statistics 
accessible to most of our users.
The more I think about these options, the more I think that the statistics 
professional is best served by using R and creating the views that they are 
looking for - because this will become a never ending "bring me another rock 
because I want to see things THIS way".

So box and whiskers are out, because the vast majority of our audience has a 
hard time understanding the difference between a mean and a median, and between 
naive gas pressure calculations and actually accurate math (I get at least two 
emails a month stating that our SAC rates are wrong).

Now as for which specific graph to use and which one is easier for users 
WITHOUT A BACKGROUND IN STATISTICS to grasp, I am certainly open to more input 
here. Ideally input that is based on actual feedback from such users or 
presentations about data accessibility and visualization. I found the video 
that Pedro shared rather compelling (especially if played at 1.25x speed 
because the presenter is taking his time). Which is why I am leaning towards a 
line graph, but I certainly could see floating bars with a marker for the mean.

> As far as the horizontal graphs are concerned, they have a place, but we need 
> to understand where they come from, and that is from the old days when we 
> tried to print graphs on a mainframe line printer that could not print 
> characters vertically. The conventional way to represent histograms or bar 
> charts is in the vertical way *unless there is good reason to do otherwise*. 
> These days there is no problem in printing labels vertically. To have a 
> horizontal bar graph with depth measurements along the vertical axis is just 
> totally unorthodox and not up to modern standards.
> 

Willem, those are some very strong statements that initially provoked a rather 
negative reaction in me. Calling someone else's proposal "not up to modern 
standards" feels borderline insulting.
As a matter of fact, yes we can show vertical labels. They are also a complete 
pain to read. I would argue that the readability of a horizontal chart is 
actually much better than the vertical one that you so strongly argue for.
I did a quick survey of some of the other dive logs that have screen shots of 
their statistics pages up on their web sites. And they seem to be about equally 
split between the two different approaches.

To me in the end this doesn't really matter. I don't think I'd ever use this 
other than to test that it works. Which is true for two thirds, actually, more 
likely 80% of the features in Subsurface.
What I do care about is that we continue to build something that stays 
maintainable, stays usable, and serves the need of a broad user base. That's 
why I refuse the frequent attempts to turn Subsurface into an asset management 
tool. And that's why I will gently push back to attempts to turn Subsurface 
into tool for statisticians. There are great tools for those purposes. Use them.

/D

_______________________________________________
subsurface mailing list
[email protected]
http://lists.subsurface-divelog.org/cgi-bin/mailman/listinfo/subsurface

Reply via email to