-------- Forwarded Message --------
Subject:        Re: RFC: Statistics in Subsurface
Date:   Tue, 12 May 2020 21:41:08 +0200
From:   Willem Ferguson <[email protected]>
Reply-To:       [email protected]
Organization:   University of Pretoria
To:     Dirk Hohndel <[email protected]>



On 2020/05/12 20:49, Dirk Hohndel wrote:
Hi Willem,

Thanks for responding... I wish more people got involved into these conversations. But usually topics like this get two or three of the 300+ people here to respond. And then ten more will complain after we have done the next release and they notice for the first time that we added a feature...

On May 12, 2020, at 12:59 AM, Willem Ferguson <[email protected] <mailto:[email protected]>> wrote:

I understand Berthold's request with respect to temporal sequences. When developing such a temporal facility there is an important caveat. Emphatically, such temporal representations do not provide any clear *explanation* of anything: it is just a temporal pattern. For instance a decrease in SAC rate over time does not necessarily imply any improvement in physiological ability but may reflect adoption of new equipment or change in dive sites. Any explanation of a temporal trend is dependent on the understanding of the USER, not on the SOFTWARE. So, when dealing with temporal trends, one needs to consider carefully the intended type of use of it. I think Berthold is more concerned with continuous variables such as temperature, SAC, dive duration, depth, etc which could probably be reasonably easily implemented. To represent categorical variables such as tags, dive mode, people and suit (one could even add dive site) is a totally different issue requiring a totally different type of visual representation.


I was in complete agreement until the very last sentence. I don't understand why this 'per se' requires a "totally different type of visual representation". Let's say I am charting SAC over my criteria. Let's assume I'm using box and whiskers charts to easily show the quartiles. The values on x-axis have implications for the interpretation, of course, but whether the x-axis is months of the year, the suit worn, the maximum depth of the dive, the tags present on the dive (e.g., teaching dive or non-teaching dive) has absolutely no impact on how this should be visually represented...

It would help, in this discussion, if one were to distinguish between the filtering aspect and the statistics display aspect and state that with respect to the argument. In Dirk's artwork above, I am not sure how the constraints will be used. Are we talking of the filtering process or the stats display mechanism? Let's say "Suit" is a constraint and two dates are provided. I am not sure what the expected result of the operation would be. Ahh, the problems of communication.


What I was trying to describe was a way to create criteria that can be used for columns in the visualization. You go through this filter process, name the result, and that name becomes one of the available labels over which you can chart the values.
Again, as I said before, I may simply be over-engineering this.

In general, in my opinion, the existing filter layout is a good starting point (I would add the variables of dive depth and dive duration because they are the two variables that fundamentally define a dive). As a filtering mechanism the current implementation is ultra-flexible.


While I respect your opinion, let me politely state that personally I believe that the current filter widget is a disaster and extremely unintuitive to use. That's not a criticism of the original author, nor of the people who have added to it - but yeah, that thing is a mess.

As far as UI for filter sets are concerned the minimum component count would include: Combobox of existing named filters within the set. Button: add current filter to filter set. These could potentially reside at the top right of the current filter panel. But there might be a need to give filter set a name as well. That would need a text box.


Making the current widget more convoluted and more confusing was not a direction that I was envisioning us to go.


Maybe we need to rain in the crazy German and go back to something much more basic. Something like ten predefined sets of criteria. And only apply them to the filtered dive list.

So.
(1) per month
(2) per year
(3) per trip
(4) by max depth in 10m increments
(5) by duration in 10min increments
(6) by min temperature in 10F / 5C increments
(7) by type (for people who track more than SCUBA)
(8) by suit (that's likely a fairly small set for most people)
(9) by tags (that one I'm unclear about - would likely need some more ability to influence how this is drawn - but straight forward would be to draw them in pairs of two, left one represents with the tag, right one without the tag)
(10) by people? (no idea how / why)
(11) by full text? (no idea how / why)

If we drop the last three this seems fairly obvious how to do.

Next comes the question of visualization. That might depend on the data (so the columns of the yearly statistics). At first glance I thought that box and whiskers charts might be useful, or more simplified min / avg / max charts (so floating bar with a circle for the average)


the 'candlesticks' plotMake an Avg-Max-Min Chart in Microsoft Excel

Are there any columns that couldn't be visualized with that?

/D


I am comfortable with your points of view, above. The 10m or 10min increments could easily be configurable. For instance a person with OW certification (dives to 18m only with almost all dives in the 10-18m range) would probably want at most  a 5m increment in depth. Unless I understand you wrongly (again). Normally with statistical software (like R) the default increment is determined by the (max-min) range of the data as well as the number of data points being plotted. Of course I would not like an increment of 3.674 m of depth as might be the case when increment is automatically calculated by machine. My only point is that a single fixed increment is possibly restrictive and it would help if there were a simple rule to do some adjustment of the increment.

As far as specifying categories like tags I like the present UI where one could specify a number of tags to be included in the filter, giving great flexibility. Again my impression of such a plot possibly differs from yours. I like your binary set idea (a set including compared to a set excluding). But I would more realistically often want to compare (e.g. SAC when comparing two tags "air" and "nitrox"), a use case which does not necessarily imply a binary comparison because it could compare 3 or 4 tags. Does this make sense at all?

Lastly, I do not like candlestick graphs because the application in econometrics does not include the equivalent of a mean value. It is meant to indicate the limits and sometimes direction of change within a specific time period giving rise to the candle forming the central part of the graph. In my opinion a minimal box and whisker approach is more readily interpretable.

I am very excited that this discussion is actually happening that that a window of opportunity exists with people like Tomaz and Berthold interested in being involved.

Kind regards,

willem





--
This message and attachments are subject to a disclaimer.

Please refer to 
http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf <http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf> for full details.
_______________________________________________
subsurface mailing list
[email protected]
http://lists.subsurface-divelog.org/cgi-bin/mailman/listinfo/subsurface

Reply via email to