Fwd: Re: RFC: Statistics in Subsurface

Willem Ferguson via subsurface Tue, 12 May 2020 12:44:18 -0700



-------- Forwarded Message --------
Subject:        Re: RFC: Statistics in Subsurface
Date:   Tue, 12 May 2020 21:41:08 +0200
From:   Willem Ferguson <[email protected]>
Reply-To:       [email protected]
Organization:   University of Pretoria
To:     Dirk Hohndel <[email protected]>



On 2020/05/12 20:49, Dirk Hohndel wrote:

Hi Willem,
Thanks for responding... I wish more people got involved into theseconversations. But usually topics like this get two or three of the300+ people here to respond. And then ten more will complain after wehave done the next release and they notice for the first time that weadded a feature...
On May 12, 2020, at 12:59 AM, Willem Ferguson<[email protected]<mailto:[email protected]>> wrote:
I understand Berthold's request with respect to temporal sequences.When developing such a temporal facility there is an importantcaveat. Emphatically, such temporal representations do not provideany clear *explanation* of anything: it is just a temporal pattern.For instance a decrease in SAC rate over time does not necessarilyimply any improvement in physiological ability but may reflectadoption of new equipment or change in dive sites. Any explanation ofa temporal trend is dependent on the understanding of the USER, noton the SOFTWARE. So, when dealing with temporal trends, one needs toconsider carefully the intended type of use of it. I think Bertholdis more concerned with continuous variables such as temperature, SAC,dive duration, depth, etc which could probably be reasonably easilyimplemented. To represent categorical variables such as tags, divemode, people and suit (one could even add dive site) is a totallydifferent issue requiring a totally different type of visualrepresentation.
I was in complete agreement until the very last sentence. I don'tunderstand why this 'per se' requires a "totally different type ofvisual representation".Let's say I am charting SAC over my criteria. Let's assume I'm usingbox and whiskers charts to easily show the quartiles. The values onx-axis have implications for the interpretation, of course, butwhether the x-axis is months of the year, the suit worn, the maximumdepth of the dive, the tags present on the dive (e.g., teaching diveor non-teaching dive) has absolutely no impact on how this should bevisually represented...
It would help, in this discussion, if one were to distinguish betweenthe filtering aspect and the statistics display aspect and state thatwith respect to the argument. In Dirk's artwork above, I am not surehow the constraints will be used. Are we talking of the filteringprocess or the stats display mechanism? Let's say "Suit" is aconstraint and two dates are provided. I am not sure what theexpected result of the operation would be. Ahh, the problems ofcommunication.
What I was trying to describe was a way to create criteria that can beused for columns in the visualization. You go through this filterprocess, name the result, and that name becomes one of the availablelabels over which you can chart the values.
Again, as I said before, I may simply be over-engineering this.
In general, in my opinion, the existing filter layout is a goodstarting point (I would add the variables of dive depth and diveduration because they are the two variables that fundamentally definea dive). As a filtering mechanism the current implementation isultra-flexible.
While I respect your opinion, let me politely state that personally Ibelieve that the current filter widget is a disaster and extremelyunintuitive to use. That's not a criticism of the original author, norof the people who have added to it - but yeah, that thing is a mess.
As far as UI for filter sets are concerned the minimum componentcount would include: Combobox of existing named filters within theset. Button: add current filter to filter set. These couldpotentially reside at the top right of the current filter panel. Butthere might be a need to give filter set a name as well. That wouldneed a text box.
Making the current widget more convoluted and more confusing was not adirection that I was envisioning us to go.
Maybe we need to rain in the crazy German and go back to somethingmuch more basic. Something like ten predefined sets of criteria. Andonly apply them to the filtered dive list.
So.
(1) per month
(2) per year
(3) per trip
(4) by max depth in 10m increments
(5) by duration in 10min increments
(6) by min temperature in 10F / 5C increments
(7) by type (for people who track more than SCUBA)
(8) by suit (that's likely a fairly small set for most people)
(9) by tags (that one I'm unclear about - would likely need some moreability to influence how this is drawn - but straight forward would beto draw them in pairs of two, left one represents with the tag, rightone without the tag)
(10) by people? (no idea how / why)
(11) by full text? (no idea how / why)

If we drop the last three this seems fairly obvious how to do.
Next comes the question of visualization. That might depend on thedata (so the columns of the yearly statistics). At first glance Ithought that box and whiskers charts might be useful, or moresimplified min / avg / max charts (so floating bar with a circle forthe average)
the 'candlesticks' plotMake an Avg-Max-Min Chart in Microsoft Excel

Are there any columns that couldn't be visualized with that?

/D

I am comfortable with your points of view, above. The 10m or 10minincrements could easily be configurable. For instance a person with OWcertification (dives to 18m only with almost all dives in the 10-18mrange) would probably want at most a 5m increment in depth. Unless Iunderstand you wrongly (again). Normally with statistical software (likeR) the default increment is determined by the (max-min) range of thedata as well as the number of data points being plotted. Of course Iwould not like an increment of 3.674 m of depth as might be the casewhen increment is automatically calculated by machine. My only point isthat a single fixed increment is possibly restrictive and it would helpif there were a simple rule to do some adjustment of the increment.

As far as specifying categories like tags I like the present UI whereone could specify a number of tags to be included in the filter, givinggreat flexibility. Again my impression of such a plot possibly differsfrom yours. I like your binary set idea (a set including compared to aset excluding). But I would more realistically often want to compare(e.g. SAC when comparing two tags "air" and "nitrox"), a use case whichdoes not necessarily imply a binary comparison because it could compare3 or 4 tags. Does this make sense at all?

Lastly, I do not like candlestick graphs because the application ineconometrics does not include the equivalent of a mean value. It ismeant to indicate the limits and sometimes direction of change within aspecific time period giving rise to the candle forming the central partof the graph. In my opinion a minimal box and whisker approach is morereadily interpretable.

I am very excited that this discussion is actually happening that that awindow of opportunity exists with people like Tomaz and Bertholdinterested in being involved.


Kind regards,

willem





--
This message and attachments are subject to a disclaimer.

Please refer to

http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf<http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf> forfulldetails.

_______________________________________________
subsurface mailing list
[email protected]
http://lists.subsurface-divelog.org/cgi-bin/mailman/listinfo/subsurface

Fwd: Re: RFC: Statistics in Subsurface

Reply via email to