-------- Forwarded Message --------
Subject: Re: RFC: Statistics in Subsurface
Date: Tue, 12 May 2020 21:41:08 +0200
From: Willem Ferguson <[email protected]>
Reply-To: [email protected]
Organization: University of Pretoria
To: Dirk Hohndel <[email protected]>
On 2020/05/12 20:49, Dirk Hohndel wrote:
Hi Willem,
Thanks for responding... I wish more people got involved into these
conversations. But usually topics like this get two or three of the
300+ people here to respond. And then ten more will complain after we
have done the next release and they notice for the first time that we
added a feature...
On May 12, 2020, at 12:59 AM, Willem Ferguson
<[email protected]
<mailto:[email protected]>> wrote:
I understand Berthold's request with respect to temporal sequences.
When developing such a temporal facility there is an important
caveat. Emphatically, such temporal representations do not provide
any clear *explanation* of anything: it is just a temporal pattern.
For instance a decrease in SAC rate over time does not necessarily
imply any improvement in physiological ability but may reflect
adoption of new equipment or change in dive sites. Any explanation of
a temporal trend is dependent on the understanding of the USER, not
on the SOFTWARE. So, when dealing with temporal trends, one needs to
consider carefully the intended type of use of it. I think Berthold
is more concerned with continuous variables such as temperature, SAC,
dive duration, depth, etc which could probably be reasonably easily
implemented. To represent categorical variables such as tags, dive
mode, people and suit (one could even add dive site) is a totally
different issue requiring a totally different type of visual
representation.
I was in complete agreement until the very last sentence. I don't
understand why this 'per se' requires a "totally different type of
visual representation".
Let's say I am charting SAC over my criteria. Let's assume I'm using
box and whiskers charts to easily show the quartiles. The values on
x-axis have implications for the interpretation, of course, but
whether the x-axis is months of the year, the suit worn, the maximum
depth of the dive, the tags present on the dive (e.g., teaching dive
or non-teaching dive) has absolutely no impact on how this should be
visually represented...
It would help, in this discussion, if one were to distinguish between
the filtering aspect and the statistics display aspect and state that
with respect to the argument. In Dirk's artwork above, I am not sure
how the constraints will be used. Are we talking of the filtering
process or the stats display mechanism? Let's say "Suit" is a
constraint and two dates are provided. I am not sure what the
expected result of the operation would be. Ahh, the problems of
communication.
What I was trying to describe was a way to create criteria that can be
used for columns in the visualization. You go through this filter
process, name the result, and that name becomes one of the available
labels over which you can chart the values.
Again, as I said before, I may simply be over-engineering this.
In general, in my opinion, the existing filter layout is a good
starting point (I would add the variables of dive depth and dive
duration because they are the two variables that fundamentally define
a dive). As a filtering mechanism the current implementation is
ultra-flexible.
While I respect your opinion, let me politely state that personally I
believe that the current filter widget is a disaster and extremely
unintuitive to use. That's not a criticism of the original author, nor
of the people who have added to it - but yeah, that thing is a mess.
As far as UI for filter sets are concerned the minimum component
count would include: Combobox of existing named filters within the
set. Button: add current filter to filter set. These could
potentially reside at the top right of the current filter panel. But
there might be a need to give filter set a name as well. That would
need a text box.
Making the current widget more convoluted and more confusing was not a
direction that I was envisioning us to go.
Maybe we need to rain in the crazy German and go back to something
much more basic. Something like ten predefined sets of criteria. And
only apply them to the filtered dive list.
So.
(1) per month
(2) per year
(3) per trip
(4) by max depth in 10m increments
(5) by duration in 10min increments
(6) by min temperature in 10F / 5C increments
(7) by type (for people who track more than SCUBA)
(8) by suit (that's likely a fairly small set for most people)
(9) by tags (that one I'm unclear about - would likely need some more
ability to influence how this is drawn - but straight forward would be
to draw them in pairs of two, left one represents with the tag, right
one without the tag)
(10) by people? (no idea how / why)
(11) by full text? (no idea how / why)
If we drop the last three this seems fairly obvious how to do.
Next comes the question of visualization. That might depend on the
data (so the columns of the yearly statistics). At first glance I
thought that box and whiskers charts might be useful, or more
simplified min / avg / max charts (so floating bar with a circle for
the average)
the 'candlesticks' plotMake an Avg-Max-Min Chart in Microsoft Excel
Are there any columns that couldn't be visualized with that?
/D
I am comfortable with your points of view, above. The 10m or 10min
increments could easily be configurable. For instance a person with OW
certification (dives to 18m only with almost all dives in the 10-18m
range) would probably want at most a 5m increment in depth. Unless I
understand you wrongly (again). Normally with statistical software (like
R) the default increment is determined by the (max-min) range of the
data as well as the number of data points being plotted. Of course I
would not like an increment of 3.674 m of depth as might be the case
when increment is automatically calculated by machine. My only point is
that a single fixed increment is possibly restrictive and it would help
if there were a simple rule to do some adjustment of the increment.
As far as specifying categories like tags I like the present UI where
one could specify a number of tags to be included in the filter, giving
great flexibility. Again my impression of such a plot possibly differs
from yours. I like your binary set idea (a set including compared to a
set excluding). But I would more realistically often want to compare
(e.g. SAC when comparing two tags "air" and "nitrox"), a use case which
does not necessarily imply a binary comparison because it could compare
3 or 4 tags. Does this make sense at all?
Lastly, I do not like candlestick graphs because the application in
econometrics does not include the equivalent of a mean value. It is
meant to indicate the limits and sometimes direction of change within a
specific time period giving rise to the candle forming the central part
of the graph. In my opinion a minimal box and whisker approach is more
readily interpretable.
I am very excited that this discussion is actually happening that that a
window of opportunity exists with people like Tomaz and Berthold
interested in being involved.
Kind regards,
willem
--
This message and attachments are subject to a disclaimer.
Please refer to
http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf
<http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf> for
full
details.
_______________________________________________
subsurface mailing list
[email protected]
http://lists.subsurface-divelog.org/cgi-bin/mailman/listinfo/subsurface