To summarize a dataset of many values into a dataset of few values, use this
program.
summary =. [ 4 : 0 ,@] ^/ i.@>:@[
y=.x*(}.%{.)+/y
e=.1
for_i. 1+i.x do.
e=.e,~i%~+/(_1^i.i)*e*i{.y
end.
_1&x:|.->{:p. e
)
1 summary 1 2 2 3 3 NB. mean value
2.2
2 summary 1 2 2 3 3 NB. mean give or take std.dev
1.45167 2.94833
2 summary 2 summary 1 2 2 3 3 NB. it is idempotent
1.45167 2.94833
3 summary 1 2 2 3 3 NB. showing asymmetry
1.20749 2.37816 3.01435
2 summary 3 summary 1 2 2 3 3 NB. simpler data is intact
1.45167 2.94833
5 summary 1 2 2 3 3 NB. no reduction
1 2 2 3 3
NB. This variant takes a histogram as input
summary_histogram =. [ 4 : 0 ] * i.@#@,@] ^/ i.@>:@[
y=.x*(}.%{.)+/y
e=.1
for_i. 1+i.x do.
e=.e,~i%~+/(_1^i.i)*e*i{.y
end.
_1&x:|.->{:p. e
)
5 summary_histogram 0 1 2 2
1 2 2 3 3
Thanks.
Bo.
Den lørdag den 25. juli 2020 03.16.11 CEST skrev Skip Cave
<[email protected]>:
Raul,
I was thinking more along the lines of:
hsbc=:[:(]{"1~[:\:1{])~.,:#/.~
hsbc n
13 9 14 6 10 7
3 2 2 1 1 1
box hsbc n
┌──┬─┬──┬─┬──┬─┐
│13│9│14│6│10│7│
│3 │2│2 │1│1 │1│
└──┴─┴──┴─┴──┴─┘
How could the box verb be made?
Skip Cave
Cave Consulting LLC
On Fri, Jul 24, 2020 at 6:20 PM Raul Miller <[email protected]> wrote:
> Well... first off, there's a trivial conversion to tacit available here:
>
> 13 :'h{"1~\:1{h=.(~.,:#/.~)n=.y'
> [: (] {"1~ [: \: 1 { ]) ~. ,: #/.~
>
> And I guess that's good enough, so let's just name it:
>
> hsbc=: 13 :'h{"1~\:1{h=.(~.,:#/.~)n=.y'
> n=: 9 13 6 9 13 13 10 14 7 14
>
> As for vertical boxes, maybe something like this would be close enough
> to what you want?
>
> vbox=: ]each@{.,:'#' ,.@#each~ {:
> vbox hsbc n
> +--+-+--+-+--+-+
> |13|9|14|6|10|7|
> +--+-+--+-+--+-+
> |# |#|# |#|# |#|
> |# |#|# | | | |
> |# | | | | | |
> +--+-+--+-+--+-+
>
> Though personally, I might be more inclined towards horizontal boxes
> (partially because the proportional font I see used in email contexts
> messes up vertical boxes so badly, but also see my hbar suggestion)
>
> hbox=: ]each@{.,.'#' #each~ {:
> hbox hsbc n
>
> Or, getting rid of the boxes:
>
> hbar=: ":@,.@{. ,.' ' ,. '#' #every~ {:
> hbar hsbc n
> 13 ###
> 9 ##
> 14 ##
> 6 #
> 10 #
> 7 #
>
> (This sort of thing still looks better with a fixed width font, in my
> opinion.)
>
> Thanks,
>
>
> --
> Raul
>
> On Fri, Jul 24, 2020 at 6:45 PM Skip Cave <[email protected]> wrote:
> >
> > I find that a histogram of the data sorted by count, is useful in many
> > cases, in place of the mode:
> >
> > ]n=.?10#15
> >
> > 9 13 6 9 13 13 10 14 7 14
> >
> > h{"1~\:1{h=.(~.,:#/.~)n
> >
> > 13 9 14 6 10 7
> >
> > 3 2 2 1 1 1
> >
> >
> > I'm sure there are more concise ways to express this, and it would be
> nice
> > to have vertical boxes for each quantity, but this gets what I need done.
> >
> >
> > Skip
> >
> > Skip Cave
> > Cave Consulting LLC
> >
> >
> > On Fri, Jul 24, 2020 at 1:47 PM Raul Miller <[email protected]>
> wrote:
> >
> > >
> https://en.wikipedia.org/wiki/Mode_(statistics)#Uniqueness_and_definedness
> > >
> > > "Finally, as said before, the mode is not necessarily unique. Certain
> > > pathological distributions (for example, the Cantor distribution) have
> > > no defined mode at all."
> > >
> > > That said, just as we can redefine median to be the mean of the two
> > > median values when the length of the sequence is even, we could
> > > redefine mode as the median of the candidate mode values when there is
> > > more than one "most frequently occuring value".
> > >
> > > Thanks,
> > >
> > > --
> > > Raul
> > >
> > > On Fri, Jul 24, 2020 at 2:36 PM Devon McCormick <[email protected]>
> > > wrote:
> > > >
> > > > Hi - I've started reading "Fun Q" which is a book on machine learning
> > > using
> > > > the q language. Early on, the author points out that his "mode"
> > > function -
> > > > where "mode" is stats-talk for "the most frequent observation" - is
> > > > order-dependent.
> > > >
> > > > I checked my own "mode" and found that this is true of mine as well:
> > > > mode
> > > > ~. {~ [: (i. >./) #/.~
> > > > mode 1 2 2 3 3
> > > > 2
> > > > mode 1 3 3 2 2
> > > > 3
> > > >
> > > > This might be an ill-defined statistical concept but does anyone
> have any
> > > > insight based on practice? Is this order-dependence just a weakness
> of
> > > the
> > > > definition of "mode"?
> > > >
> > > > I could not find "mode" defined in any of the J standard libraries.
> > > >
> > > > Thanks,
> > > >
> > > > Devon
> > > >
> > > > --
> > > >
> > > > Devon McCormick, CFA
> > > >
> > > > Quantitative Consultant
> > > >
> ----------------------------------------------------------------------
> > > > For information about J forums see
> http://www.jsoftware.com/forums.htm
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm