Ralph, et al. -
in the example I gave, 6 6 7 7 and 0 2 each ended up in their own boxes
because they are the values next to each other in that particular sequence.
The 0 2 box is smaller than average because the sequence of 4s had to
be kept together because these values are exactly the same; the fact that
the
4s are in a separate box from 0 2 is a consequence of how I ensure that
equal values remain together: I use dyadic i. to move the initial breakpoint
to the start of the set of equal values. I could do this some other way but
the
method I chose was the simplest I could think of.
In the context of the larger problem, maybe it doesn't really matter that
equal
values end up in adjacent partitions.
The larger problem of which this is a part: I'm analyzing a lot of (usually
non-integer)
numbers by grouping together similar values and looking for characteristics
that
apply to the groups.
For instance, I might have P/E ratios for S&P 500 stocks. To test the
hypothesis
that low-PE stocks do better than high-PE stocks, and to quantify the
difference
in performance, I group the PEs into, say, deciles. I then look at the
total returns of
the deciles to see if there's any pattern relating to the lower versus the
higher deciles.
The grouping into deciles has a number of advantages. For one, it's easier
to work
with 10 numbers than with 500. Also, deciles are relative rankings at any
point in time.
This is helpful because my study will cover periods when PEs are higher than
average and
times when they're lower than average but deciles, in a sense, normalize
these historical
fluctuations.
The use of deciles is arbitrary which is why I have a left argument: if I
find a relation that
works with deciles, I can test its robustness by seeing if it holds up when
I look at quintiles
or 11-iles or 9-iles.
I'm vaguely dis-satisfied with the functions I included in my original
message because the
two of them substantially duplicate each other and somehow seem to be doing
too much
work. Looking at them again, maybe they're not so bad - perhaps I should
assign "grd{y"
instead of using it three times, and, in the second-to-last line of
"ncileix", there's other
ways to count the number of elements per partition than
"#&>ptn<;.1]1$~#ptn".
I must be taking J too much for granted when I think that five- and six-line
functions are
too long. It's just that sometimes, if someone else has looked at a similar
problem, they've
come up with a completely different way to do it that can be illuminating.
Thanks for the responses,
Devon
On 5/28/07, Ralph G Selfridge <[EMAIL PROTECTED]> wrote:
I do have one question. How you decide not to split 6 6 7 7 into 2 boxes?
Look at 0 2. Thus I wonder about 'well-defined' (other than by your
solution.
Ralph Selfridge
On Sun, 27 May 2007, Devon McCormick wrote:
> Members of the Forum -
>
> I have this function:
>
> NB.* ncile: break vector into x equally-sized pieces based on ascending
> values.
> ncile=: 4 : 0"(0 1)
> grd=. /:y
> brkptixs=. roundNums (>:i.<:x)*x%~#y NB. Internal breakpoints only
> brkptixs=. (grd{y) i. brkptixs{grd{y NB. Adjust for breaks across
> =values.
> ptn=. (1) (0,brkptixs)}0$~#y
> ptn<;.1 grd{y
> )
>
> This is supposed to break up a vector into (roughly) equal-sized pieces,
> for example:
>
> ]rnn=. ?12$10
> 7 4 0 2 7 6 8 4 6 9 8 4
> 4 ncile rnn
> +---+-----+-------+-----+
> |0 2|4 4 4|6 6 7 7|8 8 9|
> +---+-----+-------+-----+
>
> The pieces are not equal-sized because it's more important that the
> divisions
> be non-overlapping, so all three "4"s have to be in the same group.
>
> The companion to this returns a list of partition indexes:
>
> NB.* ncileix: index vector by ncile into which it falls.
> ncileix=: 4 : 0"(0 1)
> grd=. /:y
> brkptixs=. roundNums (>:i.<:x)*x%~#y NB. Internal breakpoints only
> brkptixs=. (grd{y) i. brkptixs{grd{y NB. Adjust for breaks across
> =values.
> ptn=. (1) (0,brkptixs)}0$~#y
> ptnix=. (#&>ptn<;.1]1$~#ptn)#i.+/ptn NB. Partition indexes/graded
values
> ptnix{~/:grd NB. Back in original input order
> )
>
> So, for example (matching the divisions above):
>
> rnn,:4 ncileix rnn
> 7 4 0 2 7 6 8 4 6 9 8 4
> 2 1 0 0 2 2 3 1 2 3 3 1
>
> However, my feeling is this code is a bit clumsy. Can anyone think of
> a neater pair of solutions?
>
> P.S. I don't really care about this behavior that fails to return "x"
> values as my left arguments will probably be hundreds of nearly distinct
> values and I'll usually want no more than 10 partitions:
>
> 4 ncile 20$1
> +---------------------------------------+
> |1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1|
> +---------------------------------------+
> 4 ncileix 20$1
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
>
> Thanks,
>
> Devon McCormick, CFA
> ^me^ at acm.
> org is my
> preferred e-mail
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
--
Devon McCormick, CFA
^me^ at acm.
org is my
preferred e-mail
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm