You need to think about whether you want to sort this by the numeric values or by the labels. If you sort it by the numeric values, your bins will change position as your values change relative to each other; sorting by the labels keeps the bins in the same place. It depends on how you want to present it.
On Tue, Jun 3, 2014 at 10:41 AM, Roger Hui <[email protected]> wrote: > The treatment of an argument with duplicate entries obtain as follows: > > f =: 3 : 0 > sorted=: \:~ y > vals =: (+/\ % +/) sorted > (sorted i. y) { vals > ) > > g =: 3 : 0 > i /:~ (+/\ % +/) i{y [ i=. \:y > ) > > When the argument does not have duplicate entries, f and g are identical: > > (f -: g) 40?60 > 1 > (f -: g) 40?60 > 1 > (f -: g) 40?60 > 1 > > But when there are duplicate entries: > > (f -: g) y=: 3 4 3 3 3 > 0 > f y > 0.4375 0.25 0.4375 0.4375 0.4375 > g y > 0.4375 0.25 0.625 0.8125 1 > > g on the original y is seen to be "correct" when applied to a slightly > perturbed y : > > y1=: y+1e_10*i.#y > f y1 > 1 0.25 0.8125 0.625 0.4375 > g y1 > 1 0.25 0.8125 0.625 0.4375 > > The difference between f and g is that g re-orders the cumulated ratios > using the inverse permutation of i that sorted the argument. When y has no > duplicates, the inverse permutation i is the same as (sorted i. y); in > general, the inverse permutation is /:i. Note: i/:~blah ←→ blah/:i ←→ > (/:i){blah. > > > > > On Tue, Jun 3, 2014 at 6:49 AM, Joe Bogner <[email protected]> wrote: > > > Thanks Pascal - good solution to my incorrect approach. > > > > Roger, I am using this with a pareto chart to identify what bin each > record > > would fall under. > > > > http://en.wikipedia.org/wiki/Pareto_chart > > > > I don't need to draw the chart, I just need to know what each record > would > > be classified as: > > > > causes =: > ' ' cut each LF cut(0 : 0) > > Public 47 > > Weather 28 > > Oversight 18 > > Emergency 12 > > Traffic 5 > > ChildCare 57 > > ) > > > > vals=. ". > 1}"1 causes > > > > runsumpct =: 3 : 0 > > sorted=. \:~ y > > vals =. (+/\ % +/) sorted > > (sorted i. y) { vals > > ) > > > > pct=:runsumpct vals > > causes,.>each pct > > > > ┌─────────┬──┬────────┐ > > │Public │47│0.622754│ > > ├─────────┼──┼────────┤ > > │Weather │28│0.790419│ > > ├─────────┼──┼────────┤ > > │Oversight│18│0.898204│ > > ├─────────┼──┼────────┤ > > │Emergency│12│0.97006 │ > > ├─────────┼──┼────────┤ > > │Traffic │5 │1 │ > > ├─────────┼──┼────────┤ > > │ChildCare│57│0.341317│ > > └─────────┴──┴────────┘ > > > > (/: pct) { (causes,.>each pct) > > > > ┌─────────┬──┬────────┐ > > │ChildCare│57│0.341317│ > > ├─────────┼──┼────────┤ > > │Public │47│0.622754│ > > ├─────────┼──┼────────┤ > > │Weather │28│0.790419│ > > ├─────────┼──┼────────┤ > > │Oversight│18│0.898204│ > > ├─────────┼──┼────────┤ > > │Emergency│12│0.97006 │ > > ├─────────┼──┼────────┤ > > │Traffic │5 │1 │ > > └─────────┴──┴────────┘ > > > > I suppose I could sort the data before providing it to the function if > that > > helps. > > > > You are right that dupes cause problems with using i. to locate the > record. > > Thank you for pointing that out. I don't know how to fix it yet and would > > welcome any suggestions. > > > > causes =: > ' ' cut each LF cut(0 : 0) > > Public 47 > > Weather 28 > > Oversight 18 > > Emergency 12 > > Traffic 5 > > ChildCare 57 > > XYZ 5 > > ) > > > > > > (/: pct) { (causes,.>each pct) > > ┌─────────┬──┬────────┐ > > │ChildCare│57│0.331395│ > > ├─────────┼──┼────────┤ > > │Public │47│0.604651│ > > ├─────────┼──┼────────┤ > > │Weather │28│0.767442│ > > ├─────────┼──┼────────┤ > > │Oversight│18│0.872093│ > > ├─────────┼──┼────────┤ > > │Emergency│12│0.94186 │ > > ├─────────┼──┼────────┤ > > │Traffic │5 │0.97093 │ > > ├─────────┼──┼────────┤ > > │XYZ │5 │0.97093 │ > > └─────────┴──┴────────┘ > > > > > > > > > > On Tue, Jun 3, 2014 at 9:10 AM, Roger Hui <[email protected]> > > wrote: > > > > > y > > > 1 100 5 10 > > > y,.runsumpct y > > > 1 1 > > > 100 0.862069 > > > 5 0.991379 > > > 10 0.948276 > > > > > > Please provide an English description of the problem being solved. In > > > particular, I don't understand how the result is "in the original > order". > > > In addition, won't you have a problem if the argument has duplicate > > > entries? > > > > > > t,.runsumpct t=: y,1 1 1 > > > 1 0.97479 > > > 100 0.840336 > > > 5 0.966387 > > > 10 0.92437 > > > 1 0.97479 > > > 1 0.97479 > > > 1 0.97479 > > > > > > > > > > > > > > > > > > On Tue, Jun 3, 2014 at 4:32 AM, Joe Bogner <[email protected]> > wrote: > > > > > > > Is there a cleaner way to write this or is this a reasonable > > > > implementation? > > > > > > > > runsumpct =: 3 : 0 > > > > > > > > sorted=: \:~ y > > > > > > > > vals =: (+/\ % +/) sorted > > > > > > > > (sorted i. y) { vals > > > > > > > > ) > > > > > > > > > > > > runsumpct 1 100 5 10 > > > > > > > > 1 0.862069 0.991379 0.948276 > > > > > > > > > > > > > > > > I'm interested if there's a cleaner approach to sorting, operating, > and > > > > then returning the result in the original order. > > > > > ---------------------------------------------------------------------- > > > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > > > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > -- Devon McCormick, CFA ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
