Ric, Here's a response to an old thread which I took from my "things to think about collection:
group2=:(+/\@(* #) I. i.@#@]) </. ] 0.35 0.3 0.3 group2 i.14 ┌─────────┬─────────┬───────────┐ │0 1 2 3 4│5 6 7 8 9│10 11 12 13│ └─────────┴─────────┴───────────┘ f=: 13 : '(x (] I.~ [: +/\ [ * [: # ]) y)</. y' 0.35 0.3 0.3 f i.14 ┌─────────┬─────────┬───────────┐ │0 1 2 3 4│5 6 7 8 9│10 11 12 13│ └─────────┴─────────┴───────────┘ group2 (+/\@(* #) I. i.@#@]) </. ] f ] </.~ ] I.~ [: +/\ [ * [: # ] I was surprised at how neat f turned out when I wrote an explicit definition of my version of you way to group the numbers. Linda -----Original Message----- From: Programming [mailto:[email protected]] On Behalf Of Ric Sherlock Sent: Saturday, October 21, 2017 3:21 PM To: Programming JForum <[email protected]> Subject: Re: [Jprogramming] Splitting an Array into several arrays Jon, The following assigns each of the data points to a group, then boxes those groups using key. assignGrp=: +/\@(* #) I. i.@#@] group=: assignGrp </. ] 0.35 0.35 0.3 group i.14 ┌─────────┬─────────┬───────────┐ │0 1 2 3 4│5 6 7 8 9│10 11 12 13│ └─────────┴─────────┴───────────┘ On 21/10/2017 02:14, "'Jon Hough' via Programming" < [email protected]> wrote: > Hi, > > What I am really after is a verb that splits by percentage. To give a > concrete uses case: > I have a dataset, which I wish to split into training set, validation > set, and testing set. > > I want 35% of the datapoints to go in the training set, 35% go in the > validation set, the rest go in the test set. (Just example numbers). > > > No need to worry about shuffling, randomizing etc, I am assuming the > data is sufficiently random. > As Raul said, I can simplify slightly by just using the size of the > dataset as the right argument. > > -------------------------------------------- > On Fri, 10/20/17, Erling Hellenäs <[email protected]> wrote: > > Subject: Re: [Jprogramming] Splitting an Array into several arrays > To: [email protected] > Date: Friday, October 20, 2017, 10:06 PM > > Hi all ! > > A splitSubs with CutN could possibly look like > this: > > splitSubsE=: ([ (([: > # [) {. ]) ([: <. 0.5 + [: }: [ * [: # ]) ( [ , ([: > # ]) - [: +/ [) ]) CutN ] > > (i.0) splitSubsE i.0 > > (,55) splitSubsE ,5 > ┌─┐ > │5│ > └─┘ > split > splitSubsE i.0 > ┌┬┬┐ > ││││ > └┴┴┘ > split splitSubsE i.1 > ┌┬┬─┐ > │││0│ > └┴┴─┘ > split > splitSubsE i.2 > ┌─┬─┬┐ > │0│1││ > └─┴─┴┘ > split > splitSubsE i.3 > ┌─┬─┬─┐ > │0│1│2│ > └─┴─┴─┘ > > split splitSubsE i.4 > ┌─┬─┬───┐ > │0│1│2 3│ > └─┴─┴───┘ > > Cheers > > Erling > Hellenäs > > > Den 2017-10-20 kl. 14:11, skrev Erling > Hellenäs: > > Hi all! > > > > I looked for a > version of Cut which takes the number of items in each > group as > left argument. I didn't find one. I think it is what you most > > often need, because it allows groups with zero length content. > > > > I made CutN as an > illustration: > > > > > CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i. ])"0 ])@:[ {&.>/ [: > < ] > > > (i.0) CutN i.0 > > > > (,0) CutN i.0 > > ┌┐ > > ││ > > └┘ > > (,1) CutN > 10+i.1 > > ┌──┐ > > │10│ > > > └──┘ > > 0 2 CutN 10+i.2 > > ┌┬─────┐ > > ││10 11│ > > > └┴─────┘ > > 2 5 0 > CutN 10+i.7 > > > ┌─────┬──────────────┬┐ > > │10 11│12 13 14 15 16││ > > > └─────┴──────────────┴┘ > > 0 7 0 CutN 10+i.7 > > > ┌┬────────────────────┬┐ > > ││10 11 12 13 14 15 16││ > > > └┴────────────────────┴┘ > > > > Cheers, > > > > Erling Hellenäs > > > > > > > Den 2017-10-20 kl. 10:42, skrev 'Jon Hough' via > Programming: > >> The problem: > >> Let X be an array. > >> X=: i. 50 NB. example > >> > >> Let > 'split' be the percentages that each subarray takes from X, >> > sequentially >> e.g >> split =: > 0.35 0.35 0.3 NB. first array takes 35% , second sub array > > >> takes 35%, third takes 30% > >> So in the end > >> > >> My > solution > >> > >> > splitSubs =: > -.~&.>/\@:(i.&.>"0@:<"0)@:}.@:>.@:((+/\ > - ])@:[ (* , ]) > >> #@:]) > >> > >> split > splitSubs X > >> > >> > >> This gives 3 > boxed arrays. Each array holds the indices to take from X. > >> > >> > There is a slight problem in that the first and second subarrays >> > have different >> length, due to rounding error. I am not too > bothered about that >> since, depending on the size of X and the > percentages, this is >> unavoidable. > >> > >> Any more > succinct, nicer solutions? > >> > > ---------------------------------------------------------------------- > >> For information about J forums see > http://www.jsoftware.com/forums.htm > > > > > > ---------------------------------------------------------------------- > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > For information about J forums see > http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
