Hi, What I am really after is a verb that splits by percentage. To give a concrete uses case: I have a dataset, which I wish to split into training set, validation set, and testing set.
I want 35% of the datapoints to go in the training set, 35% go in the validation set, the rest go in the test set. (Just example numbers). No need to worry about shuffling, randomizing etc, I am assuming the data is sufficiently random. As Raul said, I can simplify slightly by just using the size of the dataset as the right argument. -------------------------------------------- On Fri, 10/20/17, Erling Hellenäs <[email protected]> wrote: Subject: Re: [Jprogramming] Splitting an Array into several arrays To: [email protected] Date: Friday, October 20, 2017, 10:06 PM Hi all ! A splitSubs with CutN could possibly look like this: splitSubsE=: ([ (([: # [) {. ]) ([: <. 0.5 + [: }: [ * [: # ]) ( [ , ([: # ]) - [: +/ [) ]) CutN ] (i.0) splitSubsE i.0 (,55) splitSubsE ,5 ┌─┐ │5│ └─┘ split splitSubsE i.0 ┌┬┬┐ ││││ └┴┴┘ split splitSubsE i.1 ┌┬┬─┐ │││0│ └┴┴─┘ split splitSubsE i.2 ┌─┬─┬┐ │0│1││ └─┴─┴┘ split splitSubsE i.3 ┌─┬─┬─┐ │0│1│2│ └─┴─┴─┘ split splitSubsE i.4 ┌─┬─┬───┐ │0│1│2 3│ └─┴─┴───┘ Cheers Erling Hellenäs Den 2017-10-20 kl. 14:11, skrev Erling Hellenäs: > Hi all! > > I looked for a version of Cut which takes the number of items in each > group as left argument. I didn't find one. I think it is what you most > often need, because it allows groups with zero length content. > > I made CutN as an illustration: > > CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i. ])"0 ])@:[ {&.>/ [: < ] > > (i.0) CutN i.0 > > (,0) CutN i.0 > ┌┐ > ││ > └┘ > (,1) CutN 10+i.1 > ┌──┐ > │10│ > └──┘ > 0 2 CutN 10+i.2 > ┌┬─────┐ > ││10 11│ > └┴─────┘ > 2 5 0 CutN 10+i.7 > ┌─────┬──────────────┬┐ > │10 11│12 13 14 15 16││ > └─────┴──────────────┴┘ > 0 7 0 CutN 10+i.7 > ┌┬────────────────────┬┐ > ││10 11 12 13 14 15 16││ > └┴────────────────────┴┘ > > Cheers, > > Erling Hellenäs > > > Den 2017-10-20 kl. 10:42, skrev 'Jon Hough' via Programming: >> The problem: >> Let X be an array. >> X=: i. 50 NB. example >> >> Let 'split' be the percentages that each subarray takes from X, >> sequentially >> e.g >> split =: 0.35 0.35 0.3 NB. first array takes 35% , second sub array >> takes 35%, third takes 30% >> So in the end >> >> My solution >> >> splitSubs =: -.~&.>/\@:(i.&.>"0@:<"0)@:}.@:>.@:((+/\ - ])@:[ (* , ]) >> #@:]) >> >> split splitSubs X >> >> >> This gives 3 boxed arrays. Each array holds the indices to take from X. >> >> There is a slight problem in that the first and second subarrays >> have different >> length, due to rounding error. I am not too bothered about that >> since, depending on the size of X and the percentages, this is >> unavoidable. >> >> Any more succinct, nicer solutions? >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
