This looks pretty similar to what others have proposed:
X=: i. 50 NB. example
split=: 0.35 0.35 0.3 NB. 35% in first, 35% in second, 30% in third
(3 : 'assert. 1=+/y') split
(3 : 'assert. 1=+/y') 0.5 0.4
|assertion failure
| 1 =+/y
split (] </.~ [: +/ ([: +/\ [) </ [: ([: }: ] %~ [: i.@:>: ]) [: # ]) X
+---------...--------------+-----------...--------+-----...--------------+
|0 1 2 3 4...13 14 15 16 17|18 19 20 21...33 34 35|36 37...45 46 47 48 49|
+---------...--------------+-----------...--------+-----...--------------+
NB. Simpler to verify at a glance:
split (] </.~ [: +/ ([: +/\ [) </ [: ([: }: ] %~ [: i.@:>: ]) [: # ])
i.100
+---------...-----------+--------------...-----------+--------...-----------+
|0 1 2 3 4...32 33 34 35|36 37 38 39 40...67 68 69 70|71 72 73...96 97 98
99|
+---------...-----------+--------------...-----------+--------...-----------+
ptn=. split (] </.~ [: +/ ([: +/\ [) </ [: ([: }: ] %~ [: i.@:>: ]) [: #
]) i.100
({.,{:)&.>ptn
+----+-----+-----+
|0 35|36 70|71 99|
+----+-----+-----+
On Fri, Oct 20, 2017 at 9:14 AM, 'Jon Hough' via Programming <
[email protected]> wrote:
> Hi,
>
> What I am really after is a verb that splits by percentage. To give a
> concrete uses case:
> I have a dataset, which I wish to split into training set, validation set,
> and testing set.
>
> I want 35% of the datapoints to go in the training set,
> 35% go in the validation set,
> the rest go in the test set. (Just example numbers).
>
>
> No need to worry about shuffling, randomizing etc, I am assuming the data
> is sufficiently random.
> As Raul said, I can simplify slightly by just using the size of the
> dataset as the right argument.
>
> --------------------------------------------
> On Fri, 10/20/17, Erling Hellenäs <[email protected]> wrote:
>
> Subject: Re: [Jprogramming] Splitting an Array into several arrays
> To: [email protected]
> Date: Friday, October 20, 2017, 10:06 PM
>
> Hi all !
>
> A splitSubs with CutN could possibly look like
> this:
>
> splitSubsE=: ([ (([:
> # [) {. ]) ([: <. 0.5 + [: }: [ * [: # ]) ( [ , ([:
> # ]) - [: +/ [) ]) CutN ]
>
> (i.0) splitSubsE i.0
>
> (,55) splitSubsE ,5
> ┌─┐
> │5│
> └─┘
> split
> splitSubsE i.0
> ┌┬┬┐
> ││││
> └┴┴┘
> split splitSubsE i.1
> ┌┬┬─┐
> │││0│
> └┴┴─┘
> split
> splitSubsE i.2
> ┌─┬─┬┐
> │0│1││
> └─┴─┴┘
> split
> splitSubsE i.3
> ┌─┬─┬─┐
> │0│1│2│
> └─┴─┴─┘
>
> split splitSubsE i.4
> ┌─┬─┬───┐
> │0│1│2 3│
> └─┴─┴───┘
>
> Cheers
>
> Erling
> Hellenäs
>
>
> Den 2017-10-20 kl. 14:11, skrev Erling
> Hellenäs:
> > Hi all!
> >
> > I looked for a
> version of Cut which takes the number of items in each
> > group as left argument. I didn't find
> one. I think it is what you most
> > often
> need, because it allows groups with zero length content.
> >
> > I made CutN as an
> illustration:
> >
> >
> CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i. ])"0
> ])@:[ {&.>/ [: < ]
> >
> > (i.0) CutN i.0
> >
> > (,0) CutN i.0
> > ┌┐
> > ││
> > └┘
> > (,1) CutN
> 10+i.1
> > ┌──┐
> > │10│
> >
> └──┘
> > 0 2 CutN 10+i.2
> > ┌┬─────┐
> > ││10 11│
> >
> └┴─────┘
> > 2 5 0
> CutN 10+i.7
> >
> ┌─────┬──────────────┬┐
> > │10 11│12 13 14 15 16││
> >
> └─────┴──────────────┴┘
> > 0 7 0 CutN 10+i.7
> >
> ┌┬────────────────────┬┐
> > ││10 11 12 13 14 15 16││
> >
> └┴────────────────────┴┘
> >
> > Cheers,
> >
> > Erling Hellenäs
> >
> >
> >
> Den 2017-10-20 kl. 10:42, skrev 'Jon Hough' via
> Programming:
> >> The problem:
> >> Let X be an array.
> >> X=: i. 50 NB. example
> >>
> >> Let
> 'split' be the percentages that each subarray takes
> from X,
> >> sequentially
> >> e.g
> >> split =:
> 0.35 0.35 0.3 NB. first array takes 35% , second sub array
>
> >> takes 35%, third takes 30%
> >> So in the end
> >>
> >> My
> solution
> >>
> >>
> splitSubs =:
> -.~&.>/\@:(i.&.>"0@:<"0)@:}.@:>.@:((+/\
> - ])@:[ (* , ])
> >> #@:])
> >>
> >> split
> splitSubs X
> >>
> >>
> >> This gives 3
> boxed arrays. Each array holds the indices to take from
> X.
> >>
> >>
> There is a slight problem in that the first and second
> subarrays
> >> have different
> >> length, due to rounding error. I am
> not too bothered about that
> >> since,
> depending on the size of X and the percentages, this is
> >> unavoidable.
> >>
> >> Any more
> succinct, nicer solutions?
> >>
> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >
> >
> ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
--
Devon McCormick, CFA
Quantitative Consultant
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm