This looks pretty similar to what others have proposed:

   X=: i. 50             NB.  example
   split=: 0.35 0.35 0.3 NB. 35% in first, 35% in second, 30% in third
   (3 : 'assert. 1=+/y') split
   (3 : 'assert. 1=+/y') 0.5 0.4
|assertion failure
|   1    =+/y

   split (] </.~ [: +/ ([: +/\ [) </ [: ([: }: ] %~ [: i.@:>: ]) [: # ]) X
+---------...--------------+-----------...--------+-----...--------------+
|0 1 2 3 4...13 14 15 16 17|18 19 20 21...33 34 35|36 37...45 46 47 48 49|
+---------...--------------+-----------...--------+-----...--------------+
   NB. Simpler to verify at a glance:
   split (] </.~ [: +/ ([: +/\ [) </ [: ([: }: ] %~ [: i.@:>: ]) [: # ])
i.100
+---------...-----------+--------------...-----------+--------...-----------+
|0 1 2 3 4...32 33 34 35|36 37 38 39 40...67 68 69 70|71 72 73...96 97 98
99|
+---------...-----------+--------------...-----------+--------...-----------+
   ptn=. split (] </.~ [: +/ ([: +/\ [) </ [: ([: }: ] %~ [: i.@:>: ]) [: #
]) i.100
   ({.,{:)&.>ptn
+----+-----+-----+
|0 35|36 70|71 99|
+----+-----+-----+


On Fri, Oct 20, 2017 at 9:14 AM, 'Jon Hough' via Programming <
[email protected]> wrote:

> Hi,
>
> What I am really after is a verb that splits by percentage. To give a
> concrete uses case:
> I have a dataset, which I wish to split into training set, validation set,
> and testing set.
>
> I want 35% of the datapoints to go in the training set,
> 35% go in the validation set,
> the rest go in the test set. (Just example numbers).
>
>
> No need to worry about shuffling, randomizing etc, I am assuming the data
> is sufficiently random.
> As Raul said, I can simplify slightly by just using the size of the
> dataset as the right argument.
>
> --------------------------------------------
> On Fri, 10/20/17, Erling Hellenäs <[email protected]> wrote:
>
>  Subject: Re: [Jprogramming] Splitting an Array into several arrays
>  To: [email protected]
>  Date: Friday, October 20, 2017, 10:06 PM
>
>  Hi all !
>
>  A splitSubs with CutN could possibly look like
>  this:
>
>  splitSubsE=: ([ (([:
>  # [) {. ]) ([: <. 0.5 + [: }: [ * [: # ]) ( [ , ([:
>  # ]) - [: +/ [) ]) CutN ]
>
>      (i.0) splitSubsE i.0
>
>      (,55) splitSubsE ,5
>  ┌─┐
>  │5│
>  └─┘
>      split
>  splitSubsE i.0
>  ┌┬┬┐
>  ││││
>  └┴┴┘
>      split splitSubsE i.1
>  ┌┬┬─┐
>  │││0│
>  └┴┴─┘
>      split
>  splitSubsE i.2
>  ┌─┬─┬┐
>  │0│1││
>  └─┴─┴┘
>      split
>  splitSubsE i.3
>  ┌─┬─┬─┐
>  │0│1│2│
>  └─┴─┴─┘
>
>  split splitSubsE i.4
>  ┌─┬─┬───┐
>  │0│1│2 3│
>  └─┴─┴───┘
>
>  Cheers
>
>  Erling
>  Hellenäs
>
>
>  Den 2017-10-20 kl. 14:11, skrev Erling
>  Hellenäs:
>  > Hi all!
>  >
>  > I looked for a
>  version of Cut which takes the number of items in each
>  > group as left argument. I didn't find
>  one. I think it is what you most
>  > often
>  need, because it allows groups with zero length content.
>  >
>  > I made CutN as an
>  illustration:
>  >
>  >
>  CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i. ])"0
>  ])@:[ {&.>/ [: < ]
>  >
>  >    (i.0) CutN i.0
>  >
>  >    (,0) CutN i.0
>  > ┌┐
>  > ││
>  > └┘
>  >    (,1) CutN
>  10+i.1
>  > ┌──┐
>  > │10│
>  >
>  └──┘
>  >    0 2 CutN 10+i.2
>  > ┌┬─────┐
>  > ││10 11│
>  >
>  └┴─────┘
>  >    2 5 0
>  CutN 10+i.7
>  >
>  ┌─────┬──────────────┬┐
>  > │10 11│12 13 14 15 16││
>  >
>  └─────┴──────────────┴┘
>  >    0 7 0 CutN 10+i.7
>  >
>  ┌┬────────────────────┬┐
>  > ││10 11 12 13 14 15 16││
>  >
>  └┴────────────────────┴┘
>  >
>  > Cheers,
>  >
>  > Erling Hellenäs
>  >
>  >
>  >
>  Den 2017-10-20 kl. 10:42, skrev 'Jon Hough' via
>  Programming:
>  >> The problem:
>  >> Let X be an array.
>  >> X=: i. 50 NB.  example
>  >>
>  >> Let
>  'split' be the percentages that each subarray takes
>  from X,
>  >> sequentially
>  >> e.g
>  >> split =:
>  0.35 0.35 0.3 NB. first array takes 35% , second sub array
>
>  >> takes  35%, third takes 30%
>  >> So in the end
>  >>
>  >> My
>  solution
>  >>
>  >>
>  splitSubs =:
>  -.~&.>/\@:(i.&.>"0@:<"0)@:}.@:>.@:((+/\
>  - ])@:[ (* , ])
>  >> #@:])
>  >>
>  >> split
>  splitSubs X
>  >>
>  >>
>  >> This gives 3
>  boxed arrays. Each array holds the indices to take from
>  X.
>  >>
>  >>
>  There is a slight problem in that the first and second
>  subarrays
>  >> have different
>  >> length, due to rounding error. I am
>  not too bothered about that
>  >> since,
>  depending on the size of X and the percentages, this is
>  >> unavoidable.
>  >>
>  >> Any more
>  succinct, nicer solutions?
>  >>
>  ----------------------------------------------------------------------
>  >> For information about J forums see http://www.jsoftware.com/forums.htm
>  >
>  >
>  ----------------------------------------------------------------------
>  > For information about J forums see http://www.jsoftware.com/forums.htm
>
>  ----------------------------------------------------------------------
>  For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 

Devon McCormick, CFA

Quantitative Consultant
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to