Hi,

What I am really after is a verb that splits by percentage. To give a concrete 
uses case:
I have a dataset, which I wish to split into training set, validation set, and 
testing set.

I want 35% of the datapoints to go in the training set,
35% go in the validation set,
the rest go in the test set. (Just example numbers).


No need to worry about shuffling, randomizing etc, I am assuming the data is 
sufficiently random.
As Raul said, I can simplify slightly by just using the size of the dataset as 
the right argument.

--------------------------------------------
On Fri, 10/20/17, Erling Hellenäs <[email protected]> wrote:

 Subject: Re: [Jprogramming] Splitting an Array into several arrays
 To: [email protected]
 Date: Friday, October 20, 2017, 10:06 PM
 
 Hi all !
 
 A splitSubs with CutN could possibly look like
 this:
 
 splitSubsE=: ([ (([:
 # [) {. ]) ([: <. 0.5 + [: }: [ * [: # ]) ( [ , ([: 
 # ]) - [: +/ [) ]) CutN ]
 
     (i.0) splitSubsE i.0
 
     (,55) splitSubsE ,5
 ┌─┐
 │5│
 └─┘
     split
 splitSubsE i.0
 ┌┬┬┐
 ││││
 └┴┴┘
     split splitSubsE i.1
 ┌┬┬─┐
 │││0│
 └┴┴─┘
     split
 splitSubsE i.2
 ┌─┬─┬┐
 │0│1││
 └─┴─┴┘
     split
 splitSubsE i.3
 ┌─┬─┬─┐
 │0│1│2│
 └─┴─┴─┘
    
 split splitSubsE i.4
 ┌─┬─┬───┐
 │0│1│2 3│
 └─┴─┴───┘
 
 Cheers
 
 Erling
 Hellenäs
 
 
 Den 2017-10-20 kl. 14:11, skrev Erling
 Hellenäs:
 > Hi all!
 >
 > I looked for a
 version of Cut which takes the number of items in each 
 > group as left argument. I didn't find
 one. I think it is what you most 
 > often
 need, because it allows groups with zero length content.
 >
 > I made CutN as an
 illustration:
 >
 >
 CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i. ])"0
 ])@:[ {&.>/ [: < ]
 >
 >    (i.0) CutN i.0
 >
 >    (,0) CutN i.0
 > ┌┐
 > ││
 > └┘
 >    (,1) CutN
 10+i.1
 > ┌──┐
 > │10│
 >
 └──┘
 >    0 2 CutN 10+i.2
 > ┌┬─────┐
 > ││10 11│
 >
 └┴─────┘
 >    2 5 0
 CutN 10+i.7
 >
 ┌─────┬──────────────┬┐
 > │10 11│12 13 14 15 16││
 >
 └─────┴──────────────┴┘
 >    0 7 0 CutN 10+i.7
 >
 ┌┬────────────────────┬┐
 > ││10 11 12 13 14 15 16││
 >
 └┴────────────────────┴┘
 >
 > Cheers,
 >
 > Erling Hellenäs
 >
 >
 >
 Den 2017-10-20 kl. 10:42, skrev 'Jon Hough' via
 Programming:
 >> The problem:
 >> Let X be an array.
 >> X=: i. 50 NB.  example
 >>
 >> Let
 'split' be the percentages that each subarray takes
 from X, 
 >> sequentially
 >> e.g
 >> split =:
 0.35 0.35 0.3 NB. first array takes 35% , second sub array
 
 >> takes  35%, third takes 30%
 >> So in the end
 >>
 >> My
 solution
 >>
 >>
 splitSubs =:
 -.~&.>/\@:(i.&.>"0@:<"0)@:}.@:>.@:((+/\
 - ])@:[ (* , ]) 
 >> #@:])
 >>
 >> split
 splitSubs X
 >>
 >>
 >> This gives 3
 boxed arrays. Each array holds the indices to take from
 X.
 >>
 >>  
 There is a slight problem in that the first and second
 subarrays 
 >> have different
 >> length, due to rounding error. I am
 not too bothered about that 
 >> since,
 depending on the size of X and the percentages, this is
 >> unavoidable.
 >>
 >> Any more
 succinct, nicer solutions?
 >>
 ----------------------------------------------------------------------
 >> For information about J forums see http://www.jsoftware.com/forums.htm
 >
 >
 ----------------------------------------------------------------------
 > For information about J forums see http://www.jsoftware.com/forums.htm
 
 ----------------------------------------------------------------------
 For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to