Hi all!
You want a trick to make the percentage split as good as possible?
Cheers,
Erling Hellenäs
Den 2017-10-20 kl. 15:14, skrev 'Jon Hough' via Programming:
Hi,
What I am really after is a verb that splits by percentage. To give a concrete
uses case:
I have a dataset, which I wish to split into training set, validation set, and
testing set.
I want 35% of the datapoints to go in the training set,
35% go in the validation set,
the rest go in the test set. (Just example numbers).
No need to worry about shuffling, randomizing etc, I am assuming the data is
sufficiently random.
As Raul said, I can simplify slightly by just using the size of the dataset as
the right argument.
--------------------------------------------
On Fri, 10/20/17, Erling Hellenäs <[email protected]> wrote:
Subject: Re: [Jprogramming] Splitting an Array into several arrays
To: [email protected]
Date: Friday, October 20, 2017, 10:06 PM
Hi all !
A splitSubs with CutN could possibly look like
this:
splitSubsE=: ([ (([:
# [) {. ]) ([: <. 0.5 + [: }: [ * [: # ]) ( [ , ([:
# ]) - [: +/ [) ]) CutN ]
(i.0) splitSubsE i.0
(,55) splitSubsE ,5
┌─┐
│5│
└─┘
split
splitSubsE i.0
┌┬┬┐
││││
└┴┴┘
split splitSubsE i.1
┌┬┬─┐
│││0│
└┴┴─┘
split
splitSubsE i.2
┌─┬─┬┐
│0│1││
└─┴─┴┘
split
splitSubsE i.3
┌─┬─┬─┐
│0│1│2│
└─┴─┴─┘
split splitSubsE i.4
┌─┬─┬───┐
│0│1│2 3│
└─┴─┴───┘
Cheers
Erling
Hellenäs
Den 2017-10-20 kl. 14:11, skrev Erling
Hellenäs:
> Hi all!
>
> I looked for a
version of Cut which takes the number of items in each
> group as left argument. I didn't find
one. I think it is what you most
> often
need, because it allows groups with zero length content.
>
> I made CutN as an
illustration:
>
>
CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i. ])"0
])@:[ {&.>/ [: < ]
>
> (i.0) CutN i.0
>
> (,0) CutN i.0
> ┌┐
> ││
> └┘
> (,1) CutN
10+i.1
> ┌──┐
> │10│
>
└──┘
> 0 2 CutN 10+i.2
> ┌┬─────┐
> ││10 11│
>
└┴─────┘
> 2 5 0
CutN 10+i.7
>
┌─────┬──────────────┬┐
> │10 11│12 13 14 15 16││
>
└─────┴──────────────┴┘
> 0 7 0 CutN 10+i.7
>
┌┬────────────────────┬┐
> ││10 11 12 13 14 15 16││
>
└┴────────────────────┴┘
>
> Cheers,
>
> Erling Hellenäs
>
>
>
Den 2017-10-20 kl. 10:42, skrev 'Jon Hough' via
Programming:
>> The problem:
>> Let X be an array.
>> X=: i. 50 NB. example
>>
>> Let
'split' be the percentages that each subarray takes
from X,
>> sequentially
>> e.g
>> split =:
0.35 0.35 0.3 NB. first array takes 35% , second sub array
>> takes 35%, third takes 30%
>> So in the end
>>
>> My
solution
>>
>>
splitSubs =:
-.~&.>/\@:(i.&.>"0@:<"0)@:}.@:>.@:((+/\
- ])@:[ (* , ])
>> #@:])
>>
>> split
splitSubs X
>>
>>
>> This gives 3
boxed arrays. Each array holds the indices to take from
X.
>>
>>
There is a slight problem in that the first and second
subarrays
>> have different
>> length, due to rounding error. I am
not too bothered about that
>> since,
depending on the size of X and the percentages, this is
>> unavoidable.
>>
>> Any more
succinct, nicer solutions?
>>
----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
>
----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm