Hi all !
New splitSub with Distribute and CutN.
Project:
CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i. ])"0 ])@:[ {&.>/ [: < ]
Distribute=: 4 : 'rr+(/:s){($s=.\:r-rr){.(d=.y-+/rr=.<.!.0 r=.x*y)$1'
splitSubsE=: ([ Distribute #@:]) CutN ]
split =: 0.35 0.35 0.3
(i.0) splitSubsE i.0
split splitSubsE i.0
split splitSubsE i.1
split splitSubsE i.2
split splitSubsE i.3
split splitSubsE i.4
Output:
(i.0) splitSubsE i.0
split splitSubsE i.0
┌┬┬┐
││││
└┴┴┘
split splitSubsE i.1
┌─┬┬┐
│0│││
└─┴┴┘
split splitSubsE i.2
┌─┬─┬┐
│0│1││
└─┴─┴┘
split splitSubsE i.3
┌─┬─┬─┐
│0│1│2│
└─┴─┴─┘
split splitSubsE i.4
┌───┬─┬─┐
│0 1│2│3│
└───┴─┴─┘
Cheers,
Erling Hellenäs
On 2017-10-21 14:09, 'Jon Hough' via Programming wrote:
Hi Skip,
I am not doing machine learning in J per se, only toy programs. But I am doing
some data
prep for some kind of data analysis, which will be done probably in Python. So
I am just looking
to slice and dice the data in J.
I think J has an ML lab, which you might find interesting. Doing any kind of
NLP means you
should use a battle tested library, and the only one I am aware of is NLTK
http://www.nltk.org/ .
Thanks,
Jon
--------------------------------------------
On Sat, 10/21/17, 'Skip Cave' via Programming <[email protected]> wrote:
Subject: Re: [Jprogramming] Splitting an Array into several arrays
To: "[email protected]" <[email protected]>
Date: Saturday, October 21, 2017, 7:03 AM
Jon,
"training set", "validation
set", "test set". Sounds like you are
working
on a machine learning project to
me.
Are there any demos,
labs, or articles on machine learning techniques in J?
Things like gradient decent, word2vec,
paragraph vectors, etc? I have a big
project
that calls for these kinds of NLP ML tools, and I hate that
I will
have to learn Python, or worse yet
Java Script, in order to do the project.
Skip
Skip
Cave
Cave Consulting LLC
On Fri, Oct 20, 2017 at 8:14 AM, 'Jon
Hough' via Programming <
[email protected]>
wrote:
> Hi,
>
> What I am really
after is a verb that splits by percentage. To give a
> concrete uses case:
> I
have a dataset, which I wish to split into training set,
validation set,
> and testing set.
>
> I want 35% of the
datapoints to go in the training set,
>
35% go in the validation set,
> the rest
go in the test set. (Just example numbers).
>
>
>
No need to worry about shuffling, randomizing etc, I am
assuming the data
> is sufficiently
random.
> As Raul said, I can simplify
slightly by just using the size of the
>
dataset as the right argument.
>
>
--------------------------------------------
> On Fri, 10/20/17, Erling Hellenäs <[email protected]>
wrote:
>
> Subject:
Re: [Jprogramming] Splitting an Array into several arrays
> To: [email protected]
> Date: Friday, October 20, 2017, 10:06
PM
>
> Hi all !
>
> A splitSubs with
CutN could possibly look like
>
this:
>
>
splitSubsE=: ([ (([:
> # [) {. ]) ([:
<. 0.5 + [: }: [ * [: # ]) ( [ , ([:
> # ]) - [: +/ [) ]) CutN ]
>
> (i.0)
splitSubsE i.0
>
>
(,55) splitSubsE ,5
>
┌─┐
> │5│
> └─┘
>
split
> splitSubsE i.0
> ┌┬┬┐
>
││││
> └┴┴┘
> split splitSubsE i.1
> ┌┬┬─┐
>
│││0│
> └┴┴─┘
> split
>
splitSubsE i.2
> ┌─┬─┬┐
> │0│1││
>
└─┴─┴┘
> split
> splitSubsE i.3
>
┌─┬─┬─┐
>
│0│1│2│
>
└─┴─┴─┘
>
> split splitSubsE i.4
> ┌─┬─┬───┐
> │0│1│2 3│
>
└─┴─┴───┘
>
> Cheers
>
> Erling
>
Hellenäs
>
>
> Den 2017-10-20 kl. 14:11, skrev
Erling
> Hellenäs:
> > Hi all!
>
>
> > I looked for a
> version of Cut which takes the number of
items in each
> > group as left
argument. I didn't find
> one. I
think it is what you most
> >
often
> need, because it allows groups
with zero length content.
> >
> > I made CutN as an
> illustration:
>
>
> >
>
CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i.
])"0
> ])@:[ {&.>/ [: <
]
> >
> >
(i.0) CutN i.0
> >
> > (,0) CutN i.0
> > ┌┐
> >
││
> > └┘
> > (,1) CutN
> 10+i.1
> >
┌──┐
> > │10│
> >
>
└──┘
> > 0 2 CutN
10+i.2
> >
┌┬─────┐
> >
││10 11│
> >
> └┴─────┘
> > 2 5 0
>
CutN 10+i.7
> >
>
┌─────┬──────────────┬┐
> > │10 11│12 13 14 15 16││
> >
>
└─────┴──────────────┴┘
> > 0 7 0 CutN 10+i.7
> >
>
┌┬────────────────────┬┐
> > ││10 11 12 13 14 15 16││
> >
>
└┴────────────────────┴┘
> >
> >
Cheers,
> >
>
> Erling Hellenäs
> >
> >
> >
> Den 2017-10-20 kl. 10:42, skrev 'Jon
Hough' via
> Programming:
> >> The problem:
> >> Let X be an array.
> >> X=: i. 50 NB. example
> >>
> >>
Let
> 'split' be the
percentages that each subarray takes
>
from X,
> >> sequentially
> >> e.g
>
>> split =:
> 0.35 0.35 0.3 NB.
first array takes 35% , second sub array
>
> >> takes
35%, third takes 30%
> >> So in
the end
> >>
> >> My
>
solution
> >>
> >>
>
splitSubs =:
>
-.~&.>/\@:(i.&.>"0@:<"0)@:}.@:>.@:((+/\
> - ])@:[ (* , ])
>
>> #@:])
> >>
> >> split
>
splitSubs X
> >>
> >>
> >>
This gives 3
> boxed arrays. Each array
holds the indices to take from
> X.
> >>
>
>>
> There is a slight problem in
that the first and second
>
subarrays
> >> have different
> >> length, due to rounding error.
I am
> not too bothered about that
> >> since,
>
depending on the size of X and the percentages, this is
> >> unavoidable.
> >>
> >>
Any more
> succinct, nicer
solutions?
> >>
>
----------------------------------------------------------------------
> >> For information about J forums
see http://www.jsoftware.com/forums.htm
> >
> >
>
----------------------------------------------------------------------
> > For information about J forums see
http://www.jsoftware.com/forums.htm
>
>
----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm