Hi Skip,

I am not doing machine learning in J per se, only toy programs. But I am doing 
some data
prep for some kind of data analysis, which will be done probably in Python. So 
I am just looking
to slice and dice the data in J.

I think J has an ML lab, which you might find interesting. Doing any kind of 
NLP means you 
should use a battle tested library, and the only one I am aware of is NLTK 
http://www.nltk.org/ .

Thanks,
Jon
--------------------------------------------
On Sat, 10/21/17, 'Skip Cave' via Programming <[email protected]> wrote:

 Subject: Re: [Jprogramming] Splitting an Array into several arrays
 To: "[email protected]" <[email protected]>
 Date: Saturday, October 21, 2017, 7:03 AM
 
 Jon,
 
 "training set", "validation
 set", "test set". Sounds like you are
 working
 on a machine learning project to
 me.
 
 Are there any demos,
 labs, or articles on machine learning techniques in J?
 Things like gradient decent, word2vec,
 paragraph vectors, etc? I have a big
 project
 that calls for these kinds of NLP ML tools, and I hate that
 I will
 have to learn Python, or worse yet
 Java Script, in order to do the project.
 
 Skip
 
 Skip
 Cave
 Cave Consulting LLC
 
 On Fri, Oct 20, 2017 at 8:14 AM, 'Jon
 Hough' via Programming <
 [email protected]>
 wrote:
 
 > Hi,
 >
 > What I am really
 after is a verb that splits by percentage. To give a
 > concrete uses case:
 > I
 have a dataset, which I wish to split into training set,
 validation set,
 > and testing set.
 >
 > I want 35% of the
 datapoints to go in the training set,
 >
 35% go in the validation set,
 > the rest
 go in the test set. (Just example numbers).
 >
 >
 >
 No need to worry about shuffling, randomizing etc, I am
 assuming the data
 > is sufficiently
 random.
 > As Raul said, I can simplify
 slightly by just using the size of the
 >
 dataset as the right argument.
 >
 >
 --------------------------------------------
 > On Fri, 10/20/17, Erling Hellenäs <[email protected]>
 wrote:
 >
 >  Subject:
 Re: [Jprogramming] Splitting an Array into several arrays
 >  To: [email protected]
 >  Date: Friday, October 20, 2017, 10:06
 PM
 >
 >  Hi all !
 >
 >  A splitSubs with
 CutN could possibly look like
 > 
 this:
 >
 > 
 splitSubsE=: ([ (([:
 >  # [) {. ]) ([:
 <. 0.5 + [: }: [ * [: # ]) ( [ , ([:
 >  # ]) - [: +/ [) ]) CutN ]
 >
 >      (i.0)
 splitSubsE i.0
 >
 > 
     (,55) splitSubsE ,5
 > 
 ┌─┐
 >  │5│
 >  └─┘
 >     
 split
 >  splitSubsE i.0
 >  ┌┬┬┐
 > 
 ││││
 >  └┴┴┘
 >      split splitSubsE i.1
 >  ┌┬┬─┐
 > 
 │││0│
 >  └┴┴─┘
 >      split
 > 
 splitSubsE i.2
 >  ┌─┬─┬┐
 >  │0│1││
 > 
 └─┴─┴┘
 >      split
 >  splitSubsE i.3
 > 
 ┌─┬─┬─┐
 > 
 │0│1│2│
 > 
 └─┴─┴─┘
 >
 >  split splitSubsE i.4
 >  ┌─┬─┬───┐
 >  │0│1│2 3│
 > 
 └─┴─┴───┘
 >
 >  Cheers
 >
 >  Erling
 > 
 Hellenäs
 >
 >
 >  Den 2017-10-20 kl. 14:11, skrev
 Erling
 >  Hellenäs:
 >  > Hi all!
 > 
 >
 >  > I looked for a
 >  version of Cut which takes the number of
 items in each
 >  > group as left
 argument. I didn't find
 >  one. I
 think it is what you most
 >  >
 often
 >  need, because it allows groups
 with zero length content.
 >  >
 >  > I made CutN as an
 >  illustration:
 > 
 >
 >  >
 > 
 CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i.
 ])"0
 >  ])@:[ {&.>/ [: <
 ]
 >  >
 >  > 
   (i.0) CutN i.0
 >  >
 >  >    (,0) CutN i.0
 >  > ┌┐
 >  >
 ││
 >  > └┘
 >  >    (,1) CutN
 >  10+i.1
 >  >
 ┌──┐
 >  > │10│
 >  >
 > 
 └──┘
 >  >    0 2 CutN
 10+i.2
 >  >
 ┌┬─────┐
 >  >
 ││10 11│
 >  >
 >  └┴─────┘
 >  >    2 5 0
 > 
 CutN 10+i.7
 >  >
 > 
 ┌─────┬──────────────┬┐
 >  > │10 11│12 13 14 15 16││
 >  >
 > 
 └─────┴──────────────┴┘
 >  >    0 7 0 CutN 10+i.7
 >  >
 > 
 ┌┬────────────────────┬┐
 >  > ││10 11 12 13 14 15 16││
 >  >
 > 
 └┴────────────────────┴┘
 >  >
 >  >
 Cheers,
 >  >
 > 
 > Erling Hellenäs
 >  >
 >  >
 >  >
 >  Den 2017-10-20 kl. 10:42, skrev 'Jon
 Hough' via
 >  Programming:
 >  >> The problem:
 >  >> Let X be an array.
 >  >> X=: i. 50 NB.  example
 >  >>
 >  >>
 Let
 >  'split' be the
 percentages that each subarray takes
 > 
 from X,
 >  >> sequentially
 >  >> e.g
 > 
 >> split =:
 >  0.35 0.35 0.3 NB.
 first array takes 35% , second sub array
 >
 >  >> takes 
 35%, third takes 30%
 >  >> So in
 the end
 >  >>
 >  >> My
 > 
 solution
 >  >>
 >  >>
 > 
 splitSubs =:
 > 
 -.~&.>/\@:(i.&.>"0@:<"0)@:}.@:>.@:((+/\
 >  - ])@:[ (* , ])
 > 
 >> #@:])
 >  >>
 >  >> split
 > 
 splitSubs X
 >  >>
 >  >>
 >  >>
 This gives 3
 >  boxed arrays. Each array
 holds the indices to take from
 >  X.
 >  >>
 > 
 >>
 >  There is a slight problem in
 that the first and second
 > 
 subarrays
 >  >> have different
 >  >> length, due to rounding error.
 I am
 >  not too bothered about that
 >  >> since,
 > 
 depending on the size of X and the percentages, this is
 >  >> unavoidable.
 >  >>
 >  >>
 Any more
 >  succinct, nicer
 solutions?
 >  >>
 > 
 ----------------------------------------------------------------------
 >  >> For information about J forums
 see http://www.jsoftware.com/forums.htm
 >  >
 >  >
 > 
 ----------------------------------------------------------------------
 >  > For information about J forums see
 http://www.jsoftware.com/forums.htm
 >
 > 
 ----------------------------------------------------------------------
 >  For information about J forums see http://www.jsoftware.com/forums.htm
 >
 ----------------------------------------------------------------------
 > For information about J forums see http://www.jsoftware.com/forums.htm
 >
 ----------------------------------------------------------------------
 For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to