Hi all !

New splitSub with Distribute and CutN.

Project:

CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i. ])"0 ])@:[ {&.>/ [: < ]

Distribute=: 4 : 'rr+(/:s){($s=.\:r-rr){.(d=.y-+/rr=.<.!.0 r=.x*y)$1'

splitSubsE=: ([ Distribute #@:]) CutN ]

split =: 0.35 0.35 0.3

(i.0) splitSubsE i.0

split splitSubsE i.0

split splitSubsE i.1

split splitSubsE i.2

split splitSubsE i.3

split splitSubsE i.4


Output:

   (i.0) splitSubsE i.0

   split splitSubsE i.0
┌┬┬┐
││││
└┴┴┘
   split splitSubsE i.1
┌─┬┬┐
│0│││
└─┴┴┘
   split splitSubsE i.2
┌─┬─┬┐
│0│1││
└─┴─┴┘
   split splitSubsE i.3
┌─┬─┬─┐
│0│1│2│
└─┴─┴─┘
   split splitSubsE i.4
┌───┬─┬─┐
│0 1│2│3│
└───┴─┴─┘

Cheers,


Erling Hellenäs





On 2017-10-21 14:09, 'Jon Hough' via Programming wrote:
Hi Skip,

I am not doing machine learning in J per se, only toy programs. But I am doing 
some data
prep for some kind of data analysis, which will be done probably in Python. So 
I am just looking
to slice and dice the data in J.

I think J has an ML lab, which you might find interesting. Doing any kind of 
NLP means you
should use a battle tested library, and the only one I am aware of is NLTK 
http://www.nltk.org/ .

Thanks,
Jon
--------------------------------------------
On Sat, 10/21/17, 'Skip Cave' via Programming <[email protected]> wrote:

  Subject: Re: [Jprogramming] Splitting an Array into several arrays
  To: "[email protected]" <[email protected]>
  Date: Saturday, October 21, 2017, 7:03 AM
Jon, "training set", "validation
  set", "test set". Sounds like you are
  working
  on a machine learning project to
  me.
Are there any demos,
  labs, or articles on machine learning techniques in J?
  Things like gradient decent, word2vec,
  paragraph vectors, etc? I have a big
  project
  that calls for these kinds of NLP ML tools, and I hate that
  I will
  have to learn Python, or worse yet
  Java Script, in order to do the project.
Skip Skip
  Cave
  Cave Consulting LLC
On Fri, Oct 20, 2017 at 8:14 AM, 'Jon
  Hough' via Programming <
  [email protected]>
  wrote:
> Hi,
  >
  > What I am really
  after is a verb that splits by percentage. To give a
  > concrete uses case:
  > I
  have a dataset, which I wish to split into training set,
  validation set,
  > and testing set.
  >
  > I want 35% of the
  datapoints to go in the training set,
  >
  35% go in the validation set,
  > the rest
  go in the test set. (Just example numbers).
  >
  >
  >
  No need to worry about shuffling, randomizing etc, I am
  assuming the data
  > is sufficiently
  random.
  > As Raul said, I can simplify
  slightly by just using the size of the
  >
  dataset as the right argument.
  >
  >
  --------------------------------------------
  > On Fri, 10/20/17, Erling Hellenäs <[email protected]>
  wrote:
  >
  >  Subject:
  Re: [Jprogramming] Splitting an Array into several arrays
  >  To: [email protected]
  >  Date: Friday, October 20, 2017, 10:06
  PM
  >
  >  Hi all !
  >
  >  A splitSubs with
  CutN could possibly look like
  >
  this:
  >
  >
  splitSubsE=: ([ (([:
  >  # [) {. ]) ([:
  <. 0.5 + [: }: [ * [: # ]) ( [ , ([:
  >  # ]) - [: +/ [) ]) CutN ]
  >
  >      (i.0)
  splitSubsE i.0
  >
  >
      (,55) splitSubsE ,5
  >
  ┌─┐
  >  │5│
  >  └─┘
  >
  split
  >  splitSubsE i.0
  >  ┌┬┬┐
  >
  ││││
  >  └┴┴┘
  >      split splitSubsE i.1
  >  ┌┬┬─┐
  >
  │││0│
  >  └┴┴─┘
  >      split
  >
  splitSubsE i.2
  >  ┌─┬─┬┐
  >  │0│1││
  >
  └─┴─┴┘
  >      split
  >  splitSubsE i.3
  >
  ┌─┬─┬─┐
  >
  │0│1│2│
  >
  └─┴─┴─┘
  >
  >  split splitSubsE i.4
  >  ┌─┬─┬───┐
  >  │0│1│2 3│
  >
  └─┴─┴───┘
  >
  >  Cheers
  >
  >  Erling
  >
  Hellenäs
  >
  >
  >  Den 2017-10-20 kl. 14:11, skrev
  Erling
  >  Hellenäs:
  >  > Hi all!
  >
  >
  >  > I looked for a
  >  version of Cut which takes the number of
  items in each
  >  > group as left
  argument. I didn't find
  >  one. I
  think it is what you most
  >  >
  often
  >  need, because it allows groups
  with zero length content.
  >  >
  >  > I made CutN as an
  >  illustration:
  >
  >
  >  >
  >
  CutN=:((# {. 0 , [: }: [: +/\ ])([: < [ + [: i.
  ])"0
  >  ])@:[ {&.>/ [: <
  ]
  >  >
  >  >
    (i.0) CutN i.0
  >  >
  >  >    (,0) CutN i.0
  >  > ┌┐
  >  >
  ││
  >  > └┘
  >  >    (,1) CutN
  >  10+i.1
  >  >
  ┌──┐
  >  > │10│
  >  >
  >
  └──┘
  >  >    0 2 CutN
  10+i.2
  >  >
  ┌┬─────┐
  >  >
  ││10 11│
  >  >
  >  └┴─────┘
  >  >    2 5 0
  >
  CutN 10+i.7
  >  >
  >
  ┌─────┬──────────────┬┐
  >  > │10 11│12 13 14 15 16││
  >  >
  >
  └─────┴──────────────┴┘
  >  >    0 7 0 CutN 10+i.7
  >  >
  >
  ┌┬────────────────────┬┐
  >  > ││10 11 12 13 14 15 16││
  >  >
  >
  └┴────────────────────┴┘
  >  >
  >  >
  Cheers,
  >  >
  >
  > Erling Hellenäs
  >  >
  >  >
  >  >
  >  Den 2017-10-20 kl. 10:42, skrev 'Jon
  Hough' via
  >  Programming:
  >  >> The problem:
  >  >> Let X be an array.
  >  >> X=: i. 50 NB.  example
  >  >>
  >  >>
  Let
  >  'split' be the
  percentages that each subarray takes
  >
  from X,
  >  >> sequentially
  >  >> e.g
  >
  >> split =:
  >  0.35 0.35 0.3 NB.
  first array takes 35% , second sub array
  >
  >  >> takes
  35%, third takes 30%
  >  >> So in
  the end
  >  >>
  >  >> My
  >
  solution
  >  >>
  >  >>
  >
  splitSubs =:
  >
  -.~&.>/\@:(i.&.>"0@:<"0)@:}.@:>.@:((+/\
  >  - ])@:[ (* , ])
  >
  >> #@:])
  >  >>
  >  >> split
  >
  splitSubs X
  >  >>
  >  >>
  >  >>
  This gives 3
  >  boxed arrays. Each array
  holds the indices to take from
  >  X.
  >  >>
  >
  >>
  >  There is a slight problem in
  that the first and second
  >
  subarrays
  >  >> have different
  >  >> length, due to rounding error.
  I am
  >  not too bothered about that
  >  >> since,
  >
  depending on the size of X and the percentages, this is
  >  >> unavoidable.
  >  >>
  >  >>
  Any more
  >  succinct, nicer
  solutions?
  >  >>
  >
  ----------------------------------------------------------------------
  >  >> For information about J forums
  see http://www.jsoftware.com/forums.htm
  >  >
  >  >
  >
  ----------------------------------------------------------------------
  >  > For information about J forums see
  http://www.jsoftware.com/forums.htm
  >
  >
  ----------------------------------------------------------------------
  >  For information about J forums see http://www.jsoftware.com/forums.htm
  >
  ----------------------------------------------------------------------
  > For information about J forums see http://www.jsoftware.com/forums.htm
  >
  ----------------------------------------------------------------------
  For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to