Joe, you suggest that every player should only appear in the master table
once. If that is the case I think you should be able to simply use the
primitive i. .
m1=: mids i. sids
This gives the same answer as your
m=:(3 : 'search y') each Ssids
except that m1 isn't boxed. i.e.
m1 -: ; m
20 (6!:2) 'mids i. sids'
0.00967119
The only reason that these give a different answer to your idx verb is
because there are two entries in the master table for the playerID
<''baezda01'. This appears to be an error in the data because they are in
fact different players.
0 1 4 16 17 {"1 mcols, 458 459 { mdata
┌────────┬────────┬─────────┬─────────┬────────┐
│lahmanID│playerID│birthYear│nameFirst│nameLast│
├────────┼────────┼─────────┼─────────┼────────┤
│459 │baezda01│1977 │Danys │Baez │
├────────┼────────┼─────────┼─────────┼────────┤
│460 │baezda01│1953 │Jose │Baez │
└────────┴────────┴─────────┴─────────┴────────┘
HTH
On Wed, Oct 9, 2013 at 3:34 PM, Joe Bogner <[email protected]> wrote:
> Thanks everyone and thank you for the feedback Dan. Yes, I've been at it a
> little over a month. I'm working with it nearly every day for a few hours
> most days. I spend probably 50% of my days in R, Excel and SQL so I'm
> always looking for a new trick or new approach even if it's not a vastly
> superior approach.
>
> I have one more example to share that rounds out the slicing and dicing
> theme. Like many things the first time I try them with J, I thought it
> would take 15 minutes and ended up turning into several hours.
>
> Earlier I defined an idx verb: idx=:13 : '(; > L:0 I. each <"1 (x =/ y))',
> which Pascal and Ric offered improvements to.
>
> I was excited to use it to join tables. It worked very well on small
> tables. A common problem with the type of my daily data analysis is to
> merge tables together. The baseball example was perfect.
>
>
> idx=:13 : '(; > L:0 I. each <"1 (x =/ y))'
>
>
> 'mcols mdata'=. split readcsv '/temp/Master.csv'
>
> 'scols sdata'=. split readcsv '/temp/Salaries.csv'
>
>
> mids=. (mcols i. <'playerID') {"1 mdata
>
> sids=. (scols i. <'playerID') {"1 sdata
>
>
> NB. Ouch, way too slow. It ends up building a 23141 18125 array
>
> NB. (6!:2) '(sids idx mids)'
>
> NB. 54.3933
>
>
> NB. Also too slow.
>
> NB. (6!:2) '(3 : ''mids I. @:= <y'') each sids'
>
> NB. 43.321
>
>
> NB. Getting better. I don't need every index in the master since a player
> should only be there once. Still too slow
> NB. (6!:2) '(3 : ''mids i. <y'') each sids'
>
> NB. 18.0614
>
>
> NB. Try a fast search http://www.jsoftware.com/help/release/midot.htm
>
> NB. search =: mids&i.
>
> NB. (6!:2) ']m=.(3 : ''search <y'') each sids'
>
> NB. 9.76146
>
>
> NB. Can we do better by by sorting?
>
> NB. sorttbl=: ]/:{"1 NB. http://www.jsoftware.com/jwiki/JPhrases/Sorting
>
> NB. midxtbl =. 1 sorttbl (> (3 : 'y;(y{"1 mids)' ) each i. # mids)
>
> NB. sidxtbl =. 1 sorttbl (> (3 : 'y;(y{"1 sids)' ) each i. # sids)
>
>
> NB. sortedMids=. 1{"1 midxtbl
>
> NB. sortedSids=. 1{"1 midxtbl
>
>
> NB. That was a lot of work for 2 seconds, not to mention another level of
> indirection
>
> NB. 6!:2 'm=.(3 : ''search <y'') each sortedSids'
>
> NB. 7.2621
>
> This is where I went off the rails. I started to look into hash tables and
> sparse arrays. Lesson re-learned -- if a language is older than X years,
> then it's likely it doesn't need what you think it does to solve the
> problem.
>
> Manually implementing hash tables is no fun and stealing the hash key from
> a symbol table isn't very fast either.
>
> That reminded me of symbols though, which I had experimented with already.
>
> Final solution - use symbols: Boxed strings are slow for comparison. I knew
> that, but it didn't occur to me.
>
>
>
> Smids=. s:mids
>
> Ssids=. s:sids
>
> search =: Smids&i.
>
>
>
> NB. I had to run it a few times to make sure I wasn't missing something
>
> NB. BLAZING FAST
>
> 6!:2 'm=.(3 : ''search y'') each Ssids'
>
> NB. 0.031005
>
>
> Now that we have our merge index, we can do some awesome things
>
>
>
> msd=. (m { mdata),.sdata
>
> msc=. mcols,scols
>
>
> playerSalaries=.(('playerID';'nameFirst';'yearID';'salary') idx msc) {"1
> msd
>
> hankOrTommy=.(('Hank';'Tommy') idx ((msc i.<'nameFirst') {"1 msd)) { msd
>
> vert =: |:@:(<"_1&>)
>
> sumit=:13 : '(~.x);(i.~x) +/ /. y'
>
> ]vert (> (msc i.<'nameFirst'){"1 hankOrTommy) sumit (".>((msc
> i.<'salary'){"1 hankOrTommy))
>
> ┌─────┬────────┐
>
> │Hank │22366500│
>
> ├─────┼────────┤
>
> │Tommy│12724010│
>
> └─────┴────────┘
>
>
>
> NB. Sum by Name and Year
>
> yearkey=: > ((msc i.<'nameFirst'){"1 hankOrTommy) , each ((msc
> i.<'yearID'){"1 hankOrTommy)
>
> 3 {. vert yearkey sumit (".>((msc i.<'salary'){"1 hankOrTommy))
>
>
> ┌─────────┬──────┐
>
> │Hank2002 │200000│
>
> ├─────────┼──────┤
>
> │Hank2003 │302500│
>
> ├─────────┼──────┤
>
> │Hank2004 │550000│
>
> └─────────┴──────┘
>
>
>
> Sidenote: symbols are great for keeping memory down too. By default each
> string is a copy in memory. Symbols keeps only a single copy and references
> a hash table, like R's string table.
> http://cran.r-project.org/doc/manuals/R-ints.html#The-CHARSXP-cache. You
> can use symbols instead of strings when doing reading from files if you
> manually read them using cut fread, etc. I'll write that up some time.
>
>
> On Tue, Oct 8, 2013 at 9:38 AM, Pascal Jasmin <[email protected]
> >wrote:
>
> > The only hacky part is that you have included a list inside your verb.
> >
> > If you just wanted indices:
> >
> > findI =. (I.leaf @:<@:E.)"0 1
> >
> > ('a';'b';'q') findI ('a';'b';'z';'c';'a';'q')
> > ┌───┬─┬─┐
> >
> > │0 4│1│5│
> > └───┴─┴─┘
> >
> > if that seems noisy to you, consider:
> > ('a';'b';'q') (I.)"0 1 ('a';'b';'z';'c';'a';'q')
> > 0 1 1 1 0 1
> > 1 0 1 1 1 1
> > 1 1 1 1 1 0
> >
> > E. is basically -.@:I, here. if you haven't used it before.
> >
> >
> > ----- Original Message -----
> >
> > Is there a better way to find indices of a subset within a greater list?
> > This is my hacky solution
> >
> >
> > find=.('a';'b';'q')
> >
> > list=.('a';'b';'z';'c';'a';'q')
> >
> >
> > ] (; > L:0 (3 : '(<y) ([: I. =) list' ) each find) { list
> >
> > ┌─┬─┬─┬─┐
> >
> > │a│a│b│q│
> >
> > └─┴─┴─┴─┘
> >
> >
> > find xxx list
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm