How about defining get as get=:13 : '(,words)i.boxopen y'
Then it can take a single word unboxed or a boxed list. On Wed, Feb 21, 2018 at 10:18 AM, Raul Miller <rauldmil...@gmail.com> wrote: > That's an interesting point. > > That said, if you give get a large list of words to look up, it's the > sort of issue which might be buried in everything else that's going on > (the cost per word gets divided by the number of words being looked up > at once). > > Thanks, > > -- > Raul > > > On Wed, Feb 21, 2018 at 12:08 PM, Don Guinn <dongu...@gmail.com> wrote: > > Defining a verb get to retrieve the index of the desired word as tacit > does > > make get pretty much unreadable; however, there is a possible performance > > gain as the hash table for i. gets built only once when get is defined. > If > > you will be running get many times this could result in a significant > > performance gain. > > > > Of course, once read in words must not be modified without rebuilding > get. > > But if it turns out that you don't need words for anything else than in > get > > then you could erase words after get is defined so storage used by a big > > verb is offset by not having words around any more. > > > > On Wed, Feb 21, 2018 at 9:31 AM, R.E. Boss <r.e.b...@outlook.com> wrote: > > > >> vec {~ (<'adults') i.~ words > >> is perhaps what you are looking for > >> > >> > >> R.E. Boss > >> > >> > >> > -----Original Message----- > >> > From: Programming [mailto:programming-boun...@forums.jsoftware.com] > >> > On Behalf Of Skip Cave > >> > Sent: woensdag 21 februari 2018 17:09 > >> > To: programm...@jsoftware.com > >> > Subject: Re: [Jprogramming] File Cleanup > >> > > >> > Thanks to Raul and Mike for the suggestions. > >> > > >> > I read in the data: > >> > > >> > > >> > nb =: <'C:\numberbatch-en.txt' > >> > > >> > nbs =. fread nb > >> > > >> > > >> > Then I tried to clean it up: > >> > > >> > > >> > Mike's method ran out of memory: > >> > > >> > nbs4 =. ( i.&' ' ({.;0 ". }.)] ) every nbs > >> > > >> > |out of memory > >> > > >> > When I tried to run it on a smaller set: > >> > > >> > nbs4=: (i.&' '({.;0".}.)])every 100000{. nbs > >> > > >> > nbs4 > >> > > >> > ... > >> > > >> > │0││ > >> > > >> > ├─┼┤ > >> > > >> > │0││ > >> > > >> > ├─┼┤ > >> > > >> > │3││ > >> > > >> > ├─┼┤ > >> > > >> > │5││ > >> > > >> > ├─┼┤ > >> > > >> > │ ││ > >> > > >> > ├─┼┤ > >> > > >> > │0││ > >> > > >> > ├─┼┤ > >> > > >> > │.││ > >> > > >> > ├─┼┤ > >> > > >> > │0││ > >> > > >> > ├─┼┤ > >> > > >> > │7││ > >> > > >> > ├─┼┤ > >> > > >> > │8││ > >> > > >> > ├─┼┤ > >> > > >> > │2││ > >> > > >> > ├─┼┤ > >> > > >> > So that wasn't working for me. > >> > > >> > I tried Raul's suggestion: > >> > > >> > words=. <@({.~ i.&' ');._2 nbs > >> > > >> > vec =. 0 1 }. _&".;._2 nbs > >> > > >> > > >> > $words > >> > > >> > 417195 > >> > > >> > > >> > Looking good.... > >> > > >> > > >> > ,.20{. 6000}. words > >> > > >> > ┌────────────┐ > >> > > >> > │adultly │ > >> > > >> > ├────────────┤ > >> > > >> > │adultness │ > >> > > >> > ├────────────┤ > >> > > >> > │adultoid │ > >> > > >> > ├────────────┤ > >> > > >> > │adultress │ > >> > > >> > ├────────────┤ > >> > > >> > │adults │ > >> > > >> > ├────────────┤ > >> > > >> > │adultship │ > >> > > >> > ├────────────┤ > >> > > >> > │adulty │ > >> > > >> > ├────────────┤ > >> > > >> > │adumbral │ > >> > > >> > ├────────────┤ > >> > > >> > │adumbrant │ > >> > > >> > ├────────────┤ > >> > > >> > │adumbrate │ > >> > > >> > ├────────────┤ > >> > > >> > │adumbrated │ > >> > > >> > ├────────────┤ > >> > > >> > │adumbrates │ > >> > > >> > ├────────────┤ > >> > > >> > │adumbrating │ > >> > > >> > ├────────────┤ > >> > > >> > │adumbration │ > >> > > >> > ├────────────┤ > >> > > >> > │adumbrations│ > >> > > >> > ├────────────┤ > >> > > >> > │adumbrative │ > >> > > >> > ├────────────┤ > >> > > >> > │adunation │ > >> > > >> > ├────────────┤ > >> > > >> > │adunc │ > >> > > >> > ├────────────┤ > >> > > >> > │aduncate │ > >> > > >> > ├────────────┤ > >> > > >> > │aduncity │ > >> > > >> > └────────────┘ > >> > > >> > $vec > >> > > >> > 417195 300 > >> > > >> > 3 {. }.vec > >> > > >> > _0.0264 0.0468 _0.0099 _0.0242 _0.0762 0.0562 0.0863 0.0115 _0.0471 > >> 0.0442 > >> > _0.0875 0.0376 _0.0404 _0.0086 0.0161 _0.1689 0.1485 _0.0201 0.1021 > >> _0.0635 > >> > _0.0317 0.0142 0.0588 _0.1299 _0.0905 0.0389 _0.0452 0.1352 0.0731 > 0.0648 > >> > 0.1309 0.0493 0.0785 0.015... > >> > > >> > _0.0096 0.0318 _0.0095 _0.042 _0.0831 0.1103 0.075 0.024 _0.0237 > 0.0398 > >> > _0.1274 _0.0299 _0.0209 _0.0195 _0.0043 _0.1033 0.1378 _0.0499 0.0517 > >> > _0.0958 _0.0651 0.0214 0.0096 _0.0855 _0.1049 0.036 _0.0562 0.043 > 0.0616 > >> > 0.1124 0.152 0.0418 0.0628 _0.018... > >> > > >> > _0.0364 0.0254 _0.0448 _0.0327 _0.0712 0.1548 0.1004 0.0033 _0.039 > 0.0635 > >> > _0.1179 _0.0703 _0.0359 0.0296 _0.0594 _0.0954 0.1904 _0.0301 0.0078 > >> > _0.0607 _0.0344 0.034 _0.0059 _0.1453 _0.0429 _0.0061 _0.05 0.0377 > 0.0959 > >> > 0.1313 0.1238 0.0302 0.0043 _0.038... > >> > > >> > > >> > So this looks good! > >> > > >> > > >> > Now I need a verb that will let me specify a word, and it will return > the > >> > associated vector. > >> > > >> > Here's how it should work: > >> > > >> > > >> > tst =. get 'adults' > >> > > >> > > >> > tst > >> > > >> > 0.1144 0.0444 0.0574 0.0387 0.082 _0.0271 0.209 _0.006 _0.1896 0.1038 > >> > _0.0257 0.0646 0.0488 _0.0065 0.0486 0.0422 0.0239 _0.1006 _0.0541 > 0.0511 > >> > _0.0254 _0.0121 0.0216 0.0324 _0.1349 0.0237 0.0049 0.0061 0.0349 > _0.0264 > >> > 0.0086 0.0919 _0.0174 0.0645 ... > >> > > >> > > >> > To build the 'get' verb we need to try to find the location of the > word > >> 'adults' > >> > in the boxed words array: > >> > > >> > 'adults' = each words > >> > > >> > |length error > >> > > >> > | 'adults' =each words > >> > > >> > > >> > Nope, that didn't work... Do I need to box the word? > >> > > >> > > >> > (<'adults')=each words > >> > > >> > |length error > >> > > >> > | (<'adults') =each words > >> > > >> > > >> > Nope! How do I find a specific word in the boxed word array? > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > Skip Cave > >> > Cave Consulting LLC > >> > > >> > On Wed, Feb 21, 2018 at 2:36 AM, Skip Cave <s...@caveconsulting.com> > >> > wrote: > >> > > >> > > I read in a text file of word vectors using fread. The format looks > >> > > like > >> > > this: > >> > > > >> > > bell 0.0264 -0.2927 -0.0254 -0.1034 0.1672 -0.0440 -0.0019 0.1210 > ... > >> > > > >> > > bell_tower -0.1252 -0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 > -0.0237 > >> ... > >> > > > >> > > belt 0.1332 0.0142 -0.1208 -0.0574 0.1451 -0.0731 -0.1293 0.0855 ... > >> > > > >> > > belfast 0.1190 -0.0440 -0.0254 -0.2090 0.2144 0.0348 -0.1467 0.1256 > ... > >> > > > >> > > Everything is literal text. > >> > > > >> > > The basic layout for each line is: > >> > > > >> > > word(s) (could contain multiple words separated by underscores) > space > >> > > number (positive or negative) in text format space number (positive > or > >> > > negative) in text format space > >> > > ...... repeat for 300 numbers (in text) > >> > > > >> > > the last number is followed by a line feed for the next line > >> > > > >> > > I need to: > >> > > 1. Convert all the the high minuses (-) to J's low minus (_) 2. > >> > > Extract the word(s) up to the first space into a separate array > >> > > (words) 3. Convert the text numbers into a 2D array of ? x 300 > >> > > floating point numbers > >> > > > >> > > I know how to do #1 (string replace), and #3 (".) once I get rid of > >> > > the words, but I don't know how to strip out the initial word on > each > >> > > line and put them in a separate array. Any help is appreciated. > >> > > > >> > > Skip > >> > > > >> > ------------------------------------------------------------ > ---------- > >> > For information about J forums see http://www.jsoftware.com/ > forums.htm > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > >> > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm