So words should be a list instead of a one column table. So we would have words&i. instead of (,words)&i.
Correct? Doesn't the raveling prevent sharing of the contents of words in the new verb? And perhaps get should be get=:13 : 'words&i.boxopen y' instead of get=:13 : 'words i.boxopen y' Does the building of the hash table require that i. be bound to the left argument with & or will it still build the hash table only once in the tacit definition where i. is in the dyadic form where the & is not there? It would probably be safer to put the & in. On Wed, Feb 21, 2018 at 11:00 AM, Henry Rich <henryhr...@gmail.com> wrote: > I don't think this prescription is accurate. When m&i. is executed to > create a fast search verb, the value of m is put into the new verb. If m > is a name, the value of the name is NOT copied, but instead referred to. > If the name m is subsequently reassigned, the old value is retained, > referred to by the m&i. verb, and the new value is assigned to the name m. > > So, deleting words will not actually free any memory. On the other hand, > executing words&i. didn't consume any memory either. > > (this is all from memory & I haven't checked it with 7!:2) > > Henry Rich > > > On 2/21/2018 12:08 PM, Don Guinn wrote: > >> Defining a verb get to retrieve the index of the desired word as tacit >> does >> make get pretty much unreadable; however, there is a possible performance >> gain as the hash table for i. gets built only once when get is defined. If >> you will be running get many times this could result in a significant >> performance gain. >> >> Of course, once read in words must not be modified without rebuilding get. >> But if it turns out that you don't need words for anything else than in >> get >> then you could erase words after get is defined so storage used by a big >> verb is offset by not having words around any more. >> >> On Wed, Feb 21, 2018 at 9:31 AM, R.E. Boss <r.e.b...@outlook.com> wrote: >> >> vec {~ (<'adults') i.~ words >>> is perhaps what you are looking for >>> >>> >>> R.E. Boss >>> >>> >>> -----Original Message----- >>>> From: Programming [mailto:programming-boun...@forums.jsoftware.com] >>>> On Behalf Of Skip Cave >>>> Sent: woensdag 21 februari 2018 17:09 >>>> To: programm...@jsoftware.com >>>> Subject: Re: [Jprogramming] File Cleanup >>>> >>>> Thanks to Raul and Mike for the suggestions. >>>> >>>> I read in the data: >>>> >>>> >>>> nb =: <'C:\numberbatch-en.txt' >>>> >>>> nbs =. fread nb >>>> >>>> >>>> Then I tried to clean it up: >>>> >>>> >>>> Mike's method ran out of memory: >>>> >>>> nbs4 =. ( i.&' ' ({.;0 ". }.)] ) every nbs >>>> >>>> |out of memory >>>> >>>> When I tried to run it on a smaller set: >>>> >>>> nbs4=: (i.&' '({.;0".}.)])every 100000{. nbs >>>> >>>> nbs4 >>>> >>>> ... >>>> >>>> │0││ >>>> >>>> ├─┼┤ >>>> >>>> │0││ >>>> >>>> ├─┼┤ >>>> >>>> │3││ >>>> >>>> ├─┼┤ >>>> >>>> │5││ >>>> >>>> ├─┼┤ >>>> >>>> │ ││ >>>> >>>> ├─┼┤ >>>> >>>> │0││ >>>> >>>> ├─┼┤ >>>> >>>> │.││ >>>> >>>> ├─┼┤ >>>> >>>> │0││ >>>> >>>> ├─┼┤ >>>> >>>> │7││ >>>> >>>> ├─┼┤ >>>> >>>> │8││ >>>> >>>> ├─┼┤ >>>> >>>> │2││ >>>> >>>> ├─┼┤ >>>> >>>> So that wasn't working for me. >>>> >>>> I tried Raul's suggestion: >>>> >>>> words=. <@({.~ i.&' ');._2 nbs >>>> >>>> vec =. 0 1 }. _&".;._2 nbs >>>> >>>> >>>> $words >>>> >>>> 417195 >>>> >>>> >>>> Looking good.... >>>> >>>> >>>> ,.20{. 6000}. words >>>> >>>> ┌────────────┐ >>>> >>>> │adultly │ >>>> >>>> ├────────────┤ >>>> >>>> │adultness │ >>>> >>>> ├────────────┤ >>>> >>>> │adultoid │ >>>> >>>> ├────────────┤ >>>> >>>> │adultress │ >>>> >>>> ├────────────┤ >>>> >>>> │adults │ >>>> >>>> ├────────────┤ >>>> >>>> │adultship │ >>>> >>>> ├────────────┤ >>>> >>>> │adulty │ >>>> >>>> ├────────────┤ >>>> >>>> │adumbral │ >>>> >>>> ├────────────┤ >>>> >>>> │adumbrant │ >>>> >>>> ├────────────┤ >>>> >>>> │adumbrate │ >>>> >>>> ├────────────┤ >>>> >>>> │adumbrated │ >>>> >>>> ├────────────┤ >>>> >>>> │adumbrates │ >>>> >>>> ├────────────┤ >>>> >>>> │adumbrating │ >>>> >>>> ├────────────┤ >>>> >>>> │adumbration │ >>>> >>>> ├────────────┤ >>>> >>>> │adumbrations│ >>>> >>>> ├────────────┤ >>>> >>>> │adumbrative │ >>>> >>>> ├────────────┤ >>>> >>>> │adunation │ >>>> >>>> ├────────────┤ >>>> >>>> │adunc │ >>>> >>>> ├────────────┤ >>>> >>>> │aduncate │ >>>> >>>> ├────────────┤ >>>> >>>> │aduncity │ >>>> >>>> └────────────┘ >>>> >>>> $vec >>>> >>>> 417195 300 >>>> >>>> 3 {. }.vec >>>> >>>> _0.0264 0.0468 _0.0099 _0.0242 _0.0762 0.0562 0.0863 0.0115 _0.0471 >>>> >>> 0.0442 >>> >>>> _0.0875 0.0376 _0.0404 _0.0086 0.0161 _0.1689 0.1485 _0.0201 0.1021 >>>> >>> _0.0635 >>> >>>> _0.0317 0.0142 0.0588 _0.1299 _0.0905 0.0389 _0.0452 0.1352 0.0731 >>>> 0.0648 >>>> 0.1309 0.0493 0.0785 0.015... >>>> >>>> _0.0096 0.0318 _0.0095 _0.042 _0.0831 0.1103 0.075 0.024 _0.0237 0.0398 >>>> _0.1274 _0.0299 _0.0209 _0.0195 _0.0043 _0.1033 0.1378 _0.0499 0.0517 >>>> _0.0958 _0.0651 0.0214 0.0096 _0.0855 _0.1049 0.036 _0.0562 0.043 0.0616 >>>> 0.1124 0.152 0.0418 0.0628 _0.018... >>>> >>>> _0.0364 0.0254 _0.0448 _0.0327 _0.0712 0.1548 0.1004 0.0033 _0.039 >>>> 0.0635 >>>> _0.1179 _0.0703 _0.0359 0.0296 _0.0594 _0.0954 0.1904 _0.0301 0.0078 >>>> _0.0607 _0.0344 0.034 _0.0059 _0.1453 _0.0429 _0.0061 _0.05 0.0377 >>>> 0.0959 >>>> 0.1313 0.1238 0.0302 0.0043 _0.038... >>>> >>>> >>>> So this looks good! >>>> >>>> >>>> Now I need a verb that will let me specify a word, and it will return >>>> the >>>> associated vector. >>>> >>>> Here's how it should work: >>>> >>>> >>>> tst =. get 'adults' >>>> >>>> >>>> tst >>>> >>>> 0.1144 0.0444 0.0574 0.0387 0.082 _0.0271 0.209 _0.006 _0.1896 0.1038 >>>> _0.0257 0.0646 0.0488 _0.0065 0.0486 0.0422 0.0239 _0.1006 _0.0541 >>>> 0.0511 >>>> _0.0254 _0.0121 0.0216 0.0324 _0.1349 0.0237 0.0049 0.0061 0.0349 >>>> _0.0264 >>>> 0.0086 0.0919 _0.0174 0.0645 ... >>>> >>>> >>>> To build the 'get' verb we need to try to find the location of the word >>>> >>> 'adults' >>> >>>> in the boxed words array: >>>> >>>> 'adults' = each words >>>> >>>> |length error >>>> >>>> | 'adults' =each words >>>> >>>> >>>> Nope, that didn't work... Do I need to box the word? >>>> >>>> >>>> (<'adults')=each words >>>> >>>> |length error >>>> >>>> | (<'adults') =each words >>>> >>>> >>>> Nope! How do I find a specific word in the boxed word array? >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Skip Cave >>>> Cave Consulting LLC >>>> >>>> On Wed, Feb 21, 2018 at 2:36 AM, Skip Cave <s...@caveconsulting.com> >>>> wrote: >>>> >>>> I read in a text file of word vectors using fread. The format looks >>>>> like >>>>> this: >>>>> >>>>> bell 0.0264 -0.2927 -0.0254 -0.1034 0.1672 -0.0440 -0.0019 0.1210 ... >>>>> >>>>> bell_tower -0.1252 -0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 -0.0237 >>>>> >>>> ... >>> >>>> belt 0.1332 0.0142 -0.1208 -0.0574 0.1451 -0.0731 -0.1293 0.0855 ... >>>>> >>>>> belfast 0.1190 -0.0440 -0.0254 -0.2090 0.2144 0.0348 -0.1467 0.1256 ... >>>>> >>>>> Everything is literal text. >>>>> >>>>> The basic layout for each line is: >>>>> >>>>> word(s) (could contain multiple words separated by underscores) space >>>>> number (positive or negative) in text format space number (positive or >>>>> negative) in text format space >>>>> ...... repeat for 300 numbers (in text) >>>>> >>>>> the last number is followed by a line feed for the next line >>>>> >>>>> I need to: >>>>> 1. Convert all the the high minuses (-) to J's low minus (_) 2. >>>>> Extract the word(s) up to the first space into a separate array >>>>> (words) 3. Convert the text numbers into a 2D array of ? x 300 >>>>> floating point numbers >>>>> >>>>> I know how to do #1 (string replace), and #3 (".) once I get rid of >>>>> the words, but I don't know how to strip out the initial word on each >>>>> line and put them in a separate array. Any help is appreciated. >>>>> >>>>> Skip >>>>> >>>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>> >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >>> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > > --- > This email has been checked for viruses by AVG. > http://www.avg.com > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm