So words should be a list instead of a one column table. So we would have
   words&i.
instead of
   (,words)&i.

Correct? Doesn't the raveling prevent sharing of the contents of words in
the new verb?

And perhaps get should be
get=:13 : 'words&i.boxopen y'
instead of
get=:13 : 'words i.boxopen y'

Does the building of the hash table require that i. be bound to the left
argument with & or will it still build the hash table only once in the
tacit definition where i. is in the dyadic form where the & is not there?

It would probably be safer to put the & in.

On Wed, Feb 21, 2018 at 11:00 AM, Henry Rich <henryhr...@gmail.com> wrote:

> I don't think this prescription is accurate.  When m&i. is executed to
> create a fast search verb, the value of m is put into the new verb.  If m
> is a name, the value of the name is NOT copied, but instead referred to.
> If the name m is subsequently reassigned, the old value is retained,
> referred to by the m&i. verb, and the new value is assigned to the name m.
>
> So, deleting words will not actually free any memory.  On the other hand,
> executing words&i. didn't consume any memory either.
>
> (this is all from memory & I haven't checked it with 7!:2)
>
> Henry Rich
>
>
> On 2/21/2018 12:08 PM, Don Guinn wrote:
>
>> Defining a verb get to retrieve the index of the desired word as tacit
>> does
>> make get pretty much unreadable; however, there is a possible performance
>> gain as the hash table for i. gets built only once when get is defined. If
>> you will be running get many times this could result in a significant
>> performance gain.
>>
>> Of course, once read in words must not be modified without rebuilding get.
>> But if it turns out that you don't need words for anything else than in
>> get
>> then you could erase words after get is defined so storage used by a big
>> verb is offset by not having words around any more.
>>
>> On Wed, Feb 21, 2018 at 9:31 AM, R.E. Boss <r.e.b...@outlook.com> wrote:
>>
>>   vec {~ (<'adults') i.~ words
>>> is perhaps what you are looking for
>>>
>>>
>>> R.E. Boss
>>>
>>>
>>> -----Original Message-----
>>>> From: Programming [mailto:programming-boun...@forums.jsoftware.com]
>>>> On Behalf Of Skip Cave
>>>> Sent: woensdag 21 februari 2018 17:09
>>>> To: programm...@jsoftware.com
>>>> Subject: Re: [Jprogramming] File Cleanup
>>>>
>>>> Thanks to Raul and Mike for the suggestions.
>>>>
>>>> I read in the data:
>>>>
>>>>
>>>> nb =: <'C:\numberbatch-en.txt'
>>>>
>>>> nbs =. fread nb
>>>>
>>>>
>>>> Then I tried to clean it up:
>>>>
>>>>
>>>> Mike's method ran out of memory:
>>>>
>>>> nbs4 =. ( i.&' ' ({.;0 ". }.)] ) every nbs
>>>>
>>>> |out of memory
>>>>
>>>> When I tried to run it on a smaller set:
>>>>
>>>> nbs4=: (i.&' '({.;0".}.)])every 100000{. nbs
>>>>
>>>> nbs4
>>>>
>>>> ...
>>>>
>>>> │0││
>>>>
>>>> ├─┼┤
>>>>
>>>> │0││
>>>>
>>>> ├─┼┤
>>>>
>>>> │3││
>>>>
>>>> ├─┼┤
>>>>
>>>> │5││
>>>>
>>>> ├─┼┤
>>>>
>>>> │ ││
>>>>
>>>> ├─┼┤
>>>>
>>>> │0││
>>>>
>>>> ├─┼┤
>>>>
>>>> │.││
>>>>
>>>> ├─┼┤
>>>>
>>>> │0││
>>>>
>>>> ├─┼┤
>>>>
>>>> │7││
>>>>
>>>> ├─┼┤
>>>>
>>>> │8││
>>>>
>>>> ├─┼┤
>>>>
>>>> │2││
>>>>
>>>> ├─┼┤
>>>>
>>>> So that wasn't working for me.
>>>>
>>>> I tried Raul's suggestion:
>>>>
>>>> words=. <@({.~ i.&' ');._2 nbs
>>>>
>>>> vec =. 0 1 }. _&".;._2 nbs
>>>>
>>>>
>>>> $words
>>>>
>>>> 417195
>>>>
>>>>
>>>> Looking good....
>>>>
>>>>
>>>> ,.20{. 6000}. words
>>>>
>>>> ┌────────────┐
>>>>
>>>> │adultly │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adultness │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adultoid │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adultress │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adults │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adultship │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adulty │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adumbral │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adumbrant │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adumbrate │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adumbrated │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adumbrates │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adumbrating │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adumbration │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adumbrations│
>>>>
>>>> ├────────────┤
>>>>
>>>> │adumbrative │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adunation │
>>>>
>>>> ├────────────┤
>>>>
>>>> │adunc │
>>>>
>>>> ├────────────┤
>>>>
>>>> │aduncate │
>>>>
>>>> ├────────────┤
>>>>
>>>> │aduncity │
>>>>
>>>> └────────────┘
>>>>
>>>> $vec
>>>>
>>>> 417195 300
>>>>
>>>> 3 {. }.vec
>>>>
>>>> _0.0264 0.0468 _0.0099 _0.0242 _0.0762 0.0562 0.0863 0.0115 _0.0471
>>>>
>>> 0.0442
>>>
>>>> _0.0875 0.0376 _0.0404 _0.0086 0.0161 _0.1689 0.1485 _0.0201 0.1021
>>>>
>>> _0.0635
>>>
>>>> _0.0317 0.0142 0.0588 _0.1299 _0.0905 0.0389 _0.0452 0.1352 0.0731
>>>> 0.0648
>>>> 0.1309 0.0493 0.0785 0.015...
>>>>
>>>> _0.0096 0.0318 _0.0095 _0.042 _0.0831 0.1103 0.075 0.024 _0.0237 0.0398
>>>> _0.1274 _0.0299 _0.0209 _0.0195 _0.0043 _0.1033 0.1378 _0.0499 0.0517
>>>> _0.0958 _0.0651 0.0214 0.0096 _0.0855 _0.1049 0.036 _0.0562 0.043 0.0616
>>>> 0.1124 0.152 0.0418 0.0628 _0.018...
>>>>
>>>> _0.0364 0.0254 _0.0448 _0.0327 _0.0712 0.1548 0.1004 0.0033 _0.039
>>>> 0.0635
>>>> _0.1179 _0.0703 _0.0359 0.0296 _0.0594 _0.0954 0.1904 _0.0301 0.0078
>>>> _0.0607 _0.0344 0.034 _0.0059 _0.1453 _0.0429 _0.0061 _0.05 0.0377
>>>> 0.0959
>>>> 0.1313 0.1238 0.0302 0.0043 _0.038...
>>>>
>>>>
>>>> So this looks good!
>>>>
>>>>
>>>> Now I need a verb that will let me specify a word, and it will return
>>>> the
>>>> associated vector.
>>>>
>>>> Here's how it should work:
>>>>
>>>>
>>>> tst =. get 'adults'
>>>>
>>>>
>>>> tst
>>>>
>>>> 0.1144 0.0444 0.0574 0.0387 0.082 _0.0271 0.209 _0.006 _0.1896 0.1038
>>>> _0.0257 0.0646 0.0488 _0.0065 0.0486 0.0422 0.0239 _0.1006 _0.0541
>>>> 0.0511
>>>> _0.0254 _0.0121 0.0216 0.0324 _0.1349 0.0237 0.0049 0.0061 0.0349
>>>> _0.0264
>>>> 0.0086 0.0919 _0.0174 0.0645 ...
>>>>
>>>>
>>>> To build the 'get' verb we need to try to find the location of the word
>>>>
>>> 'adults'
>>>
>>>> in the boxed words array:
>>>>
>>>> 'adults' = each words
>>>>
>>>> |length error
>>>>
>>>> | 'adults' =each words
>>>>
>>>>
>>>> Nope, that didn't work... Do I need to box the word?
>>>>
>>>>
>>>> (<'adults')=each words
>>>>
>>>> |length error
>>>>
>>>> | (<'adults') =each words
>>>>
>>>>
>>>> Nope! How do I find a specific word in the boxed word array?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Skip Cave
>>>> Cave Consulting LLC
>>>>
>>>> On Wed, Feb 21, 2018 at 2:36 AM, Skip Cave <s...@caveconsulting.com>
>>>> wrote:
>>>>
>>>> I read in a text file of word vectors using fread. The format looks
>>>>> like
>>>>> this:
>>>>>
>>>>> bell 0.0264 -0.2927 -0.0254 -0.1034 0.1672 -0.0440 -0.0019 0.1210 ...
>>>>>
>>>>> bell_tower -0.1252 -0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 -0.0237
>>>>>
>>>> ...
>>>
>>>> belt 0.1332 0.0142 -0.1208 -0.0574 0.1451 -0.0731 -0.1293 0.0855 ...
>>>>>
>>>>> belfast 0.1190 -0.0440 -0.0254 -0.2090 0.2144 0.0348 -0.1467 0.1256 ...
>>>>>
>>>>> Everything is literal text.
>>>>>
>>>>> The basic layout for each line is:
>>>>>
>>>>> word(s) (could contain multiple words separated by underscores) space
>>>>> number (positive or negative) in text format space number (positive or
>>>>> negative) in text format space
>>>>> ......   repeat for 300 numbers (in text)
>>>>>
>>>>> the last number is followed by a line feed for the next line
>>>>>
>>>>> I need to:
>>>>> 1. Convert all the the high minuses (-) to J's low minus (_) 2.
>>>>> Extract the word(s) up to the first space into a separate array
>>>>> (words) 3. Convert the text numbers into a 2D array of ? x 300
>>>>> floating point numbers
>>>>>
>>>>> I know how to do #1 (string replace), and #3 (".) once I get rid of
>>>>> the words, but I don't know how to strip out the initial word on each
>>>>> line and put them in a separate array. Any help is appreciated.
>>>>>
>>>>> Skip
>>>>>
>>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to