Re: [Jprogramming] File Cleanup

Henry Rich Wed, 21 Feb 2018 10:01:05 -0800

I don't think this prescription is accurate. When m&i. is executed tocreate a fast search verb, the value of m is put into the new verb. Ifm is a name, the value of the name is NOT copied, but instead referredto. If the name m is subsequently reassigned, the old value isretained, referred to by the m&i. verb, and the new value is assigned tothe name m.

So, deleting words will not actually free any memory. On the otherhand, executing words&i. didn't consume any memory either.


(this is all from memory & I haven't checked it with 7!:2)

Henry Rich

On 2/21/2018 12:08 PM, Don Guinn wrote:

Defining a verb get to retrieve the index of the desired word as tacit does
make get pretty much unreadable; however, there is a possible performance
gain as the hash table for i. gets built only once when get is defined. If
you will be running get many times this could result in a significant
performance gain.

Of course, once read in words must not be modified without rebuilding get.
But if it turns out that you don't need words for anything else than in get
then you could erase words after get is defined so storage used by a big
verb is offset by not having words around any more.

On Wed, Feb 21, 2018 at 9:31 AM, R.E. Boss <[email protected]> wrote:

  vec {~ (<'adults') i.~ words
is perhaps what you are looking for


R.E. Boss

-----Original Message-----
From: Programming [mailto:[email protected]]
On Behalf Of Skip Cave
Sent: woensdag 21 februari 2018 17:09
To: [email protected]
Subject: Re: [Jprogramming] File Cleanup

Thanks to Raul and Mike for the suggestions.

I read in the data:


nb =: <'C:\numberbatch-en.txt'

nbs =. fread nb


Then I tried to clean it up:


Mike's method ran out of memory:

nbs4 =. ( i.&' ' ({.;0 ". }.)] ) every nbs

|out of memory

When I tried to run it on a smaller set:

nbs4=: (i.&' '({.;0".}.)])every 100000{. nbs

nbs4

...

│0││

├─┼┤

│0││

├─┼┤

│3││

├─┼┤

│5││

├─┼┤

│ ││

├─┼┤

│0││

├─┼┤

│.││

├─┼┤

│0││

├─┼┤

│7││

├─┼┤

│8││

├─┼┤

│2││

├─┼┤

So that wasn't working for me.

I tried Raul's suggestion:

words=. <@({.~ i.&' ');._2 nbs

vec =. 0 1 }. _&".;._2 nbs


$words

417195


Looking good....


,.20{. 6000}. words

┌────────────┐

│adultly │

├────────────┤

│adultness │

├────────────┤

│adultoid │

├────────────┤

│adultress │

├────────────┤

│adults │

├────────────┤

│adultship │

├────────────┤

│adulty │

├────────────┤

│adumbral │

├────────────┤

│adumbrant │

├────────────┤

│adumbrate │

├────────────┤

│adumbrated │

├────────────┤

│adumbrates │

├────────────┤

│adumbrating │

├────────────┤

│adumbration │

├────────────┤

│adumbrations│

├────────────┤

│adumbrative │

├────────────┤

│adunation │

├────────────┤

│adunc │

├────────────┤

│aduncate │

├────────────┤

│aduncity │

└────────────┘

$vec

417195 300

3 {. }.vec

_0.0264 0.0468 _0.0099 _0.0242 _0.0762 0.0562 0.0863 0.0115 _0.0471

0.0442

_0.0875 0.0376 _0.0404 _0.0086 0.0161 _0.1689 0.1485 _0.0201 0.1021

_0.0635

_0.0317 0.0142 0.0588 _0.1299 _0.0905 0.0389 _0.0452 0.1352 0.0731 0.0648
0.1309 0.0493 0.0785 0.015...

_0.0096 0.0318 _0.0095 _0.042 _0.0831 0.1103 0.075 0.024 _0.0237 0.0398
_0.1274 _0.0299 _0.0209 _0.0195 _0.0043 _0.1033 0.1378 _0.0499 0.0517
_0.0958 _0.0651 0.0214 0.0096 _0.0855 _0.1049 0.036 _0.0562 0.043 0.0616
0.1124 0.152 0.0418 0.0628 _0.018...

_0.0364 0.0254 _0.0448 _0.0327 _0.0712 0.1548 0.1004 0.0033 _0.039 0.0635
_0.1179 _0.0703 _0.0359 0.0296 _0.0594 _0.0954 0.1904 _0.0301 0.0078
_0.0607 _0.0344 0.034 _0.0059 _0.1453 _0.0429 _0.0061 _0.05 0.0377 0.0959
0.1313 0.1238 0.0302 0.0043 _0.038...


So this looks good!


Now I need a verb that will let me specify a word, and it will return the
associated vector.

Here's how it should work:


tst =. get 'adults'


tst

0.1144 0.0444 0.0574 0.0387 0.082 _0.0271 0.209 _0.006 _0.1896 0.1038
_0.0257 0.0646 0.0488 _0.0065 0.0486 0.0422 0.0239 _0.1006 _0.0541 0.0511
_0.0254 _0.0121 0.0216 0.0324 _0.1349 0.0237 0.0049 0.0061 0.0349 _0.0264
0.0086 0.0919 _0.0174 0.0645 ...


To build the 'get' verb we need to try to find the location of the word

'adults'

in the boxed words array:

'adults' = each words

|length error

| 'adults' =each words


Nope, that didn't work... Do I need to box the word?


(<'adults')=each words

|length error

| (<'adults') =each words


Nope! How do I find a specific word in the boxed word array?








Skip Cave
Cave Consulting LLC

On Wed, Feb 21, 2018 at 2:36 AM, Skip Cave <[email protected]>
wrote:

I read in a text file of word vectors using fread. The format looks
like
this:

bell 0.0264 -0.2927 -0.0254 -0.1034 0.1672 -0.0440 -0.0019 0.1210 ...

bell_tower -0.1252 -0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 -0.0237

...

belt 0.1332 0.0142 -0.1208 -0.0574 0.1451 -0.0731 -0.1293 0.0855 ...

belfast 0.1190 -0.0440 -0.0254 -0.2090 0.2144 0.0348 -0.1467 0.1256 ...

Everything is literal text.

The basic layout for each line is:

word(s) (could contain multiple words separated by underscores) space
number (positive or negative) in text format space number (positive or
negative) in text format space
......   repeat for 300 numbers (in text)

the last number is followed by a line feed for the next line

I need to:
1. Convert all the the high minuses (-) to J's low minus (_) 2.
Extract the word(s) up to the first space into a separate array
(words) 3. Convert the text numbers into a 2D array of ? x 300
floating point numbers

I know how to do #1 (string replace), and #3 (".) once I get rid of
the words, but I don't know how to strip out the initial word on each
line and put them in a separate array. Any help is appreciated.

Skip

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm



---
This email has been checked for viruses by AVG.
http://www.avg.com

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] File Cleanup

Reply via email to