txt here is a set of lines from your example with trailing ... removed;
here are the first two:
    ,.2{.txt
+----------------------------------------------------------------------+
|bell 0.0264 -0.2927 -0.0254 -0.1034 0.1672 -0.0440 -0.0019 0.1210     |
+----------------------------------------------------------------------+
|bell_tower -0.1252 -0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 -0.0237 |
+----------------------------------------------------------------------+

This separates words from numerica vectors of arbitrary length:
   ( i.&' ' ({.;0 ". }.)] ) every txt
+----------+-----------------------------------------------------------+
|bell      |0.0264 _0.2927 _0.0254 _0.1034 0.1672 _0.044 _0.0019 0.121 |
+----------+-----------------------------------------------------------+
|bell_tower|_0.1252 _0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 _0.0237 |
+----------+-----------------------------------------------------------+
|belt      |0.1332 0.0142 _0.1208 _0.0574 0.1451 _0.0731 _0.1293 0.0855|
+----------+-----------------------------------------------------------+
|belfast   |0.119 _0.044 _0.0254 _0.209 0.2144 0.0348 _0.1467 0.1256   |
+----------+-----------------------------------------------------------+

It should be easy enough to split off the first column as a word-list,
and the second as a vector of vectors.

OK?

Mike






On 21/02/2018 08:36, Skip Cave wrote:
I read in a text file of word vectors using fread. The format looks like
this:

bell 0.0264 -0.2927 -0.0254 -0.1034 0.1672 -0.0440 -0.0019 0.1210 ...

bell_tower -0.1252 -0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 -0.0237 ...

belt 0.1332 0.0142 -0.1208 -0.0574 0.1451 -0.0731 -0.1293 0.0855 ...

belfast 0.1190 -0.0440 -0.0254 -0.2090 0.2144 0.0348 -0.1467 0.1256 ...

Everything is literal text.

The basic layout for each line is:

word(s) (could contain multiple words separated by underscores)
space
number (positive or negative) in text format
space
number (positive or negative) in text format
space
......   repeat for 300 numbers (in text)

the last number is followed by a line feed for the next line

I need to:
1. Convert all the the high minuses (-) to J's low minus (_)
2. Extract the word(s) up to the first space into a separate array (words)
3. Convert the text numbers into a 2D array of ? x 300 floating point
numbers

I know how to do #1 (string replace), and #3 (".) once I get rid of the
words,
but I don't know how to strip out the initial word on each line and put
them in a separate array. Any help is appreciated.

Skip
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to