Another suggestion using some of J's in-built utilities

dat=: freads 'yourfile.txt'

labels=: <@(' '&taketo);._2 dat
numbers=: _ ". (' '&takeafter);._2 dat

HTH
Ric

On Wed, Feb 21, 2018 at 9:57 PM, 'Mike Day' via Programming <
[email protected]> wrote:

> txt here is a set of lines from your example with trailing ... removed;
> here are the first two:
>     ,.2{.txt
> +----------------------------------------------------------------------+
> |bell 0.0264 -0.2927 -0.0254 -0.1034 0.1672 -0.0440 -0.0019 0.1210     |
> +----------------------------------------------------------------------+
> |bell_tower -0.1252 -0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 -0.0237 |
> +----------------------------------------------------------------------+
>
> This separates words from numerica vectors of arbitrary length:
>    ( i.&' ' ({.;0 ". }.)] ) every txt
> +----------+-----------------------------------------------------------+
> |bell      |0.0264 _0.2927 _0.0254 _0.1034 0.1672 _0.044 _0.0019 0.121 |
> +----------+-----------------------------------------------------------+
> |bell_tower|_0.1252 _0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 _0.0237 |
> +----------+-----------------------------------------------------------+
> |belt      |0.1332 0.0142 _0.1208 _0.0574 0.1451 _0.0731 _0.1293 0.0855|
> +----------+-----------------------------------------------------------+
> |belfast   |0.119 _0.044 _0.0254 _0.209 0.2144 0.0348 _0.1467 0.1256   |
> +----------+-----------------------------------------------------------+
>
> It should be easy enough to split off the first column as a word-list,
> and the second as a vector of vectors.
>
> OK?
>
> Mike
>
>
>
>
>
>
>
> On 21/02/2018 08:36, Skip Cave wrote:
>
>> I read in a text file of word vectors using fread. The format looks like
>> this:
>>
>> bell 0.0264 -0.2927 -0.0254 -0.1034 0.1672 -0.0440 -0.0019 0.1210 ...
>>
>> bell_tower -0.1252 -0.1233 0.1351 0.1897 0.0242 0.0014 0.1942 -0.0237 ...
>>
>> belt 0.1332 0.0142 -0.1208 -0.0574 0.1451 -0.0731 -0.1293 0.0855 ...
>>
>> belfast 0.1190 -0.0440 -0.0254 -0.2090 0.2144 0.0348 -0.1467 0.1256 ...
>>
>> Everything is literal text.
>>
>> The basic layout for each line is:
>>
>> word(s) (could contain multiple words separated by underscores)
>> space
>> number (positive or negative) in text format
>> space
>> number (positive or negative) in text format
>> space
>> ......   repeat for 300 numbers (in text)
>>
>> the last number is followed by a line feed for the next line
>>
>> I need to:
>> 1. Convert all the the high minuses (-) to J's low minus (_)
>> 2. Extract the word(s) up to the first space into a separate array (words)
>> 3. Convert the text numbers into a 2D array of ? x 300 floating point
>> numbers
>>
>> I know how to do #1 (string replace), and #3 (".) once I get rid of the
>> words,
>> but I don't know how to strip out the initial word on each line and put
>> them in a separate array. Any help is appreciated.
>>
>> Skip
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to