This was my original code which ran `forever'. Were the amendments
truly in place? One array is sparse, the other pre-allocated. Comments
show observations on task manager. I don't have much experience with
Windows beyond the usual Office programs. (PS. I now realize I could
have discarded the "T" and stored the remaining number in the sparse array.)
NB. According to task manager, 7GByte free (of 16 GB)
NB. and j process steadily fluctuated by 10 Mbytes.
NB. is this a mapped file issue on Windows 10
NB. or were the amendments not in place?
NB. file detail
NB. fields into rows * columns
NB. 67653078 *inv 1183748 * 2141
NB. 37.4618 percent filled
NB. c:/Users/user/Downloads/j904_win64/j904/bin/jconsole.exe
NB. JVERSION
NB. Engine: j904/j64avx/windows
NB. Beta-e: commercial/2022-07-16T19:25:02
NB. Library: 9.04.03
NB. Platform: Win 64
NB. Installer: J904 install
NB. InstallPath: c:/users/user/downloads/j904_win64/j904
NB. Contact: www.jsoftware.com
require 'jmf'
testfile=:'c:/Users/user/temp/tc.csv'
datafile=:'c:/Users/user/ZW/kaggle.com/bosch-production-line-performance/train_categorical.csv'
NB. INF {~ 0 indexes rows
NB. gets data of first row
indexes=: (>:@{. + [: i.@<: -~/)@({ ~ 0 1&+)~
tokenize=: 3 :0 NB. y is the literal
rows=. _1 , I. LF = y
row_tally=. <: # rows
row=. col=. 0
k=. _1 NB. current data index
col_tally=. >: +/ ',' = y {~ 0 indexes rows NB. tally of columns
data=: a: $~ col_tally + +/ 'T' = y NB. columns + those with data,
skipping ID
NB. coor shall be sparse
NB. coor=. ((<: # rows) , col_tally) $ _1
coor=: 1 $. ((<: # rows) , col_tally) ; 0 1 ; _1 NB. coordinates of data
while. row < 9 >. row_tally do.
fields=. ([: <;._2 ,&',') y {~ row indexes rows
cols=. }. I. a: ~: fields NB. indexes of data in row excluding ID
po=. (>: k) + i. # cols NB. positions of these items in data
co=. < row ; cols NB. location in sparse array to store po
da=. cols { fields
NB. coor is sparse
coor=: po co} coor NB.NB.NB. assignments in place?
NB. data is preallocated
data=: da po} data NB.NB.NB. assignments in place?
k=. k + # cols
row=. >: row
end.
'data and coor are global'
)
JCHAR map_jmf_'INF';testfile ] datafile
tokenize INF
unmap_jmf_'INF'
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm