Hi Samuel,

Please check test code here below, comparing csvRead vs mfscanf and fscanfMat 
for the asci format used by Philipp and a file with 50,000 lines of data.
On my laptop it takes about 35 s to run mainly because of evstr function, which 
is avoided in the mfscanf and fscanfMat methods as shown.

// Simple test of mfscanf, fscanfMat and csvRead methods
txt = [ "HEADER-Line",
"01.12.2015, 01:15:00.12, 1.1, -2.2"];

u = mopen("myfile.txt","w");
mfprintf(u,"%s\n",repmat(txt(2),50000,1));  //output file with 50,000 lines


// SOLUTION#1: mfscanf
u = mopen("myfile.txt","r");
h = mfscanf(1,u,"%s\n");
r = mfscanf(-1,u,"%d.%d.%d, %d:%d:%d.%d, %f, %f\n");
r = r(:,:);  //to convert from mlist of ctype to matrix of constant type
t1 = timer();

// SOLUTION#2: fscanfMat
u = mopen("myfile.txt","r");
tx = mgetl(u,-1)
tx = tx(2:$);              // get rid of header line
tx1 = part(tx,1:24);  // get date and time
tx2 = part(tx,25:$);  // get numeric data values
// Now get rid of separators:
tx1 = strsubst(tx1,'.',' ');
tx1 = strsubst(tx1,':',' ');
tx1 = strsubst(tx1,',',' ');
tx2 = strsubst(tx2,',',' ');
tx = tx1 + tx2; // regroups all data but now with numeric values only
fd = mopen("temp.bak","w");
m = fscanfMat('temp.bak')
t2 = timer();

// SOLUTION#3: csvRead
q = csvRead("myfile.txt",",",[],"string",[":", ","],[],[],1);
tx1 = q(:,1);
tx2 = q(:,2:$);
q2 = evstr(tx2);  // Most time consuming step
// (plus, work will still be required to handle dates in txt1)
t3 = timer();

disp( [tx1(1,:) string(q2(1,:))], r(1,:), m(1,:) )
printf("\ntime1= %g\ntime2= %g\ntime3= %g",t1,t2,t3)

The results for a 50,000-lines input ASCII file are:
   time1= 0.686404
   time2= 0.499203
   time3= 35.3966


From: users [mailto:users-boun...@lists.scilab.org] On Behalf Of Samuel Gougeon
Sent: Saturday, October 15, 2016 7:36 PM
To: Users mailing list for Scilab <users@lists.scilab.org>
Subject: Re: [Scilab-users] using csvRead

Le 15/10/2016 19:16, Rafael Guerra a écrit :
Hi Samuel,
As the data is loaded by csvRead as strings in the example below (if loading as 
doubles then we get NaN's), it will require further processing to convert it to 
numeric (using evstr, tokens or other).
For very large data files, this seems to be rather slow compared to the mfscanf 
or fscanfMat solutions.
What do you think?
AFAIK, fscanfMat() is very stiff. It can parse files only for numbers, with no 
interstitial contents.
I know no benchmark comparing csvRead() + evstr() vs mfscanf(). Despite evstr() 
is vectorized, you may be right. Explicit results would be interesting.
mfscanf() requires the structure of a row been explicitly known. But then it 
looks certainly the most versatile and adaptable solution to read and split it.
csvRead() requires just to know the separator.

users mailing list

Reply via email to