Re: [Scilab-users] [EXT] parsing TSV (or CSV) file with scilab is a nightmare

2020-04-28 Thread Rafael Guerra
Antoine,

One workflow that works fast for me, for large data files, is to load first the 
whole file with mgetl, then remove all empty lines using isempty in a loop (as 
shown below), process the header block, isolate the data block and save it to a 
temporary backup file to disk using mputl, then load very efficiently from disk 
that backup file using fscanfMat.

tlines= mgetl(fid,-1);  // reads lines until end of file into 1 column text 
vector
bool= ~cellfun(isempty,tlines);
tlines= tlines(bool);// removes empty lines

function out_text=cellfun(fun, in_text)
// Applies function to input text (column strings vector), line by line
  n=size(in_text,1);
  for i=1:n;
 out_text(i)=fun(in_text(i));
  end
endfunction


Regards,
Rafael
___
users mailing list
users@lists.scilab.org
http://lists.scilab.org/mailman/listinfo/users


Re: [Scilab-users] [EXT] parsing TSV (or CSV) file with scilab is a nightmare

2020-04-27 Thread Antoine Monmayrant

Hello Adrian,


In essence, your extremely useful solution is similar to what Samuel and 
Jan proposed: grab the whole file once.
I must admit I did not even consider it given the length of the files 
involved and how easily I managed to crash scilab on small files.



Thanks,


Antoine

On 27/04/2020 18:58, Adrian Weeks wrote:


Hi Antoine,

I often have to read csv files with odd lines that trip functions like 
csvRead so I often use the method below.  It may solve your problem.


dataread = mgetl(readfile); // Read everything

a = [];

b = [];

…

for i = 1: size(dataread, 'r') do

line = dataread(i);

if length(line) ~= 0 
then // Ignore blank lines


line = tokens(line, [' ', ',', ascii(9)]);    // Accept 
spaces, commas or tabs


if and(isnum(line)) then    // If the line 
is all-numeric


line = strtod(line);

a = [a; line(1)];

b = [b; line(2)];

…

end

end

end

Adrian Weeks
Development Engineer, Hardware Engineering EMEA
Office: +44 (0)2920 528500 | Desk: +44 (0)2920 528523 | Fax: +44 
(0)2920 520178

awe...@hidglobal.com 

HID Global Logo 

Unit 3, Cae Gwyrdd,
Green meadow Springs,
Cardiff, UK,
CF15 7AB.
www.hidglobal.com 

*From:*users  *On Behalf Of *Antoine 
Monmayrant

*Sent:* 27 April 2020 16:41
*To:* Users mailing list for Scilab 
*Subject:* [EXT] [Scilab-users] parsing TSV (or CSV) file with scilab 
is a nightmare


 Please use caution this is an externally originating email. *** ***

Hi all,

This is both a rant and desperate cry for help.
I'm trying to parse some TSV data (tab separated data file) with 
scilab and I cannot find a way to navigate around the minefield of 
bugs present in meof/mgetl/mgetstr/csvRead.


A bit of context: I need to load into scilab data generated by a 
closed source software.
The data is in the form of many TSV files (that I cannot share in 
full, just some redacted bits) with a header and a footer.
I don't want to hand modify these files or edit them in any way (I 
need to keep this as portable as possible, so no sed/awk/grep...)



OPTION 1: csvRead

That's the most intuitive solution, however, because of 
http://bugzilla.scilab.org/show_bug.cgi?id=16391 
 
and the presence of more than 1 empty line in my header/footer, this 
crashes Scilab.



OPTION 2: hand parsing line by line using mgetl/meof

I tried:

filename="tsv.txt";
[fd, err] = mopen(filename, 'rt');
while ~meof(fd) do
    txtline=mgetl(fd,1);
end
mclose(fd)

Saddly, and contrary to what's written in "help mgetl", meof keeps on 
returning 0, well passed the end of the file and the while never ends!



OPTION 3: hand parsing chunk by chunk using mgetstr/meof

"help meof" does not confirm that meof should work with mgetl, but 
mgetstr is specifically listed.

I thus tried:

filename="tsv.txt";
[fd, err] = mopen(filename, 'rt');
while ~meof(fd) do
    txtchunk=mgetstr(80,fd);
end
mclose(fd)

But thanks to http://bugzilla.scilab.org/show_bug.cgi?id=16419 
 
this is also crashing Scilab.



OPTION 4: Can anyone here help me with this?

I am really running out of ideas.
Did I miss some -hmm- obvious combination of available file parsing 
scilab functions to achieve my goal?
I have the feeling that it would have been faster for me to just learn 
a totally new language that does not suck at parsing files than trying 
to get it to work with scilab


Antoine

(depressed)

http://bugzilla.scilab.org/show_bug.cgi?id=16419 




___
users mailing list
users@lists.scilab.org
http://lists.scilab.org/mailman/listinfo/users
___
users mailing list
users@lists.scilab.org
http://lists.scilab.org/mailman/listinfo/users


Re: [Scilab-users] [EXT] parsing TSV (or CSV) file with scilab is a nightmare

2020-04-27 Thread Adrian Weeks
Hi Antoine,

I often have to read csv files with odd lines that trip functions like csvRead 
so I often use the method below.  It may solve your problem.

dataread = mgetl(readfile); 
// Read everything
a = [];
b = [];
…
for i = 1: size(dataread, 'r') do
line = dataread(i);
if length(line) ~= 0 then   
  // Ignore blank lines
line = tokens(line, [' ', ',', 
ascii(9)]);// Accept spaces, commas or tabs
if and(isnum(line)) then
// If the line is all-numeric
line = 
strtod(line);
a = [a; 
line(1)];
b = [b; 
line(2)];
…
end
end
end


Adrian Weeks
Development Engineer, Hardware Engineering EMEA
Office: +44 (0)2920 528500 | Desk: +44 (0)2920 528523 | Fax: +44 (0)2920 520178
awe...@hidglobal.com
[HID Global Logo]
Unit 3, Cae Gwyrdd,
Green meadow Springs,
Cardiff, UK,
CF15 7AB.
www.hidglobal.com


From: users  On Behalf Of Antoine Monmayrant
Sent: 27 April 2020 16:41
To: Users mailing list for Scilab 
Subject: [EXT] [Scilab-users] parsing TSV (or CSV) file with scilab is a 
nightmare

*** Please use caution this is an externally originating email. ***

Hi all,



This is both a rant and desperate cry for help.
I'm trying to parse some TSV data (tab separated data file) with scilab and I 
cannot find a way to navigate around the minefield of bugs present in 
meof/mgetl/mgetstr/csvRead.

A bit of context: I need to load into scilab data generated by a closed source 
software.
The data is in the form of many TSV files (that I cannot share in full, just 
some redacted bits) with a header and a footer.
I don't want to hand modify these files or edit them in any way (I need to keep 
this as portable as possible, so no sed/awk/grep...)

OPTION 1: csvRead

That's the most intuitive solution, however, because of 
http://bugzilla.scilab.org/show_bug.cgi?id=16391
 and the presence of more than 1 empty line in my header/footer, this crashes 
Scilab.

OPTION 2: hand parsing line by line using mgetl/meof

I tried:

filename="tsv.txt";
[fd, err] = mopen(filename, 'rt');
while ~meof(fd) do
txtline=mgetl(fd,1);
end
mclose(fd)

Saddly, and contrary to what's written in "help mgetl", meof keeps on returning 
0, well passed the end of the file and the while never ends!

OPTION 3: hand parsing chunk by chunk using mgetstr/meof

"help meof" does not confirm that meof should work with mgetl, but mgetstr is 
specifically listed.
I thus tried:

filename="tsv.txt";
[fd, err] = mopen(filename, 'rt');
while ~meof(fd) do
txtchunk=mgetstr(80,fd);
end
mclose(fd)

But thanks to 
http://bugzilla.scilab.org/show_bug.cgi?id=16419
 this is also crashing Scilab.



OPTION 4: Can anyone here help me with this?

I am really running out of ideas.
Did I miss some -hmm- obvious combination of available file parsing scilab 
functions to achieve my goal?
I have the feeling that it would have been faster for me to just learn a 
totally new language that does not suck at parsing files than trying to get it 
to work with scilab



Antoine

(depressed)





http://bugzilla.scilab.org/show_bug.cgi?id=16419
___
users mailing list
users@lists.scilab.org
http://lists.scilab.org/mailman/listinfo/users