Re: [Scilab-users] Why so slow?
Hi The code you provide does not run on my sci 5.5.0 anyway, i would read only the once the matrix, as a string, and then use csvTextScan on mydat(:,2:6) to convert that part to double. Taht already saves time. in your loop i believe strtod is extremely slow. Again, use csvtextscan instead. Hope this helps Adrien On 20/05/2014 14:44, Richard Llom wrote: Hello, I need to read in a csv of about 360.000 lines with date and numerical values. Attached is a sample excerpt of that file. So far I did: CODE // read in tic mydat = csvRead('dat04-2011.csv', ';', ',', 'double', [], [], [], 6); toc (= 5,213 secs) mydat = mydat(:,2:6); tic mystring = csvRead('dat04-2011.csv', ';', ',', 'string', [], [], [], 6); toc (= 3,077 secs) mystring = mystring(:,1); tic for i=1:size(mydat,1) mydate(i,:) = strtod(strsplit(mystring(i,1),['.';' ';':']))'; end toc (= 186,473 secs) CODE (I filled in the toc values). As you can see this is unfortunately very slow. The read in of the csv, but especially the for loop. So I have several question: 1) Is there a faster way to read in the csv? Note that I need the 'header' option. 2) Instead of the loop I would like to use mydate = strtod(strsplit(mystring(:,1),['.';' ';':']))'; but this doesn't work. Is there another way to avoid the loop? 3) The raw csv file is around 15MB, but when I want to read it in the second time, Scilab says this will exceed the stacksize. Which is default by 76MB. So I don't quite understand how two times the 15MB file takes so much memory? I raised the stacksize now, but I would rather like not to. Any help is appreciated. Thanks! Richard ___ users mailing list users@lists.scilab.org http://lists.scilab.org/mailman/listinfo/users -- Adrien Vogt-Schilb Consultant (World Bank) and PhD Candidate (Cired) 1 202 473 7980 ___ users mailing list users@lists.scilab.org http://lists.scilab.org/mailman/listinfo/users
Re: [Scilab-users] Why so slow?
I had a whole slew of CSV problems too - over a year ago. Guess they haven't been fixed yet. Michael Dunn | Editor, EDN Design Ideas <http://www.edn.com/design-ideas/all> PCB <http://www.edn.com/design/pc-board>, IC/FPGA <http://www.edn.com/design/integrated-circuit-design> & Medical <http://www.edn.com/design/medical> Design Centers (519) 744-9395 (Canada) (226) 336-6033 (Mobile) (415) 947-6096 (USA) EDN Profile <http://edn.com/user/Michael%20Dunn> | LinkedIn <http://www.linkedin.com/profile/view?id=29419994> | Skype: MichaelDunn_UBM <http://www.tech.ubm.com/> -Original Message- From: Richard Llom Reply-To: "International users mailing list for Scilab." Date: Tuesday, May 20, 2014 2:44 PM To: "users@lists.scilab.org" Subject: [Scilab-users] Why so slow? >Hello, >I need to read in a csv of about 360.000 lines with date and numerical >values. Attached is a sample excerpt of that file. > >So far I did: > CODE > >// read in >tic >mydat = csvRead('dat04-2011.csv', ';', ',', 'double', [], [], [], 6); >toc (= 5,213 secs) >mydat = mydat(:,2:6); >tic >mystring = csvRead('dat04-2011.csv', ';', ',', 'string', [], [], [], 6); >toc (= 3,077 secs) >mystring = mystring(:,1); > > >tic >for i=1:size(mydat,1) >mydate(i,:) = strtod(strsplit(mystring(i,1),['.';' ';':']))'; >end >toc (= 186,473 secs) > > > CODE >(I filled in the toc values). > > >As you can see this is unfortunately very slow. The read in of the csv, >but >especially the for loop. > >So I have several question: > >1) >Is there a faster way to read in the csv? Note that I need the 'header' >option. > >2) >Instead of the loop I would like to use >mydate = strtod(strsplit(mystring(:,1),['.';' ';':']))'; >but this doesn't work. Is there another way to avoid the loop? > >3) >The raw csv file is around 15MB, but when I want to read it in the second >time, Scilab says this will exceed the stacksize. Which is default by >76MB. >So I don't quite understand how two times the 15MB file takes so much >memory? I raised the stacksize now, but I would rather like not to. > > >Any help is appreciated. >Thanks! >Richard ___ users mailing list users@lists.scilab.org http://lists.scilab.org/mailman/listinfo/users
[Scilab-users] Why so slow?
Hello, I need to read in a csv of about 360.000 lines with date and numerical values. Attached is a sample excerpt of that file. So far I did: CODE // read in tic mydat = csvRead('dat04-2011.csv', ';', ',', 'double', [], [], [], 6); toc (= 5,213 secs) mydat = mydat(:,2:6); tic mystring = csvRead('dat04-2011.csv', ';', ',', 'string', [], [], [], 6); toc (= 3,077 secs) mystring = mystring(:,1); tic for i=1:size(mydat,1) mydate(i,:) = strtod(strsplit(mystring(i,1),['.';' ';':']))'; end toc (= 186,473 secs) CODE (I filled in the toc values). As you can see this is unfortunately very slow. The read in of the csv, but especially the for loop. So I have several question: 1) Is there a faster way to read in the csv? Note that I need the 'header' option. 2) Instead of the loop I would like to use mydate = strtod(strsplit(mystring(:,1),['.';' ';':']))'; but this doesn't work. Is there another way to avoid the loop? 3) The raw csv file is around 15MB, but when I want to read it in the second time, Scilab says this will exceed the stacksize. Which is default by 76MB. So I don't quite understand how two times the 15MB file takes so much memory? I raised the stacksize now, but I would rather like not to. Any help is appreciated. Thanks! Richard# S # Parame # Unit: # Titles: Timity # Data: 03.01.2004 14:20;9,33;6,96;11,1;0,75;2 05.01.2004 13:40;8,58;7,34;9,56;0,38;2 10.01.2004 13:10;7,33;6,19;8,79;0,58;2 13.01.2004 06:10;16,07;12,92;20,62;1,27;2 25.01.2004 18:20;4,15;3,88;4,46;0,15;2 15.02.2004 00:30;3,49;3,11;3,78;0,17;2 27.02.2004 03:10;8,33;7,34;9,46;0,36;2 15.03.2004 08:50;15,04;13,31;17,16;0,49;2 19.03.2004 06:00;14,4;13,02;15,62;0,38;2 ___ users mailing list users@lists.scilab.org http://lists.scilab.org/mailman/listinfo/users