I cannot send the CSV files because they are private data currently being used for research, just to give a hint I'm parsing files from 800,000 to 19 millions of lines.
I'm using http://www.squeaksource.com/SimpleTextParser.html which is based in http://www.squeaksource.com/CSV.html plus some useful additions (for me). 2010/12/7 Benoit St-Jean <[email protected]>: > What are you using to read those CSV files? Do you have a file so we can > have a look at it and possibly speed up the reading of the CSV file? > > ----------------- > Benoit St-Jean > A standpoint is an intellectual horizon of radius zero. > (Albert Einstein) > > > > >> Date: Mon, 6 Dec 2010 16:54:37 -0300 >> From: [email protected] >> To: [email protected] >> Subject: Re: [Pharo-users] Fastest matrix implementation? >> >> Hi Benoit, >> >> I've loaded the package but it seems the port is not complete, i.e. if >> you evaluate: >> >> DhbMatrix new: 10 >> >> you will get a MessageNotUnderstood: Interval>>asVector because >> extension methods were not ported. I uploaded to the SqueakSource a >> new version including extension methods and now most tests pass. >> >> Concerning the performance issues, I've narrowed my code to only >> measure the writing and reading of a matrix of 710500 elements, >> resulting in 58239 milliseconds for the native Matrix implementation >> and 56920 for DhbMatrix. >> It seems my performance problem involves reading and parsing a "CSV" file >> >> Elements Matrix DhbMatrix >> 53400 18274 17329 >> 175960 61043 60722 >> 710500 379276 385278 >> >> I will check if it's worth to implement a primitive for very fast >> parsing of CSV files. >> Cheers, >> >> 2010/12/5 Benoit St-Jean <[email protected]>: >> > Have you tried the matrix implementation in the numerical package from >> > Didier H. Besset? >> > >> > http://squeaksource.com/@Q45T_l348Ag07gGT/VMsGzidC >> > >> > >> > >> > >> > ----------------- >> > Benoit St-Jean >> > A standpoint is an intellectual horizon of radius zero. >> > (Albert Einstein) >> > >> > >> > >> > >> >> Date: Sun, 5 Dec 2010 17:33:17 -0300 >> >> From: [email protected] >> >> To: [email protected] >> >> Subject: [Pharo-users] Fastest matrix implementation? >> >> >> >> Hi list >> >> >> >> In the context of a scientific project here we are building big >> >> matrices for later processing, mostly exporting to custom file formats >> >> for PLINK, HaploView, etc (bioinformatics tools). I've tested one of >> >> our scripts in both Pharo 1.1 (not CogVM) with the corresponding >> >> Python 2.6 implementation (without PyPy), and the performance in >> >> Python was superior, about 8x faster than ST. >> >> So I wonder if anyone knows the fastest (or a faster) implementation >> >> of Matrix than the included by default in Collections? >> >> >> >> Cheers, >> >> >> >> -- >> Hernán Morales >> Information Technology Manager, >> Institute of Veterinary Genetics. >> National Scientific and Technical Research Council (CONICET). >> La Plata (1900), Buenos Aires, Argentina. >> Telephone: +54 (0221) 421-1799. >> Internal: 422 >> Fax: 425-7980 or 421-1799. >> > -- Hernán Morales Information Technology Manager, Institute of Veterinary Genetics. National Scientific and Technical Research Council (CONICET). La Plata (1900), Buenos Aires, Argentina. Telephone: +54 (0221) 421-1799. Internal: 422 Fax: 425-7980 or 421-1799.
