I cannot send the CSV files because they are private data currently
being used for research, just to give a hint I'm parsing files from
800,000 to 19 millions of lines.

I'm using http://www.squeaksource.com/SimpleTextParser.html which is
based in http://www.squeaksource.com/CSV.html plus some useful
additions (for me).

2010/12/7 Benoit St-Jean <[email protected]>:
> What are you using to read those CSV files?  Do you have a file so we can
> have a look at it and possibly speed up the reading of the CSV file?
>
> -----------------
> Benoit St-Jean
> A standpoint is an intellectual horizon of radius zero.
> (Albert Einstein)
>
>
>
>
>> Date: Mon, 6 Dec 2010 16:54:37 -0300
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: [Pharo-users] Fastest matrix implementation?
>>
>> Hi Benoit,
>>
>> I've loaded the package but it seems the port is not complete, i.e. if
>> you evaluate:
>>
>> DhbMatrix new: 10
>>
>> you will get a MessageNotUnderstood: Interval>>asVector because
>> extension methods were not ported. I uploaded to the SqueakSource a
>> new version including extension methods and now most tests pass.
>>
>> Concerning the performance issues, I've narrowed my code to only
>> measure the writing and reading of a matrix of 710500 elements,
>> resulting in 58239 milliseconds for the native Matrix implementation
>> and 56920 for DhbMatrix.
>> It seems my performance problem involves reading and parsing a "CSV" file
>>
>> Elements Matrix DhbMatrix
>> 53400 18274 17329
>> 175960 61043 60722
>> 710500 379276 385278
>>
>> I will check if it's worth to implement a primitive for very fast
>> parsing of CSV files.
>> Cheers,
>>
>> 2010/12/5 Benoit St-Jean <[email protected]>:
>> > Have you tried the matrix implementation in the numerical package from
>> > Didier H. Besset?
>> >
>> > http://squeaksource.com/@Q45T_l348Ag07gGT/VMsGzidC
>> >
>> >
>> >
>> >
>> > -----------------
>> > Benoit St-Jean
>> > A standpoint is an intellectual horizon of radius zero.
>> > (Albert Einstein)
>> >
>> >
>> >
>> >
>> >> Date: Sun, 5 Dec 2010 17:33:17 -0300
>> >> From: [email protected]
>> >> To: [email protected]
>> >> Subject: [Pharo-users] Fastest matrix implementation?
>> >>
>> >> Hi list
>> >>
>> >> In the context of a scientific project here we are building big
>> >> matrices for later processing, mostly exporting to custom file formats
>> >> for PLINK, HaploView, etc (bioinformatics tools). I've tested one of
>> >> our scripts in both Pharo 1.1 (not CogVM) with the corresponding
>> >> Python 2.6 implementation (without PyPy), and the performance in
>> >> Python was superior, about 8x faster than ST.
>> >> So I wonder if anyone knows the fastest (or a faster) implementation
>> >> of Matrix than the included by default in Collections?
>> >>
>> >> Cheers,
>> >>
>>
>> --
>> Hernán Morales
>> Information Technology Manager,
>> Institute of Veterinary Genetics.
>> National Scientific and Technical Research Council (CONICET).
>> La Plata (1900), Buenos Aires, Argentina.
>> Telephone: +54 (0221) 421-1799.
>> Internal: 422
>> Fax: 425-7980 or 421-1799.
>>
>



-- 
Hernán Morales
Information Technology Manager,
Institute of Veterinary Genetics.
National Scientific and Technical Research Council (CONICET).
La Plata (1900), Buenos Aires, Argentina.
Telephone: +54 (0221) 421-1799.
Internal: 422
Fax: 425-7980 or 421-1799.

Reply via email to