Hi Hernán, On 06 Dec 2010, at 20:54, Hernán Morales Durand wrote:
> It seems my performance problem involves reading and parsing a "CSV" file > > Elements Matrix DhbMatrix > 53400 18274 17329 > 175960 61043 60722 > 710500 379276 385278 > > I will check if it's worth to implement a primitive for very fast parsing of > CSV files. I think that instead of going native, it would be worthwhile to try to optimize in Smalltalk first (it certainly is more fun). I thought that this was an interesting problem so I tried writing some code myself, assuming the main problem is getting a CSV matrix in and out of Smalltalk as fast as possible. I simplified further by making the matrix square and containing only Numbers. I also preallocate the matrix and use the fact that I know the row/column sizes. These are my results: Size Elements Read Write 250 62500 1013 7858 500 250000 4185 31007 750 562500 9858 71434 I think this is faster, but it is hard to compare. I am still a bit puzzled as to why the writing is slower than the reading though. The code is available at http://www.squeaksource.com/ADayAtTheBeach.html in the package 'Smalltalk-Hacking-Sven', class NumberMatrix. This is the write loop: writeCsvTo: stream 1 to: size do: [ :row | 1 to: size do: [ :column | column ~= 1 ifTrue: [ stream nextPut: $, ]. stream print: (self at: row at: column) ]. stream nextPut: Character lf ] And this is the read loop: readCsvFrom: stream | numberParser | numberParser := SqNumberParser on: stream. 1 to: size do: [ :row | 1 to: size do: [ :column | self at: row at: column put: numberParser nextNumber. column ~= size ifTrue: [ stream peekFor: $, ] ]. row ~= size ifTrue: [ stream peekFor: Character lf ] ] I am of course cheating a little bit, but should your CSV file be different, I am sure you can adapt the code (for example to deal with quoting). I am also advancing the stream under the SqNumberParser to avoid allocation a new one every time. I think this code generates little garbage. What do you think ? Sven
