Ah, you beat me :-) Still, your implementation isn't loading the whole contents as the Java version does.
The key issue is the conversion indeed. Phil On Tue, Mar 17, 2015 at 4:17 PM, Sven Van Caekenberghe <[email protected]> wrote: > >> On 17 Mar 2015, at 15:45, Stephan Eggermont <[email protected]> wrote: >> >> I tried it myself, java seems to be 7 times faster on a 35 MB jfreechart.mse >> file I found on github. Moose 5.1 managed about >> 30 MB/s. >> >> UTF8 is rather suboptimal for source code. Nearly all of it is >> ASCII which can be processed a machine word at a time, instead of byte. >> There were earlier discussions about that >> http://forum.world.st/Fastest-utf-8-encoder-contest-td4634566.html >> >> Stephan > > Thanks for the pointer to the file (finally !). > > Using this file: > https://raw.githubusercontent.com/mircealungu/experiments-polymorphism/master/fileouts/jfreechart.mse > which is indeed 35Mb we can do better. > > Since > > (FileLocator desktop / 'jfreechart.mse') binaryReadStreamDo: [ :in | > in contents allSatisfy: [ :each | each < 127 ]. > > is true, we can skip decoding. > > For me, it is pretty fast now > > [ > | count | > count := 0. > (FileLocator desktop / 'jfreechart.mse') binaryReadStreamDo: [ :in | > in contents do: [ :each | count := count + 1 ] ]. > count > ] timeToRun. > > "0:00:00:00.637" > > Adding UTF8 decoding (implemented in Pharo) makes it 10x slower > > [ > | count | > count := 0. > (FileLocator desktop / 'jfreechart.mse') binaryReadStreamDo: [ :in | > in contents utf8Decoded do: [ :each | count := count + 1 ] ]. > count > ] timeToRun. "0:00:00:07.45" > > HTH, > > Sven > > >
