Ok, I've tried this out.

First version
----------------

[
|length a|

length := 0.
1 to: 10 do: [ :index |
    ('Loop {1}' format: { index }) logCr.
    a := (FileLocator imageDirectory / 'javacomp' /
'jfreechart-0_9_0.mse') readStream contents.

    (ReadStream on: a) do: [ :c |
            length := length + 1.
         ].
    length asString logCr.
]] timeToRun
 0:00:00:22.33

Takes a lot of time.

Second version (streaming, less memory intensive)
---------------------------------------------------------------------

[
|length c|

length := 0.
1 to: 10 do: [ :index |
    ('Loop {1}' format: { index }) logCr.
    (FileLocator imageDirectory / 'javacomp' / 'jfreechart-0_9_0.mse')
readStreamDo: [ :s |
        [ s atEnd ] whileFalse: [
            c := s next.
            length := length + 1.
            ]
         ].
    length asString logCr.
]] timeToRun
 0:00:00:03.683

Already better.


But profiling version 1 showed the issue. We dealing with a multibyte
stream there.


So, switching to a StandardFileStream gives

Version 3
-------------

[
|length a|

length := 0.
1 to: 10 do: [ :index |
    ('Loop {1}' format: { index }) logCr.
    a := (StandardFileStream fileNamed: (FileLocator imageDirectory /
'javacomp' / 'jfreechart-0_9_0.mse') pathString) readStream contents.

    a do: [ :c |
            length := length + 1.
         ].
    length asString logCr.
]] timeToRun 0:00:00:03.18


I see that Java does Files.readAllBytes(Paths.get(filename)), "UTF8")

readAllBytes sees suspect to me, even with UTF8. Looks like a standard
file stream with no conversion.

Pharo isn't so slow after all.

HTH
Phil


On Tue, Mar 17, 2015 at 1:21 PM, Nicolas Anquetil
<[email protected]> wrote:
>
> the file is 10M.
>
> it seems to me the content does not change anything since we are just
> reading it character by character without doing anything else.
>
> anyway, you can find it at:
> https://dl.dropboxusercontent.com/u/12861461/jfreechart-0_9_0.mse
>
> nicolas
>
> On 17/03/2015 11:04, [email protected] wrote:
>>
>> Yeah, put the file on a dropbox somewhere and share the link.
>>
>> I'd like to see why this is "slow". I am reading tons of data from a
>> MongoDb and it is superfast.
>>
>> Phil
>>
>> On Tue, Mar 17, 2015 at 10:24 AM, Sven Van Caekenberghe <[email protected]>
>> wrote:
>>>
>>> Can you post/share your file (jfreechart-0_9_0.mse) somewhere so we can
>>> run the same test ?
>>>
>>> Also, in your Java code I do not see a loop doing the benchmark 10 times
>>> ...
>>>
>>>> On 17 Mar 2015, at 10:19, Nicolas Anquetil <[email protected]>
>>>> wrote:
>>>>
>>>>
>>>> Eliot, Sven, Stephan,
>>>>
>>>> thank you for your answers.
>>>>
>>>> As you noticed I am not an expert in profiling :-)
>>>>
>>>> it seems now I might have goofed up and the time taken by pharo in my
>>>> initial program (compared to java) is due to some other extra compilation I
>>>> was doing.
>>>>
>>>> So the "macro benchmark" might be wrong
>>>>
>>>> Still the "micro benchmark" still holds
>>>> I tested the code proposed by Elliot and the result is ....
>>>>
>>>> ---
>>>> [1 to: 10 do: [:j || a length |
>>>>   length:=0.
>>>>   a :=
>>>> '/home/anquetil/Documents/RMod/Tools/workspace/Blocks/jfreechart-0_9_0.mse'
>>>> asFileReference readStream contents.
>>>>   1 to: a size do: [ :i| | c | c:= a at: i. length:= length+1]]]
>>>> timeToRunWithoutGC
>>>> ---
>>>>
>>>> 12.723 sec.
>>>>
>>>> [reminder] For java it is: 1.482 sec.
>>>>
>>>> so it is still a factor 8 or 9
>>>> it seems a lot for such a simple thing, no?
>>>> (or maybe not, I don't know)
>>>>
>>>> nicolas
>>>>
>>>> On 16/03/2015 09:49, Nicolas Anquetil wrote:
>>>>>
>>>>> I have been doing some file intensive activities and found my program
>>>>> to be VERY slow (see at the end).
>>>>> Just to be sure I ran them in Java and found it was much faster
>>>>>
>>>>> So I did a small test:
>>>>> ---
>>>>> [10 timesRepeat: [i := 0.
>>>>>
>>>>> '/home/anquetil/Documents/RMod/Tools/workspace/Blocks/jfreechart-0_9_0.mse'
>>>>> asFileReference readStream contents do: [ :c | i:= i+1].
>>>>> ] ] timeToRunWithoutGC.
>>>>> ---
>>>>>
>>>>> result = 12.932 sec
>>>>>
>>>>> similar thing (as far as I can tell) 10 times in java: 1.482 sec.
>>>>> ---
>>>>>     public static void main(String[] args) {
>>>>>         int length =0;
>>>>>         try {
>>>>>             String filename =
>>>>> "/home/anquetil/Documents/RMod/Tools/workspace/Blocks/jfreechart-0_9_0.mse";
>>>>>             String content = new
>>>>> String(Files.readAllBytes(Paths.get(filename)), "UTF8");
>>>>>             for (int i=0; i < content.length(); i++) {
>>>>>                 content.charAt(i);
>>>>>                 length = length+1;
>>>>>             }
>>>>>         } catch (IOException e) {
>>>>>             e.printStackTrace();
>>>>>         }
>>>>>         System.out.println(length);
>>>>>     }
>>>>> ---
>>>>>
>>>>> Because my program is MUCH slower (see at the end) in Smalltalk than in
>>>>> Java, I did another experiment:
>>>>>
>>>>> ---
>>>>> [1 to: 10 do: [:i| 1 to: 100000000 do: [:j | String new] ] ]
>>>>> timeToRunWithoutGC.
>>>>> ---
>>>>>
>>>>> result = 33.063 sec
>>>>>
>>>>> and in java: 4.382 sec.
>>>>> ---[10 runs of]
>>>>>     public static void main(String[] args) {
>>>>>         for (int i=0; i < 100000000; i++) {
>>>>>             new String();
>>>>>         }
>>>>>     }
>>>>> ---
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Concretly, my need was:
>>>>> Take 2600 methods in a Moose model, take their source code (therefore
>>>>> reading files), for methods longer than 100  lines (there are 29 of them),
>>>>> go through there code to find the blocks (matching {}).
>>>>> In smalltalk it ran > 12hours and I had processed 5 methods of the 29
>>>>> long ones
>>>>> I reimplemented in Java (basically, just changing from pharo to java
>>>>> syntax) and it took 1 minutes to compute everything ...
>>>>>
>>>>> :-(
>>>>>
>>>>> On the good side, it was much easier to program it in smalltalk (about
>>>>> half a day to think about the algorithm, experiement, implement, test) 
>>>>> than
>>>>> in Java (another 1/2 day, just to recode the algorithm that already 
>>>>> worked).
>>>>>
>>>>> nicolas
>>>>>
>>>>
>>>
>
>

Reply via email to