Ok, I've tried this out.
First version
----------------
[
|length a|
length := 0.
1 to: 10 do: [ :index |
('Loop {1}' format: { index }) logCr.
a := (FileLocator imageDirectory / 'javacomp' /
'jfreechart-0_9_0.mse') readStream contents.
(ReadStream on: a) do: [ :c |
length := length + 1.
].
length asString logCr.
]] timeToRun
0:00:00:22.33
Takes a lot of time.
Second version (streaming, less memory intensive)
---------------------------------------------------------------------
[
|length c|
length := 0.
1 to: 10 do: [ :index |
('Loop {1}' format: { index }) logCr.
(FileLocator imageDirectory / 'javacomp' / 'jfreechart-0_9_0.mse')
readStreamDo: [ :s |
[ s atEnd ] whileFalse: [
c := s next.
length := length + 1.
]
].
length asString logCr.
]] timeToRun
0:00:00:03.683
Already better.
But profiling version 1 showed the issue. We dealing with a multibyte
stream there.
So, switching to a StandardFileStream gives
Version 3
-------------
[
|length a|
length := 0.
1 to: 10 do: [ :index |
('Loop {1}' format: { index }) logCr.
a := (StandardFileStream fileNamed: (FileLocator imageDirectory /
'javacomp' / 'jfreechart-0_9_0.mse') pathString) readStream contents.
a do: [ :c |
length := length + 1.
].
length asString logCr.
]] timeToRun 0:00:00:03.18
I see that Java does Files.readAllBytes(Paths.get(filename)), "UTF8")
readAllBytes sees suspect to me, even with UTF8. Looks like a standard
file stream with no conversion.
Pharo isn't so slow after all.
HTH
Phil
On Tue, Mar 17, 2015 at 1:21 PM, Nicolas Anquetil
<[email protected]> wrote:
>
> the file is 10M.
>
> it seems to me the content does not change anything since we are just
> reading it character by character without doing anything else.
>
> anyway, you can find it at:
> https://dl.dropboxusercontent.com/u/12861461/jfreechart-0_9_0.mse
>
> nicolas
>
> On 17/03/2015 11:04, [email protected] wrote:
>>
>> Yeah, put the file on a dropbox somewhere and share the link.
>>
>> I'd like to see why this is "slow". I am reading tons of data from a
>> MongoDb and it is superfast.
>>
>> Phil
>>
>> On Tue, Mar 17, 2015 at 10:24 AM, Sven Van Caekenberghe <[email protected]>
>> wrote:
>>>
>>> Can you post/share your file (jfreechart-0_9_0.mse) somewhere so we can
>>> run the same test ?
>>>
>>> Also, in your Java code I do not see a loop doing the benchmark 10 times
>>> ...
>>>
>>>> On 17 Mar 2015, at 10:19, Nicolas Anquetil <[email protected]>
>>>> wrote:
>>>>
>>>>
>>>> Eliot, Sven, Stephan,
>>>>
>>>> thank you for your answers.
>>>>
>>>> As you noticed I am not an expert in profiling :-)
>>>>
>>>> it seems now I might have goofed up and the time taken by pharo in my
>>>> initial program (compared to java) is due to some other extra compilation I
>>>> was doing.
>>>>
>>>> So the "macro benchmark" might be wrong
>>>>
>>>> Still the "micro benchmark" still holds
>>>> I tested the code proposed by Elliot and the result is ....
>>>>
>>>> ---
>>>> [1 to: 10 do: [:j || a length |
>>>> length:=0.
>>>> a :=
>>>> '/home/anquetil/Documents/RMod/Tools/workspace/Blocks/jfreechart-0_9_0.mse'
>>>> asFileReference readStream contents.
>>>> 1 to: a size do: [ :i| | c | c:= a at: i. length:= length+1]]]
>>>> timeToRunWithoutGC
>>>> ---
>>>>
>>>> 12.723 sec.
>>>>
>>>> [reminder] For java it is: 1.482 sec.
>>>>
>>>> so it is still a factor 8 or 9
>>>> it seems a lot for such a simple thing, no?
>>>> (or maybe not, I don't know)
>>>>
>>>> nicolas
>>>>
>>>> On 16/03/2015 09:49, Nicolas Anquetil wrote:
>>>>>
>>>>> I have been doing some file intensive activities and found my program
>>>>> to be VERY slow (see at the end).
>>>>> Just to be sure I ran them in Java and found it was much faster
>>>>>
>>>>> So I did a small test:
>>>>> ---
>>>>> [10 timesRepeat: [i := 0.
>>>>>
>>>>> '/home/anquetil/Documents/RMod/Tools/workspace/Blocks/jfreechart-0_9_0.mse'
>>>>> asFileReference readStream contents do: [ :c | i:= i+1].
>>>>> ] ] timeToRunWithoutGC.
>>>>> ---
>>>>>
>>>>> result = 12.932 sec
>>>>>
>>>>> similar thing (as far as I can tell) 10 times in java: 1.482 sec.
>>>>> ---
>>>>> public static void main(String[] args) {
>>>>> int length =0;
>>>>> try {
>>>>> String filename =
>>>>> "/home/anquetil/Documents/RMod/Tools/workspace/Blocks/jfreechart-0_9_0.mse";
>>>>> String content = new
>>>>> String(Files.readAllBytes(Paths.get(filename)), "UTF8");
>>>>> for (int i=0; i < content.length(); i++) {
>>>>> content.charAt(i);
>>>>> length = length+1;
>>>>> }
>>>>> } catch (IOException e) {
>>>>> e.printStackTrace();
>>>>> }
>>>>> System.out.println(length);
>>>>> }
>>>>> ---
>>>>>
>>>>> Because my program is MUCH slower (see at the end) in Smalltalk than in
>>>>> Java, I did another experiment:
>>>>>
>>>>> ---
>>>>> [1 to: 10 do: [:i| 1 to: 100000000 do: [:j | String new] ] ]
>>>>> timeToRunWithoutGC.
>>>>> ---
>>>>>
>>>>> result = 33.063 sec
>>>>>
>>>>> and in java: 4.382 sec.
>>>>> ---[10 runs of]
>>>>> public static void main(String[] args) {
>>>>> for (int i=0; i < 100000000; i++) {
>>>>> new String();
>>>>> }
>>>>> }
>>>>> ---
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Concretly, my need was:
>>>>> Take 2600 methods in a Moose model, take their source code (therefore
>>>>> reading files), for methods longer than 100 lines (there are 29 of them),
>>>>> go through there code to find the blocks (matching {}).
>>>>> In smalltalk it ran > 12hours and I had processed 5 methods of the 29
>>>>> long ones
>>>>> I reimplemented in Java (basically, just changing from pharo to java
>>>>> syntax) and it took 1 minutes to compute everything ...
>>>>>
>>>>> :-(
>>>>>
>>>>> On the good side, it was much easier to program it in smalltalk (about
>>>>> half a day to think about the algorithm, experiement, implement, test)
>>>>> than
>>>>> in Java (another 1/2 day, just to recode the algorithm that already
>>>>> worked).
>>>>>
>>>>> nicolas
>>>>>
>>>>
>>>
>
>