On Friday, March 16, 2012 6:34:05 PM UTC-7, Peter van Hardenberg wrote:
>
> String concatenation is generally expensive. Collecting all the tokens
> in an array and .joining them if a common idiom you might consider
> employing if you're beginning to think about performance issues.
>
This is not necessarily true with ruby (though it is generally true in
python since python has immutable strings):
Code:
require 'benchmark'
N = 10000000
I = 10
def s
x = ''
d = '.' * I
N.times{x << d}
x
end
def a
x = []
d = '.' * I
N.times{x << d}
x.join
end
Benchmark.bm(10) do |x|
x.report('string'){s}
x.report('array'){a}
end
p(s == a)
Output:
ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-openbsd]
user system total real
string 1.710000 0.070000 1.780000 ( 1.785498)
array 1.930000 0.140000 2.070000 ( 2.070328)
true
ruby 1.8.7 (2012-02-08 patchlevel 358) [x86_64-openbsd]
user system total real
string 2.440000 0.040000 2.480000 ( 2.483004)
array 2.580000 0.150000 2.730000 ( 2.715857)
true
jruby 1.6.7 (ruby-1.8.7-p357) (2012-02-22 fffffff) (OpenJDK 64-Bit Server
VM 1.7.0) [OpenBSD-amd64-java]
user system total real
string 1.741000 0.000000 1.741000 ( 1.589000)
array 1.510000 0.000000 1.510000 ( 1.510000)
true
You can try playing around with the code and different sizes of appended
strings (I) and different number of iterations (N) to see how different
interpreters handle things. It's trivial to have Sequel use the array
approach, since << is the only method used, you just need to start with an
empty array and add an extra #join at the end. During the dataset
literalization refactoring, I tested both strings, arrays, and stringIO,
and strings were the fastest in the use cases I tested, so that's what I
went with. What I found was that arrays were actually faster than strings
until the final #join, but adding the final #join at the end to turn the
array into a string made it slower than appending everything to the string.
I'll try not to take offense to the "if you're beginning to think about
performance issues". :) I actually do spend quite a bit of time thinking
about performance issues, and have for some time. The reason Sequel uses a
pure string concatenation approach for building SQL is that it is the
fastest way I know of. I'm pretty sure Sequel's SQL literalizer is orders
of magnitude faster than ARel's in the worse case (see
https://github.com/jeremyevans/sequel/commit/092905dea17e1c800e5c6af6c38ff4997d0bdf8f).
There are large parts of Sequel that are unoptimized, but most of the inner
loops are heavily optimized. If I've missed some low hanging fruit, please
do send in patches. :)
> Memoizing as much of the generated SQL as possible could be an
> advantage for the kind of use cases I have. On the other hand: it's
> Ruby. There's only so much you can do without going down to C...
>
>
Memoizing makes sense if the cost of memoizing and retrieval are low, the
hit ratio is high, and the cost of rebuilding is high. I'm not sure this
is true in Sequel's case, but I suppose it is possible. I'll accept patches
in this area, but they'll have to come with benchmarks that show obvious
advantages in certain use cases with no significant disadvantages.
It is true that doing it in C would be faster, but the literalization code
is not something I would want to code in C (and I'm fairly comfortable with
the ruby C API). Even in C I wouldn't expect it to be more than 2-3x
faster.
Jeremy
--
You received this message because you are subscribed to the Google Groups
"sequel-talk" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/sequel-talk/-/CqDsNZML8U0J.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sequel-talk?hl=en.