On Friday, March 16, 2012 6:34:05 PM UTC-7, Peter van Hardenberg wrote:
>
> String concatenation is generally expensive. Collecting all the tokens
> in an array and .joining them if a common idiom you might consider
> employing if you're beginning to think about performance issues.
>

This is not necessarily true with ruby (though it is generally true in 
python since python has immutable strings):

Code:

require 'benchmark'

N = 10000000
I = 10

def s
  x = ''
  d = '.' * I
  N.times{x << d}
  x
end

def a
  x = []
  d = '.' * I
  N.times{x << d}
  x.join
end

Benchmark.bm(10) do |x|
  x.report('string'){s}
  x.report('array'){a}
end
p(s == a)

Output:

ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-openbsd]
                 user     system      total        real
string       1.710000   0.070000   1.780000 (  1.785498)
array        1.930000   0.140000   2.070000 (  2.070328)
true

ruby 1.8.7 (2012-02-08 patchlevel 358) [x86_64-openbsd]
                user     system      total        real
string      2.440000   0.040000   2.480000 (  2.483004)
array       2.580000   0.150000   2.730000 (  2.715857)
true

jruby 1.6.7 (ruby-1.8.7-p357) (2012-02-22 fffffff) (OpenJDK 64-Bit Server 
VM 1.7.0) [OpenBSD-amd64-java]
                user     system      total        real
string      1.741000   0.000000   1.741000 (  1.589000)
array       1.510000   0.000000   1.510000 (  1.510000)
true

You can try playing around with the code and different sizes of appended 
strings (I) and different number of iterations (N) to see how different 
interpreters handle things.  It's trivial to have Sequel use the array 
approach, since << is the only method used, you just need to start with an 
empty array and add an extra #join at the end.  During the dataset 
literalization refactoring, I tested both strings, arrays, and stringIO, 
and strings were the fastest in the use cases I tested, so that's what I 
went with.  What I found was that arrays were actually faster than strings 
until the final #join, but adding the final #join at the end to turn the 
array into a string made it slower than appending everything to the string.

I'll try not to take offense to the "if you're beginning to think about 
performance issues". :)  I actually do spend quite a bit of time thinking 
about performance issues, and have for some time.  The reason Sequel uses a 
pure string concatenation approach for building SQL is that it is the 
fastest way I know of.  I'm pretty sure Sequel's SQL literalizer is orders 
of magnitude faster than ARel's in the worse case (see 
https://github.com/jeremyevans/sequel/commit/092905dea17e1c800e5c6af6c38ff4997d0bdf8f).
  
There are large parts of Sequel that are unoptimized, but most of the inner 
loops are heavily optimized.  If I've missed some low hanging fruit, please 
do send in patches. :)  

 

> Memoizing as much of the generated SQL as possible could be an
> advantage for the kind of use cases I have. On the other hand: it's
> Ruby. There's only so much you can do without going down to C...
>
>
Memoizing makes sense if the cost of memoizing and retrieval are low, the 
hit ratio is high, and the cost of rebuilding is high.   I'm not sure this 
is true in Sequel's case, but I suppose it is possible. I'll accept patches 
in this area, but they'll have to come with benchmarks that show obvious 
advantages in certain use cases with no significant disadvantages.

It is true that doing it in C would be faster, but the literalization code 
is not something I would want to code in C (and I'm fairly comfortable with 
the ruby C API).  Even in C I wouldn't expect it to be more than 2-3x 
faster.

Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sequel-talk/-/CqDsNZML8U0J.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sequel-talk?hl=en.

Reply via email to