On Sun, Oct 11, 2009 at 8:13 PM, Nate Wiger <[email protected]> wrote:
> Sounds good Jeremy, actually it sounds like we're on the same page.
>
> At this point, all the changes I've made have resulted in all specs
> passing. The points you mentioned are all valid, but all behaviors
> remain intact.
>
> The mods I've checked in only extend existing behavior. The concerns
> you raised:
>
> Dataset.filter{id > 1}.all
> Dataset.filter("id > ?", 1).all
> Dataset.filter("id > :id", :id=>1).all
>
> will remain unchanged, just leveraging bind vars if possible. The
> "implicit bind vars" issue is a bit of a false concern.
Your commit special cases placeholder literal strings, so I don't see
how this would use bound variables with your commit:
Dataset.filter{id > 1}.all
That's not to say that your implementation is bad, but I think
currently it's at the wrong level. You do not want to override
Dataset#placeholder_literal_string_sql. You probably want to override
Dataset#literal instead and deal with things there, so that all
literalization is affected.
> Bind vars aren't that complicated; there is NO REASON in Postres,
> Oracle, etc where you would EVER prefer forcing the parsing engine to
> hard-parse 'val' vs providing a :val with bind (param). So to answer
> this:
You still haven't given hard figures as to how much faster bind
variables without prepared statements are on Oracle or PostgreSQL.
Can you offer a benchmark showing how much faster things are? I
understand that you think it should be faster. However, my initial
benchmarks actually show bind variables being slower on PostgreSQL
using the pg driver:
irb(main):016:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT 1')}}}
0.120000 0.120000 0.240000 ( 0.576387)
=> nil
irb(main):017:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT 1')}}}
0.130000 0.070000 0.200000 ( 0.579006)
=> nil
irb(main):018:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT 1')}}}
0.130000 0.160000 0.290000 ( 0.701892)
=> nil
irb(main):019:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT $1::int', [1])}}}
0.150000 0.080000 0.230000 ( 0.812850)
=> nil
irb(main):020:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT $1::int', [1])}}}
0.230000 0.170000 0.400000 ( 0.900962)
=> nil
irb(main):021:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT $1::int', [1])}}}
0.160000 0.190000 0.350000 ( 0.889527)
irb(main):025:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT * FROM artist WHERE id =
$1::int', [1])}}}
0.200000 0.130000 0.330000 ( 1.308425)
=> nil
irb(main):026:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT * FROM artist WHERE id =
$1::int', [1])}}}
0.250000 0.150000 0.400000 ( 1.312458)
=> nil
irb(main):027:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT * FROM artist WHERE id =
$1::int', [1])}}}
0.180000 0.070000 0.250000 ( 1.253889)
=> nil
irb(main):028:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT * FROM artist WHERE id =
1')}}}
0.180000 0.190000 0.370000 ( 1.040620)
=> nil
irb(main):029:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT * FROM artist WHERE id =
1')}}}
0.090000 0.020000 0.110000 ( 0.972778)
=> nil
irb(main):030:0> DB.synchronize{|c| puts
Benchmark.measure{10000.times{c.exec('SELECT * FROM artist WHERE id =
1')}}}
0.120000 0.090000 0.210000 ( 0.966392)
=> nil
Now, it could be that database engine is doing less work. However,
you can't ignore ruby's overhead, even when we are dropping right down
to the database connection level. Let's try a benchmark using some
higher level Sequel code:
irb(main):037:0> puts
Benchmark.measure{10000.times{DB[:artist].filter('id = ?', 1).all}}
2.770000 0.380000 3.150000 ( 3.910163)
=> nil
irb(main):038:0> puts
Benchmark.measure{10000.times{DB[:artist].filter('id = ?', 1).all}}
2.530000 0.360000 2.890000 ( 3.955839)
=> nil
irb(main):039:0> puts
Benchmark.measure{10000.times{DB[:artist].filter('id = ?', 1).all}}
2.760000 0.310000 3.070000 ( 3.952374)
=> nil
irb(main):040:0> puts
Benchmark.measure{10000.times{DB[:artist].filter('id = ?',
$i).call(:select, :i=>1)}}
5.490000 0.530000 6.020000 ( 7.123458)
=> nil
irb(main):041:0> puts
Benchmark.measure{10000.times{DB[:artist].filter('id = ?',
$i).call(:select, :i=>1)}}
5.300000 0.650000 5.950000 ( 7.131980)
=> nil
irb(main):042:0> puts
Benchmark.measure{10000.times{DB[:artist].filter('id = ?',
$i).call(:select, :i=>1)}}
5.490000 0.470000 5.960000 ( 7.127245)
=> nil
Here, the results are even more dramatic. Turns out that using bound
variables instead of just literalizing the integer into the SQL is
almost twice as slow.
Now, you can prepare the statement and have it be faster than the
literalized case:
irb(main):053:0> ps = DB[:artist].filter('id = ?',
:$id__int).prepare(:select, :artist_by_id)
=> <Sequel::Postgres::Dataset/PreparedStatement "SELECT * FROM
\"artist\" WHERE (id = $1::int)">
irb(main):054:0> DB.loggers.clear
=> []
irb(main):055:0> puts Benchmark.measure{10000.times{ps.call(:id=>1)}}
2.010000 0.300000 2.310000 ( 2.765197)
=> nil
irb(main):056:0> puts Benchmark.measure{10000.times{ps.call(:id=>1)}}
1.880000 0.440000 2.320000 ( 2.773472)
=> nil
irb(main):057:0> puts Benchmark.measure{10000.times{ps.call(:id=>1)}}
1.850000 0.390000 2.240000 ( 2.769613)
=> nil
However, you can only do that if you know the query will be used a lot
(e.g. profiling). If you have the brilliant idea to just prepare all
queries, performance is much worse:
irb(main):058:0> puts
Benchmark.measure{10000.times{DB[:artist].filter('id = ?',
:$id__int).prepare(:select, :artist_by_id).call(:id=>1)}}
8.490000 1.060000 9.550000 ( 10.429433)
=> nil
irb(main):059:0> puts
Benchmark.measure{10000.times{DB[:artist].filter('id = ?',
:$id__int).prepare(:select, :artist_by_id).call(:id=>1)}}
8.580000 1.100000 9.680000 ( 10.489959)
=> nil
irb(main):060:0> puts
Benchmark.measure{10000.times{DB[:artist].filter('id = ?',
:$id__int).prepare(:select, :artist_by_id).call(:id=>1)}}
8.640000 1.100000 9.740000 ( 10.437481)
So the advice I've been giving people all along is sound, at least for
cases like these on PostgreSQL (Oracle may be a completely different
story). Only optimize by using bound variables and/or prepared
statements if you know a particular query is a problem, and even then,
the performance benefits are significant but hardly amazing. And
always, always benchmark both cases, because your assumptions may not
be accurate.
>> I suppose I should refine my original statements to say that I'm
>> against implicit bound variable/prepared statement support not for
>> philosophical reasons, but for pragmatic ones that could potentially
>> change.
>
> I'm honestly not trying to be a dick, but even pragmatically, you're
> off. Bind vars ALWAYS have an advantage over forcing a hard parse.
> There is no reason to profile these. It would be like profiling a
> Rails/Merb/Sinatra app in dev vs prod mode and expecting dev to maybe
> be faster. It won't be. If it was, the whole stack is totally
> jacked.
In light of my benchmarking, do you care to retract your statements?
Apparently, bind vars don't ALWAYS have an advantage. There's always
the possibility that Sequel is "totally jacked", but considering that
performance is worse when literalizing even when using pg directly, I
doubt that is the reason.
Jeremy
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"sequel-talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/sequel-talk?hl=en
-~----------~----~----~----~------~----~------~--~---