On Monday, April 10, 2017 at 2:55:09 PM UTC-7, Andriy Tyurnikov wrote:
>
> Hello everyone, and thank you so much for your effort with sequel.
>
> While benchmarking in-memory processing of big datasets (100K - 200K rows)
> in active-record and sequel
> we've noticed surprisingly good results of sql_query gem, which does
> nothing but invoking ActiveRecord::Base.connection.execute(SQL).entries,
> which suggest that deserialization process is suboptimal (for big datasets
> at least) in both sequel and activerecord, which is fairly surprising
> since creating 1_000_000 ruby objects doesn't seem that expensive (even
> with exception of Date.new);
>
> With increase of resulting dataset
> "ActiveRecord::Base.connection.execute(SQL).entries" demonstrates fairly
> small cost of results processing, while both ORMs degrade, when used for
> resulting object instantiation:
>
> "
>
> Benchmark.measure {DB[:orders].limit(100000).map{|i| i[:id]}}
>
> D, [2017-04-11T00:07:11.743980 #38269] DEBUG -- : (1.468975s) SELECT *
> FROM "orders" LIMIT 100000
>
> => #<Benchmark::Tms:0x007fc061756768 @label="", @real=13.32571900000039,
> @cstime=0.0, @cutime=0.0, @stime=0.47000000000000597,
> @utime=11.819999999999993, @total=12.29>
> "
>
> 1) With that in mind - could someone please express an opinion on reasons
> of potential performance loss in such case?
>
Creating objects is one of the more expensive things you can do in ruby,
and creating either Sequel::Model or ActiveRecord instances is going to be
expensive if done for many objects.
In Sequel, the most similar code to the
ActiveRecord::Base.connection.execute call would be:
DB.sychronize{|conn| conn.execute(query)} # assuming the underlying
connection supports an #execute method
However it isn't exactly the same as ActiveRecord generally abstracts the
connection object, whereas Sequel uses the raw connection object provided
by the driver (in most adapters).
One of the reasons that Sequel tends to be faster than ActiveRecord when
retrieving objects is that it does less work when creating instances.
However, it's still going to be slower than working with the driver
directly, as it has to:
1) build symbol keyed hashes for each row (1 hash per row)
2) do typecasting of values (if the driver doesn't do that) (potentially
1 or more objects per row per column)
3) wrap each hash in a Sequel::Model instance (if using Sequel::Model) (1
object per row)
For the fastest possible code, use DB.synchronize to get access to the
connection object directly, and/or drop down to using C.
If you are using Sequel with the pg driver, you probably also want to load
sequel_pg (a C extension that significantly speeds things up).
> 2) While jeremyevans/simple_orm_benchmark is farly good illustration of
> 'sequel' superiority, I wonder if anyone
> explored ORM performance in terms of detailed cost of
> networking/parsing/result object allocation?
>
I certainly would be interested in such an analysis.
Thanks,
Jeremy
--
You received this message because you are subscribed to the Google Groups
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.