-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Attached is a patch with the following API change for ActiveRecord::Base.create.
Currently ActiveRecord::Base.create takes either a hash of attributes or an array of hashes of attributes. If you pass in an array it will treat each hash individually and create/save each object separately. This is really slow because it invokes a single INSERT statement for each hash of attributes. The current class method 'create' also returns a model object or an array of model objects for the hash(es) that were passed in. This patch updates ActiveRecord::Base.create to take a third argument which is an options hash. Have this take an :optimize key which can point to one of the values: 'fast' or 'fastest' 'fast' would use the minimum amount of INSERT statements to create the values and would still use the current validations/callback methods 'fastest' would use the minimum amount of INSERT statements and would return the number of INSERT statements issued. It would not create any model objects, and it would skip all validations/callbacks. Currently 'fastest' is implemented in the patch. I expect to have 'fast' done by midweek but I'd like some peer review and core acceptance before I focus on doing to many more patches for this feature. This also involves a patch to MysqlAdapter (and it will require a patch for any other adapter to support this optimization) to support intelligent communication with the database server to determine server values like the maximum allowed packet size (which is used to create the minimum number of INSERT statements). Since dev.rubyonrails.com is down I figured I post this here. Below are some benchmarking statistic. Creation of 100 MyISAM records using AR now took 0.92 seconds Creation of 100 MyISAM records using :optimize=>'fastest' took 0.10 seconds Creation of 1000 MyISAM records using AR now took 5.41 seconds Creation of 1000 MyISAM records using :optimize=>'fastest' took 0.17 seconds Creation of 10000 MyISAM records using AR now took 57.46 seconds Creation of 10000 MyISAM records using :optimize=>'fastest' took 2.45 seconds The benchmarks for Memory and InnoDb are just as good (although InnoDb does get a better speedup). The change in code for the user looks like: MyModel.create array_of_hashes, :optimize=>'fastest' This is a patch off from trunk tonight, and it includes unit tests for everything. Could someone check this out and give some feedback? I have started to develop this as a separate plugin for ActiveRecord, but the more I think about it the more it seems that this sort of thing should be in core. Thoughts? Zach -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEuuhyMyx0fW1d8G0RAiDTAJ9GceqhS2lyR4OL+FrajQjenLmPOACfVpia Z4tSCY8kl/kncnbRe6CIuHE= =4hcE -----END PGP SIGNATURE-----
Index: /home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/base.rb =================================================================== --- /home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/base.rb (revision 4610) +++ /home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/base.rb (working copy) @@ -431,9 +431,23 @@ # Creates an object, instantly saves it as a record (if the validation permits it), and returns it. If the save # fails under validations, the unsaved object is still returned. - def create(attributes = nil) - if attributes.is_a?(Array) + # Creates and returns an object, given a hash of attributes and instantly saves it as + # a record (if the validation permits it). If the save fails under validations, the unsaved object is still around. + # + # Creates and returns an array of object, given an array of hash attributes. By default + # this will validate each object created. If an object's save fails under validations, + # the unsaved object is still around. (This is the default behavior) + # + # Creates and returns an array of objects given an array of hash attributes, and passing in a + # hash of options. The options hash can include: + # * :optimize=>'best' - this will skip all validations and callbacks and perform an optimized \ + # insert operation given that the database adapter supports this optimization. (Currently \ + # only the Mysql adapter supports this feature). + def create(attributes = nil, *args ) + if attributes.is_a?(Array) and args.empty? attributes.collect { |attr| create(attr) } + elsif attributes.is_a?(Array) and args.first.is_a?( Hash ) + create_many_with_options( attributes, args.first ) else object = new(attributes) scope(:create).each { |att,value| object.send("#{att}=", value) } if scoped?(:create) @@ -440,6 +454,40 @@ object.save object end + end + + def create_many_with_options( objects, options ) # :nodoc: + if options[:optimize].to_s =~ /^fast$/ + create_many( objects, options ) + # elsif options[:optimize].to_s =~ /fastest$/ + else + create_many_without_validations_or_callbacks( objects, options ) + end + end + + # TODO currently not supported, defaults to create_many_without_validations_or_callbacks + def create_many( objects, options={} ) # :nodoc + create_many_without_validations_or_callbacks( objects, options ) + end + + # Returns the number of insert operations it took to create these records without + # validations or callbacks. The underlying database adapter has to support + # the method insert_many. + def create_many_without_validations_or_callbacks( objects, options ) # :nodoc: + column_names = objects.first.keys + column_names_sql = column_names.map{ |c| connection.quote_column_name( c ) } + + values = objects.map{ |hsh| + column_names.map{ |colname| + connection.quote( hsh[colname], columns_hash[colname] ) } } + values_sql = values.map{ |arr| '(' + arr.join( ',' ) + ')' } + + base_sql = "INSERT INTO #{table_name} (#{column_names_sql.join(', ')}) VALUES" + number_of_inserts = connection.insert_many( + base_sql, + values_sql, + "#{self.class.name} Create Without Validation" ) + number_of_inserts end # Finds the record from the passed +id+, instantly saves it with the passed +attributes+ (if the validation permits it), Index: /home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb =================================================================== --- /home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb (revision 4610) +++ /home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb (working copy) @@ -197,6 +197,35 @@ execute(sql, name = nil) id_value || @connection.insert_id end + + def insert_many( base_sql, values, *args ) + # the number of inserts default + number_of_inserts = 0 + + # the number of bytes the requested insert statement values will take up + values_in_bytes = self.class.sum_sizes( *values ) + + # the number of bytes (commas) it will take to comma separate our values + comma_separated_bytes = values.size-1 + + # the total number of bytes required if this statement is one statement + total_bytes = base_sql.size + values_in_bytes + comma_separated_bytes + + max = max_allowed_packet + # if we can insert it all as one statement + if total_bytes < max + number_of_inserts += 1 + insert( base_sql + values.join( ',' ), *args ) + else + value_sets = self.class.get_insert_value_sets( values, base_sql.size, max ) + value_sets.each do |values| + number_of_inserts += 1 + insert( base_sql + values.join( ',' ), *args ) + end + end + + number_of_inserts + end def update(sql, name = nil) #:nodoc: execute(sql, name) @@ -324,7 +353,41 @@ current_type = select_one("SHOW COLUMNS FROM #{table_name} LIKE '#{column_name}'")["Type"] execute "ALTER TABLE #{table_name} CHANGE #{column_name} #{new_column_name} #{current_type}" end - + + # MISC ================================================================= + + # Returns the sum of the sizes of the passed in objects. This should + # probably be moved outside this class, but to where? + def self.sum_sizes( *objects ) # :nodoc: + objects.inject( 0 ){|sum,o| sum += o.size } + end + + # Returns the maximum number of bytes that the server will allow + # in a single packet + def max_allowed_packet # :nodoc: + result = execute( "SHOW VARIABLES like 'max_allowed_packet';" ) + result.fetch_row[1].to_i + end + + def self.get_insert_value_sets( values, sql_size, max_bytes ) + value_sets = [] + arr, current_arr_values_size, current_size = [], 0, 0 + values.each_with_index do |val,i| + comma_bytes = arr.size-1 + if sql_size + current_size + val.size + comma_bytes < max_bytes + current_size += val.size + arr << val + else + value_sets << arr + arr = [ val ] + current_size = val.size + end + + # if we're on the last iteration push whatever we have in arr to value_sets + value_sets << arr if i == (values.size-1) + end + [ *value_sets ] + end private def connect Index: /home/zdennis/source/local_projects/rails_trunk/activerecord/test/adapter_test_mysql.rb =================================================================== --- /home/zdennis/source/local_projects/rails_trunk/activerecord/test/adapter_test_mysql.rb (revision 0) +++ /home/zdennis/source/local_projects/rails_trunk/activerecord/test/adapter_test_mysql.rb (revision 0) @@ -0,0 +1,45 @@ + +class MysqlAdapterTest < Test::Unit::TestCase + include ActiveRecord::ConnectionAdapters + + def setup + @connection = ActiveRecord::Base.connection + end + + def test_get_insert_value_sets + values = [ + "('1','2','3')", + "('4','5','6')", + "('7','8','9')" ] + + values_size_in_bytes = MysqlAdapter.sum_sizes( *values ) + base_sql_size_in_bytes = 15 + max_bytes = 30 + + value_sets = MysqlAdapter.get_insert_value_sets( values, base_sql_size_in_bytes, max_bytes ) + assert_equal 3, value_sets.size, 'Three value sets were expected!' + + # Each element in the value_sets array must be an array + value_sets.each_with_index { |e,i| + assert_kind_of Array, e, "Element #{i} was expected to be an Array!" } + + # Each element in the values array should have a 1:1 correlation to the elements + # in the returned value_sets arrays + assert_equal values[0], value_sets[0].first + assert_equal values[1], value_sets[1].first + assert_equal values[2], value_sets[2].first + end + + def test_insert_many + base_sql = "INSERT INTO #{Topic.table_name} (`title`,`author_name`) VALUES " + values = [ + "('Morgawr','Brooks, Terry')", + "('Antrax', 'Brooks, Terry')", + "('Jarka Ruus', 'Brooks, Terry')" ] + + current_number_of_topic_records = Topic.count + @connection.insert_many( base_sql, values ) + assert_equal current_number_of_topic_records + 3, Topic.count, "Incorrect number of records in the database!" + end + +end \ No newline at end of file Index: /home/zdennis/source/local_projects/rails_trunk/activerecord/test/base_test.rb =================================================================== --- /home/zdennis/source/local_projects/rails_trunk/activerecord/test/base_test.rb (revision 4610) +++ /home/zdennis/source/local_projects/rails_trunk/activerecord/test/base_test.rb (working copy) @@ -138,7 +138,33 @@ topic_reloaded = Topic.find(topic.id) assert_equal("New Topic", topic_reloaded.title) end + + def test_create_many1 + topic_hash = { :title => 'New Topic' } + topics_array = [ topic_hash, topic_hash, topic_hash ] + topic_count = Topic.count + number_of_expected_topics = topics_array.size + topic_count + + Topic.create topics_array + assert_equal number_of_expected_topics, Topic.count + + number_of_expected_topics += topics_array.size + Topic.create_many topics_array, :optimize=>'fastest' + assert_equal number_of_expected_topics, Topic.count + end + + def test_create_many_with_options + topic_hash = { :title => 'New Topic' } + topics_array = [ topic_hash, topic_hash, topic_hash ] + + topic_count = Topic.count + number_of_expected_topics = topics_array.size + topic_count + + Topic.create_many_with_options topics_array, :optimize=>'fastest' + assert_equal number_of_expected_topics, Topic.count + end + def test_save! topic = Topic.new(:title => "New Topic") assert topic.save!
_______________________________________________ Rails-core mailing list Rails-core@lists.rubyonrails.org http://lists.rubyonrails.org/mailman/listinfo/rails-core