-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Attached is a patch with the following API change for
ActiveRecord::Base.create.

Currently ActiveRecord::Base.create takes either a hash of attributes or
an array of hashes of attributes. If you pass in an array it will treat
each hash individually and create/save each object separately. This is
really slow because it invokes a single INSERT statement for each hash
of attributes. The current class method 'create' also returns a model
object or an array of model objects for the hash(es) that were passed in.

This patch updates ActiveRecord::Base.create to take a third argument
which is an options hash. Have this take an :optimize key which can
point to one of the values: 'fast' or 'fastest'

'fast' would use the minimum amount of INSERT statements to create the
values and would still use the current validations/callback methods

'fastest' would use the minimum amount of INSERT statements and would
return the number of INSERT statements issued. It would not create any
model objects, and it would skip all validations/callbacks.

Currently 'fastest' is implemented in the patch. I expect to have 'fast'
done by midweek but I'd like some peer review and core acceptance before
I focus on doing to many more patches for this feature.

This also involves a patch to MysqlAdapter (and it will require a patch
for any other adapter to support this optimization) to support
intelligent communication with the database server to determine server
values like the maximum allowed packet size (which is used to create the
minimum number of INSERT statements).

Since dev.rubyonrails.com is down I figured I post this here. Below are
some benchmarking statistic.

Creation of 100 MyISAM records using AR now took 0.92 seconds
Creation of 100 MyISAM records using :optimize=>'fastest' took 0.10 seconds

Creation of 1000 MyISAM records using AR now took 5.41 seconds
Creation of 1000 MyISAM records using :optimize=>'fastest' took 0.17 seconds

Creation of 10000 MyISAM records using AR now took 57.46 seconds
Creation of 10000 MyISAM records using :optimize=>'fastest' took  2.45
seconds

The benchmarks for Memory and InnoDb are just as good (although InnoDb
does get a better speedup).

The change in code for the user looks like:
   MyModel.create array_of_hashes, :optimize=>'fastest'

This is a patch off from trunk tonight, and it includes unit tests for
everything.

Could someone check this out and give some feedback? I have started to
develop this as a separate plugin for ActiveRecord, but the more I think
about it the more it seems that this sort of thing should be in core.
Thoughts?

Zach



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEuuhyMyx0fW1d8G0RAiDTAJ9GceqhS2lyR4OL+FrajQjenLmPOACfVpia
Z4tSCY8kl/kncnbRe6CIuHE=
=4hcE
-----END PGP SIGNATURE-----
Index: 
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/base.rb
===================================================================
--- 
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/base.rb
      (revision 4610)
+++ 
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/base.rb
      (working copy)
@@ -431,9 +431,23 @@
 
       # Creates an object, instantly saves it as a record (if the validation 
permits it), and returns it. If the save
       # fails under validations, the unsaved object is still returned.
-      def create(attributes = nil)
-        if attributes.is_a?(Array)
+      # Creates and returns an object, given a hash of attributes and 
instantly saves it as 
+      # a record (if the validation permits it). If the save fails under 
validations, the unsaved object is still around. 
+      # 
+      # Creates and returns an array of object, given an array of hash 
attributes. By default
+      # this will validate each object created. If an object's save fails 
under validations,
+      # the unsaved object is still around. (This is the default behavior)
+      # 
+      # Creates and returns an array of objects given an array of hash 
attributes, and passing in a 
+      # hash of options. The options hash can include:
+      #  * :optimize=>'best' - this will skip all validations and callbacks 
and perform an optimized \
+      #     insert operation given that the database adapter supports this 
optimization. (Currently \
+      #     only the Mysql adapter supports this feature). 
+      def create(attributes = nil, *args )      
+        if attributes.is_a?(Array) and args.empty?
           attributes.collect { |attr| create(attr) }
+        elsif attributes.is_a?(Array) and args.first.is_a?( Hash )
+          create_many_with_options( attributes, args.first )
         else
           object = new(attributes)
           scope(:create).each { |att,value| object.send("#{att}=", value) } if 
scoped?(:create)
@@ -440,6 +454,40 @@
           object.save
           object
         end
+      end 
+      
+      def create_many_with_options( objects, options ) # :nodoc:
+        if options[:optimize].to_s =~ /^fast$/
+          create_many( objects, options )
+        # elsif options[:optimize].to_s =~ /fastest$/
+        else
+          create_many_without_validations_or_callbacks( objects, options )
+        end
+      end  
+            
+      # TODO currently not supported, defaults to 
create_many_without_validations_or_callbacks
+      def create_many( objects, options={} ) # :nodoc
+        create_many_without_validations_or_callbacks( objects, options )
+      end
+      
+      # Returns the number of insert operations it took to create these 
records without
+      # validations or callbacks. The underlying database adapter has to 
support
+      # the method insert_many.
+      def create_many_without_validations_or_callbacks( objects, options ) # 
:nodoc:
+        column_names = objects.first.keys
+        column_names_sql = column_names.map{ |c| connection.quote_column_name( 
c ) }
+        
+        values = objects.map{ |hsh| 
+          column_names.map{ |colname| 
+            connection.quote( hsh[colname], columns_hash[colname] ) } }
+        values_sql = values.map{ |arr| '(' + arr.join( ',' ) + ')' }
+        
+        base_sql = "INSERT INTO #{table_name} (#{column_names_sql.join(', ')}) 
VALUES"
+        number_of_inserts = connection.insert_many( 
+          base_sql, 
+          values_sql,
+          "#{self.class.name} Create Without Validation" )          
+        number_of_inserts
       end
 
       # Finds the record from the passed +id+, instantly saves it with the 
passed +attributes+ (if the validation permits it),
Index: 
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb
===================================================================
--- 
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb
 (revision 4610)
+++ 
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb
 (working copy)
@@ -197,6 +197,35 @@
         execute(sql, name = nil)
         id_value || @connection.insert_id
       end
+      
+      def insert_many( base_sql, values, *args )
+        # the number of inserts default
+        number_of_inserts = 0
+
+        # the number of bytes the requested insert statement values will take 
up
+        values_in_bytes = self.class.sum_sizes( *values )
+        
+        # the number of bytes (commas) it will take to comma separate our 
values
+        comma_separated_bytes = values.size-1
+        
+        # the total number of bytes required if this statement is one statement
+        total_bytes = base_sql.size + values_in_bytes + comma_separated_bytes
+        
+        max = max_allowed_packet        
+        # if we can insert it all as one statement
+        if total_bytes < max
+          number_of_inserts += 1
+          insert( base_sql + values.join( ',' ), *args )
+        else
+          value_sets = self.class.get_insert_value_sets( values, 
base_sql.size, max )
+          value_sets.each do |values|
+            number_of_inserts += 1
+            insert( base_sql + values.join( ',' ), *args )
+          end
+        end        
+        
+        number_of_inserts
+      end
 
       def update(sql, name = nil) #:nodoc:
         execute(sql, name)
@@ -324,7 +353,41 @@
         current_type = select_one("SHOW COLUMNS FROM #{table_name} LIKE 
'#{column_name}'")["Type"]
         execute "ALTER TABLE #{table_name} CHANGE #{column_name} 
#{new_column_name} #{current_type}"
       end
-
+      
+      # MISC =================================================================
+      
+      # Returns the sum of the sizes of the passed in objects. This should
+      # probably be moved outside this class, but to where?
+      def self.sum_sizes( *objects ) # :nodoc:
+        objects.inject( 0 ){|sum,o| sum += o.size }
+      end
+            
+      # Returns the maximum number of bytes that the server will allow
+      # in a single packet
+      def max_allowed_packet # :nodoc:
+        result = execute( "SHOW VARIABLES like 'max_allowed_packet';" )
+        result.fetch_row[1].to_i
+      end
+      
+      def self.get_insert_value_sets( values, sql_size, max_bytes )
+        value_sets = []          
+        arr, current_arr_values_size, current_size = [], 0, 0
+        values.each_with_index do |val,i|
+          comma_bytes = arr.size-1
+          if sql_size + current_size + val.size + comma_bytes < max_bytes
+            current_size += val.size            
+            arr << val
+          else
+            value_sets << arr
+            arr = [ val ]
+            current_size = val.size
+          end
+          
+          # if we're on the last iteration push whatever we have in arr to 
value_sets
+          value_sets << arr if i == (values.size-1)
+        end
+        [ *value_sets ]
+      end
 
       private
         def connect
Index: 
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/adapter_test_mysql.rb
===================================================================
--- 
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/adapter_test_mysql.rb
     (revision 0)
+++ 
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/adapter_test_mysql.rb
     (revision 0)
@@ -0,0 +1,45 @@
+
+class MysqlAdapterTest < Test::Unit::TestCase
+  include ActiveRecord::ConnectionAdapters
+  
+  def setup
+    @connection = ActiveRecord::Base.connection
+  end
+
+  def test_get_insert_value_sets
+    values = [
+      "('1','2','3')",
+      "('4','5','6')",
+      "('7','8','9')" ]
+      
+    values_size_in_bytes = MysqlAdapter.sum_sizes( *values )
+    base_sql_size_in_bytes = 15
+    max_bytes = 30
+    
+    value_sets = MysqlAdapter.get_insert_value_sets( values, 
base_sql_size_in_bytes, max_bytes )
+    assert_equal 3, value_sets.size, 'Three value sets were expected!'
+     
+    # Each element in the value_sets array must be an array
+    value_sets.each_with_index { |e,i| 
+      assert_kind_of Array, e, "Element #{i} was expected to be an Array!" }
+
+    # Each element in the values array should have a 1:1 correlation to the 
elements
+    # in the returned value_sets arrays
+    assert_equal values[0], value_sets[0].first
+    assert_equal values[1], value_sets[1].first
+    assert_equal values[2], value_sets[2].first
+  end
+  
+  def test_insert_many
+    base_sql = "INSERT INTO #{Topic.table_name} (`title`,`author_name`) VALUES 
"
+    values = [ 
+      "('Morgawr','Brooks, Terry')",
+      "('Antrax', 'Brooks, Terry')",
+      "('Jarka Ruus', 'Brooks, Terry')" ]
+    
+    current_number_of_topic_records = Topic.count
+    @connection.insert_many( base_sql, values )
+    assert_equal current_number_of_topic_records + 3, Topic.count, "Incorrect 
number of records in the database!"    
+  end
+
+end
\ No newline at end of file
Index: 
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/base_test.rb
===================================================================
--- 
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/base_test.rb  
    (revision 4610)
+++ 
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/base_test.rb  
    (working copy)
@@ -138,7 +138,33 @@
     topic_reloaded = Topic.find(topic.id)
     assert_equal("New Topic", topic_reloaded.title)
   end
+    
+  def test_create_many1
+    topic_hash = { :title => 'New Topic' }
+    topics_array = [ topic_hash, topic_hash, topic_hash ]
   
+    topic_count = Topic.count
+    number_of_expected_topics = topics_array.size + topic_count
+    
+    Topic.create topics_array
+    assert_equal number_of_expected_topics, Topic.count
+    
+    number_of_expected_topics += topics_array.size
+    Topic.create_many topics_array, :optimize=>'fastest'
+    assert_equal number_of_expected_topics, Topic.count       
+  end
+  
+  def test_create_many_with_options
+    topic_hash = { :title => 'New Topic' }
+    topics_array = [ topic_hash, topic_hash, topic_hash ]
+  
+    topic_count = Topic.count
+    number_of_expected_topics = topics_array.size + topic_count
+
+    Topic.create_many_with_options topics_array, :optimize=>'fastest'
+    assert_equal number_of_expected_topics, Topic.count    
+  end
+    
   def test_save!
     topic = Topic.new(:title => "New Topic")
     assert topic.save!
_______________________________________________
Rails-core mailing list
Rails-core@lists.rubyonrails.org
http://lists.rubyonrails.org/mailman/listinfo/rails-core

Reply via email to