On Saturday 09 May 2009, Jian Lin wrote:
> i am writing a test program for ActiveRecord, and it reads a document
> which is like 6000 words long.  And then i just tally up the words by
>
>     recordWord = Word.find_by_s(word);
>     if (recordWord.nil?)
>       recordWord = Word.new
>       recordWord.s = word
>     end
>     if recordWord.count.nil?
>       recordWord.count = 1
>     else
>       recordWord.count += 1
>     end
>     recordWord.save
>
> and so this part loops for 6000 times...

Simply put, this is an inappropriate use of a relational database. As 
the other respondents said, you're executing 12000 transactions each 
with a single database access. If you wrap the code above into a 
transaction block

Word.transaction do
  ...
end

you're down to one transaction, but still have 12000 accesses. IIUC, you 
have a word list of 6000 words (or something) in memory already, why not 
do the frequency counting where it's easy?

frequencies = Hash.new(0)
wordlist.each { |word| frequencies[word] += 1 }

Word.transaction do
  wordlist.each do |word, freq|
    unless w = Word.find_by_s(word)
      w = Word.new(:s => word, :count => 0)
    end
    w.freq += 1
    w.save!
  end
end

This way still incurs two database accesses per unique word in your 
list. To cut this down without custom SQL ([1], [2]) you have to 
know/decide whether there are mostly known or unknown words in every new 
document.

Assuming most words are already in the database, you could do the update 
like this

Word.transaction do
  wordlist.each do |word, freq|
    update_count = Word.update_all(
      ["freq = freq + ?", freq],
      ["s = ?", word])
    if update_count == 0
      Word.create!(:s => word, :count => freq)
    end
  end
end

Michael

[1] http://dev.mysql.com/doc/refman/5.1/en/insert-on-duplicate.html
[2] http://www.postgresql.org/docs/current/static/plpgsql-control-
structures.html#PLPGSQL-ERROR-TRAPPING

-- 
Michael Schuerig
mailto:[email protected]
http://www.schuerig.de/michael/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to