On Saturday 09 May 2009, Jian Lin wrote:
> i am writing a test program for ActiveRecord, and it reads a document
> which is like 6000 words long. And then i just tally up the words by
>
> recordWord = Word.find_by_s(word);
> if (recordWord.nil?)
> recordWord = Word.new
> recordWord.s = word
> end
> if recordWord.count.nil?
> recordWord.count = 1
> else
> recordWord.count += 1
> end
> recordWord.save
>
> and so this part loops for 6000 times...
Simply put, this is an inappropriate use of a relational database. As
the other respondents said, you're executing 12000 transactions each
with a single database access. If you wrap the code above into a
transaction block
Word.transaction do
...
end
you're down to one transaction, but still have 12000 accesses. IIUC, you
have a word list of 6000 words (or something) in memory already, why not
do the frequency counting where it's easy?
frequencies = Hash.new(0)
wordlist.each { |word| frequencies[word] += 1 }
Word.transaction do
wordlist.each do |word, freq|
unless w = Word.find_by_s(word)
w = Word.new(:s => word, :count => 0)
end
w.freq += 1
w.save!
end
end
This way still incurs two database accesses per unique word in your
list. To cut this down without custom SQL ([1], [2]) you have to
know/decide whether there are mostly known or unknown words in every new
document.
Assuming most words are already in the database, you could do the update
like this
Word.transaction do
wordlist.each do |word, freq|
update_count = Word.update_all(
["freq = freq + ?", freq],
["s = ?", word])
if update_count == 0
Word.create!(:s => word, :count => freq)
end
end
end
Michael
[1] http://dev.mysql.com/doc/refman/5.1/en/insert-on-duplicate.html
[2] http://www.postgresql.org/docs/current/static/plpgsql-control-
structures.html#PLPGSQL-ERROR-TRAPPING
--
Michael Schuerig
mailto:[email protected]
http://www.schuerig.de/michael/
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---