Excerpts from Edward Z. Yang's message of Fri Jun 05 17:47:00 -0400 2009:
> Now that you mention it, the messages that tickle this bug on my side also
> have one extremely long line.  That's very interesting.

Here is the culprit, laid out to bear its full shame:

    /\w.*:$/

I thought this was a suspicious looking regexen; a simple test confirmed my
belief:

    line = ":a" * 10000
    line =~ /\w.*:$/

Ba boom ba boom ba boom.  This is a textbook case of catastrophic backtracking.

I have two possible fixes, they end up being about the same time for regular
cases, but the second one is more optimal for really long strings:

First, the simple one:

diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index 5993729..0ddd3af 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -26,7 +26,7 @@ class Message
 
   QUOTE_PATTERN = /^\s{0,4}[>|\}]/
   BLOCK_QUOTE_PATTERN = /^-----\s*Original Message\s*----+$/
-  QUOTE_START_PATTERN = /\w.*:$/
+  QUOTE_START_PATTERN = /\w\W*:$/
   SIG_PATTERN = /(^-- 
?$)|(^\s*----------+\s*$)|(^\s*_________+\s*$)|(^\s*--~--~-)|(^\s*--\+\+\*\*==)/
 
   MAX_SIG_DISTANCE = 15 # lines from the end

And the slightly more complicated one (but optimal for large n):

diff --git a/lib/sup/message.rb b/lib/sup/message.rb
index 5993729..c5481a6 100644
--- a/lib/sup/message.rb
+++ b/lib/sup/message.rb
@@ -26,7 +26,6 @@ class Message
 
   QUOTE_PATTERN = /^\s{0,4}[>|\}]/
   BLOCK_QUOTE_PATTERN = /^-----\s*Original Message\s*----+$/
-  QUOTE_START_PATTERN = /\w.*:$/
   SIG_PATTERN = /(^-- 
?$)|(^\s*----------+\s*$)|(^\s*_________+\s*$)|(^\s*--~--~-)|
 
   MAX_SIG_DISTANCE = 15 # lines from the end
@@ -449,7 +448,7 @@ private
       when :text
         newstate = nil
 
-        if line =~ QUOTE_PATTERN || (line =~ QUOTE_START_PATTERN && nextline 
=~ QUO
+        if line =~ QUOTE_PATTERN || (line =~ /:$/ && line =~ /\w/ && nextline 
=~ QU
           newstate = :quote
         elsif line =~ SIG_PATTERN && (lines.length - i) < MAX_SIG_DISTANCE
           newstate = :sig

There are number of micro-optimizations that could be made to message
parsing, but this will basically fix the egregious problem.

Cheers,
Edward
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk

Reply via email to