Here's what I'm using to clean text.

This removes all tags and attributes except those in the allowed_markup list, plus it adds rel="nofollow" to help minimize potential link spam.

(derived from http://redhanded.hobix.com/bits/ htmlFilteringForRedCloth.html)

Best,
Patrick

...

def clean_html(text, allowed_markup = nil)
  if allowed_markup == nil
    allowed_markup = {
      'a' => ['href']
      }
  end
  text.gsub!( /<!\[CDATA\[/, '' )
  text.gsub!( /<(\/*)(\w+)([^>]*)>/ ) do
    raw = $~
    tag = raw[2].downcase
    if allowed_markup.has_key? tag
      pcs = [tag]
      pcs << "rel=\"nofollow\"" if tag=='a'
      allowed_markup[tag].each do |prop|
        ['"', "'", ''].each do |q|
          q2 = ( q != '' ? q : '\s' )
          if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i
            attrv = $1
            next if tag!='img' and prop == 'src' and attrv !~ /^http/
            pcs << "#{prop}=\"#{$1.gsub('"', '\\"')}\""
            break
          end
        end
      end if allowed_markup[tag]
      "<#{raw[1]}#{pcs.join " "}>"
    else
      " "
    end
    end
    text.gsub('</a rel="nofollow">', "</a>")
end
_______________________________________________
Sdruby mailing list
[email protected]
http://lists.sdruby.com/mailman/listinfo/sdruby

Reply via email to