Here's what I'm using to clean text.
This removes all tags and attributes except those in the
allowed_markup list, plus it adds rel="nofollow" to help minimize
potential link spam.
(derived from http://redhanded.hobix.com/bits/
htmlFilteringForRedCloth.html)
Best,
Patrick
...
def clean_html(text, allowed_markup = nil)
if allowed_markup == nil
allowed_markup = {
'a' => ['href']
}
end
text.gsub!( /<!\[CDATA\[/, '' )
text.gsub!( /<(\/*)(\w+)([^>]*)>/ ) do
raw = $~
tag = raw[2].downcase
if allowed_markup.has_key? tag
pcs = [tag]
pcs << "rel=\"nofollow\"" if tag=='a'
allowed_markup[tag].each do |prop|
['"', "'", ''].each do |q|
q2 = ( q != '' ? q : '\s' )
if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i
attrv = $1
next if tag!='img' and prop == 'src' and attrv !~ /^http/
pcs << "#{prop}=\"#{$1.gsub('"', '\\"')}\""
break
end
end
end if allowed_markup[tag]
"<#{raw[1]}#{pcs.join " "}>"
else
" "
end
end
text.gsub('</a rel="nofollow">', "</a>")
end
_______________________________________________
Sdruby mailing list
[email protected]
http://lists.sdruby.com/mailman/listinfo/sdruby