[ 
https://issues.apache.org/jira/browse/SOLR-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468498
 ] 

Coda Hale commented on SOLR-122:
--------------------------------

Yonik -- The results switch back once the text gets more complicated than 
"woot." Your escape function is really fast as long as the block passed to 
String#gsub never gets called -- if there's nothing there to escape. Blocks are 
pretty slow compared with other means of branching. Good catch on the regexp 
compiling -- I didn't think that String#gsub compiled the first parameter to a 
Regexp every time.

Here's how it looks with 1000 random characters of [A-Za-z0-9<>&], 100,000 
times each:

                                          user     system      total        real
string concatenation:                 9.320000   0.070000   9.390000 (  
9.921551)
string substitution:                  9.210000   0.050000   9.260000 (  
9.660138)
string concatenation2:                7.610000   0.050000   7.660000 (  
7.919240)
string substitution2:                 7.550000   0.040000   7.590000 (  
7.817162)
catenation w/ single pass escape:    12.640000   0.070000  12.710000 ( 
13.121503)
substitution w/ single pass escape:  12.420000   0.070000  12.490000 ( 
12.845156)
libxml:                               2.050000   0.010000   2.060000 (  
2.108470)

libxml back in the lead. ;-)

Also, if you're on Mac or Linux, you can install libxml-ruby as follows: sudo 
gem install libxml-ruby

Be sure you've installed libxml2 first (sudo port install libxml2, sudo apt-get 
install libxml2, sudo rpm something-or-other).

If you're on Windows, you'll just have to take my word for it.

====

require "benchmark"
require "rexml/document"
require "rubygems"
require "xml/libxml"

TESTS = 100_000

CHARS = ('A'..'Z').to_a + ('a'..'z').to_a + ('0'..'9').to_a + ['<', '>', '&']
TEXT = ""
1000.times do
  TEXT << CHARS[rand(CHARS.size)]
end

def escape(text)
 text.gsub(/([&<>])/) { |ch|
   case ch
   when '&' then '&amp;'
   when '<' then '&lt;'
   when '>' then '&gt;'
   end
 }
end


Benchmark.bmbm do |results|
 results.report("string concatenation:") do
   TESTS.times do
     x = "<blah>"
     x << TEXT.gsub("&", "&amp;").gsub("<", "&lt;").gsub(">", "&gt;")
     x << "</blah>"
   end
 end

 results.report("string substitution:") do
   TESTS.times do
     x = "<blah>#{TEXT.gsub("&", "&amp;").gsub("<", "&lt;").gsub(">", 
"&gt;")}</blah>"
   end
 end

 results.report("string concatenation2:") do
   TESTS.times do
     x = "<blah>"
     x << TEXT.gsub(/&/, '&amp;').gsub(/</, '&lt;').gsub(/>/, '&gt;')
     x << "</blah>"
   end
 end

 results.report("string substitution2:") do
   TESTS.times do
     x = "<blah>#{TEXT.gsub(/&/, '&amp;').gsub(/</, '&lt;').gsub(/>/, 
'&gt;')}</blah>"
   end
 end

 results.report("catenation w/ single pass escape:") do
   TESTS.times do
     x = "<blah>"
     x << escape(TEXT)
     x << "</blah>"
   end
 end

 results.report("substitution w/ single pass escape:") do
   TESTS.times do
     x = "<blah>#{escape(TEXT)}</blah>"
   end
 end

 results.report("libxml:") do
   TESTS.times do
     e = XML::Node.new("blah")
     e << TEXT
     e.to_s
   end
 end
end

> Add optional support for Ruby-libxml2 (vs. REXML)
> -------------------------------------------------
>
>                 Key: SOLR-122
>                 URL: https://issues.apache.org/jira/browse/SOLR-122
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - ruby - flare
>            Reporter: Coda Hale
>         Attachments: libxml.rb, libxml.rb
>
>
> This file adds drop-in support for the ruby-libxml2, which is a wrapper for 
> the libxml2 library, which is an order of magnitude or so faster than REXML.
> This depends on my SOLR-121 patch for multi-document adds, since the behavior 
> of Solr::Request::AddDocument#to_s is different.
> Requiring this makes some tests fail, but for trivial reasons: some tests are 
> directly tied to REXML, others fail due to interelement whitespace added by 
> libxml2 (which you can't disable via the Ruby interface). Functionally, it's 
> identical, and passes all functional tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to