On Nov 26, 2008, at 9:54 AM, Matt Mitchell wrote:
Yeah I overlooked all of that. Thanks Erik. So could a better query
test be
an incremental one based on id like:
100.times do |id|
q = "id:#{id}"
# query request here...
end
?
Testing is an art form. Depends on what you are testing. Issuing
entirely unique queries is not very real-world either, but at least it
will cause the bypassing of query and HTTP caching shortcuts.
Many organizations mine their query logs to get a set of
representative queries to test with, for example.
I think your point is proven - EmbeddedSolrServer itself is faster
than CommonsHttpSolrServer. But would you deploy that way? Is your
front-end going to be merged with Solr itself? That may or may not be
very viable, depending on the resources the front-end and Solr needs
and how much system resources you have. What about doing load
balancing? You're then stuck with load balancing your front-end in
tandem with Solr itself.
Again, it all boils down to what you're after with the benchmarks.
And I'm not a benchmarking performance savvy person myself, so I'm not
sure where to take it from here. It's an interesting test, for sure,
and I'd like to have it reviewed by others that really know their
stuff in this realm and with Solr itself that can elaborate on why
there is such a huge difference in speed. Is it just HTTP and
serialize/unserialize overhead? (I tend to doubt that, but don't know)
Would you happen to know why the solr home and data dir never really
change?
Anytime I use commons http or embedded, a "solr" directory is
created in the
same directory as my script. Even though I'm setting the home and
data dir
in my code?
I don't know at the moment, I'd have to dig deeper.
Erik