On Nov 26, 2008, at 9:54 AM, Matt Mitchell wrote:
Yeah I overlooked all of that. Thanks Erik. So could a better query test be
an incremental one based on id like:

100.times do |id|
 q = "id:#{id}"
 # query request here...
end

?

Testing is an art form. Depends on what you are testing. Issuing entirely unique queries is not very real-world either, but at least it will cause the bypassing of query and HTTP caching shortcuts.

Many organizations mine their query logs to get a set of representative queries to test with, for example.

I think your point is proven - EmbeddedSolrServer itself is faster than CommonsHttpSolrServer. But would you deploy that way? Is your front-end going to be merged with Solr itself? That may or may not be very viable, depending on the resources the front-end and Solr needs and how much system resources you have. What about doing load balancing? You're then stuck with load balancing your front-end in tandem with Solr itself.

Again, it all boils down to what you're after with the benchmarks. And I'm not a benchmarking performance savvy person myself, so I'm not sure where to take it from here. It's an interesting test, for sure, and I'd like to have it reviewed by others that really know their stuff in this realm and with Solr itself that can elaborate on why there is such a huge difference in speed. Is it just HTTP and serialize/unserialize overhead? (I tend to doubt that, but don't know)

Would you happen to know why the solr home and data dir never really change? Anytime I use commons http or embedded, a "solr" directory is created in the same directory as my script. Even though I'm setting the home and data dir
in my code?

I don't know at the moment, I'd have to dig deeper.

        Erik

Reply via email to