Re: Lucene/Solr Filesystem tunings
Just to add to the pile...use the Deadline or NOOP I/O scheduler. -Z On Sat, Jun 8, 2013 at 4:40 PM, Mark Miller markrmil...@gmail.com wrote: Turning swappiness down to 0 can have some decent performance impact. - http://en.wikipedia.org/wiki/Swappiness In the past, I've seen better performance with ext3 over ext4 around commits/fsync. Test were actually enough slower (lots of these operations), that I made a special ext3 partition workspace for lucene/solr dev. (Still use ext4 for root and home). Have not checked that recently, and it may not be a large concern for many use cases. - Mark On Jun 4, 2013, at 6:48 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey all, Does anyone have any advice or special filesytem tuning to share for Lucene/Solr, and which file systems they like more? Also, does Lucene/Solr care about access times if I turn them off (I think I doesn't care)? A bit unrelated: What are people's opinions on reducing some consistency things like filesystem journaling, etc (ext2?) due to SolrCloud's additional HA with replicas? How about RAID 0 x 3 replicas or so? Thanks! Tim Vaillancourt
Re: Distributed Search and the Stale Check
On Mon, Feb 25, 2013 at 8:26 PM, Mark Miller markrmil...@gmail.com wrote: Please file a JIRA issue and attach your patch. Great write up! (Saw it pop up on twitter, so I read it a little earlier). Done. https://issues.apache.org/jira/browse/SOLR-4509
Distributed Search and the Stale Check
Hello Solr Users, I just wrote up a piece about some work I did recently to improve the throughput of distributed search. http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html The short of it is that the stale check in Apache's HTTP Client used by SolrJ can add a lot of latency to a distributed search request. Especially given that distributed search is actually made up of 2 stages, each of which must perform its own stale check. For my particular benchmark setup I saw a 2-4x increase in throughput and 100ms+ drop in latency. All my work has been done in context of a larger project, Yokozuna [1], and thus the patch is currently local to that project. I would like to see a similar fix made upstream and that is why I am posting here. I was hoping the Solr sages could offer their input. My fix is very basic, simply disabling the check and adding a sweeper thread to prevent socket reset errors [2]. But if I had more time I think a rewrite using the latest Apache HTTP Components might be in order. I'm not sure. I'm happy to answer any questions and give more details on my test setup. -Z [1] https://github.com/rzezeski/yokozuna [2] https://github.com/rzezeski/yokozuna/blob/a731748f07ee2156b5b3eb558e6b8a3efda4bfe4/solr-patches/no-stale-check.patch
Re: Distributed Search and the Stale Check
On Mon, Feb 25, 2013 at 8:42 PM, Yonik Seeley yo...@lucidworks.com wrote: That's insane! It is insane. Keep in mind this was a 5-node cluster on the same physical machine sharing the same resources. It consist of 5 smartos zones on the same global zone. On my MacBook Pro I saw ~1.5ms per stale check but that was not under load (I'm honestly not sure if on/off load makes a difference as it didn't seem to on my smartos cluster). I could probably get to the root of this with DTrace/BTrace, but alas I haven't bothered. It's still not even clear to me how the stale check works (reliably). Couldn't the server still close the connection between the stale check and the send of data by the client? The stale check isn't 100%, but it works most of the time. As you say, the server could close the socket between the stale check completing and the request data being sent. I'm pretty sure Oleg, one of the maintainers, has said as much but I can't find the original context. -Z
Re: SolrCloud - Query performance degrades with multiple servers
There are some gains to be made in Solr's distributed search code. A few weeks about I spent time profiling dist search using dtrace/btrace and found some areas for improvement. I planned on writing up some blog posts and providing patches but I'll list them off now in case others have input. 1) Disable the http client stale check. It is known to cause latency issues. Doing this gave be a 4x increase in perf. 2) Disable nagle, many tiny packets are not being sent (to my knowledge), so don't wait. 3) Use a single TermEnum for all external id-lucene id lookups. This seemed to reduce total bytes read according to dtrace. 4) Building off #3, cache a certain number of external id-lucene id. Avoding the TermEnum altogether. 5) If fl=id is present then dont' run the 2nd phase of the dist search. I'm still very new to Solr so there could be issues with any of the patches I propose above that I'm not aware of. Would love to hear input. -Z On Wed, Dec 5, 2012 at 8:35 PM, sausarkar sausar...@ebay.com wrote: We are using SolrCloud and trying to configure it for testing purposes, we are seeing that the average query time is increasing if we have more than one node in the SolrCloud cluster. We have a single shard 12 gigs index.Example:1 node, average query time *~28 msec* , load 140 queries/second3 nodes, average query time *~110 msec*, load 420 queries/second distributed equally on three servers so essentially 140 qps on each node.Is there any inter node communication going on for queries, is there any setting on the Solrcloud for query tuning for a cloud config with multiple nodes.Please help. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660.html Sent from the Solr - User mailing list archive at Nabble.com.