Re: Lucene/Solr Filesystem tunings

2013-06-10 Thread Ryan Zezeski
Just to add to the pile...use the Deadline or NOOP I/O scheduler.

-Z


On Sat, Jun 8, 2013 at 4:40 PM, Mark Miller markrmil...@gmail.com wrote:

 Turning swappiness down to 0 can have some decent performance impact.

 - http://en.wikipedia.org/wiki/Swappiness

 In the past, I've seen better performance with ext3 over ext4 around
 commits/fsync. Test were actually enough slower (lots of these operations),
 that I made a special ext3 partition workspace for lucene/solr dev. (Still
 use ext4 for root and home).

 Have not checked that recently, and it may not be a large concern for many
 use cases.

 - Mark

 On Jun 4, 2013, at 6:48 PM, Tim Vaillancourt t...@elementspace.com wrote:

  Hey all,
 
  Does anyone have any advice or special filesytem tuning to share for
 Lucene/Solr, and which file systems they like more?
 
  Also, does Lucene/Solr care about access times if I turn them off (I
 think I doesn't care)?
 
  A bit unrelated: What are people's opinions on reducing some consistency
 things like filesystem journaling, etc (ext2?) due to SolrCloud's
 additional HA with replicas? How about RAID 0 x 3 replicas or so?
 
  Thanks!
 
  Tim Vaillancourt




Re: Distributed Search and the Stale Check

2013-02-27 Thread Ryan Zezeski
On Mon, Feb 25, 2013 at 8:26 PM, Mark Miller markrmil...@gmail.com wrote:

 Please file a JIRA issue and attach your patch. Great write up! (Saw it
 pop up on twitter, so I read it a little earlier).


Done.

https://issues.apache.org/jira/browse/SOLR-4509


Distributed Search and the Stale Check

2013-02-25 Thread Ryan Zezeski
Hello Solr Users,

I just wrote up a piece about some work I did recently to improve the
throughput of distributed search.

http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html

The short of it is that the stale check in Apache's HTTP Client used by
SolrJ can add a lot of latency to a distributed search request.  Especially
given that distributed search is actually made up of 2 stages, each of
which must perform its own stale check.  For my particular benchmark setup
I saw a 2-4x increase in throughput and 100ms+ drop in latency.  All my
work has been done in context of a larger project, Yokozuna [1], and thus
the patch is currently local to that project.  I would like to see a
similar fix made upstream and that is why I am posting here.  I was hoping
the Solr sages could offer their input.  My fix is very basic, simply
disabling the check and adding a sweeper thread to prevent socket reset
errors [2].  But if I had more time I think a rewrite using the latest
Apache HTTP Components might be in order.  I'm not sure.  I'm happy to
answer any questions and give more details on my test setup.

-Z

[1] https://github.com/rzezeski/yokozuna

[2]
https://github.com/rzezeski/yokozuna/blob/a731748f07ee2156b5b3eb558e6b8a3efda4bfe4/solr-patches/no-stale-check.patch


Re: Distributed Search and the Stale Check

2013-02-25 Thread Ryan Zezeski
On Mon, Feb 25, 2013 at 8:42 PM, Yonik Seeley yo...@lucidworks.com wrote:


 That's insane!


It is insane.  Keep in mind this was a 5-node cluster on the
same physical machine sharing the same resources.  It consist of 5 smartos
zones on the same global zone.  On my MacBook Pro I saw ~1.5ms per stale
check but that was not under load (I'm honestly not sure if on/off load
makes a difference as it didn't seem to on my smartos cluster).  I could
probably get to the root of this with DTrace/BTrace, but alas I haven't
bothered.



 It's still not even clear to me how the stale check works (reliably).
 Couldn't the server still close the connection between the stale check
 and the send of data by the client?


The stale check isn't 100%, but it works most of the time.  As you say, the
server could close the socket between the stale check completing and the
request data being sent.  I'm pretty sure Oleg, one of the maintainers, has
said as much but I can't find the original context.

-Z


Re: SolrCloud - Query performance degrades with multiple servers

2012-12-06 Thread Ryan Zezeski
There are some gains to be made in Solr's distributed search code.  A few
weeks about I spent time profiling dist search using dtrace/btrace and
found some areas for improvement.  I planned on writing up some blog posts
and providing patches but I'll list them off now in case others have input.

1) Disable the http client stale check.  It is known to cause latency
issues.  Doing this gave be a 4x increase in perf.

2) Disable nagle, many tiny packets are not being sent (to my knowledge),
so don't wait.

3) Use a single TermEnum for all external id-lucene id lookups.  This
seemed to reduce total bytes read according to dtrace.

4) Building off #3, cache a certain number of external id-lucene id.
 Avoding the TermEnum altogether.

5) If fl=id is present then dont' run the 2nd phase of the dist search.

I'm still very new to Solr so there could be issues with any of the patches
I propose above that I'm not aware of.  Would love to hear input.

-Z

On Wed, Dec 5, 2012 at 8:35 PM, sausarkar sausar...@ebay.com wrote:

 We are using SolrCloud and trying to configure it for testing purposes, we
 are seeing that the average query time is increasing if we have more than
 one node in the SolrCloud cluster. We have a single shard 12 gigs
 index.Example:1 node, average query time *~28 msec* , load 140
 queries/second3 nodes, average query time *~110 msec*, load 420
 queries/second distributed equally on three servers so essentially 140 qps
 on each node.Is there any inter node communication going on for queries, is
 there any setting on the Solrcloud for query tuning for a  cloud config
 with
 multiple nodes.Please help.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660.html
 Sent from the Solr - User mailing list archive at Nabble.com.