Re: replica recovery

2015-11-19 Thread Brian Scholl
Hey Erick, Thanks for the reply. I plan on rebuilding my cluster soon with more nodes so that the index size (including tlogs) is under 50% of the available disk at a minimum, ideally we will shoot for under 33% budget permitting. I think I now understand the problem that managing this

Re: replica recovery

2015-11-19 Thread Erick Erickson
bq: I would still like to increase the number of transaction logs retained so that shard recovery (outside of long term failures) is faster than replicating the entire shard from the leader That's legitimate, but (you knew that was coming!) nodes having to recover _should_ be a rare event. Is

Re: replica recovery

2015-11-19 Thread Jeff Wartes
I completely agree with the other comments on this thread with regard to needing more disk space asap, but I thought I’d add a few comments regarding the specific questions here. If your goal is to prevent full recovery requests, you only need to cover the duration you expect a replica to be

Re: replica recovery

2015-11-19 Thread Brian Scholl
Primarily our outages are caused by Java crashes or really long GC pauses, in short not all of our developers have a good sense of what types of queries are unsafe if abused (for example, cursorMark or start=). Honestly, stability of the JVM is another task I have coming up. I agree that

Re: replica recovery

2015-11-19 Thread Erick Erickson
Right, I've managed to double the memory required by Solr by varying the _query_. Siiih. There are some JIRAs out there (don't have them readily available, sorry) that short-circuit queries that take "too long", and there are some others to short circuit "expensive" queries. I believe this

Re: replica recovery

2015-11-19 Thread Erick Erickson
First, every time you autocommit there _should_ be a new tlog created. A hard commit truncates the tlog by design. My guess (not based on knowing the code) is that Real Time Get needs file handle open to the tlog files and you'll have a bunch of them. Lots and lots and lots. Thus the too many

Re: replica recovery

2015-11-19 Thread Brian Scholl
I have opted to modify the number and size of transaction logs that I keep to resolve the original issue I described. In so doing I think I have created a new problem, feedback is appreciated. Here are the new updateLog settings: ${solr.ulog.dir:}

Re: replica recovery

2015-10-28 Thread Shawn Heisey
On 10/27/2015 6:16 PM, Brian Scholl wrote: > - a shard replica is larger than 50% of the available disk This detail indicates a potential problem even without any of the other details. The bottom line here is that if you don't have enough disk space to hold your index three times, you can have

Re: replica recovery

2015-10-27 Thread Brian Scholl
Both are excellent points and I will look to implement them. Particularly I wonder if a respectable increase to the numRecordsToKeep param could solve this problem entirely. Thanks! > On Oct 27, 2015, at 20:50, Jeff Wartes wrote: > > > On the face of it, your

replica recovery

2015-10-27 Thread Brian Scholl
Hello, I am experiencing a failure mode where a replica is unable to recover and it will try to do so forever. In writing this email I want to make sure that I haven't missed anything obvious or missed a configurable option that could help. If something about this looks funny, I would really

Re: replica recovery

2015-10-27 Thread Erick Erickson
Brian: Two things come to mind here: 1> Even a partial index is better than none. Let's say we have a leader and follower. Follower goes offline and thus out of date. Follower comes back up and sees it needs to replicate and deletes the index as the first step. At this very instant someone

Re: replica recovery

2015-10-27 Thread Brian Scholl
Whoops, in the description of my setup that should say 2 replicas per shard. Every server has a replica. > On Oct 27, 2015, at 20:16, Brian Scholl wrote: > > Hello, > > I am experiencing a failure mode where a replica is unable to recover and it > will try to do so

Re: replica recovery

2015-10-27 Thread Jeff Wartes
On the face of it, your scenario seems plausible. I can offer two pieces of info that may or may not help you: 1. A write request to Solr will not be acknowledged until an attempt has been made to write to all relevant replicas. So, B won’t ever be missing updates that were applied to A, unless