Hey Erick,
Thanks for the reply.
I plan on rebuilding my cluster soon with more nodes so that the index size
(including tlogs) is under 50% of the available disk at a minimum, ideally we
will shoot for under 33% budget permitting. I think I now understand the
problem that managing this
bq: I would still like to increase the number of transaction logs
retained so that shard recovery (outside of long term failures) is
faster than replicating the entire shard from the leader
That's legitimate, but (you knew that was coming!) nodes having to
recover _should_ be a rare event. Is
I completely agree with the other comments on this thread with regard to
needing more disk space asap, but I thought I’d add a few comments
regarding the specific questions here.
If your goal is to prevent full recovery requests, you only need to cover
the duration you expect a replica to be
Primarily our outages are caused by Java crashes or really long GC pauses, in
short not all of our developers have a good sense of what types of queries are
unsafe if abused (for example, cursorMark or start=).
Honestly, stability of the JVM is another task I have coming up. I agree that
Right, I've managed to double the memory required by Solr
by varying the _query_. Siiih.
There are some JIRAs out there (don't have them readily available, sorry)
that short-circuit queries that take "too long", and there are some others
to short circuit "expensive" queries. I believe this
First, every time you autocommit there _should_ be a new
tlog created. A hard commit truncates the tlog by design.
My guess (not based on knowing the code) is that
Real Time Get needs file handle open to the tlog files
and you'll have a bunch of them. Lots and lots and lots. Thus
the too many
I have opted to modify the number and size of transaction logs that I keep to
resolve the original issue I described. In so doing I think I have created a
new problem, feedback is appreciated.
Here are the new updateLog settings:
${solr.ulog.dir:}
On 10/27/2015 6:16 PM, Brian Scholl wrote:
> - a shard replica is larger than 50% of the available disk
This detail indicates a potential problem even without any of the other
details. The bottom line here is that if you don't have enough disk
space to hold your index three times, you can have
Both are excellent points and I will look to implement them. Particularly I
wonder if a respectable increase to the numRecordsToKeep param could solve this
problem entirely.
Thanks!
> On Oct 27, 2015, at 20:50, Jeff Wartes wrote:
>
>
> On the face of it, your
Hello,
I am experiencing a failure mode where a replica is unable to recover and it
will try to do so forever. In writing this email I want to make sure that I
haven't missed anything obvious or missed a configurable option that could
help. If something about this looks funny, I would really
Brian:
Two things come to mind here:
1> Even a partial index is better than none. Let's say we have a
leader and follower. Follower goes offline and thus out of date.
Follower comes back up and sees it needs to replicate and deletes the
index as the first step. At this very instant someone
Whoops, in the description of my setup that should say 2 replicas per shard.
Every server has a replica.
> On Oct 27, 2015, at 20:16, Brian Scholl wrote:
>
> Hello,
>
> I am experiencing a failure mode where a replica is unable to recover and it
> will try to do so
On the face of it, your scenario seems plausible. I can offer two pieces
of info that may or may not help you:
1. A write request to Solr will not be acknowledged until an attempt has
been made to write to all relevant replicas. So, B won’t ever be missing
updates that were applied to A, unless
13 matches
Mail list logo