Re: [GitHub] [lucene-solr] uschindler commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

2021-02-14 Thread Michael Sokolov
So sorry to hear that Uwe; take your time to grieve - that's a big one, I think

-Mike

On Sat, Feb 13, 2021 at 9:57 AM GitBox  wrote:
>
>
> uschindler commented on pull request #2306:
> URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-778630343
>
>
>> @uschindler if you want to push up an example/make the change the way 
> you are thinking, I'm 100% happy to have that! Your Java skillz are way 
> beyond mine, and while I sort of understnad what you say, it may take you 20 
> minutes to get the change you want, and then I'll learn from you ;-)
>
>Sorry, my father died on Thursday. So excuse my ignorance! To me @dsmiley 
> changes look fine. I will review later as some "long term specialist" on XML 
> processing.
>
>
> 
> This is an automated message from the Apache Git Service.
> To respond to the message, please log on to GitHub and use the
> URL above to go to the specific comment.
>
> For queries about this service, please contact Infrastructure at:
> us...@infra.apache.org
>
>
>
> -
> To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
> For additional commands, e-mail: issues-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Circuit Breakers interaction with Shards

2021-02-14 Thread Walter Underwood
Rate limiting is a good idea. It requires a lot of ongoing engineering to 
adjust the rates to the current cluster behavior. It doesn’t help with some 
kinds of overload. The ROI just doesn’t work out. It is too much work for not 
enough benefit.

Rate limiting works if the collection size doesn’t change and the queries don’t 
change.

At Netflix, we limited traffic based on number of connections to each server. 
This is basically the length of the queue of requests for that server. This is 
similar to limiting by load average, which is also the work waiting to be done. 
It has the same weaknesses as the load average circuit breaker, but it did not 
need to be changed when average CPU usage per query increased. It was “set and 
forget”. Rate limiters require constant adjustment.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 14, 2021, at 11:44 AM, Atri Sharma  wrote:
> 
> This is a debate better suited for  a different forum  -- but I would 
> disagree with your assertion that rate limiting is a bad idea.
> 
> Solr allows you to specify node level request quotas which also follow the 
> principle of not limiting internal requests. I find that to be pretty useful 
> in two forms: 1. Use it in conjunction with a global request limit which is 
> typically 0.75 of my total load capacity given my average query resource 
> consumption. 2. Allow per node request limits to ensure fairness and 
> dedicated capacity for different types of requests. 3. Allow circuit breakers 
> to handle cases where a couple of rogue queries can take down nodes.
> 
> We digress -- as I said, it should be fairly simple to have a circuit breaker 
> which rejects only external requests,  but should be clearly documented with 
> its downsides.
> 
> On Mon, 15 Feb 2021, 00:33 Walter Underwood,  > wrote:
> We’ve looked at and rejected rate limiters as high-maintenance and not 
> sufficient protection.
> 
> We would have run nginx on each node, sent external traffic to nginx on a 
> different port and let internal traffic stay on the default Solr port. This 
> has other advantages (monitoring), but the rate limiting part is way too 
> fiddly.
> 
> Rates depend on how much CPU is used per query and on the size of the cluster 
> (if they are not on each node). Some examples from our largest cluster which 
> would need a change in rate limits. Some of these could be set by doing 
> offline load benchmarks, some not.
> 
> * Experiment cell that uses 2.5X more CPU for each query (running now in prod)
> * Increasing traffic allocated to that cell (did this last week)
> * Increase in index size (number of docs and CPU requirements increase about 
> 5% every month)
> * Website slowdown that shifts most traffic to mobile, where queries use 2X 
> as much CPU
> * Horizontal scaling from 24 tp 48 nodes
> * Vertical scaling from c5.8xlarge to c5.18xlarge
> 
> And so on. Rate limiting would require almost weekly load benchmarks and it 
> still wouldn’t catch the outage-causing problems.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/   (my blog)
> 
>> On Feb 14, 2021, at 10:25 AM, Atri Sharma > > wrote:
>> 
>> The way I look at it is that for cluster level stability, rate limiters 
>> should be used which allow rate limiting of only external requests. They are 
>> "circuit breakers" in the sense of defending against cluster level 
>> instability, which is what you describe.
>> 
>> Circuit breakers, in Solr world, are targeted to be the last resort defense 
>> of a node.
>> 
>> As I said earlier, it is possible to write a circuit breaker which rejects 
>> only external requests, but I personally do not see the benefit in presence 
>> of rate limiters.
>> 
>> On Sun, 14 Feb 2021, 23:50 Walter Underwood, > > wrote:
>> Ideally, it would only affect a few queries. In reality, with a sharded 
>> system, the impact will be large.
>> 
>> I disagree that the goal is to protect a node. The goal is to make the 
>> entire cluster avoid congestion failure when overloaded, while providing 
>> good service for the load that it can handle.
>> 
>> I have had Solr clusters take down entire websites when overloaded, both at 
>> Netflix and Chegg, and I’ve built defenses for this at both places. I’m a 
>> huge fan of circuit breakers.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org 
>> http://observer.wunderwood.org/   (my blog)
>> 
>>> On Feb 14, 2021, at 9:50 AM, Atri Sharma >> > wrote:
>>> 
>>> This has an issue of still leading to node outages if the fanout for a 
>>> query is high.
>>> 
>>> Circuit breakers follow a simple rule -- defend the node at the cost of 
>>> degraded responses.
>>> 
>>> Ideally, only few requests will be 

Re: Circuit Breakers interaction with Shards

2021-02-14 Thread Atri Sharma
This is a debate better suited for  a different forum  -- but I would
disagree with your assertion that rate limiting is a bad idea.

Solr allows you to specify node level request quotas which also follow the
principle of not limiting internal requests. I find that to be pretty
useful in two forms: 1. Use it in conjunction with a global request limit
which is typically 0.75 of my total load capacity given my average query
resource consumption. 2. Allow per node request limits to ensure fairness
and dedicated capacity for different types of requests. 3. Allow circuit
breakers to handle cases where a couple of rogue queries can take down
nodes.

We digress -- as I said, it should be fairly simple to have a circuit
breaker which rejects only external requests,  but should be clearly
documented with its downsides.

On Mon, 15 Feb 2021, 00:33 Walter Underwood,  wrote:

> We’ve looked at and rejected rate limiters as high-maintenance and not
> sufficient protection.
>
> We would have run nginx on each node, sent external traffic to nginx on a
> different port and let internal traffic stay on the default Solr port. This
> has other advantages (monitoring), but the rate limiting part is way too
> fiddly.
>
> Rates depend on how much CPU is used per query and on the size of the
> cluster (if they are not on each node). Some examples from our largest
> cluster which would need a change in rate limits. Some of these could be
> set by doing offline load benchmarks, some not.
>
> * Experiment cell that uses 2.5X more CPU for each query (running now in
> prod)
> * Increasing traffic allocated to that cell (did this last week)
> * Increase in index size (number of docs and CPU requirements increase
> about 5% every month)
> * Website slowdown that shifts most traffic to mobile, where queries use
> 2X as much CPU
> * Horizontal scaling from 24 tp 48 nodes
> * Vertical scaling from c5.8xlarge to c5.18xlarge
>
> And so on. Rate limiting would require almost weekly load benchmarks and
> it still wouldn’t catch the outage-causing problems.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Feb 14, 2021, at 10:25 AM, Atri Sharma  wrote:
>
> The way I look at it is that for cluster level stability, rate limiters
> should be used which allow rate limiting of only external requests. They
> are "circuit breakers" in the sense of defending against cluster level
> instability, which is what you describe.
>
> Circuit breakers, in Solr world, are targeted to be the last resort
> defense of a node.
>
> As I said earlier, it is possible to write a circuit breaker which rejects
> only external requests, but I personally do not see the benefit in presence
> of rate limiters.
>
> On Sun, 14 Feb 2021, 23:50 Walter Underwood, 
> wrote:
>
>> Ideally, it would only affect a few queries. In reality, with a sharded
>> system, the impact will be large.
>>
>> I disagree that the goal is to protect a node. The goal is to make the
>> entire cluster avoid congestion failure when overloaded, while providing
>> good service for the load that it can handle.
>>
>> I have had Solr clusters take down entire websites when overloaded, both
>> at Netflix and Chegg, and I’ve built defenses for this at both places. I’m
>> a huge fan of circuit breakers.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> On Feb 14, 2021, at 9:50 AM, Atri Sharma  wrote:
>>
>> This has an issue of still leading to node outages if the fanout for a
>> query is high.
>>
>> Circuit breakers follow a simple rule -- defend the node at the cost of
>> degraded responses.
>>
>> Ideally, only few requests will be completely rejected -- some will see
>> partial results. Due to this non discriminating nature of circuit breakers,
>> the typical blip on service quality due to high resource usage is short
>> lived.
>>
>> However, it is possible to write a circuit breaker which rejects only
>> external requests in master branch (we have the ability to identify
>> requests as internal or external there).
>>
>> Regards,
>>
>> Atri
>>
>> On Sun, 14 Feb 2021, 23:07 Walter Underwood, 
>> wrote:
>>
>>> This got zero responses on the solr-user list, so I’ll raise the issue
>>> here.
>>>
>>> Should circuit breakers only kill external search requests and not
>>> cluster-internal requests to shards?
>>>
>>> Circuit breakers can kill any request, whether it is a client request
>>> from outside the cluster or an internal distributed request to a shard.
>>> Killing a portion of distributed request will affect the main request. Not
>>> sure whether a 503 from a shard will kill the whole request or cause
>>> partial results, but it isn’t good.
>>>
>>> We run with 8 shards. If a circuit breaker is killing 10% of requests on
>>> each host, that will hit 57% of all external requests (0.9^8 = 0.43). That
>>> seems like “overkill” to me. If it only kills external requests, then 10%
>>> means 10%.
>>>

Re: Circuit Breakers interaction with Shards

2021-02-14 Thread Walter Underwood
We’ve looked at and rejected rate limiters as high-maintenance and not 
sufficient protection.

We would have run nginx on each node, sent external traffic to nginx on a 
different port and let internal traffic stay on the default Solr port. This has 
other advantages (monitoring), but the rate limiting part is way too fiddly.

Rates depend on how much CPU is used per query and on the size of the cluster 
(if they are not on each node). Some examples from our largest cluster which 
would need a change in rate limits. Some of these could be set by doing offline 
load benchmarks, some not.

* Experiment cell that uses 2.5X more CPU for each query (running now in prod)
* Increasing traffic allocated to that cell (did this last week)
* Increase in index size (number of docs and CPU requirements increase about 5% 
every month)
* Website slowdown that shifts most traffic to mobile, where queries use 2X as 
much CPU
* Horizontal scaling from 24 tp 48 nodes
* Vertical scaling from c5.8xlarge to c5.18xlarge

And so on. Rate limiting would require almost weekly load benchmarks and it 
still wouldn’t catch the outage-causing problems.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 14, 2021, at 10:25 AM, Atri Sharma  wrote:
> 
> The way I look at it is that for cluster level stability, rate limiters 
> should be used which allow rate limiting of only external requests. They are 
> "circuit breakers" in the sense of defending against cluster level 
> instability, which is what you describe.
> 
> Circuit breakers, in Solr world, are targeted to be the last resort defense 
> of a node.
> 
> As I said earlier, it is possible to write a circuit breaker which rejects 
> only external requests, but I personally do not see the benefit in presence 
> of rate limiters.
> 
> On Sun, 14 Feb 2021, 23:50 Walter Underwood,  > wrote:
> Ideally, it would only affect a few queries. In reality, with a sharded 
> system, the impact will be large.
> 
> I disagree that the goal is to protect a node. The goal is to make the entire 
> cluster avoid congestion failure when overloaded, while providing good 
> service for the load that it can handle.
> 
> I have had Solr clusters take down entire websites when overloaded, both at 
> Netflix and Chegg, and I’ve built defenses for this at both places. I’m a 
> huge fan of circuit breakers.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/   (my blog)
> 
>> On Feb 14, 2021, at 9:50 AM, Atri Sharma > > wrote:
>> 
>> This has an issue of still leading to node outages if the fanout for a query 
>> is high.
>> 
>> Circuit breakers follow a simple rule -- defend the node at the cost of 
>> degraded responses.
>> 
>> Ideally, only few requests will be completely rejected -- some will see 
>> partial results. Due to this non discriminating nature of circuit breakers, 
>> the typical blip on service quality due to high resource usage is short 
>> lived.
>> 
>> However, it is possible to write a circuit breaker which rejects only 
>> external requests in master branch (we have the ability to identify requests 
>> as internal or external there).
>> 
>> Regards,
>> 
>> Atri
>> 
>> On Sun, 14 Feb 2021, 23:07 Walter Underwood, > > wrote:
>> This got zero responses on the solr-user list, so I’ll raise the issue here.
>> 
>> Should circuit breakers only kill external search requests and not 
>> cluster-internal requests to shards?
>> 
>> Circuit breakers can kill any request, whether it is a client request from 
>> outside the cluster or an internal distributed request to a shard. Killing a 
>> portion of distributed request will affect the main request. Not sure 
>> whether a 503 from a shard will kill the whole request or cause partial 
>> results, but it isn’t good.
>> 
>> We run with 8 shards. If a circuit breaker is killing 10% of requests on 
>> each host, that will hit 57% of all external requests (0.9^8 = 0.43). That 
>> seems like “overkill” to me. If it only kills external requests, then 10% 
>> means 10%.
>> 
>> Killing only external requests requires that external requests go roughly 
>> equally to all hosts in the cluster, or at least all NRT or PULL replicas.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org 
>> http://observer.wunderwood.org/   (my blog)
> 



Re: Circuit Breaker Clean-up/Extension Jira

2021-02-14 Thread Atri Sharma
If you have a github account, you can fork Lucene/Solr repository, create a
branch in your fork, push your changes there and navigate to the Github
page of your fork which will provide you a button to create a PR.

On Sun, 14 Feb 2021, 23:46 Walter Underwood,  wrote:

> Sorry, couldn’t figure out how to do that for Solr. I do PRs all day on
> our company system, but that uses Bitbucket.
>
> The “how to contribute” docs just said to make a PR, which didn’t really
> help. I tried, but nothing worked.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Feb 14, 2021, at 9:56 AM, Atri Sharma  wrote:
>
> Also, if you could open a PR, it would be easier to review.
>
> On Sun, 14 Feb 2021, 23:22 Atri Sharma,  wrote:
>
>> Apologies for the delay. I will review this tomorrow
>>
>> On Sun, 14 Feb 2021, 23:06 Walter Underwood, 
>> wrote:
>>
>>> Please review for 8.9. We will use this feature after it is updated. The
>>> current circuit breakers won’t work for us.
>>>
>>> https://issues.apache.org/jira/browse/SOLR-15056
>>>
>>> This change:
>>>
>>> * Preserves existing functionality.
>>> * Renames the existing load average circuit breaker to a more accurate
>>> name.
>>> * Adds a circuit breaker for CPU usage that is available if the JVM
>>> supports it.
>>> * Adds detail to documentation, listing which JMX calls each circuit
>>> breaker is based on.
>>> * Copy-edits on docs for more detail, less complicated wording (good
>>> when English is not the reader’s primary language)
>>> * Includes unit tests.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>


Re: Circuit Breakers interaction with Shards

2021-02-14 Thread Atri Sharma
The way I look at it is that for cluster level stability, rate limiters
should be used which allow rate limiting of only external requests. They
are "circuit breakers" in the sense of defending against cluster level
instability, which is what you describe.

Circuit breakers, in Solr world, are targeted to be the last resort defense
of a node.

As I said earlier, it is possible to write a circuit breaker which rejects
only external requests, but I personally do not see the benefit in presence
of rate limiters.

On Sun, 14 Feb 2021, 23:50 Walter Underwood,  wrote:

> Ideally, it would only affect a few queries. In reality, with a sharded
> system, the impact will be large.
>
> I disagree that the goal is to protect a node. The goal is to make the
> entire cluster avoid congestion failure when overloaded, while providing
> good service for the load that it can handle.
>
> I have had Solr clusters take down entire websites when overloaded, both
> at Netflix and Chegg, and I’ve built defenses for this at both places. I’m
> a huge fan of circuit breakers.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Feb 14, 2021, at 9:50 AM, Atri Sharma  wrote:
>
> This has an issue of still leading to node outages if the fanout for a
> query is high.
>
> Circuit breakers follow a simple rule -- defend the node at the cost of
> degraded responses.
>
> Ideally, only few requests will be completely rejected -- some will see
> partial results. Due to this non discriminating nature of circuit breakers,
> the typical blip on service quality due to high resource usage is short
> lived.
>
> However, it is possible to write a circuit breaker which rejects only
> external requests in master branch (we have the ability to identify
> requests as internal or external there).
>
> Regards,
>
> Atri
>
> On Sun, 14 Feb 2021, 23:07 Walter Underwood, 
> wrote:
>
>> This got zero responses on the solr-user list, so I’ll raise the issue
>> here.
>>
>> Should circuit breakers only kill external search requests and not
>> cluster-internal requests to shards?
>>
>> Circuit breakers can kill any request, whether it is a client request
>> from outside the cluster or an internal distributed request to a shard.
>> Killing a portion of distributed request will affect the main request. Not
>> sure whether a 503 from a shard will kill the whole request or cause
>> partial results, but it isn’t good.
>>
>> We run with 8 shards. If a circuit breaker is killing 10% of requests on
>> each host, that will hit 57% of all external requests (0.9^8 = 0.43). That
>> seems like “overkill” to me. If it only kills external requests, then 10%
>> means 10%.
>>
>> Killing only external requests requires that external requests go roughly
>> equally to all hosts in the cluster, or at least all NRT or PULL replicas.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>
>


Re: Circuit Breakers interaction with Shards

2021-02-14 Thread Walter Underwood
Ideally, it would only affect a few queries. In reality, with a sharded system, 
the impact will be large.

I disagree that the goal is to protect a node. The goal is to make the entire 
cluster avoid congestion failure when overloaded, while providing good service 
for the load that it can handle.

I have had Solr clusters take down entire websites when overloaded, both at 
Netflix and Chegg, and I’ve built defenses for this at both places. I’m a huge 
fan of circuit breakers.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 14, 2021, at 9:50 AM, Atri Sharma  wrote:
> 
> This has an issue of still leading to node outages if the fanout for a query 
> is high.
> 
> Circuit breakers follow a simple rule -- defend the node at the cost of 
> degraded responses.
> 
> Ideally, only few requests will be completely rejected -- some will see 
> partial results. Due to this non discriminating nature of circuit breakers, 
> the typical blip on service quality due to high resource usage is short lived.
> 
> However, it is possible to write a circuit breaker which rejects only 
> external requests in master branch (we have the ability to identify requests 
> as internal or external there).
> 
> Regards,
> 
> Atri
> 
> On Sun, 14 Feb 2021, 23:07 Walter Underwood,  > wrote:
> This got zero responses on the solr-user list, so I’ll raise the issue here.
> 
> Should circuit breakers only kill external search requests and not 
> cluster-internal requests to shards?
> 
> Circuit breakers can kill any request, whether it is a client request from 
> outside the cluster or an internal distributed request to a shard. Killing a 
> portion of distributed request will affect the main request. Not sure whether 
> a 503 from a shard will kill the whole request or cause partial results, but 
> it isn’t good.
> 
> We run with 8 shards. If a circuit breaker is killing 10% of requests on each 
> host, that will hit 57% of all external requests (0.9^8 = 0.43). That seems 
> like “overkill” to me. If it only kills external requests, then 10% means 10%.
> 
> Killing only external requests requires that external requests go roughly 
> equally to all hosts in the cluster, or at least all NRT or PULL replicas.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/   (my blog)



Re: Circuit Breaker Clean-up/Extension Jira

2021-02-14 Thread Walter Underwood
Sorry, couldn’t figure out how to do that for Solr. I do PRs all day on our 
company system, but that uses Bitbucket.

The “how to contribute” docs just said to make a PR, which didn’t really help. 
I tried, but nothing worked.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 14, 2021, at 9:56 AM, Atri Sharma  wrote:
> 
> Also, if you could open a PR, it would be easier to review.
> 
> On Sun, 14 Feb 2021, 23:22 Atri Sharma,  > wrote:
> Apologies for the delay. I will review this tomorrow
> 
> On Sun, 14 Feb 2021, 23:06 Walter Underwood,  > wrote:
> Please review for 8.9. We will use this feature after it is updated. The 
> current circuit breakers won’t work for us.
> 
> https://issues.apache.org/jira/browse/SOLR-15056 
> 
> 
> This change:
> 
> * Preserves existing functionality.
> * Renames the existing load average circuit breaker to a more accurate name.
> * Adds a circuit breaker for CPU usage that is available if the JVM supports 
> it.
> * Adds detail to documentation, listing which JMX calls each circuit breaker 
> is based on.
> * Copy-edits on docs for more detail, less complicated wording (good when 
> English is not the reader’s primary language)
> * Includes unit tests.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/   (my blog)
> 



Re: Circuit Breaker Clean-up/Extension Jira

2021-02-14 Thread Atri Sharma
Also, if you could open a PR, it would be easier to review.

On Sun, 14 Feb 2021, 23:22 Atri Sharma,  wrote:

> Apologies for the delay. I will review this tomorrow
>
> On Sun, 14 Feb 2021, 23:06 Walter Underwood, 
> wrote:
>
>> Please review for 8.9. We will use this feature after it is updated. The
>> current circuit breakers won’t work for us.
>>
>> https://issues.apache.org/jira/browse/SOLR-15056
>>
>> This change:
>>
>> * Preserves existing functionality.
>> * Renames the existing load average circuit breaker to a more accurate
>> name.
>> * Adds a circuit breaker for CPU usage that is available if the JVM
>> supports it.
>> * Adds detail to documentation, listing which JMX calls each circuit
>> breaker is based on.
>> * Copy-edits on docs for more detail, less complicated wording (good when
>> English is not the reader’s primary language)
>> * Includes unit tests.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>


Re: Circuit Breaker Clean-up/Extension Jira

2021-02-14 Thread Atri Sharma
Apologies for the delay. I will review this tomorrow

On Sun, 14 Feb 2021, 23:06 Walter Underwood,  wrote:

> Please review for 8.9. We will use this feature after it is updated. The
> current circuit breakers won’t work for us.
>
> https://issues.apache.org/jira/browse/SOLR-15056
>
> This change:
>
> * Preserves existing functionality.
> * Renames the existing load average circuit breaker to a more accurate
> name.
> * Adds a circuit breaker for CPU usage that is available if the JVM
> supports it.
> * Adds detail to documentation, listing which JMX calls each circuit
> breaker is based on.
> * Copy-edits on docs for more detail, less complicated wording (good when
> English is not the reader’s primary language)
> * Includes unit tests.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Re: Circuit Breakers interaction with Shards

2021-02-14 Thread Atri Sharma
This has an issue of still leading to node outages if the fanout for a
query is high.

Circuit breakers follow a simple rule -- defend the node at the cost of
degraded responses.

Ideally, only few requests will be completely rejected -- some will see
partial results. Due to this non discriminating nature of circuit breakers,
the typical blip on service quality due to high resource usage is short
lived.

However, it is possible to write a circuit breaker which rejects only
external requests in master branch (we have the ability to identify
requests as internal or external there).

Regards,

Atri

On Sun, 14 Feb 2021, 23:07 Walter Underwood,  wrote:

> This got zero responses on the solr-user list, so I’ll raise the issue
> here.
>
> Should circuit breakers only kill external search requests and not
> cluster-internal requests to shards?
>
> Circuit breakers can kill any request, whether it is a client request from
> outside the cluster or an internal distributed request to a shard. Killing
> a portion of distributed request will affect the main request. Not sure
> whether a 503 from a shard will kill the whole request or cause partial
> results, but it isn’t good.
>
> We run with 8 shards. If a circuit breaker is killing 10% of requests on
> each host, that will hit 57% of all external requests (0.9^8 = 0.43). That
> seems like “overkill” to me. If it only kills external requests, then 10%
> means 10%.
>
> Killing only external requests requires that external requests go roughly
> equally to all hosts in the cluster, or at least all NRT or PULL replicas.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>


Circuit Breakers interaction with Shards

2021-02-14 Thread Walter Underwood
This got zero responses on the solr-user list, so I’ll raise the issue here.

Should circuit breakers only kill external search requests and not 
cluster-internal requests to shards?

Circuit breakers can kill any request, whether it is a client request from 
outside the cluster or an internal distributed request to a shard. Killing a 
portion of distributed request will affect the main request. Not sure whether a 
503 from a shard will kill the whole request or cause partial results, but it 
isn’t good.

We run with 8 shards. If a circuit breaker is killing 10% of requests on each 
host, that will hit 57% of all external requests (0.9^8 = 0.43). That seems 
like “overkill” to me. If it only kills external requests, then 10% means 10%.

Killing only external requests requires that external requests go roughly 
equally to all hosts in the cluster, or at least all NRT or PULL replicas.

wunder
Walter Underwood
wun...@wunderwood.org 
http://observer.wunderwood.org/   (my blog)

Circuit Breaker Clean-up/Extension Jira

2021-02-14 Thread Walter Underwood
Please review for 8.9. We will use this feature after it is updated. The 
current circuit breakers won’t work for us.

https://issues.apache.org/jira/browse/SOLR-15056

This change:

* Preserves existing functionality.
* Renames the existing load average circuit breaker to a more accurate name.
* Adds a circuit breaker for CPU usage that is available if the JVM supports it.
* Adds detail to documentation, listing which JMX calls each circuit breaker is 
based on.
* Copy-edits on docs for more detail, less complicated wording (good when 
English is not the reader’s primary language)
* Includes unit tests.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: [VOTE] Release Lucene/Solr 8.8.1 RC1

2021-02-14 Thread Timothy Potter
Looks like an extra space got added on the end of the python3 command, try
this one:

python3 -u dev-tools/scripts/smokeTestRelease.py
https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC1-rev6a50a0315ac7e4979abb0b530857c7795bb3b928




On Sun, Feb 14, 2021 at 9:26 AM Timothy Potter 
wrote:

> Please vote for release candidate 1 for Lucene/Solr 8.8.1
>
>
> The artifacts can be downloaded from:
>
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC1-rev6a50a0315ac7e4979abb0b530857c7795bb3b928
>
>
> You can run the smoke tester directly with this command:
>
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
>
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC1-rev6a50a0315ac7e4979abb0b530857c7795bb3b928
>
>
> The vote will be open for at least 72 hours i.e. until 2021-02-17 17:00
> UTC.
>
>
> Here is my +1 ~ SUCCESS! [0:50:06.728441]
>
>
> In addition to the smoke test, I built a Docker image from solr-8.8.1.tgz
> locally and verified:
>
>
> a. A rolling upgrade of a 3-node 8.7.0 cluster to the 8.8.1 RC completes
> successfully w/o any NPEs or weirdness with leader election / recoveries.
>
>
> b. The base_url property is stored in replica state after the upgrade
>
>
> c. A basic client application built with SolrJ 8.7.0 can load cluster
> state info directly from ZK and query the 8.8.1 RC1 servers.
>
>
> d. Same client app built with SolrJ 8.8.0 works as well.
>
>
> As this bug-fix release is primarily needed to address a SolrJ back-compat
> break (SOLR-15145) and unfortunately our smoke tester framework does not
> test for backcompat of older SolrJ against the RC, I ask others to please
> test rolling upgrades of servers (ideally multi-node clusters) running 
> pre-8.8.0
> to this RC if possible. Also, please try client applications that are using
> an older SolrJ, esp. those that load cluster state directly from ZK.
>
>
> Best regards,
>
> Tim
>
>
>
>
>


[VOTE] Release Lucene/Solr 8.8.1 RC1

2021-02-14 Thread Timothy Potter
Please vote for release candidate 1 for Lucene/Solr 8.8.1


The artifacts can be downloaded from:

https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC1-rev6a50a0315ac7e4979abb0b530857c7795bb3b928


You can run the smoke tester directly with this command:


python3 -u dev-tools/scripts/smokeTestRelease.py \

https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.8.1-RC1-rev6a50a0315ac7e4979abb0b530857c7795bb3b928


The vote will be open for at least 72 hours i.e. until 2021-02-17 17:00 UTC.


Here is my +1 ~ SUCCESS! [0:50:06.728441]


In addition to the smoke test, I built a Docker image from solr-8.8.1.tgz
locally and verified:


a. A rolling upgrade of a 3-node 8.7.0 cluster to the 8.8.1 RC completes
successfully w/o any NPEs or weirdness with leader election / recoveries.


b. The base_url property is stored in replica state after the upgrade


c. A basic client application built with SolrJ 8.7.0 can load cluster state
info directly from ZK and query the 8.8.1 RC1 servers.


d. Same client app built with SolrJ 8.8.0 works as well.


As this bug-fix release is primarily needed to address a SolrJ back-compat
break (SOLR-15145) and unfortunately our smoke tester framework does not
test for backcompat of older SolrJ against the RC, I ask others to please
test rolling upgrades of servers (ideally multi-node clusters) running
pre-8.8.0
to this RC if possible. Also, please try client applications that are using
an older SolrJ, esp. those that load cluster state directly from ZK.


Best regards,

Tim