Re: How to block expensive solr queries
On Wed, Oct 9, 2019 at 9:59 AM Wei wrote: > Thanks all. I debugged a bit and see timeAllowed does not limit stats > call. Also I think it would be useful for solr to support a white list or > black list of operations as Toke suggested. Will create jira for it. > Currently seems the only option to explore is adding filter to solr's > embedded jetty. Does anyone have experience doing that? Do I also need to > change SolrDispatchFilter? > > On Tue, Oct 8, 2019 at 3:50 AM Toke Eskildsen wrote: > >> On Mon, 2019-10-07 at 10:18 -0700, Wei wrote: >> > /solr/mycollection/select?stats=true=unique_ids >> > cdistinct=true >> ... >> > Is there a way to block certain solr queries based on url pattern? >> > i.e. ignore the stats.calcdistinct request in this case. >> >> It sounds like it is possible for users to issue arbitrary queries >> against your Solr installation. As you have noticed, it makes it easy >> to perform a Denial Of Service (intentional or not). Filtering out >> stats.calcdistinct won't help with the next request for >> group.ngroups=true, facet.field=unique_id=1, >> rows=1 or something fifth. >> >> I recommend you flip your logic and only allow specific types of >> requests and put limits on those. To my knowledge that is not a build- >> in feature of Solr. >> >> - Toke Eskildsem, Royal Danish Library >> >> >>
Re: How to block expensive solr queries
On Mon, 2019-10-07 at 10:18 -0700, Wei wrote: > /solr/mycollection/select?stats=true=unique_ids > cdistinct=true ... > Is there a way to block certain solr queries based on url pattern? > i.e. ignore the stats.calcdistinct request in this case. It sounds like it is possible for users to issue arbitrary queries against your Solr installation. As you have noticed, it makes it easy to perform a Denial Of Service (intentional or not). Filtering out stats.calcdistinct won't help with the next request for group.ngroups=true, facet.field=unique_id=1, rows=1 or something fifth. I recommend you flip your logic and only allow specific types of requests and put limits on those. To my knowledge that is not a build- in feature of Solr. - Toke Eskildsem, Royal Danish Library
Re: How to block expensive solr queries
It's worth to raise an issue for supporting timeAllowed for stats. Until it's done, something like jetty filter is only an option, On Tue, Oct 8, 2019 at 12:34 AM Wei wrote: > Hi Mikhail, > > Yes I have the timeAllowed parameter configured, still is this case it > doesn't seem to prevent the stats request from blocking other normal > queries. Is it possible to drop the request before solr executes it? maybe > at the jetty request filter? > > Thanks, > Wei > > On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev wrote: > > > Hello, Wei. > > > > Have you tried to abandon heavy queries with > > > > > https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter > > ? > > It may or may not be able to stop stats. > > > > > https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223 > > can clarify it. > > > > On Mon, Oct 7, 2019 at 8:19 PM Wei wrote: > > > > > Hi, > > > > > > Recently we encountered a problem when solr cloud query latency > suddenly > > > increase, many simple queries that has small recall gets time out. > After > > > digging a bit I found that the root cause is some stats queries happen > at > > > the same time, such as > > > > > > > > > > > > /solr/mycollection/select?stats=true=unique_ids=true > > > > > > > > > > > > I see unique_ids is a high cardinality field so this query is quite > > > expensive. But why a small volume of such query blocks other queries > and > > > make simple queries time out? I checked the solr thread pool and see > > there > > > are plenty of idle threads available. We are using solr 7.6.2 with a > 10 > > > shard cloud set up. > > > > > > Is there a way to block certain solr queries based on url pattern? i.e. > > > ignore the stats.calcdistinct request in this case. > > > > > > > > > Thanks, > > > > > > Wei > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > -- Sincerely yours Mikhail Khludnev
Re: How to block expensive solr queries
Hi Mikhail, Yes I have the timeAllowed parameter configured, still is this case it doesn't seem to prevent the stats request from blocking other normal queries. Is it possible to drop the request before solr executes it? maybe at the jetty request filter? Thanks, Wei On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev wrote: > Hello, Wei. > > Have you tried to abandon heavy queries with > > https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter > ? > It may or may not be able to stop stats. > > https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223 > can clarify it. > > On Mon, Oct 7, 2019 at 8:19 PM Wei wrote: > > > Hi, > > > > Recently we encountered a problem when solr cloud query latency suddenly > > increase, many simple queries that has small recall gets time out. After > > digging a bit I found that the root cause is some stats queries happen at > > the same time, such as > > > > > > > /solr/mycollection/select?stats=true=unique_ids=true > > > > > > > > I see unique_ids is a high cardinality field so this query is quite > > expensive. But why a small volume of such query blocks other queries and > > make simple queries time out? I checked the solr thread pool and see > there > > are plenty of idle threads available. We are using solr 7.6.2 with a 10 > > shard cloud set up. > > > > Is there a way to block certain solr queries based on url pattern? i.e. > > ignore the stats.calcdistinct request in this case. > > > > > > Thanks, > > > > Wei > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: How to block expensive solr queries
Hello, Wei. Have you tried to abandon heavy queries with https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter ? It may or may not be able to stop stats. https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223 can clarify it. On Mon, Oct 7, 2019 at 8:19 PM Wei wrote: > Hi, > > Recently we encountered a problem when solr cloud query latency suddenly > increase, many simple queries that has small recall gets time out. After > digging a bit I found that the root cause is some stats queries happen at > the same time, such as > > > /solr/mycollection/select?stats=true=unique_ids=true > > > > I see unique_ids is a high cardinality field so this query is quite > expensive. But why a small volume of such query blocks other queries and > make simple queries time out? I checked the solr thread pool and see there > are plenty of idle threads available. We are using solr 7.6.2 with a 10 > shard cloud set up. > > Is there a way to block certain solr queries based on url pattern? i.e. > ignore the stats.calcdistinct request in this case. > > > Thanks, > > Wei > -- Sincerely yours Mikhail Khludnev
How to block expensive solr queries
Hi, Recently we encountered a problem when solr cloud query latency suddenly increase, many simple queries that has small recall gets time out. After digging a bit I found that the root cause is some stats queries happen at the same time, such as /solr/mycollection/select?stats=true=unique_ids=true I see unique_ids is a high cardinality field so this query is quite expensive. But why a small volume of such query blocks other queries and make simple queries time out? I checked the solr thread pool and see there are plenty of idle threads available. We are using solr 7.6.2 with a 10 shard cloud set up. Is there a way to block certain solr queries based on url pattern? i.e. ignore the stats.calcdistinct request in this case. Thanks, Wei