Re: How to Resolve : "The request took too long to iterate over doc values"?

2020-09-30 Thread raj.yadav
raj.yadav wrote
> In cases for which we are getting this warning, I'm not able to extract
> the
> `exact solr query`. Instead logger is logging `parsedquery ` for such
> cases.
> Here is one example:
> 
> 
> 2020-09-29 13:09:41.279 WARN  (qtp926837661-82461) [c:mycollection
> s:shard1_0 r:core_node5 x:mycollection_shard1_0_replica_n3]
> o.a.s.s.SolrIndexSearcher Query: [+FunctionScoreQuery(+*:*, scored by
> boost(product(if(max(const(0),
> sub(float(my_doc_value_field1),const(500))),const(0.01),
>
> if(max(const(0),sub(float(my_doc_value_field2),const(290))),const(0.2),const(1))),
>
> sqrt(product(sum(const(1),float(my_doc_value_field3),float(my_doc_value_field4)),
> sqrt(sum(const(1),float(my_doc_value_field5
> #BitSetDocTopFilter]; The request took too long to iterate over doc
> values.
> Timeout: timeoutAt: 1635297585120522 (System.nanoTime():
> 1635297690311384),
> DocValues=org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$8@7df12bf1
> 
> 


Hi Community members,

In my previous mail, I had mentioned that solr is not logging actual
`solr_query` and instead its only logging parsedquery. Actually, solr is
logging the solr_query just after logging above warning message.

Coming back to the above query for which we are getting above warning:
QUERY => retrieve all docs (i.e q = *:*) and ordered them using
multiplicative boost function (i.e boost functional query). So this clearly
rules out the possibility mentioned by Erick (i.e query might be searching
against field which has indexed=false and docValue=true). 

Is this expected on using doc values for the functional query? This is only
happening when the query is retrieving  large number of documents (in
millions).  Has anyone else faced this issue before? We are experiencing
this issue even when there is no load on the system.

Regards,
Raj






--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr 7.7 Indexing issue

2020-09-30 Thread Manisha Rahatadkar
Hello all

We are using Apache Solr 7.7 on Windows platform. The data is synced to Solr 
using Solr.Net commit. The data is being synced to SOLR in batches. The 
document size is very huge (~0.5GB average) and solr indexing is taking long 
time. Total document size is ~200GB. As the solr commit is done as a part of 
API, the API calls are failing as document indexing is not completed.

  1.  What is your advise on syncing such a large volume of data to Solr KB.
  2.  Because of the search fields requirements, almost 8 fields are defined as 
Text fields.
  3.  Currently Solr_JAVA_MEM is set to 2gb. Is that enough for such a large 
volume of data? ( IF "%SOLR_JAVA_MEM%"=="" set SOLR_JAVA_MEM=-Xms2g -Xmx2g)
  4.  How to set up Solr in production on Windows? Currently it's set up as a 
standalone engine and client is requested to take the backup of the drive. Is 
there any other better way to do? How to set up for the disaster recovery?

Thanks in advance.

Regards
Manisha Rahatadkar


Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.


Transaction not closed on ms sql

2020-09-30 Thread yaswanth kumar
Can some one help in troubleshooting some issues that happening from DIH??

Solr version: 8.2; zookeeper 3.4
Solr cloud with 4 nodes and 3 zookeepers

1. Configured DIH for ms sql with mssql jdbc driver, and when trying to pull 
the data from mssql it’s connecting and fetching records but we do see the 
connection that was opened on the other end mssql was not closed even though 
the full import was completed .. need some help in troubleshooting why it’s 
leaving connections open

2. The way I have scheduled this import api call is like a util that will be 
hitting DIH api every min with a solr pool url and with this it looks like 
multiple calls are going from different solr nodes which I don’t want .. I 
always need the call to be taken by only one node.. can we control this with 
any config?? Or is this happening because I have three zoo’s?? Please suggest 
the best approach 

3. I do see some records are shown as failed while doing import, is there a way 
to track these failures?? Like why a minimal no of records are failing??



Sent from my iPhone

Re: Master/Slave

2020-09-30 Thread Walter Underwood
We do this sort of thing outside of Solr. The indexing process includes creating
a feed file with one JSON object per line. The feed files are stored in S3 with
names that are ISO 8601 timestamps. Those files are picked up and loaded into
Solr. Because S3 is cross-region in AWS, those files are also our disaster
recovery method for indexing. And of course, two clusters could be loaded
from the same file.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 30, 2020, at 12:09 PM, David Hastings  
> wrote:
> 
>> whether we should expect Master/Slave replication also to be deprecated
> 
> it better not ever be depreciated.  it has been the most reliable mechanism
> for its purpose, solr cloud isnt going to replace standalone, if it does,
> thats when I guess I stop upgrading or move to elastic
> 
> On Wed, Sep 30, 2020 at 2:58 PM Oakley, Craig (NIH/NLM/NCBI) [C]
>  wrote:
> 
>> Based on the thread below (reading "legacy" as meaning "likely to be
>> deprecated in later versions"), we have been working to extract ourselves
>> from Master/Slave replication
>> 
>> Most of our collections need to be in two data centers (a read/write copy
>> in one local data center: the disaster-recovery-site SolrCloud could be
>> read-only). We also need redundancy within each data center for when one
>> host or another is unavailable. We implemented this by having different
>> SolrClouds in the different data centers; with Master/Slave replication
>> pulling data from one of the read/write replicas to each of the Slave
>> replicas in the disaster-recovery-site read-only SolrCloud. Additionally,
>> for some collections, there is a desire to have local read-only replicas
>> remain unchanged for querying during the loading process: for these
>> collections, there is a local read/write loading SolrCloud, a local
>> read-only querying SolrCloud (normally configured for Master/Slave
>> replication from one of the replicas of the loader SolrCloud to both
>> replicas of the query SolrCloud, but with Master/Slave disabled when the
>> load was in progress on the loader SolrCloud, and with Master/Slave resumed
>> after the loaded data passes QA checks).
>> 
>> Based on the thread below, we made an attempt to switch to CDCR. The main
>> reason for wanting to change was that CDCR was said to be the supported
>> mechanism, and the replacement for Master/Slave replication.
>> 
>> After multiple unsuccessful attempts to get CDCR to work, we ended up with
>> reproducible cases of CDCR loosing data in transit. In June, I initiated a
>> thread in this group asking for clarification of how/whether CDCR could be
>> made reliable. This seemed to me to be met with deafening silence until the
>> announcement in July of the release of Solr8.6 and the deprecation of CDCR.
>> 
>> So we are left with the question whether we should expect Master/Slave
>> replication also to be deprecated; and if so, with what is it expected to
>> be replaced (since not with CDCR)? Or is it now sufficiently safe to assume
>> that Master/Slave replication will continue to be supported after all
>> (since the assertion that it would be replaced by CDCR has been
>> discredited)? In either case, are there other suggested implementations of
>> having a read-only SolrCloud receive data from a read/write SolrCloud?
>> 
>> 
>> Thanks
>> 
>> -Original Message-
>> From: Shawn Heisey 
>> Sent: Tuesday, May 21, 2019 11:15 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SolrCloud (7.3) and Legacy replication slaves
>> 
>> On 5/21/2019 8:48 AM, Michael Tracey wrote:
>>> Is it possible set up an existing SolrCloud cluster as the master for
>>> legacy replication to a slave server or two?   It looks like another
>> option
>>> is to use Uni-direction CDCR, but not sure what is the best option in
>> this
>>> case.
>> 
>> You're asking for problems if you try to combine legacy replication with
>> SolrCloud.  The two features are not guaranteed to work together.
>> 
>> CDCR is your best bet.  This replicates from one SolrCloud cluster to
>> another.
>> 
>> Thanks,
>> Shawn
>> 



Re: Master/Slave

2020-09-30 Thread David Hastings
>whether we should expect Master/Slave replication also to be deprecated

it better not ever be depreciated.  it has been the most reliable mechanism
for its purpose, solr cloud isnt going to replace standalone, if it does,
thats when I guess I stop upgrading or move to elastic

On Wed, Sep 30, 2020 at 2:58 PM Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:

> Based on the thread below (reading "legacy" as meaning "likely to be
> deprecated in later versions"), we have been working to extract ourselves
> from Master/Slave replication
>
> Most of our collections need to be in two data centers (a read/write copy
> in one local data center: the disaster-recovery-site SolrCloud could be
> read-only). We also need redundancy within each data center for when one
> host or another is unavailable. We implemented this by having different
> SolrClouds in the different data centers; with Master/Slave replication
> pulling data from one of the read/write replicas to each of the Slave
> replicas in the disaster-recovery-site read-only SolrCloud. Additionally,
> for some collections, there is a desire to have local read-only replicas
> remain unchanged for querying during the loading process: for these
> collections, there is a local read/write loading SolrCloud, a local
> read-only querying SolrCloud (normally configured for Master/Slave
> replication from one of the replicas of the loader SolrCloud to both
> replicas of the query SolrCloud, but with Master/Slave disabled when the
> load was in progress on the loader SolrCloud, and with Master/Slave resumed
> after the loaded data passes QA checks).
>
> Based on the thread below, we made an attempt to switch to CDCR. The main
> reason for wanting to change was that CDCR was said to be the supported
> mechanism, and the replacement for Master/Slave replication.
>
> After multiple unsuccessful attempts to get CDCR to work, we ended up with
> reproducible cases of CDCR loosing data in transit. In June, I initiated a
> thread in this group asking for clarification of how/whether CDCR could be
> made reliable. This seemed to me to be met with deafening silence until the
> announcement in July of the release of Solr8.6 and the deprecation of CDCR.
>
> So we are left with the question whether we should expect Master/Slave
> replication also to be deprecated; and if so, with what is it expected to
> be replaced (since not with CDCR)? Or is it now sufficiently safe to assume
> that Master/Slave replication will continue to be supported after all
> (since the assertion that it would be replaced by CDCR has been
> discredited)? In either case, are there other suggested implementations of
> having a read-only SolrCloud receive data from a read/write SolrCloud?
>
>
> Thanks
>
> -Original Message-
> From: Shawn Heisey 
> Sent: Tuesday, May 21, 2019 11:15 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud (7.3) and Legacy replication slaves
>
> On 5/21/2019 8:48 AM, Michael Tracey wrote:
> > Is it possible set up an existing SolrCloud cluster as the master for
> > legacy replication to a slave server or two?   It looks like another
> option
> > is to use Uni-direction CDCR, but not sure what is the best option in
> this
> > case.
>
> You're asking for problems if you try to combine legacy replication with
> SolrCloud.  The two features are not guaranteed to work together.
>
> CDCR is your best bet.  This replicates from one SolrCloud cluster to
> another.
>
> Thanks,
> Shawn
>


Master/Slave

2020-09-30 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
Based on the thread below (reading "legacy" as meaning "likely to be deprecated 
in later versions"), we have been working to extract ourselves from 
Master/Slave replication

Most of our collections need to be in two data centers (a read/write copy in 
one local data center: the disaster-recovery-site SolrCloud could be 
read-only). We also need redundancy within each data center for when one host 
or another is unavailable. We implemented this by having different SolrClouds 
in the different data centers; with Master/Slave replication pulling data from 
one of the read/write replicas to each of the Slave replicas in the 
disaster-recovery-site read-only SolrCloud. Additionally, for some collections, 
there is a desire to have local read-only replicas remain unchanged for 
querying during the loading process: for these collections, there is a local 
read/write loading SolrCloud, a local read-only querying SolrCloud (normally 
configured for Master/Slave replication from one of the replicas of the loader 
SolrCloud to both replicas of the query SolrCloud, but with Master/Slave 
disabled when the load was in progress on the loader SolrCloud, and with 
Master/Slave resumed after the loaded data passes QA checks).

Based on the thread below, we made an attempt to switch to CDCR. The main 
reason for wanting to change was that CDCR was said to be the supported 
mechanism, and the replacement for Master/Slave replication.

After multiple unsuccessful attempts to get CDCR to work, we ended up with 
reproducible cases of CDCR loosing data in transit. In June, I initiated a 
thread in this group asking for clarification of how/whether CDCR could be made 
reliable. This seemed to me to be met with deafening silence until the 
announcement in July of the release of Solr8.6 and the deprecation of CDCR.

So we are left with the question whether we should expect Master/Slave 
replication also to be deprecated; and if so, with what is it expected to be 
replaced (since not with CDCR)? Or is it now sufficiently safe to assume that 
Master/Slave replication will continue to be supported after all (since the 
assertion that it would be replaced by CDCR has been discredited)? In either 
case, are there other suggested implementations of having a read-only SolrCloud 
receive data from a read/write SolrCloud?


Thanks

-Original Message-
From: Shawn Heisey  
Sent: Tuesday, May 21, 2019 11:15 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud (7.3) and Legacy replication slaves

On 5/21/2019 8:48 AM, Michael Tracey wrote:
> Is it possible set up an existing SolrCloud cluster as the master for
> legacy replication to a slave server or two?   It looks like another option
> is to use Uni-direction CDCR, but not sure what is the best option in this
> case.

You're asking for problems if you try to combine legacy replication with 
SolrCloud.  The two features are not guaranteed to work together.

CDCR is your best bet.  This replicates from one SolrCloud cluster to 
another.

Thanks,
Shawn


Re: advice on whether to use stopwords for use case

2020-09-30 Thread Walter Underwood
I’m not clear on the requirements. It sounds like the query “cigar” or “cuban 
cigar”
should return zero results. Is that right?

If so, then check for those words in the query before sending it to Solr.

But the stopwords approach seems like the requirement is different. Could you 
give
some examples?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 30, 2020, at 11:53 AM, Alexandre Rafalovitch  
> wrote:
> 
> You may also want to look at something like: 
> https://docs.querqy.org/index.html
> 
> ApacheCon had (is having..) a presentation on it that seemed quite
> relevant to your needs. The videos should be live in a week or so.
> 
> Regards,
>   Alex.
> 
> On Tue, 29 Sep 2020 at 22:56, Alexandre Rafalovitch  
> wrote:
>> 
>> I am not sure why you think stop words are your first choice. Maybe I
>> misunderstand the question. I read it as that you need to exclude
>> completely a set of documents that include specific keywords when
>> called from specific module.
>> 
>> If I wanted to differentiate the searches from specific module, I
>> would give that module a different end-point (Request Query Handler),
>> instead of /select. So, /nocigs or whatever.
>> 
>> Then, in that end-point, you could do all sorts of extra things, such
>> as setting appends or even invariants parameters, which would include
>> filter query to exclude any documents matching specific keywords. I
>> assume it is ok to return documents that are matching for other
>> reasons.
>> 
>> Ideally, you would mark the cigs documents during indexing with a
>> binary or enumeration flag and then during search you just need to
>> check against that flag. In that case, you could copyField  your text
>> and run it against something like
>> https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#keep-word-filter
>> combined with Shingles for multiwords. Or similar. And just transform
>> it as index-only so that the result is basically a yes/no flag.
>> Similar thing could be done with UpdateRequestProcessor pipeline if
>> you want to end up with a true boolean flag. The idea is the same,
>> just to have an index-only flag that you force lock into for any
>> request from specific module.
>> 
>> Or even with something like ElevationSearchComponent. Same idea.
>> 
>> Hope this helps.
>> 
>> Regards,
>>   Alex.
>> 
>> On Tue, 29 Sep 2020 at 22:28, Derek Poh  wrote:
>>> 
>>> Hi
>>> 
>>> I have read in the mailings list that we should try to avoid using stop
>>> words.
>>> 
>>> I have a use case where I would like to know if there is other
>>> alternative solutions beside using stop words.
>>> 
>>> There is business requirement to return zero result when the search is
>>> cigarette related words and the search is coming from a particular
>>> module on our site. It does not apply to all searches from our site.
>>> There is a list of these cigarette related words. This list contains
>>> single word, multiple words (Electronic cigar), multiple words with
>>> punctuation (e-cigarette case).
>>> I am planning to copy a different set of search fields, that will
>>> include the stopword filter in the index and query stage, for this
>>> module to use.
>>> 
>>> For this use case, other than using stop words to handle it, is there
>>> any alternative solution?
>>> 
>>> Derek
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> 
>>> This e-mail (including any attachments) may contain confidential and/or 
>>> privileged information. If you are not the intended recipient or have 
>>> received this e-mail in error, please inform the sender immediately and 
>>> delete this e-mail (including any attachments) from your computer, and you 
>>> must not use, disclose to anyone else or copy this e-mail (including any 
>>> attachments), whether in whole or in part.
>>> 
>>> This e-mail and any reply to it may be monitored for security, legal, 
>>> regulatory compliance and/or other appropriate reasons.



Re: advice on whether to use stopwords for use case

2020-09-30 Thread Alexandre Rafalovitch
You may also want to look at something like: https://docs.querqy.org/index.html

ApacheCon had (is having..) a presentation on it that seemed quite
relevant to your needs. The videos should be live in a week or so.

Regards,
   Alex.

On Tue, 29 Sep 2020 at 22:56, Alexandre Rafalovitch  wrote:
>
> I am not sure why you think stop words are your first choice. Maybe I
> misunderstand the question. I read it as that you need to exclude
> completely a set of documents that include specific keywords when
> called from specific module.
>
> If I wanted to differentiate the searches from specific module, I
> would give that module a different end-point (Request Query Handler),
> instead of /select. So, /nocigs or whatever.
>
> Then, in that end-point, you could do all sorts of extra things, such
> as setting appends or even invariants parameters, which would include
> filter query to exclude any documents matching specific keywords. I
> assume it is ok to return documents that are matching for other
> reasons.
>
> Ideally, you would mark the cigs documents during indexing with a
> binary or enumeration flag and then during search you just need to
> check against that flag. In that case, you could copyField  your text
> and run it against something like
> https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#keep-word-filter
> combined with Shingles for multiwords. Or similar. And just transform
> it as index-only so that the result is basically a yes/no flag.
> Similar thing could be done with UpdateRequestProcessor pipeline if
> you want to end up with a true boolean flag. The idea is the same,
> just to have an index-only flag that you force lock into for any
> request from specific module.
>
> Or even with something like ElevationSearchComponent. Same idea.
>
> Hope this helps.
>
> Regards,
>Alex.
>
> On Tue, 29 Sep 2020 at 22:28, Derek Poh  wrote:
> >
> > Hi
> >
> > I have read in the mailings list that we should try to avoid using stop
> > words.
> >
> > I have a use case where I would like to know if there is other
> > alternative solutions beside using stop words.
> >
> > There is business requirement to return zero result when the search is
> > cigarette related words and the search is coming from a particular
> > module on our site. It does not apply to all searches from our site.
> > There is a list of these cigarette related words. This list contains
> > single word, multiple words (Electronic cigar), multiple words with
> > punctuation (e-cigarette case).
> > I am planning to copy a different set of search fields, that will
> > include the stopword filter in the index and query stage, for this
> > module to use.
> >
> > For this use case, other than using stop words to handle it, is there
> > any alternative solution?
> >
> > Derek
> >
> > --
> > CONFIDENTIALITY NOTICE
> >
> > This e-mail (including any attachments) may contain confidential and/or 
> > privileged information. If you are not the intended recipient or have 
> > received this e-mail in error, please inform the sender immediately and 
> > delete this e-mail (including any attachments) from your computer, and you 
> > must not use, disclose to anyone else or copy this e-mail (including any 
> > attachments), whether in whole or in part.
> >
> > This e-mail and any reply to it may be monitored for security, legal, 
> > regulatory compliance and/or other appropriate reasons.


Re: How to Resolve : "The request took too long to iterate over doc values"?

2020-09-30 Thread raj.yadav
Hi,

I went through other queries for which we are getting `The request took too
long to iterate over doc values` warning. As pointed by Erick I have cross
check all the fields that are being used in query and there is no such field
against which we are searching and it as index=false and docValues=true.

Few observations I would like to share here:

- We are performing a load test on our system and the above timeout warning
is occurring for only those queries which are fetching a large number of
documents.

- I had stopped all the load on the system and fired same queries (for which
we were getting timeout warning). Here is solr response:

Solr Response:
response: {
numFound: 6082251,
start: 0,
maxScore: 4709.594,
docs: [ ]
}

The response was quite weird (header is saying there are `6082251` docs
found but `docs` array is empty) also there was no timeout warning in logs.
Then I increased `timeAllowed` to 5000ms (default is 1000ms). This time
`docs` array was not empty and in fact there was an increase in numFound
count. This clearly points that query was not able to complete in 1000ms
(default timeAllowed).

I have following question:
1. Is doc value is as effiecient as ExternalFileField for functional query?
2. Why I got warning message when system was under load but no warning
message was thrown when there was no not under laod?

When we were performing load test (load scale is same) with 
ExternalFileField type were not getting any warning messages in our logs.

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to Resolve : "The request took too long to iterate over doc values"?

2020-09-30 Thread raj.yadav
Hi,

I went through other queries for which we are getting `The request took too
long to iterate over doc values` warning. As pointed by Erick I have cross
check all the fields that are being used in query and there is no such field
against which we are searching and it as index=false and docValues=true.

Few observations I would like to share here:

- We are performing a load test on our system and the above timeout warning
is occurring for only those queries which are fetching a large number of
documents.

- I had stopped all the load on the system and fired same queries (for which
we were getting timeout warning). Here is solr response:

Solr Response:
response: {
numFound: 6082251,
start: 0,
maxScore: 4709.594,
docs: [ ]
}

The response was quite weird (header is saying there are `6082251` docs
found but `docs` array is empty) also there was no timeout warning in logs.
Then I increased `timeAllowed` to 5000ms (default is 1000ms). This time
`docs` array was not empty and in fact there was an increase in numFound
count. This clearly points that query was not able to complete in 1000ms
(default timeAllowed).

I have following question:
1. Is doc value is as effiecient as ExternalFileField for functional query?
2. Why I got warning message when system was under load but no when there
was no laod?

When we were performing load test (load scale is same) with 
ExternalFileField type were not getting any warning messages in our logs.

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to Resolve : "The request took too long to iterate over doc values"?

2020-09-30 Thread raj.yadav
I went through other queries for which we are getting `The request took too
long to iterate over doc values` warning. As pointed by Erick I have cross
check all the fields that are being used in query and there is no such field
against which we are searching and it as index=false and docValues=true. 

Few observations I would like to share here:

- We are performing a load test on our system and the above timeout warning
is occurring for only those queries which are fetching a large number of
documents. 

- I had stopped all the load on the system and fired same queries (for which
we were getting timeout warning). Here is solr response:

Solr Response:
response: {
numFound: 6082251,
start: 0,
maxScore: 4709.594,
docs: [ ]
}

The response was quite weird (header is saying there are `6082251` docs
found but `docs` array is empty) also there was no timeout warning in logs.
Then I increased `timeAllowed` to 5000ms (default is 1000ms). This time
`docs` array was not empty and in fact there was an increase in numFound
count. This clearly points that query was not able to complete in 1000ms
(default timeAllowed).

I have following question:
1. Is doc value is as effiecient as ExternalFileField for functional query?
2. Why I got warning message when system was under load but no when there
was no laod?

When we were performing load test (load scale is same) with 
ExternalFileField type were not getting any warning messages in our logs.



raj.yadav wrote
> Hey Erick,
> 
> In cases for which we are getting this warning, I'm not able to extract
> the
> `exact solr query`. Instead logger is logging `parsedquery ` for such
> cases.
> Here is one example:
> 
> 
> 2020-09-29 13:09:41.279 WARN  (qtp926837661-82461) [c:mycollection
> s:shard1_0 r:core_node5 x:mycollection_shard1_0_replica_n3]
> o.a.s.s.SolrIndexSearcher Query: [+FunctionScoreQuery(+*:*, scored by
> boost(product(if(max(const(0),
> sub(float(my_doc_value_field1),const(500))),const(0.01),
>
> if(max(const(0),sub(float(my_doc_value_field2),const(290))),const(0.2),const(1))),
>
> sqrt(product(sum(const(1),float(my_doc_value_field3),float(my_doc_value_field4)),
> sqrt(sum(const(1),float(my_doc_value_field5
> #BitSetDocTopFilter]; The request took too long to iterate over doc
> values.
> Timeout: timeoutAt: 1635297585120522 (System.nanoTime():
> 1635297690311384),
> DocValues=org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$8@7df12bf1
> 
> 
> 
> As per my understanding query in the above case is `q=*:*`. And then there
> is boost function which uses functional query on my_doc_value_field*
> (fieldtype doc_value_field i.e having index=false and docValue = true) to
> reorder matched docs. If docValue works efficiently for _function queries_
> then why this warning are coming?
> 
> 
> Also, we do use frange queries on doc_value_field (having index=false and
> docValue = true).
> example:
> {!frange l=1.0}my_doc_value_field1
> 
> 
> Erick Erickson wrote
>> Let’s see the query. My bet is that you are _searching_ against the field
>> and have indexed=false.
>> 
>> Searching against a docValues=true indexed=false field results in the
>> equivalent of a “table scan” in the RDBMS world. You may use
>> the docValues efficiently for _function queries_ to mimic some
>> search behavior.
>> 
>> Best,
>> Erick
> 
> 
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Slow Solr 8 response for long query

2020-09-30 Thread Erick Erickson
Increasing the number of rows should not have this kind of impact in either 
version of Solr, so I think there’s something fundamentally strange in your 
setup.

Whether returning 10 or 300 documents, every document has to be scored. There 
are two differences between 10 and 300 rows:

1> when returning 10 rows, Solr keeps a sorted list of 10 doc, just IDs and 
score (assuming you’re sorting by relevance), when returning 300 the list is 
300 long. I find it hard to believe that keeping a list 300 items long is 
making that much of a difference.

2> Solr needs to fetch/decompress/assemble 300 documents .vs. 10 documents for 
the response. Regardless of the fields returned, the entire document will be 
decompresses if you return any fields that are not docValues=true. So it’s 
possible that what you’re seeing is related.

Try adding, as Alexandre suggests,  to the query. Pay particular 
attention to the “timings” section too, that’ll show you the time each 
component took _exclusive_ of step <2> above and should give a clue.


All that said, fq clauses don’t score, so scoring is certainly involved in why 
the query takes so long to return even 10 rows but gets faster when you move 
the clause to a filter query, but my intuition is that there’s something else 
going on as well to account for the difference when you return 300 rows.

Best,
Erick

> On Sep 29, 2020, at 8:52 PM, Alexandre Rafalovitch  wrote:
> 
> What do the debug versions of the query show between two versions?
> 
> One thing that changed is sow (split on whitespace) parameter among
> many. It is unlikely to be the cause, but I am mentioning just in
> case.
> https://lucene.apache.org/solr/guide/8_6/the-standard-query-parser.html#standard-query-parser-parameters
> 
> Regards,
>   Alex
> 
> On Tue, 29 Sep 2020 at 20:47, Permakoff, Vadim
>  wrote:
>> 
>> Hi Solr Experts!
>> We are moving from Solr 6.5.1 to Solr 8.5.0 and having a problem with long 
>> query, which has a search text plus many OR and AND conditions (all in one 
>> place, the query is about 20KB long).
>> For the same set of data (about 500K docs) and the same schema the query in 
>> Solr 6 return results in less than 2 sec, Solr 8 takes more than 10 sec to 
>> get 10 results. If I increase the number of rows to 300, in Solr 6 it takes 
>> about 10 sec, in Solr 8 it takes more than 1 min. The results are small, 
>> just IDs. It looks like the relevancy scoring plays role, because if I move 
>> this query to filter query - both Solr versions work pretty fast.
>> The right way should be to change the query, but unfortunately it is 
>> difficult to modify the application which creates these queries, so I want 
>> to find some temporary workaround.
>> 
>> What was changed from Solr 6 to Solr 8 in terms of scoring with many 
>> conditions, which affects the search speed negatively?
>> Is there anything to configure in Solr 8 to get the same performance for 
>> such query like it was in Solr 6?
>> 
>> Thank you,
>> Vadim
>> 
>> 
>> 
>> This email is intended solely for the recipient. It may contain privileged, 
>> proprietary or confidential information or material. If you are not the 
>> intended recipient, please delete this email and any attachments and notify 
>> the sender of the error.



Re: ApacheCon at Home 2020 starts tomorrow!

2020-09-30 Thread Bram Van Dam
On 30/09/2020 05:14, Rahul Goswami wrote:
> Thanks for sharing this Anshum. Day 1 had some really interesting sessions.
> Missed out on a couple that I would have liked to listen to. Are the
> recordings of these sessions available anywhere?

The ASF will be uploading the recordings of all sessions "soon", which
probably means about a week or two.

 - Bram