[Test failure] TestFoldingMultitermExtrasQuery Looks to be a ThreadLeak Branch 8.4.1

2020-04-07 Thread sergio
So I already found the problem over here
https://lucene.472066.n3.nabble.com/mvn-test-failing-td4361042.html
  





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Spellcheck on specified fields?

2020-04-07 Thread TK Solr
Correction. "mark seattle" query doesn't show suggestions since "mark" alone has 
some hits.
It is when the same logic is used for a single term query of "seatle" that 3 
suggestions of "seattle"

are returned. Do I have to identify the field by using startOffset value?

On 4/7/20 3:46 PM, TK Solr wrote:

I query on multiple field like:

q=city:(mark seattle) name:(mark seattle) phone:(mark seattle)=true

The raw query terms are distributed to all fields because I don't know what 
term is intended to for which field.


If I misspell seattle, I get 3 suggestions:

"spellcheck":{
    "suggestions":[
  "seatle",{
    "numFound":1,
    "startOffset":29,
    "endOffset":35,
    "suggestion":["seattle"]},
  "seatle",{
    "numFound":1,
    "startOffset":50,
    "endOffset":56,
    "suggestion":["seattle"]},
  "seatle",{
    "numFound":1,
    "startOffset":73,
    "endOffset":79,
    "suggestion":["seattle"]}]}}

(Please disregard exact numbers. It's from more complicated query of the same 
nature.)


I think it's showing a correction suggestion for each query field.

Since the phone field keeps a phone number and spelling corrections are not 
very useful,
I would like the spellchecker to skip this and similar fields but I don't see 
a relevant

parameter in spellchecker's documentation. Is there any way to specify the
fields I am interested or I am not interested?

TK





Spellcheck on specified fields?

2020-04-07 Thread TK Solr

I query on multiple field like:

q=city:(mark seattle) name:(mark seattle) phone:(mark seattle)=true

The raw query terms are distributed to all fields because I don't know what term 
is intended to for which field.


If I misspell seattle, I get 3 suggestions:

"spellcheck":{
    "suggestions":[
  "seatle",{
    "numFound":1,
    "startOffset":29,
    "endOffset":35,
    "suggestion":["seattle"]},
  "seatle",{
    "numFound":1,
    "startOffset":50,
    "endOffset":56,
    "suggestion":["seattle"]},
  "seatle",{
    "numFound":1,
    "startOffset":73,
    "endOffset":79,
    "suggestion":["seattle"]}]}}

(Please disregard exact numbers. It's from more complicated query of the same 
nature.)


I think it's showing a correction suggestion for each query field.

Since the phone field keeps a phone number and spelling corrections are not very 
useful,
I would like the spellchecker to skip this and similar fields but I don't see a 
relevant

parameter in spellchecker's documentation. Is there any way to specify the
fields I am interested or I am not interested?

TK





Query confusion - solr cloud 8.2.0

2020-04-07 Thread Joe Obernberger

I'm running the following query:

id:COLLECT2601697594_T496 AND (person:[80 TO 100])
That returns 1 hit.

The following query also returns the same hit:

id:COLLECT2601697594_T496 AND ((POP16_Rez1:blue_Sky AND POP16_Sc1:[80 TO 
100]) OR (POP16_Rez2:blue_Sky AND POP16_Sc2:[80 TO 100]) OR 
(POP16_Rez3:blue_Sky AND POP16_Sc3:[80 TO 100]) OR (POP19_Rez1:blue_Sky 
AND POP19_Sc1:[80 TO 100]) OR (POP19_Rez2:blue_Sky AND POP19_Sc2:[80 TO 
100]) OR (POP19_Rez3:blue_Sky AND POP19_Sc3:[80 TO 100]) OR 
(ResN_Rez1:blue_Sky AND ResN_Sc1:[80 TO 100]) OR (ResN_Rez2:blue_Sky AND 
ResN_Sc2:[80 TO 100]) OR (ResN_Rez3:blue_Sky AND ResN_Sc3:[80 TO 100]))


but AND'ing the two together returns 0 hits.  What am I missing?

id:COLLECT2601697594_T496 AND ((POP16_Rez1:blue_Sky AND POP16_Sc1:[80 TO 
100]) OR (POP16_Rez2:blue_Sky AND POP16_Sc2:[80 TO 100]) OR 
(POP16_Rez3:blue_Sky AND POP16_Sc3:[80 TO 100]) OR (POP19_Rez1:blue_Sky 
AND POP19_Sc1:[80 TO 100]) OR (POP19_Rez2:blue_Sky AND POP19_Sc2:[80 TO 
100]) OR (POP19_Rez3:blue_Sky AND POP19_Sc3:[80 TO 100]) OR 
(ResN_Rez1:blue_Sky AND ResN_Sc1:[80 TO 100]) OR (ResN_Rez2:blue_Sky AND 
ResN_Sc2:[80 TO 100]) OR (ResN_Rez3:blue_Sky AND ResN_Sc3:[80 TO 100])) 
AND (person:[80 TO 100])


Thank you!

-Joe



Re: If the leader dies, will the data be lost?

2020-04-07 Thread Erick Erickson
bq. Does the replication by tlog always work like this?

Yes. Otherwise there’s no way to even attempt to guarantee data integrity. The 
raw updates _must_ be received by all non-leaders that are eligible (i.e. TLOG 
or NRT but not PULL replicas).

BTW, anecdotally I’ve seen indexing throughput take quite a hit because of 
this, like 30% or more when comparing leader-only with leader + NRT or TLOG 
replicas. This was an outlier because the records were very small and the 
communications overhead to forward the raw docs and wait for an ack back was 
large in comparison to the time it actually took to index the docs. Most 
installations have much less of a difference.

And, BTW, the docs are forwarded to individual replicas in parallel.

Best,
Erick

> On Apr 6, 2020, at 10:08 PM, Taisuke Miyazaki  
> wrote:
> 
> Hi Erick,
> 
>> Before the leader goes down, the sequence of an update is this.
>> - the doc comes in to the leader (TL)
>> - the doc is forwarded to all the other tlog replicas (TF) and written to
> _their_ tlogs
>> - all the TF replicas ack back to TL
>> - TL acks back to the client
> 
> So the write request doesn't return a response until all the tlog replicas
> have been written!
> Thank you.
> 
> BTW, does replication by tlog always work like this?
> 
> 
> 
> 
> 2020年4月6日(月) 20:42 Erick Erickson :
> 
>> You’ve got the sequence, that’s it exactly.
>> 
>> I don’t quite understand the second part of the question, but let me
>> address data loss.
>> 
>> Before the leader goes down, the sequence of an update is this.
>> - the doc comes in to the leader (TL)
>> - the doc is forwarded to all the other tlog replicas (TF) and written to
>> _their_ tlogs
>> - all the TF replicas ack back to TL
>> - TL acks back to the client
>> 
>> So, upon getting success back from the update request, all TLOG replicas
>> have the
>> docs in their local tlog files. So when the leadership changes, the new
>> leader
>> has all the docs to replay, thus no data loss.
>> 
>> At that point, the old leader’s tlogs are irrelevant. When it  comes back
>> online,
>> the sequence is:
>> 
>> - synch from the new leader, including any tlogs. This effectively erases
>> the old tlogs
>> - start writing any new docs into the local tlog
>> 
>> The old leader then remains a follower until some event changes things
>> again.
>> 
>> Best,
>> Erick
>> 
>>> On Apr 6, 2020, at 1:53 AM, Taisuke Miyazaki 
>> wrote:
>>> 
>>> Hi,
>>> Using solr 7.5.0 on solr cloud, and replica type is tlog.
>>> 
>>> If a leader dies, how is the re-election of the leader and the
>>> synchronization of the replicas done?
>>> 
>>> In my opinion.
>>> Leader dies→ New tlog replica tries to become Leader→ Replays tlogs not
>>> reflected in the index→ Becomes Leader
>>> Is this the right fit first?
>>> 
>>> Also, when another leader is elected, does it create a tlog that is only
>>> available to the old leader? (I'm worried about data being lost if the
>>> tlogs aren't synchronized.)
>> 
>> 



different interpretation of the same query between solr 7.3.1 and solr 8.4.1

2020-04-07 Thread Danilo Tomasoni
Hello all,
I noticed that solr8 parses the edismax queries differently from solr7.

the querystring and parsedquery in solr 8.4.1 are

"querystring":"(_query_:\"{!edismax qf='titles subtitles study_brief_title 
abstracts abstract_background abstract_objective abstract_methods 
abstract_results abstract_conclusions abstract_other_nasa abstract_other_kie 
abstract_other_aids abstract_other_aamc abstract_other_publisher 
abstract_other_pip abstract_other_plain_language_summary keywords 
medline_chemical_terms medline_mesh_terms' q.op=OR mm=1 v=$subquery1}\" AND 
(f2:(\"pharmacokinetics\")))",


"parsedquery":"+(DisjunctionMaxQuery(((abstracts_chemical_pubtator_annotation_terms:_query_)^7.0
 | (body_nonstandard_species_gnorm_annotation_canonical_terms:_query_)^7.0 | 
(body_conclusions_species_gnorm_annotation_canonical_terms:_query_)^7.0 | 
(intervention_model_chemical_tmchemm2_annotation_terms:_query_)^7.0 | ...


that looks like a syntax error ( notice that field:_query_ )

while in solr 7.3.1 the querystring and parsedquery are

"querystring":"(_query_:\"{!edismax qf='titles subtitles study_brief_title 
abstracts abstract_background abstract_objective abstract_methods 
abstract_results abstract_conclusions abstract_other_nasa abstract_other_kie 
abstract_other_aids abstract_other_aamc abstract_other_publisher 
abstract_other_pip abstract_other_plain_language_summary keywords 
medline_chemical_terms medline_mesh_terms' q.op=OR mm=1 v=$subquery1}\" AND 
(f2:(\"pharmacokinetics\")))",


"parsedquery":"+(+(+(DisjunctionMaxQuery((subtitles:rifampicin | 
abstract_methods:rifampicin | abstract_other_plain_language_summary:rifampicin 
| abstract_other_pip:rifampicin | keywords:rifampicin | abstracts:rifampicin | 
abstract_background:rifampicin | medline_chemical_terms:rifampicin | 

that is correct.

please note also that subquery1 is

"subquery1":"(rifampicin rifampin isoniazid pyrazinamide ethambutol 
moxifloxacin pretomanid bedaquiline)"


and that df is a very big list of fields.



I attach an example POST request.

I executed the test with curl, this is the command line

(solr 7.3.1)
curl -X POST -H "Content-type: application/x-www-form-urlencoded" --data 
@request 
"http://solr2.cosbi.eu/solr/COSBIBioIndex/select?indent=off=json=1=true;

(solr 8.4.1)
curl -X POST -H "Content-type: application/x-www-form-urlencoded" --data 
@request 
"http://solr-test.cosbi.eu/solr/COSBIBioIndex/select?indent=off=json=1=true;


Any clue on why this is happening? It seems to me that there should be an 
obvious change in syntax that I can't find in the documentation and relase 
notes.
Thank you

Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for 
Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
tomas...@cosbi.eu
http://www.cosbi.eu

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatment in the 
respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to