RE: [SPAM] Re: query parsed in different ways in two identical solr instances

2019-06-10 Thread Danilo Tomasoni
Yes I identical because the configuration (solrconfig.xml etc) is identical, 
just some fields changed.
Sorry I was not so precise in the description of the environment.

Nice to know it's already fixed.

Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for 
Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
tomas...@cosbi.eu
http://www.cosbi.eu

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatment in the 
respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to


From: Alexandre Rafalovitch [arafa...@gmail.com]
Sent: 10 June 2019 15:32
To: solr-user
Subject: Re: [SPAM] Re: query parsed in different ways in two identical solr 
instances

Ok, great.

We now moved from "identical setup breaks things in a bugfix version"
to "strange behavior when field does not exist". The "identical" part
was actually throwing us off the trail.

And all this leads us to
https://issues.apache.org/jira/browse/SOLR-5163 , fixed in 8.0.

Hope it helps,
Alex.

On Mon, 10 Jun 2019 at 09:19, Danilo Tomasoni  wrote:
>
> Hello I was able to reproduce this behaviour in an isolated environment,
> and performed some differential analysis between the two versions (that has 
> different schemas, diff of schemas attached)
>
> With the schema of solr1, the query is parsed as +(+() +())
> while with the schema of solr-test, the same query is parsed as +(() 
> ())
>
> The query is
>
> "q":"(f1:PUBMEDPMID12159614 AND (_query_:\"{!edismax 
> qf='medline_chemical_terms medline_mesh_terms' q.op=OR mm=1 v=$subquery1}\"))"
>
> in solr1 and also in solr test f1 equals
> "f.f1.qf":"id pmid pmc source_id other_id doi manuscript_id publication_id 
> secondary_ids"}}
>
> And then I suddenly remembered that the field secondary_ids was renamed to 
> external_data in solr-test (before the bulk import).
>
> So I changed f1 definition removing secondary_ids and adding external_data..
> and now the behaviour is the same!
>
> How is that possible? why the schema (and in this case a non-existing field) 
> can influence in such a profound way the behaviour of the query parser?
>
> I think that this is a subtle bug and an error should be raised instead of 
> performing an unexpected query.
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for 
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
>
> As for the European General Data Protection Regulation 2016/679 on the 
> protection of natural persons with regard to the processing of personal data, 
> we inform you that all the data we possess are object of treatment in the 
> respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how; you 
> may ask for their correction, cancellation or you may oppose to their use by 
> written request sent by recorded delivery to The Microsoft Research – 
> University of Trento Centre for Computational and Systems Biology Scarl, 
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>
> 
> From: Alexandre Rafalovitch [arafa...@gmail.com]
> Sent: 10 June 2019 12:49
> To: solr-user
> Subject: [SPAM] Re: query parsed in different ways in two identical solr 
> instances
>
> Were you able to simplify it to the simplest use case showing the issue? Or
> reproduce it on the stock Solr with stock example? Because otherwise, we
> would be just as stuck in a Jira as now. It is the same people helping
>
> For example, is the _query_ part significant?
>
> Also, did you try running both queries with echoParams=all just to
> eliminate stray differences? I know you looked at the debug line, but
> perhaps this is worth a check too.
>
> Regards,
> Alex
>
>
>
> On Mon, Jun 10, 2019, 5:46 AM Danilo Tomasoni,  wrote:
>
> > Hello all,
> > maybe I should consider this as a bug and open an issue?
> &

Re: [SPAM] Re: query parsed in different ways in two identical solr instances

2019-06-10 Thread Alexandre Rafalovitch
Ok, great.

We now moved from "identical setup breaks things in a bugfix version"
to "strange behavior when field does not exist". The "identical" part
was actually throwing us off the trail.

And all this leads us to
https://issues.apache.org/jira/browse/SOLR-5163 , fixed in 8.0.

Hope it helps,
Alex.

On Mon, 10 Jun 2019 at 09:19, Danilo Tomasoni  wrote:
>
> Hello I was able to reproduce this behaviour in an isolated environment,
> and performed some differential analysis between the two versions (that has 
> different schemas, diff of schemas attached)
>
> With the schema of solr1, the query is parsed as +(+() +())
> while with the schema of solr-test, the same query is parsed as +(() 
> ())
>
> The query is
>
> "q":"(f1:PUBMEDPMID12159614 AND (_query_:\"{!edismax 
> qf='medline_chemical_terms medline_mesh_terms' q.op=OR mm=1 v=$subquery1}\"))"
>
> in solr1 and also in solr test f1 equals
> "f.f1.qf":"id pmid pmc source_id other_id doi manuscript_id publication_id 
> secondary_ids"}}
>
> And then I suddenly remembered that the field secondary_ids was renamed to 
> external_data in solr-test (before the bulk import).
>
> So I changed f1 definition removing secondary_ids and adding external_data..
> and now the behaviour is the same!
>
> How is that possible? why the schema (and in this case a non-existing field) 
> can influence in such a profound way the behaviour of the query parser?
>
> I think that this is a subtle bug and an error should be raised instead of 
> performing an unexpected query.
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for 
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
>
> As for the European General Data Protection Regulation 2016/679 on the 
> protection of natural persons with regard to the processing of personal data, 
> we inform you that all the data we possess are object of treatment in the 
> respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how; you 
> may ask for their correction, cancellation or you may oppose to their use by 
> written request sent by recorded delivery to The Microsoft Research – 
> University of Trento Centre for Computational and Systems Biology Scarl, 
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>
> 
> From: Alexandre Rafalovitch [arafa...@gmail.com]
> Sent: 10 June 2019 12:49
> To: solr-user
> Subject: [SPAM] Re: query parsed in different ways in two identical solr 
> instances
>
> Were you able to simplify it to the simplest use case showing the issue? Or
> reproduce it on the stock Solr with stock example? Because otherwise, we
> would be just as stuck in a Jira as now. It is the same people helping
>
> For example, is the _query_ part significant?
>
> Also, did you try running both queries with echoParams=all just to
> eliminate stray differences? I know you looked at the debug line, but
> perhaps this is worth a check too.
>
> Regards,
> Alex
>
>
>
> On Mon, Jun 10, 2019, 5:46 AM Danilo Tomasoni,  wrote:
>
> > Hello all,
> > maybe I should consider this as a bug and open an issue?
> >
> > Danilo Tomasoni
> >
> > Fondazione The Microsoft Research - University of Trento Centre for
> > Computational and Systems Biology (COSBI)
> > Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> > tomas...@cosbi.eu
> > http://www.cosbi.eu
> >
> > As for the European General Data Protection Regulation 2016/679 on the
> > protection of natural persons with regard to the processing of personal
> > data, we inform you that all the data we possess are object of treatment in
> > the respect of the normative provided for by the cited GDPR.
> > It is your right to be informed on which of your data are used and how;
> > you may ask for their correction, cancellation or you may oppose to their
> > use by written request sent by recorded delivery to The Microsoft Research
> > – University of Trento Centre for Computational and Systems Biology Scarl,
> > Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> > P Please don't print this e-mail unless you really need to
> >
> > 
> > From: Danilo Tomasoni
> > Sent: 07 June 2019 11:47
> > To: solr-user@lucene.apache.org
> > Subject: RE: query parsed in different ways in two identical solr instance

RE: [SPAM] Re: query parsed in different ways in two identical solr instances

2019-06-10 Thread Danilo Tomasoni
Hello I was able to reproduce this behaviour in an isolated environment, 
and performed some differential analysis between the two versions (that has 
different schemas, diff of schemas attached)

With the schema of solr1, the query is parsed as +(+() +())
while with the schema of solr-test, the same query is parsed as +(() ())

The query is

"q":"(f1:PUBMEDPMID12159614 AND (_query_:\"{!edismax qf='medline_chemical_terms 
medline_mesh_terms' q.op=OR mm=1 v=$subquery1}\"))"

in solr1 and also in solr test f1 equals 
"f.f1.qf":"id pmid pmc source_id other_id doi manuscript_id publication_id 
secondary_ids"}}

And then I suddenly remembered that the field secondary_ids was renamed to 
external_data in solr-test (before the bulk import).

So I changed f1 definition removing secondary_ids and adding external_data..
and now the behaviour is the same!

How is that possible? why the schema (and in this case a non-existing field) 
can influence in such a profound way the behaviour of the query parser?

I think that this is a subtle bug and an error should be raised instead of 
performing an unexpected query.

Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for 
Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
tomas...@cosbi.eu
http://www.cosbi.eu

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatment in the 
respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to


From: Alexandre Rafalovitch [arafa...@gmail.com]
Sent: 10 June 2019 12:49
To: solr-user
Subject: [SPAM] Re: query parsed in different ways in two identical solr 
instances

Were you able to simplify it to the simplest use case showing the issue? Or
reproduce it on the stock Solr with stock example? Because otherwise, we
would be just as stuck in a Jira as now. It is the same people helping

For example, is the _query_ part significant?

Also, did you try running both queries with echoParams=all just to
eliminate stray differences? I know you looked at the debug line, but
perhaps this is worth a check too.

Regards,
Alex



On Mon, Jun 10, 2019, 5:46 AM Danilo Tomasoni,  wrote:

> Hello all,
> maybe I should consider this as a bug and open an issue?
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatment in
> the respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppose to their
> use by written request sent by recorded delivery to The Microsoft Research
> – University of Trento Centre for Computational and Systems Biology Scarl,
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>
> 
> From: Danilo Tomasoni
> Sent: 07 June 2019 11:47
> To: solr-user@lucene.apache.org
> Subject: RE: query parsed in different ways in two identical solr instances
>
> any thoughts on that difference in the solr parsing? is it correct that
> the first looks like an AND while the second looks like and OR?
> Thank you
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatment in
> the respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppo