[Apache Solr ReRanking] Sort Clauses Bug

2019-09-25 Thread Alessandro Benedetti
Hi all,
I was playing a bit with the reranking capability and I discovered that:

*Sort by score, then by secondary field -> OK*
http://localhost:8983/solr/books/select?q=vegeta ssj&*sort=score
desc,downloads desc*=id,title,score,downloads

*ReRank, Sort by score, then by secondary field -> KO*
http://localhost:8983/solr/books/select?q=*:*={!rerank reRankQuery=$rqq
reRankDocs=1200 reRankWeight=3}=(vegeta ssj)&*sort=score desc,downloads
desc*=id,title,score,downloads

Is this intended? It sounds counter-intuitive to me and I wanted to check
before opening a Jira issue
Tested on 8.1.1 but it should be in master as well.

Regards
--
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
www.sease.io


ASCIIFoldingFilter question

2019-09-25 Thread Jarett Lear
Hope this is the right list to ask this, not sure if this is a bug or if
I'm doing something wrong.

We're running some text with some emojis through this filter and if I'm
reading the code right when it finds a U+203C (:bangbang: | double
exclamation) it replaces that with an appropriate !! ASCII characters, but
if its a "fully qualified" emoji then it also includes U+FE0E after, which
is a zero length "VARIATION SELECTOR-16".

The issue we are running into is that the emoji is replaced with !! like it
should be, but then directly after the ASCII !! there is this character
that's just hanging out now because it's not matched or changed into
anything. This causes some weird behavior down the line in other filters
and trying to strip off punctuation, for some reason it doesn't seem to be
detected as punctuation anymore. Ultimately we are trying to get down to an
array of meaningful tokens out of the content, but we are getting certain
emoji's all the way through the filters and we aren't sure why these ones
that are ASCII folded are making it through, where the ones that aren't are
filtered out like normal.

Thanks,
Jarett


Re: Undefined field - solr 7.2.1 cloud

2019-09-25 Thread Antony A
Thanks Erick.

I have removed the managed-schema for now. This setup was running perfectly
for couple of years. I implemented basic auth around the collection a year
back. But nothing really changed on my process to update the schema. Let me
see if removing managed-schema has any impact and will update.



On Wed, Sep 25, 2019 at 9:16 AM Erick Erickson 
wrote:

> Then something sounds wrong with your setup. The configs are stored in ZK,
> and read from ZooKeeper every time Solr starts. So how the replica “does
> not have the correct schema” is a complete mystery.
>
> You say you have ClassicIndexSchemaFactory set up. Take a look at your
> configs _through the Admin UI from the “collections” drop-down_ and verify.
> This reads the same thing in ZooKeeper. Sometimes I’ve thought I was set up
> one way and discovered later that I wasn’t.
>
> Next: Do you have “managed-schema” _and_ “schema.xml” in your configs? If
> you’re indeed using classic, you can remove managed-schema.
>
> All to make sure your’e operating as you think you are.
>
> Best,
> Erick
>
> > On Sep 24, 2019, at 3:58 PM, Antony A  wrote:
> >
> > Hi,
> >
> > I also observed that whenever the JVM crashes, the replicas does not have
> > the correct schema. Anyone seen similar behavior.
> >
> > Thanks,
> > AA
> >
> > On Wed, Sep 4, 2019 at 9:58 PM Antony A 
> wrote:
> >
> >> Hi,
> >>
> >> I have confirmed that ZK ensemble is external. Even though both
> >> managed-schema and schema.xml are on the admin ui, I see the below class
> >> defined in solrconfig.
> >> 
> >>
> >> The workaround is till to run "solr zk upconfig" followed by restarting
> >> the cores of the collection. Anything else I should be looking into?
> >>
> >> Thanks
> >>
> >> On Wed, Sep 4, 2019 at 6:31 PM Erick Erickson 
> >> wrote:
> >>
> >>> This almost always means that you really _didn’t_ update the schema and
> >>> reload the collection, you just thought you did ;).
> >>>
> >>> One common reason is to fire up Solr with an internal ZooKeeper but
> have
> >>> the rest of your collection be using an external ensemble.
> >>>
> >>> Another is to be modifying schema.xml when using managed-schema or
> >>> vice-versa.
> >>>
> >>> First thing I’d do is check the ZK ensemble, are any of the ports
> >>> reference by the admin screen anywhere 9983? If so it’s internal.
> >>>
> >>> Second thing I’d do is, in the admin UI, select my collection from the
> >>> drop down list, then click files and open up the schema. Check that
> there
> >>> is only managed-schema or schema.xml. If both are present, check your
> >>> solrconfig to see which one you’re using. Then open the schema and
> check
> >>> that your field is there. BTW, the field will be explicitly stated in
> the
> >>> solr log.
> >>>
> >>> Third thing I’d do is open the admin
> >>> UI>>configsets>>the_configset_you’re_using and check which schema
> you’re
> >>> using and again if the field is in the schema.
> >>>
> >>> Best,
> >>> Erick
> >>>
>  On Sep 4, 2019, at 3:27 PM, Antony A 
> wrote:
> 
>  Hi,
> 
>  I ran the collection reload after a new "leader" core was selected for
> >>> the
>  collection due to heap failure on the previous core. But I still have
> >>> stack
>  trace with common.SolrException: undefined field.
> 
>  On Thu, Aug 29, 2019 at 1:36 PM Antony A 
> >>> wrote:
> 
> > Yes. I do restart the cores on all the different servers. I will look
> >>> at
> > implementing reloading the collection. Thank you for your suggestion.
> >
> > Cheers,
> > Antony
> >
> > On Thu, Aug 29, 2019 at 1:34 PM Shawn Heisey 
> >>> wrote:
> >
> >> On 8/29/2019 1:22 PM, Antony A wrote:
> >>> I do restart Solr after changing schema using "solr zk upconfig". I
> >>> am
> >> yet
> >>> to confirm but I do have a daily cron that does "delta" import.
> Does
> >> that
> >>> process have any bearing on some cores losing the field?
> >>
> >> Did you restart all the Solr servers?  If the collection lives on
> >> multiple servers, restarting one of the servers is not going to
> affect
> >> replicas living on other servers.
> >>
> >> Reloading the collection with an HTTP request to the collections API
> >>> is
> >> a better option than restarting Solr.
> >>
> >> Thanks,
> >> Shawn
> >>
> >
> >>>
> >>>
>
>


Re: Trying to add model name to classify() output

2019-09-25 Thread Joel Bernstein
You can use the val function, which will just returns the string.

val(CRIME) as expected

Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Sep 23, 2019 at 10:00 PM Peter Davie <
peter.da...@convergentsolutions.com.au> wrote:

> Hi,
>
> I have trained a number of logistic regression classification models
> (using train()) and I am now trying to evaluate these models.  I want to
> add the model name to the classify() (output) stream.  I am trying to
> use the following select() with setValue() as follows:
>
> select(
>  classify(
>  model(
>  models,
>  id="crime_model",
>  cacheMillis=5000
>  ),
>  search(
>  news_categories,
>  sort="id asc",
>  q="role:test",
>  qt="/export",
>  fl="id,body",
>  rows=5
>  ),
>  field="body"
>  ),
>  id,
>  score_d,
>  probability_d,
>  setValue("expected","CRIME") as expected
> )
>
> However, I am not seeing the "expected" field in the output stream:
>
> {
>"result-set": {
>  "docs": [
>{
>  "probability_d": 0.9807157418649378,
>  "score_d": 1.7570993028820825,
>  "id": "0001b92f-da6e-41a6-8518-a0d083c0f870"
>},
>{
>  "probability_d": 0.7310585786300049,
>  "score_d": 0.24253562092781067,
>  "id": "0003b45b-aab9-4635-8f93-903c6f492355"
>},
>{
>  "probability_d": 0.7310585786300049,
>  "score_d": 0.2773500978946686,
>  "id": "0008ecb1-3add-4ef5-85e1-736bf37a834b"
>},
>etc.
>  ]}
> }
>
> Can anyone point out what am I doing wrong?
>
> Peter
>
>
>


Re: Undefined field - solr 7.2.1 cloud

2019-09-25 Thread Erick Erickson
Then something sounds wrong with your setup. The configs are stored in ZK, and 
read from ZooKeeper every time Solr starts. So how the replica “does not have 
the correct schema” is a complete mystery.

You say you have ClassicIndexSchemaFactory set up. Take a look at your configs 
_through the Admin UI from the “collections” drop-down_ and verify. This reads 
the same thing in ZooKeeper. Sometimes I’ve thought I was set up one way and 
discovered later that I wasn’t.

Next: Do you have “managed-schema” _and_ “schema.xml” in your configs? If 
you’re indeed using classic, you can remove managed-schema.

All to make sure your’e operating as you think you are.

Best,
Erick

> On Sep 24, 2019, at 3:58 PM, Antony A  wrote:
> 
> Hi,
> 
> I also observed that whenever the JVM crashes, the replicas does not have
> the correct schema. Anyone seen similar behavior.
> 
> Thanks,
> AA
> 
> On Wed, Sep 4, 2019 at 9:58 PM Antony A  wrote:
> 
>> Hi,
>> 
>> I have confirmed that ZK ensemble is external. Even though both
>> managed-schema and schema.xml are on the admin ui, I see the below class
>> defined in solrconfig.
>> 
>> 
>> The workaround is till to run "solr zk upconfig" followed by restarting
>> the cores of the collection. Anything else I should be looking into?
>> 
>> Thanks
>> 
>> On Wed, Sep 4, 2019 at 6:31 PM Erick Erickson 
>> wrote:
>> 
>>> This almost always means that you really _didn’t_ update the schema and
>>> reload the collection, you just thought you did ;).
>>> 
>>> One common reason is to fire up Solr with an internal ZooKeeper but have
>>> the rest of your collection be using an external ensemble.
>>> 
>>> Another is to be modifying schema.xml when using managed-schema or
>>> vice-versa.
>>> 
>>> First thing I’d do is check the ZK ensemble, are any of the ports
>>> reference by the admin screen anywhere 9983? If so it’s internal.
>>> 
>>> Second thing I’d do is, in the admin UI, select my collection from the
>>> drop down list, then click files and open up the schema. Check that there
>>> is only managed-schema or schema.xml. If both are present, check your
>>> solrconfig to see which one you’re using. Then open the schema and check
>>> that your field is there. BTW, the field will be explicitly stated in the
>>> solr log.
>>> 
>>> Third thing I’d do is open the admin
>>> UI>>configsets>>the_configset_you’re_using and check which schema you’re
>>> using and again if the field is in the schema.
>>> 
>>> Best,
>>> Erick
>>> 
 On Sep 4, 2019, at 3:27 PM, Antony A  wrote:
 
 Hi,
 
 I ran the collection reload after a new "leader" core was selected for
>>> the
 collection due to heap failure on the previous core. But I still have
>>> stack
 trace with common.SolrException: undefined field.
 
 On Thu, Aug 29, 2019 at 1:36 PM Antony A 
>>> wrote:
 
> Yes. I do restart the cores on all the different servers. I will look
>>> at
> implementing reloading the collection. Thank you for your suggestion.
> 
> Cheers,
> Antony
> 
> On Thu, Aug 29, 2019 at 1:34 PM Shawn Heisey 
>>> wrote:
> 
>> On 8/29/2019 1:22 PM, Antony A wrote:
>>> I do restart Solr after changing schema using "solr zk upconfig". I
>>> am
>> yet
>>> to confirm but I do have a daily cron that does "delta" import. Does
>> that
>>> process have any bearing on some cores losing the field?
>> 
>> Did you restart all the Solr servers?  If the collection lives on
>> multiple servers, restarting one of the servers is not going to affect
>> replicas living on other servers.
>> 
>> Reloading the collection with an HTTP request to the collections API
>>> is
>> a better option than restarting Solr.
>> 
>> Thanks,
>> Shawn
>> 
> 
>>> 
>>> 



Critical issue SOLR-13141

2019-09-25 Thread Arnold Bronley
Hi,
I am using Solr version 8.2.0 and I see that there is one critical JIRA
issue open(link below) for CDCR. The issue does not mention anything about
8.2.0 but it says that it is fixed in 8.3.0. Does this mean that CDCR is
not functional in Solr 8.2.0 and should I wait for 8.3.0 to be released?

https://issues.apache.org/jira/browse/SOLR-13141