Re:Possible bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-05 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello Florin Babes,

Thanks for this detailed report! I agree you experiencing 
ArrayIndexOutOfBoundsException during SolrFeature computation sounds like a 
bug, would you like to open a SOLR JIRA issue for it?

Here's some investigative ideas I would have, in no particular order:

Reproducibility: if a failed query is run again, does it also fail second time 
around (when some caches may be used)?

Data as a factor: is your setup single-sharded or multi-sharded? in a 
multi-sharded setup if the same query fails on some shards but succeeds on 
others (and all shards have some documents that match the query) then this 
could support a theory that a certain combination of data and features leads to 
the exception.

Feature vs. Model: you mention use of a MultipleAdditiveTrees model, if the 
same features are used in a LinearModel instead, do the same errors happen? or 
if no model is used but only feature extraction is done, does that give errors?

Identification of the troublesome feature(s): narrowing down to a single 
feature or a small combination of features could make it easier to figure out 
the problem. assuming the existing logging doesn't identify the features, 
replacing the org.apache.solr.ltr.feature.SolrFeature with a 
com.mycompany.solr.ltr.feature.MySolrFeature containing instrumentation could 
provide insights e.g. the existing code [2] logs feature names for 
UnsupportedOperationException and if it also caught 
ArrayIndexOutOfBoundsException then it could log the feature name before 
rethrowing the exception.

Based on your detail below and this [3] conditional in the code probably at 
least two features will be necessary to hit the issue, but for investigative 
purposes two features could still be simplified potentially to effectively one 
feature e.g. if one feature is a SolrFeature and the other is a ValueFeature or 
if featureA and featureB are both SolrFeature features with _identical_ 
parameters but different names.

Hope that helps.

Regards,

Christine

[1] 
https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html#extracting-features
[2] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.3/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java#L243
[3] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.3/solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java#L520-L525

From: solr-user@lucene.apache.org At: 01/04/21 17:31:44To:  
solr-user@lucene.apache.org
Subject: Possible bug on LTR when using solr 8.6.3 - index out of bounds 
DisiPriorityQueue.add(DisiPriorityQueue.java:102)

Hello,
We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
we receive an error when we try to compute some SolrFeatures. We didn't
find any pattern of the queries that fail.
Example:
We have the following query raw parameters:
q=lg cx 4k oled 120 hz -> just of many examples
term_dq=lg cx 4k oled 120 hz
rq={!ltr model=model reRankDocs=1000 store=feature_store
efi.term=${term_dq}}
defType=edismax,
mm=2<75%
The features are something like this:
{
  "name":"similarity_query_fileld_1",
  "class":"org.apache.solr.ltr.feature.SolrFeature",
  "params":{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
  "store":"feature_store"
},
{
  "name":"similarity_query_field_2",
  "class":"org.apache.solr.ltr.feature.SolrFeature",
  "params":{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
  "store":"feature_store"
}

We are testing ~6300 production queries and for about 1% of them we receive
that following error message:
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
"msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
for length 2",

The stacktrace is :
org.apache.solr.common.SolrException:
java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
9)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
at
org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
omponent.java:1513)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
java:360)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
:214)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
at

Re: increasing number of threads for faceting in JSON format

2020-12-24 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello again Arturas.

I meant to reply before but somehow lost track of it ... The "Lifecycle of a 
Solr Search Request" slides [1] and/or talk [2] may be of interest to you.

Regards,
Christine

[1] https://home.apache.org/~hossman/rev2017/
[2] https://youtu.be/qItRilJLj5o

From: solr-user@lucene.apache.org At: 12/10/20 21:42:19To:  
solr-user@lucene.apache.org
Subject: Re: increasing number of threads for faceting in JSON format

Hi Christine Munendra et al,

Wow, you dag into the code and checked weather threads are being blown in
range and term queries! I wish one day to be able to do the same myself.

How does one get to the level, so one can check the code herself? Is there
like a nice primer or crash course, solr 101 so to say, things you did not
learn in school about solr, but you wish you had learned web page? Well,
I'll take this opportunity to scroll through the lines in the github. Your
answer is very helpful.

Cheers,
Arturas

On Thu, Dec 10, 2020 at 7:08 PM Munendra S N 
wrote:

> Thank you Christine.
> Yeah, JSON facet does not support specifying threads.
>
>
> On Thu, Dec 10, 2020, 11:15 PM Christine Poerschke (BLOOMBERG/ LONDON) <
> cpoersc...@bloomberg.net> wrote:
>
> > Hello Arturas and Munendra!
> >
> > In the "Currently, JSON facets have support for specifying the number of
> > threads." sentence, I wonder if perhaps a "does not" got inadvertently
> > omitted i.e. "Currently, JSON facets does not have support for specifying
> > the number of threads." was intended?
> >
> > Let me share what I learnt from digging into the code:
> >
> > * "facet.threads" is for field value faceting [1] [2] but you're
> > interested in (JSON) field range faceting as well as JSON field value
> > faceting.
> >
> > * The area of the code [3] that does the JSON field range faceting shows
> > no obvious threading or parallelisation.
> >
> > Hope that helps?
> >
> > Regards,
> >
> > Christine
> >
> > [1]
> >
> 
https://lucene.apache.org/solr/guide/8_7/faceting.html#field-value-faceting-para
meters
> > [2]
> >
> 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/
src/java/org/apache/solr/request/SimpleFacets.java
> > [3]
> >
> 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/
src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L112-L113
> >
> > From: solr-user@lucene.apache.org At: 12/03/20 22:47:35To:
> > solr-user@lucene.apache.org
> > Subject: Re: increasing number of threads for faceting in JSON format
> >
> > Hi Munedra,
> >
> > This is great that I can get things faster by reducing the gap and by
> > increasing the number of threads. How to reduce gaps I know: one can
> > replace   "gap":   "+1HOUR" with   "gap":   "+1MONTH" What should I
> change
> > in the text below to increase the number of threads from one to 20?
> >
> > Cheers,
> > Arturas
> >
> > On Thu, Dec 3, 2020 at 1:54 PM Munendra S N 
> > wrote:
> >
> > > Hi,
> > >
> > > Currently, JSON facets have support for specifying the number of
> threads.
> > > In the above request, the range facet is computed over 2 years with a
> gap
> > > of 1 hour. By reducing the number of buckets, computation should become
> > > much faster
> > >
> > > Regards,
> > > Munendra S N
> > >
> > >
> > >
> > > On Thu, Dec 3, 2020 at 1:52 PM Arturas Mazeika 
> > wrote:
> > >
> > > > Hi Solr-Users,
> > > >
> > > > I am trying to better understand the solr capabilities, how one can
> > > > formulate queries in JSON format as well as tweak parameters.
> > Currently I
> > > > have a logs collection (ca 6GB large) with a dozen of attributes
> > running
> > > in
> > > > single server mode (F:\solr_deployment\solr-8.7.0\bin\solr.cmd start
> -h
> > > > localhost -p  -m 4g)
> > > >
> > > > I am playing with faceting functionality in solr and query a couple
> of
> > > > attributes there. My typical query is:
> > > >
> > > > GET http://localhost:/solr/db/query
> > > > <http://arteika:/solr/logan/query> HTTP/1.1
> > > > content-type: application/json
> > > >
> > > > {
> > > > "query"  : "*:*",
> > > > "limit"  : 0,
> > > > "facet": 

Re: increasing number of threads for faceting in JSON format

2020-12-10 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello Arturas and Munendra!

In the "Currently, JSON facets have support for specifying the number of 
threads." sentence, I wonder if perhaps a "does not" got inadvertently omitted 
i.e. "Currently, JSON facets does not have support for specifying the number of 
threads." was intended?

Let me share what I learnt from digging into the code:

* "facet.threads" is for field value faceting [1] [2] but you're interested in 
(JSON) field range faceting as well as JSON field value faceting.

* The area of the code [3] that does the JSON field range faceting shows no 
obvious threading or parallelisation.

Hope that helps?

Regards,

Christine

[1] 
https://lucene.apache.org/solr/guide/8_7/faceting.html#field-value-faceting-parameters
[2] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/request/SimpleFacets.java
[3] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L112-L113

From: solr-user@lucene.apache.org At: 12/03/20 22:47:35To:  
solr-user@lucene.apache.org
Subject: Re: increasing number of threads for faceting in JSON format

Hi Munedra,

This is great that I can get things faster by reducing the gap and by
increasing the number of threads. How to reduce gaps I know: one can
replace   "gap":   "+1HOUR" with   "gap":   "+1MONTH" What should I change
in the text below to increase the number of threads from one to 20?

Cheers,
Arturas

On Thu, Dec 3, 2020 at 1:54 PM Munendra S N  wrote:

> Hi,
>
> Currently, JSON facets have support for specifying the number of threads.
> In the above request, the range facet is computed over 2 years with a gap
> of 1 hour. By reducing the number of buckets, computation should become
> much faster
>
> Regards,
> Munendra S N
>
>
>
> On Thu, Dec 3, 2020 at 1:52 PM Arturas Mazeika  wrote:
>
> > Hi Solr-Users,
> >
> > I am trying to better understand the solr capabilities, how one can
> > formulate queries in JSON format as well as tweak parameters. Currently I
> > have a logs collection (ca 6GB large) with a dozen of attributes running
> in
> > single server mode (F:\solr_deployment\solr-8.7.0\bin\solr.cmd start -h
> > localhost -p  -m 4g)
> >
> > I am playing with faceting functionality in solr and query a couple of
> > attributes there. My typical query is:
> >
> > GET http://localhost:/solr/db/query
> >  HTTP/1.1
> > content-type: application/json
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> > "facet": {
> > "t" : {
> > "type":  "terms",
> > "field": "fcomp",
> > "sort":  "index",
> >
> > "facet": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > }
> > }
> > },
> > }
> > }
> >
> > not surprisingly, it takes a bit to compute the result, so I tried to
> > increase the number of threads. How do I do it in JSON format? I tried
> > adding
> >
> > {
> > "params": {
> > "facet.threads": 8
> > },
> > "query"  : "*:*",
> > ...
> > }
> >
> > and checked the jstack  of the solr java process, but I still see
> only
> > one thread working.  Can I configure params through the params section?
> >
> > I also tried
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> > "facet": {
> > "t" : {
> > "type":  "terms",
> > "field": "fcomp",
> > "sort":  "index",
> >
> > "facet": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > }
> > },
> > "threads":8
> > },
> > }
> > }
> >
> > but this ran in one thread as well. Can I influence the number of threads
> > in the "facet" section of JSON?
> >
> > Cheers,
> > Arturas
> >
>




Re:json.facet floods the filterCache

2020-10-22 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Damien,

You mention about JSON term facets, I haven't explored w.r.t. that but we have 
observed what you describe for JSON range facets and I've started 
https://issues.apache.org/jira/browse/SOLR-14939 about it.

Hope that helps.

Regards,
Christine

From: solr-user@lucene.apache.org At: 10/22/20 01:07:59To:  
solr-user@lucene.apache.org
Subject: json.facet floods the filterCache

Hi,

I'm using a json.facet query on nested facets terms and am seeing very high
filterCache usage. Is it possible to somehow control this? With a fq it's
possible to specify fq={!cache=false}... but I don't see a similar thing
json.facet.

Kind regards,
Damien




Re: Replication of Solr Model and feature store

2020-07-24 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Krishan,

Could you share what version of Solr you are using?

And I wonder if the observed behaviour could be reproduced e.g. with the 
techproducts example, changes not applying after reload [1] sounds like a bug 
if so.

Hope that helps.

Regards,

Christine

[1] 
https://lucene.apache.org/solr/guide/8_6/learning-to-rank.html#applying-changes

From: solr-user@lucene.apache.org At: 07/22/20 14:00:59To:  
solr-user@lucene.apache.org
Subject: Re: Replication of Solr Model and feature store

Adding more details here

I need some help on how to enable the solr LTR model and features on all
nodes of a solr cluster.

I am unable to replicate the model and the feature store though from any
master to its slaves with the replication API ? And unable to find any
documentation for the same. Is replication possible?

Without replication, would I have to individually update all nodes of a
cluster ? Or can the feature and model files be read as a resource (like
config or schema) so that I can replicate the file or add the file to my
deployments.


On Wed, Jul 22, 2020 at 5:53 PM krishan goyal  wrote:

> Bump. Any one has an idea how to proceed here ?
>
> On Wed, Jul 8, 2020 at 5:41 PM krishan goyal 
> wrote:
>
>> Hi,
>>
>> How do I enable replication of the model and feature store ?
>>
>> Thanks
>> Krishan
>>
>




Re:Unable to log into Jira

2019-10-15 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Richard,

Sorry to hear you're experiencing log-in difficulties. I've opened 
https://issues.apache.org/jira/browse/INFRA-19280 for this, hopefully it can be 
read without logging in.

Regards,

Christine

From: solr-user@lucene.apache.org At: 10/15/19 16:31:36To:  
solr-user@lucene.apache.org
Subject: Unable to log into Jira

Hey,

Sorry if this is the wrong group, I tried to email us...@infra.apache.org a
few weeks ago but haven't heard anything.

I am unable to log into my account, with it saying my password is
incorrect. But what is more odd is my name on the account has changed from
Richard Goodman to Alex Goodman.

I can send a forgot username which comes through to my registered email,
which is this one. However, if I do a forgot password, the email never
shows up 

Does anyone know which contact to use in order to help me sort this issue
out?

Thanks,

Richard Goodman




Re:Facing issue ith MinMaxNormalizer

2019-06-14 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello Kamal Kishore,

Thanks for including the Solr version alongside your question! What you 
describe sounds like the https://issues.apache.org/jira/browse/SOLR-11163 issue 
which is fixed in 7.0.0 but not 6.6.2 release. The fix is a simple two line 
change to MinMaxNormalizer and perhaps one workaround could be for you to build 
a custom MinMaxNormalizer locally for use with your 6.6.2 setup.

Hope that helps.

Regards,

Christine

From: solr-user@lucene.apache.org At: 06/12/19 12:41:51To:  
solr-user@lucene.apache.org
Subject: Facing issue ith MinMaxNormalizer

Hi All,

Appreciate if someone can help.
I am using LTR with MinMaxNormalizer in solr 6.6.2. 

Model.json

 "class": "org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
"name": "XGBOOST-BBB-LTR-Model",
"store":"BBB-Feature-Model",
"features": [
{
"name": "TFIDF",
 "norm":{
 "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer",
"params" : { "min":"0.0", "max":"1.0"}
 }
},
{
"name": "p_ratings_f",
 "norm":{
 "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer",
"params" : { "min":"1.0", "max":"5.0"}
 }
},
{
"name": "p_instore_trans_cnt_f",
 "norm":{
 "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer",
"params" : { "min":"1.0", "max":"209561.0" }
 }
},
{
"name": "p_reviews_f",
 "norm":{
 "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer",
"params" : { "min":"0.0", "max":"58375.0"}
 }
}

 The model got uploaded successfully, but when I reloaded the collection, it 
failed and below error is observed:

Caused by: java.lang.RuntimeException: Error invoking setter setMin on class : 
org.apache.solr.ltr.norm.MinMaxNormalizer
at 
org.apache.solr.util.SolrPluginUtils.invokeSetters(SolrPluginUtils.java:1084)
at org.apache.solr.ltr.norm.Normalizer.getInstance(Normalizer.java:49)
at 
org.apache.solr.ltr.store.rest.ManagedModelStore.fromNormalizerMap(ManagedModelStore.java:293)
at 
org.apache.solr.ltr.store.rest.ManagedModelStore.createNormalizerFromFeatureMap(ManagedModelStore.java:276)
at 
org.apache.solr.ltr.store.rest.ManagedModelStore.fromLTRScoringModelMap(ManagedModelStore.java:230)
at 
org.apache.solr.ltr.store.rest.ManagedModelStore.addModelFromMap(ManagedModelStore.java:133)
at 
org.apache.solr.ltr.store.rest.ManagedModelStore.loadStoredModels(ManagedModelStore.java:126)
at 
org.apache.solr.ltr.search.LTRQParserPlugin.onManagedResourceInitialized(LTRQParserPlugin.java:133)
at 
org.apache.solr.rest.ManagedResource.notifyObserversDuringInit(ManagedResource.java:115)
at 
org.apache.solr.rest.ManagedResource.loadManagedDataAndNotify(ManagedResource.java:91)
at 
org.apache.solr.rest.RestManager.createManagedResource(RestManager.java:694)
... 41 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.solr.util.SolrPluginUtils.invokeSetters(SolrPluginUtils.java:1082)
... 51 more
Caused by: java.lang.NumberFormatException: For input string: ""0.0""
at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
at java.lang.Float.parseFloat(Float.java:451)
at 
org.apache.solr.ltr.norm.MinMaxNormalizer.setMin(MinMaxNormalizer.java:58)

I tried uploading the model without double quotes in param value for min and 
max, it also got failed with below error.

"java.lang.IllegalArgumentException: argument type mismatch\n\tat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat
 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
 java.lang.reflect.Method.invoke(Method.java:498)\n\tat 
org.apache.solr.util.SolrPluginUtils.invokeSetters(SolrPluginUtils.java:1082)\n\tat
 org.apache.solr.ltr.norm.Normalizer.getInstance(Normalizer.java:49)\n\tat 
org.apache.solr.ltr.store.rest.ManagedModelStore.fromNormalizerMap(ManagedModelStore.java:293)\n\tat
 
org.apache.solr.ltr.store.rest.ManagedModelStore.createNormalizerFromFeatureMap(ManagedModelStore.java:276)\n\tat
 
org.apache.solr.ltr.store.rest.ManagedModelStore.fromLTRScoringModelMap(ManagedModelStore.java:230)\n\tat
 

Re: NPE deleting expired docs (SOLR-13281)

2019-03-14 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Thank you for sharing that 7.6 has the same issue.

If anyone is interested in delving into the code to investigate further, I've 
added short steps on https://issues.apache.org/jira/browse/SOLR-13281 as to how 
one could potentially make a start on that.

From: solr-user@lucene.apache.org At: 03/13/19 08:45:12To:  
solr-user@lucene.apache.org
Subject: Re: NPE deleting expired docs (SOLR-13281)

We have the same issue on Solr 7.6.

On 12.03.2019 16:05, Gerald Bonfiglio wrote:
> Has anyone else observed NPEs attempting to have expired docs removed?  I'm 
seeing the following exceptions:
>
> 2019-02-28 04:06:34.849 ERROR (autoExpireDocs-30-thread-1) [ ] 
o.a.s.u.p.DocExpirationUpdateProcessorFactory Runtime error in periodic 
deletion of expired docs: null
> java.lang.NullPointerException: null
> at 
org.apache.solr.update.processor.DistributedUpdateProcessor.handleReplicationFac
tor(DistributedUpdateProcessor.java:992) ~[solr-core-7.7.0.jar:7.7.0 
8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:23:46]
> at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(Distributed
UpdateProcessor.java:960) ~[solr-core-7.7.0.jar:7.7.0 
8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:23:46]
>
> Seems all that's required to reproduce it is to include 
DocExpirationUpdateProcessorFactory in an updateRequestProcessorChain.
>
> More details can be found at: 
https://issues.apache.org/jira/projects/SOLR/issues/SOLR-13281
>
>
>
>
>
> [Nastel  Technologies]
>
> The information contained in this e-mail and in any attachment is 
confidential and
> is intended solely for the use of the individual or entity to which it is 
addressed.
> Access, copying, disclosure or use of such information by anyone else is 
unauthorized.
> If you are not the intended recipient, please delete the e-mail and refrain 
from use of such information.
>




Re: Navigating through Solr Source Code

2018-05-24 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello.

Emir mentioned about starting from the feature/concept. If you haven't come 
across it yet then the slides and/or recording of Hoss's "Lifecycle of a Solr 
Search Request" talk may be of interest - http://home.apache.org/~hossman/ has 
links.

Erick mentioned about getting a sense via unit tests. The 
HelloWorldSolrCloudTestCase could be a starting point for that - 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.3.1/solr/core/src/test/org/apache/solr/HelloWorldSolrCloudTestCase.java
 is a link to it.

Regards,
Christine

- Original Message -
From: solr-user@lucene.apache.org
To: apa...@elyograg.org, solr-user@lucene.apache.org
At: 05/21/18 17:08:10

Thanks for your responses.

Best Regards!


On 21 May 2018 at 16:40:10, Shawn Heisey (apa...@elyograg.org) wrote:

On 5/21/2018 4:35 AM, Greenhorn Techie wrote:
> As the documentation around Solr is limited, I am thinking to go through
> the source code and understand the various bits and pieces. However, I am
a
> bit confused on where to start as I my developing skills are a bit
limited.
>
> Any thoughts on how best to start / where to start looking into Solr
source
> code?

As Erick has said, the rabbit hole is very deep.I've been looking into
it for a few years now.  There are parts of it that are a complete mystery.

Depending on exactly what you're looking for, one approach is to examine
the SolrDispatchFilter class.  This is the entry point from the servlet
container for most HTTP requests, and a lot of Solr's startup
initialization is found there.

The solr/webapp/web/WEB-INF/web.xml file in the source code checkout is
what loads SolrDispatchFilter and a few other classes when Solr starts.

Thanks,
Shawn



Re: SOLR 7.2 and LTR

2017-12-28 Thread Christine Poerschke (BLOOMBERG/ LONDON)
From a (very) quick look it seems like the 
https://issues.apache.org/jira/browse/SOLR-11501 upgrade notes might be 
relevant, potentially.

From: solr-user@lucene.apache.org At: 12/28/17 15:18:22To:  
solr-user@lucene.apache.org
Subject: Re: SOLR 7.2 and LTR

Do you have the ltr qparser plugin registered into the solrconfig? 

Can you check what happens if instead of ltr you use the rerank query plugin? 
does it work or you get the same error?  
https://lucene.apache.org/solr/guide/6_6/query-re-ranking.html


From: solr-user@lucene.apache.org At: 12/28/17 13:58:26To:  
solr-user@lucene.apache.org
Subject: Re: SOLR 7.2 and LTR

Hello Diego,

solr.log contains always the same single stacktrace in SOLR 7.2.
I've been trying to pass rq via solrconfig.xml and via HTTP form.
The /searchIncidents handler contains edismax query.
Works if I completely disable rq. When I add the rq param, even something
like:
   {!ltr reRankDocs=25 model=incidentModel}
I get the exception.
The model is there, it's LinearModel model simplified to contain only
single feature 'originalScore', defined as in all available examples.
I just copy the same config directory under 'server\solr' to SOLR 7.0 and
it works.
I only skip the 'data' subfolder because of index differences, wen copying.

2017-12-28 13:51:08.141 DEBUG (qtp205125520-18) [   x:entityindex]
o.a.s.c.S.Request [entityindex]  webapp=/solr path=/searchIncidents
params={personalId=1234567890=Test={!ltr+reRankDocs%3D25+model%3DincidentModel}}
2017-12-28 13:51:08.145 ERROR (qtp205125520-18) [   x:entityindex]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: rq
parameter must be a RankQuery
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:183)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:276)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Unknown Source)

Best regards,
Dariusz Wojtas


On Thu, Dec 28, 2017 at 1:03 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) <
dceccarel...@bloomberg.net> wrote:

> Hello Dariusz,
>
> Can you look into the solr logs for a stack trace or ERROR logs?
>
>
>
> From: solr-user@lucene.apache.org At: 12/27/17 19:01:29To:
> solr-user@lucene.apache.org
> Subject: SOLR 7.2 and LTR
>
> Hi,
>
> I am using SOLR 7.0 and use the ltr parser.
> The configuration I use works 

RE: LTR feature extraction performance issues

2017-10-31 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Brian,

I just tried to explore the scenario you describe with the techproducts example 
and am able to see what you see:

# step 1: start solr with techproducts example and ltr enabled
# step 2: upload one feature (originalScore) and one model using that feature
# step 3: examine cache stats via the Admin UI (all zero to start with)
# step 4: run a query which includes feature extraction e.g. [features] in fl
# step 5: examine cache stats to see lookups but no inserts
# step 6: run a query with feature extraction _and_ re-ranking using the model
# step 7: examine cache stats to see both lookups and inserts

Looking around the code the cache insert happens in FeatureLogger.java [1] 
which is called by the Rescorer [2] and this would allow the 'fl' feature 
logging to reuse the feature values calculated as part of the 'rq' re-ranking.

However, if there was no feature value in the cache (because no 'rq' re-ranking 
happened) then the feature value is calculated by 
LTRFeatureLoggerTransformerFactory.java [3] and based on code inspection the 
results of that calculation are not added to the cache.

It might be interesting to explore if/how that logic [3] could be changed.

--Christine

[1] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/FeatureLogger.java#L51-L60
[2] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java#L185-L205
[3] 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java#L267-L280

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 10/30/17 16:55:14

I'm still having this issue. Does anyone have LTR feature extraction 
successfully running and have cache inserts/hits?

--Brian

-Original Message-
From: Brian Yee [mailto:b...@wayfair.com] 
Sent: Tuesday, October 24, 2017 12:14 PM
To: solr-user@lucene.apache.org
Subject: RE: LTR feature extraction performance issues

Hi Alessandro,

Unfortunately some of my most important features are query dependent. I think I 
found an issue though. I don't think my features are being inserted into the 
cache. Notice "cumulative_inserts:0". There are a lot of lookups, but since 
there appear to be no values in the cache, the hitratio is 0.

stats:
cumulative_evictions:0
cumulative_hitratio:0
cumulative_hits:0
cumulative_inserts:0
cumulative_lookups:215319
evictions:0
hitratio:0
hits:0
inserts:0
lookups:3303
size:0
warmupTime:0


My configs look are as follows:



  

  
QUERY_DOC_FV
sparse
  

Would anyone have any idea why my features are not being inserted into the 
cache? Is there an additional config setting I need?


--Brian

-Original Message-
From: alessandro.benedetti [mailto:a.benede...@sease.io] 
Sent: Monday, October 23, 2017 10:01 AM
To: solr-user@lucene.apache.org
Subject: Re: LTR feature extraction performance issues

It strictly depends on the kind of features you are using.
At the moment there is just one cache for all the features.
This means that even if you have 1 query dependent feature and 100 document 
dependent feature, a different value for the query dependent one will 
invalidate the cache entry for the full vector[1].

You may look to optimise your features ( where possible).

[1]  https://issues.apache.org/jira/browse/SOLR-10448



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re:Strange Behavior When Extracting Features

2017-09-21 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Michael,

Thanks for reporting this here and via SOLR-11386 ticket!

I've just added a note to the https://issues.apache.org/jira/browse/SOLR-11386 
ticket.

Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 09/20/17 20:07:24

Hi all,

I'm getting some extremely strange behavior when trying to extract features
for a learning to rank model. The following query incorrectly says all
features have zero values:

http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=added
couple of fiber channel={!ltr model=redhat_efi_model reRankDocs=1
efi.case_summary=the efi.case_description=added couple of fiber channel
efi.case_issue=the efi.case_environment=the}=id,score,[features]=10

But this query, which simply moves the word "added" from the front of the
provided text to the back, properly fills in the feature values:

http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple
of fiber channel added={!ltr model=redhat_efi_model reRankDocs=1
efi.case_summary=the efi.case_description=couple of fiber channel added
efi.case_issue=the efi.case_environment=the}=id,score,[features]=10

The explain output for the failing query can be found here:

https://gist.github.com/manisnesan/18a8f1804f29b1b62ebfae1211f38cc4

and the explain output for the properly functioning query can be found here:

https://gist.github.com/manisnesan/47685a561605e2229434b38aed11cc65

Have any of you run into this issue? Seems like it could be a bug.

Thanks,
Michael A. Alcorn



Learning-to-Rank with Bees: question answer follow-up

2017-09-18 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi everyone,

At my "Learning-to-Rank with Apache Solr and Bees" talk on Friday [1] there was 
one question that wasn't properly understood (by me) and so not fully answered 
in the room but later in individual conversation the question/answer became 
clearer. So here I just wanted to follow-up and share with everyone (using a 
fictional mini example).

Hope that helps.

Thanks,

Christine

---

Scenario:
* a schema with multiple text fields e.g. title, summary, details
* search queries consider the text fields

Intention:
* have features that capture how well the user query matches various text fields

Example queries and feature definitions:

* without LTR:
  select?q=developer
  select?q=chef

* with LTR:
  select?q=developer={!ltr model=myDemoModel efi.userQuery=developer}
  select?q=chef={!ltr model=myDemoModel efi.userQuery=chef}

Notice how in the above example the two users' queries pass different 
efi.userQuery values and how the feature definitions below include a 
${userQuery} placeholder.

myDemoFeatures.json

[
 {
  "name" : "userQueryTitle",
  "class" : "org.apache.solr.ltr.feature.SolrFeature",
  "params" : { "q" : "title:${userQuery}" }
 },
 {
  "name" : "userQuerySummary",
  "class" : "org.apache.solr.ltr.feature.SolrFeature",
  "params" : { "q" : "summary:${userQuery}" }
 },
 {
  "name" : "userQueryDetails",
  "class" : "org.apache.solr.ltr.feature.SolrFeature",
  "params" : { "q" : "details:${userQuery}" }
 }
]

---

Links

[1] http://sched.co/BAwI
[2] http://lucene.apache.org/solr/guide/6_6/learning-to-rank.html
[3] https://github.com/cpoerschke/ltr-with-bees

Re:SOLR Learning to Rank Questions

2017-08-09 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello Joao!

re: your first question, at present there is no direct way to use document hits 
in an external SQL database in a feature. Having said that, Solr has a 
so-called "ExternalFileField" type and using that in combination e.g. with the 
"FieldValueFeature" feature class should work, I think.

In essence you would periodically 'export' document hits from the database (and 
depending on your use case perhaps you might wish to consider only document 
hits that exceed a certain threshold or in other ways filter the raw hits data) 
into an external file for use by Solr.

The Apache Solr Reference Guide has more information at 
http://lucene.apache.org/solr/guide/6_6/working-with-external-files-and-processes.html
 for 6.6 version.

Hope that helps.

Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 08/03/17 11:15:10

​
Dear all,

First of all, I would like to thank you guys for the amazing job with SOLR.
In special, I highly appreciate the learning to rank plugin. It is a
fantastic work.

I have two
​ ​
two questions for the LTR people and I hope this mailing list is the right
place for that.

*1)​ ​This is a direct implementation doubt:*

Let's say that I have the popularity of my documents (document hits) in an
external SQL database instead of saving it in the index.

Can I use this information as a feature? How?


*2) This is slightly more philosophical than a practical question:*

Let's say I would like to normalize the score of my documents, for example,
with MinMaxNormalizer. If I correctly understood it, I would have to
calculate the min and the max values for the score seen in the training set
and upload these values in my model.
When using the model, MinMaxNormalizer will apply its normalization formula
for each value retrieved based on the max and the min set in the model.

Although this is a valid approach, I see it as a global approach, not a
local (per query) one.
Hope you understand what I am talking about here.

I was expecting to have a MinMaxNormalizer without previously min and max
set. This would simply apply the min_max formula to all results for
each query. Thus, when I use this new approach, the first document would
have score 1.0 and the last document retrieved would have score 0.0.

Would it be better to normalize per query instead of a global normalization?


Thanks a lot in advance.
Looking forward to hearing back from you soon.

Best,
--
João Palotti
Website: joaopalotti.com
Twitter: @joaopalotti 
Me at Google Scholar




Re:Opposite termfrequency / Solr LTR

2017-06-29 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Stefan,

Thanks for the question.

The existing FieldValueFeature class uses the field value and you would instead 
like to map the value to a number that will vary from query to query.

External feature information (efi) can help with the query-to-query variation 
but for the mapping we don't (as yet) have features to support that (I think).

Here's what the configuration and usage for a new FieldValueMapFeature class 
might look like:

# configuration
{
  "name":  "userCategoryHistory",
  "class": "org.apache.solr.ltr.feature.FieldValueMapFeature",
  "params": {
  "field" : "category",
  "defaultValue" : 0,
  "mapping" : "${userHistory}"
  }
}

# usage
...={!ltr model="myModel" efi.userHistory="{ 'shoes' : 3, 'socks' : 1 }"} 

The http://apache.markmail.org/thread/wegba65qxfkyhnge mailing thread is 
related, though slightly different, and may also be of interest.

Hope that helps.

Best wishes,

Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 06/29/17 11:13:59

Hello everybody,
I'm using Solr LTR and i want to calculte a Feature value using the
following way:

I have a String with all Categories that are in a users search-history:
e.g. "shoes,shoes,socks,shoes"
Now I'd like to count the occurances of the value of the category field in
the String (for each document).

doc 1:{ category:shoes}
myString: "shoes,shoes,socks,shoes"

result = 3

I tried using termfrequency but it works only in the opposite direction.
Is there a way to acchieve this though?

Best regards,
Stefan




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Opposite-termfrequency-Solr-LTR-tp4343394.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re:solr learning_to_rank (normalizer) unmatched argument type issue

2017-04-04 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Jianxiong,

Thanks for reporting this. I think this is a bug and have filed 
https://issues.apache.org/jira/browse/SOLR-10421 ticket for fixing it.

Regards,
Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 03/31/17 23:19:27

Hi,
I created a toy learning-to-rank model in solr in order to show the issues.

Feature.json
-
[
  {
"store" : "wikiFeatureStore",
"name" : "doc_len",
"class" : "org.apache.solr.ltr.feature.FieldLengthFeature",
"params" : {"field":"a_text"}
  },
  {
"store" : "wikiFeatureStore",
"name" : "rankScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
  }
]

model.json
---
{
  "store" : "wikiFeatureStore",
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "name" : "wiki_qaModel",
  "features" : [
{ "name" : "doc_len",
  "norm" : {
  "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer",
  "params" : {"min": "1.0", "max" : "113.8" }
  }
},
   { "name" : "rankScore",
  "norm" : {
  "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer",
  "params" : {"min": "0.0", "max" : "49.60385" }
  }
}
   ],
  "params" : {
  "weights": {
   "doc_len": 0.322,
   "rankScore": 0.98
  }
   }
}

I could upload both feature and model  and performed re-ranking based
on the above model.   The issue was that when I stopped the solr
server and restarted it.
I got error message when I ran the same query to extract the features:
"Caused by: org.apache.solr.common.SolrException: Failed to create new
ManagedResource /schema/model-store of type
org.apache.solr.ltr.store.rest.ManagedModelStore due to:
java.lang.IllegalArgumentException: argument type mismatch
at 
org.apache.solr.rest.RestManager.createManagedResource(RestManager.java:700)
at 
org.apache.solr.rest.RestManager.addRegisteredResource(RestManager.java:666)
at org.apache.solr.rest.RestManager.access$300(RestManager.java:59)
at 
org.apache.solr.rest.RestManager$Registry.registerManagedResource(RestManager.java:231)
at 
org.apache.solr.ltr.store.rest.ManagedModelStore.registerManagedModelStore(ManagedModelStore.java:51)
at 
org.apache.solr.ltr.search.LTRQParserPlugin.inform(LTRQParserPlugin.java:124)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:719)
at org.apache.solr.core.SolrCore.init(SolrCore.java:931)
... 9 more
Caused by: java.lang.IllegalArgumentException: argument type mismatch
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.solr.util.SolrPluginUtils.invokeSetters(SolrPluginUtils.java:1077)
at org.apache.solr.ltr.norm.Normalizer.getInstance(Normalizer.java:49)
"

I found that the issue was related to
solr-6.4.2/server/solr/my_collection/conf/_schema_model-store.json
"
{
  "initArgs":{},
  "initializedOn":"2017-03-31T20:51:59.494Z",
  "updatedSinceInit":"2017-03-31T20:54:54.841Z",
  "managedList":[{
  "name":"wiki_qaModel",
  "class":"org.apache.solr.ltr.model.LinearModel",
  "store":"wikiFeatureStore",
  "features":[
{
  "name":"doc_len",
  "norm":{
"class":"org.apache.solr.ltr.norm.MinMaxNormalizer",
"params":{
  "min":1.0,
  "max":113.7862548828}}},
...
"

Here the data type  for "min'' and "max" are double. When I manually
changed them to string. Then everything worked as expected.

"
 "norm":{
"class":"org.apache.solr.ltr.norm.MinMaxNormalizer",
"params":{
  "min": "1.0",
  "max": "113.7862548828"}}},


Any insights into the above strange behavior?

Thanks

Jianxiong



Re: LTR on multiple shards

2017-03-13 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello Vincent and Michael,

Thank you for the question and answer here.

I have added an 'Applying changes' section to 
https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank and changed 
https://cwiki.apache.org/confluence/display/solr/Managed+Resources to 
cross-reference to the reload API pages.

Regards,
Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 03/08/17 15:23:20

Hey Vincent,

The feature store and model store are both Solr Managed Resources.  To
propagate managed resources in distributed mode, including managed
stopwords and synonyms, you have to issue a collection reload command.  The
Solr reference guide of Managed Resources has a bit more on it in the
Applying Changes section.

https://cwiki.apache.org/confluence/display/solr/Managed+Resources
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RELOAD:ReloadaCollection

The Managed Resource page and LTR page should be updated to be more
explicit about it.

Hope that helps,
Michael



On Wed, Mar 8, 2017 at 5:01 AM, Vincent  wrote:

> Hi all,
>
> It seems that the curl commands from the LTR wiki (
> https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank) to
> post and/or delete features from and to the feature store only affect one
> shard instead of the entire collection. For example, when I run:
>
> |curl -XDELETE 'http://localhost:8983/solr/[C
> OLLECTION]/schema/feature-store/currentFeatureStore' <
> http://localhost:8983/solr/techproducts/schema/feature-stor
> e/currentFeatureStore%27>|
>
> the feature store still exists on one of my two shards. Same goes for the
> python HTTPConnection.request-function ("POST" and "DELETE").
>
> Is this a mistake on my end? I assume it's not supposed to work this way?
>
> Thanks a lot!
> Vincent
>



Re:Learning to rank - Bad Request

2017-03-06 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Vincent,

Would you be comfortable sharing (redacted) details of the exact upload command 
you used and (redacted) extracts of the features json file that gave the upload 
error?

Two things I have encountered commonly myself:
* uploading features to the model endpoint or model to the feature endpoint
* forgotten double-quotes around the numbers in MultipleAdditiveTreesModel json

Regards,
Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 03/06/17 13:22:40

Hi all,

I've been trying to get learning to rank working on our own search 
index. Following the LTR-readme 
(https://github.com/bloomberg/lucene-solr/blob/master-ltr/solr/contrib/ltr/example/README.md)
 
I ran the example python script to train and upload the model, but I 
already get an error during the uploading of the features:

Bad Request (400) - Expected Map to create a new ManagedResource but 
received a java.util.ArrayList
 at 
org.apache.solr.rest.RestManager$RestManagerManagedResource.doPut(RestManager.java:523)
 at 
org.apache.solr.rest.ManagedResource.doPost(ManagedResource.java:355)
 at 
org.apache.solr.rest.RestManager$ManagedEndpoint.post(RestManager.java:351)
 at 
org.restlet.resource.ServerResource.doHandle(ServerResource.java:454)
 ...

This makes sense: the json feature file is an array, and the RestManager 
needs a Map in doPut.

Using the curl command from the cwiki 
(https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank) 
yields the same error, but instead of it having "received a 
java.util.ArrayList" it "received a java.lang.String".

I wonder how this actually is supposed to work, and what's going wrong 
in this case. I have tried the LTR with the default techproducts 
example, and that worked just fine. Does anyone have an idea of what's 
going wrong here?

Thanks in advance!
Vincent



Re:NPE when executing clustering query search

2016-11-16 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Tim,

Thanks for reporting this. The (just created) 
https://issues.apache.org/jira/browse/SOLR-9775 issue and associated pull 
request sound related to this.

Regards,

Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 03/22/16 14:49:20

Hi everyone,

I am trying to execute a clustering query to my single-core master-slave
solr setup and it is returning a NullPointerException.  I checked the line
in the source code where it is being thrown, and it looks like the null
object is some sort of 'filt' object, which doesn't make sense.  Below is
the query, my schema, solrconfig, and the exception.  If anyone could
please help that would be great!

Thank you!

QUERY:

1510649 [qtp1855032000-20] INFO  org.apache.solr.core.SolrCore  û
[collection1] webapp=/solr
path=/clustering
params{
mlt.minwl=3&
mlt.boost=true&
mlt.fl=textpropertymlt&
sort=score+desc&
carrot.snippet=impnoteplain&
mlt.mintf=1&
qf=concept_name&
mlt.interestingTerms=details&
wt=javabin&
clustering.engine=lingo&
version=2&
rows=500&
mlt.mindf=2&
debugQuery=true&
fl=id,concept_name,impnoteplain&
start=0&
q=id:567065dc658089be9f5c2c0d5670653d658089be9f5c2ae2&
carrot.title=concept_name&
clustering.results=true&
qt=/clustering&
fq=storeid:5670653d658089be9f5c2ae2&
fq={!edismax+v%3D''+qf%3D'textpropertymlt'+mm%3D'2<40%25'}=id=true}
status=500 QTime=217

ERROR:

1510697 [qtp1855032000-20] ERROR org.apache.solr.servlet.SolrDispatchFilter
 û null:java.lang.NullPointerException
at
org.apache.solr.search.QueryResultKey.(QueryResultKey.java:53)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1416)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:586)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:511)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:235)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)


SCHEMA.XML:

   
   
   
   

   
   

   
   
   
   
   
   

   


   
   

   

   
   

   
   
   
   c

   
   
   

   



   

   

   
   

   
   

   

   



Re:5.5.0 SOLR-8621 deprecation warnings without maxMergeDocs or mergeFactor

2016-02-24 Thread Christine Poerschke (BLOOMBERG/ LONDON)
https://issues.apache.org/jira/browse/SOLR-8734 created for follow-up.

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: Feb 24 2016 22:41:14

Hi Markus - thank you for the question.

Could you advise if/that the solrconfig.xml has a  element (for 
which deprecated warnings would appear separately) or that the solrconfig.xml 
has no  element?

If either is the case then yes based on the code (SolrIndexConfig.java#L153) 
the warnings would be expected-and-harmless though admittedly are confusing, 
and fixable.

Thanks,
Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: Feb 24 2016 17:24:45

Hi - i see lots of:

o.a.s.c.Config Beginning with Solr 5.5,  is deprecated, configure 
it on the relevant  instead.

On my development machine for all cores. None of the cores has either parameter 
configured. Is this expected?

Thanks,
Markus




Re:5.5.0 SOLR-8621 deprecation warnings without maxMergeDocs or mergeFactor

2016-02-24 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Markus - thank you for the question.

Could you advise if/that the solrconfig.xml has a  element (for 
which deprecated warnings would appear separately) or that the solrconfig.xml 
has no  element?

If either is the case then yes based on the code (SolrIndexConfig.java#L153) 
the warnings would be expected-and-harmless though admittedly are confusing, 
and fixable.

Thanks,
Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: Feb 24 2016 17:24:45

Hi - i see lots of:

o.a.s.c.Config Beginning with Solr 5.5,  is deprecated, configure 
it on the relevant  instead.

On my development machine for all cores. None of the cores has either parameter 
configured. Is this expected?

Thanks,
Markus



solr 4.8.1 to 4.10.4 upgrade / luceneMatchVersion

2015-06-25 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello.

We would like to upgrade from solr 4.8.1 to 4.10.4 and for existing collections 
(at least initially) continue to use the 4.8 lucene format rather than the 
latest 4.10 format.

Two main reasons for the preference:
(a) no need to worry about a not-yet-upgraded 4.8 replica recovering from an 
already-upgraded 4.10 replica.
(b) more options if it were to be necessary to downgrade from 4.10 back to 4.8 
in a hurry.

[As an aside, we have seen that 4.10.4 creates a 
/configs/configName/_rest_managed.json znode which 4.8 does not like, so any 
downgrading would need to consider that also, on dev just re-uploading the 
configName config to zookeeper appears to work.]

Questions:

Q1a: Is it possible (for a 4.8.1 to 4.10.4 upgrade) to have this stay with 
lucene-4.8 behaviour, out of the box or with a recommended patch?

Q1b: It appears making LiveIndexWriterConfig.matchVersion public and having 
DocumentsWriterPerThread use it instead of Version.LATEST could work. Has 
anyone used such a patch approach and/or can think of reasons why it would not 
work or not be a good idea?

Q2: Has anyone encountered something similar more generally i.e. with 
other/later versions of solr-lucene? (LUCENE-5871 removed the use of Version in 
IndexWriterConfig from version 5.0 onwards.)

Regards,

Christine

-

the potential patch referred to in Q1b above:

--- a/lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
+++ b/lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
@@ -178,7 +178,7 @@ class DocumentsWriterPerThread {
 pendingUpdates.clear();
 deleteSlice = deleteQueue.newSlice();

-segmentInfo = new SegmentInfo(directoryOrig, Version.LATEST, segmentName, 
-1, false, codec, null);
+segmentInfo = new SegmentInfo(directoryOrig, 
this.indexWriterConfig.matchVersion, segmentName, -1, false, codec, null);
 assert numDocsInRAM == 0;
 if (INFO_VERBOSE  infoStream.isEnabled(DWPT)) {
   infoStream.message(DWPT, Thread.currentThread().getName() +  init 
seg= + segmentName +  delQueue= + deleteQueue);  

--- a/lucene/core/src/java/org/apache/lucene/index/LiveIndexWriterConfig.java
+++ b/lucene/core/src/java/org/apache/lucene/index/LiveIndexWriterConfig.java
@@ -96,7 +96,7 @@ public class LiveIndexWriterConfig {
   protected volatile int perThreadHardLimitMB;
 
   /** {@link Version} that {@link IndexWriter} should emulate. */
-  protected final Version matchVersion;
+  public final Version matchVersion;
 
   /** True if segment flushes should use compound file format */
   protected volatile boolean useCompoundFile = 
IndexWriterConfig.DEFAULT_USE_COMPOUND_FILE_SYSTEM;

indexing delay due to zookeeper election

2013-12-23 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hello.

The behaviour we observed was that a zookeeper election took about 2s plus 1.5s 
for reading the zoo_data snapshot. During this time solr tried to establish 
connections to any zookeeper in the ensemble but only succeeded once a leader 
was elected *and* that leader was done reading the snapshot. Solr document 
updates were slowed down during this time window.

Is this expected to happen during and shortly after elections, that is 
zookeeper closing existing connections, rejecting new connections and thus 
stalling solr updates?

Other than avoiding zookeeper elections, are there ways of reducing their 
impact on solr?

Thanks,

Christine


zookeeper log extract

08:18:54,968 [QuorumCnxManager.java:762] Connection broken for id ...
08:18:56,916 [Leader.java:345] LEADING - LEADER ELECTION TOOK - 1941
08:18:56,918 [FileSnap.java:83] Reading snapshot ...
...
08:18:57,010 [NIOServerCnxnFactory.java:197] Accepted socket connection from ...
08:18:57,010 [NIOServerCnxn.java:354] Exception causing close of session 0x0 
due to java.io.IOException: ZooKeeperServer not running
08:18:57,010 [NIOServerCnxn.java:1001] Closed socket connection for client ... 
(no session established for client)
...
08:18:58,496 [FileTxnSnapLog.java:240] Snapshotting: ... to ...


solr log extract

08:18:54,968 [ClientCnxn.java:1085] Unable to read additional data from server 
sessionid ... likely server has closed socket, closing socket connection and 
attempting reconnect
08:18:55,068 [ConnectionManager.java:72] Watcher 
org.apache.solr.common.cloud.ConnectionManager@... name:ZooKeeperConnection 
Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent 
state:Disconnected type:None path:null path:null type:None
08:18:55,068 [ConnectionManager.java:132] zkClient has disconnected
...
08:18:55,961 [ClientCnxn.java:966] Opening socket connection to server ... 
08:18:55,961 [ClientCnxn.java:849] Socket connection established to ...
08:18:55,962 [ClientCnxn.java:1085] Unable to read additional data from server 
sessionid ... likely server has closed socket, closing socket connection and 
attempting reconnect
...
08:18:56,714 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:56,715 [ClientCnxn.java:849] Socket connection established to ...
08:18:56,715 [ClientCnxn.java:1085] Unable to read additional data from ...
...
08:18:57,640 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:57,641 [ClientCnxn.java:849] Socket connection established to ... 
08:18:57,641 [ClientCnxn.java:1085] Unable to read additional data from ...
...
08:18:58,352 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:58,353 [ClientCnxn.java:849] Socket connection established to ... 
08:18:58,353 [ClientCnxn.java:1085] Unable to read additional data from ...
...
08:18:58,749 [ClientCnxn.java:966] Opening socket connection to server ...
08:18:58,749 [ClientCnxn.java:849] Socket connection established to ...
08:18:58,751 [ClientCnxn.java:1207] Session establishment complete on server 
... sessionid = ..., negotiated timeout = ...
08:18:58,751 ... [ConnectionManager.java:72] Watcher
org.apache.solr.common.cloud.ConnectionManager@... name:ZooKeeperConnection
Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None



Re: indexing delay due to zookeeper election

2013-12-23 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Sure. https://issues.apache.org/jira/i#browse/SOLR-5577 filed. Thanks.

- Original Message -
From: solr-user@lucene.apache.org
To: Christine Poerschke (BLOOMBERG/ LONDON), solr-user@lucene.apache.org
At: Dec 23 2013 18:12:50

Interesting stuff! This is expected but not really something I have thought
about yet.

Can you file a JIRA issue? I think we want to try and tackle this with code.

We currently reject updates when we lose our connection to ZooKeeper. We
are pretty strict about this. I think you could reasonably be less strict
(eg not start rejecting updates for a few seconds).

- Mark


On Mon, Dec 23, 2013 at 12:49 PM, Christine Poerschke (BLOOMBERG/ LONDON) 
cpoersc...@bloomberg.net wrote:

 Hello.

 The behaviour we observed was that a zookeeper election took about 2s plus
 1.5s for reading the zoo_data snapshot. During this time solr tried to
 establish connections to any zookeeper in the ensemble but only succeeded
 once a leader was elected *and* that leader was done reading the snapshot.
 Solr document updates were slowed down during this time window.

 Is this expected to happen during and shortly after elections, that is
 zookeeper closing existing connections, rejecting new connections and thus
 stalling solr updates?

 Other than avoiding zookeeper elections, are there ways of reducing their
 impact on solr?

 Thanks,

 Christine


 zookeeper log extract

 08:18:54,968 [QuorumCnxManager.java:762] Connection broken for id ...
 08:18:56,916 [Leader.java:345] LEADING - LEADER ELECTION TOOK - 1941
 08:18:56,918 [FileSnap.java:83] Reading snapshot ...
 ...
 08:18:57,010 [NIOServerCnxnFactory.java:197] Accepted socket connection
 from ...
 08:18:57,010 [NIOServerCnxn.java:354] Exception causing close of session
 0x0 due to java.io.IOException: ZooKeeperServer not running
 08:18:57,010 [NIOServerCnxn.java:1001] Closed socket connection for client
 ... (no session established for client)
 ...
 08:18:58,496 [FileTxnSnapLog.java:240] Snapshotting: ... to ...


 solr log extract

 08:18:54,968 [ClientCnxn.java:1085] Unable to read additional data from
 server sessionid ... likely server has closed socket, closing socket
 connection and attempting reconnect
 08:18:55,068 [ConnectionManager.java:72] Watcher
 org.apache.solr.common.cloud.ConnectionManager@...
 name:ZooKeeperConnection Watcher:host1:port1,host2:port2,host3:port3,...
 got event WatchedEvent state:Disconnected type:None path:null path:null
 type:None
 08:18:55,068 [ConnectionManager.java:132] zkClient has disconnected
 ...
 08:18:55,961 [ClientCnxn.java:966] Opening socket connection to server ...
 08:18:55,961 [ClientCnxn.java:849] Socket connection established to ...
 08:18:55,962 [ClientCnxn.java:1085] Unable to read additional data from
 server sessionid ... likely server has closed socket, closing socket
 connection and attempting reconnect
 ...
 08:18:56,714 [ClientCnxn.java:966] Opening socket connection to server ...
 08:18:56,715 [ClientCnxn.java:849] Socket connection established to ...
 08:18:56,715 [ClientCnxn.java:1085] Unable to read additional data from ...
 ...
 08:18:57,640 [ClientCnxn.java:966] Opening socket connection to server ...
 08:18:57,641 [ClientCnxn.java:849] Socket connection established to ...
 08:18:57,641 [ClientCnxn.java:1085] Unable to read additional data from ...
 ...
 08:18:58,352 [ClientCnxn.java:966] Opening socket connection to server ...
 08:18:58,353 [ClientCnxn.java:849] Socket connection established to ...
 08:18:58,353 [ClientCnxn.java:1085] Unable to read additional data from ...
 ...
 08:18:58,749 [ClientCnxn.java:966] Opening socket connection to server ...
 08:18:58,749 [ClientCnxn.java:849] Socket connection established to ...
 08:18:58,751 [ClientCnxn.java:1207] Session establishment complete on
 server ... sessionid = ..., negotiated timeout = ...
 08:18:58,751 ... [ConnectionManager.java:72] Watcher
 org.apache.solr.common.cloud.ConnectionManager@...
 name:ZooKeeperConnection
 Watcher:host1:port1,host2:port2,host3:port3,... got event WatchedEvent
 state:SyncConnected type:None path:null path:null type:None




-- 
- Mark