Using Autoscaling Simulation Framework to simulate a lost node in a cluster

2020-09-21 Thread Howard Gonzalez
Hello folks, has anyone tried to use the autoscaling simulation framework to 
simulate a lost node in a solr cluster? I was trying to do the following:

1.- Take a current production cluster state snapshout using bin/solr 
autoscaling -save
2.- Modify the clusterstate and livenodes json files in the generated folder to 
delete one of the nodes and its related replicas.
3.- Modify the clusterstate and livenodes json files in the generated folder to 
create a new empty node in the cluster.
4.- Run simulations playing with different policies and trigger waitFor values 
when adding a new node in the cluster replacing the lost one to see if the 
rules and triggers behave as expected.

But I was wondering, is this something supported by the framework?. Or if 
there's a better approach to simulate this?

Thanks in advance for any guidance/tips.


How to use query function inside a function query in Solr LTR

2020-09-21 Thread krishan goyal
Hi,

I have use cases of features which require a query function and some more
math on top of the result of the query function

Eg of a feature : no of extra terms in the document from input text

I am trying various ways of representing this feature but always getting an
exception
java.lang.RuntimeException: Exception from createWeight for SolrFeature
. Failed to parse feature query.

 Feature representations
"name" : "no_of_extra_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params": {
"q": "{!func}sub(num_tokens_int,query({!dismax
qf=field_name}${text}))"
},

where num_tokens_int is a stored field which contains no of tokens in the
document


Also, feature representation with just a query parser like

"q": "{!dismax df=field_name}${text}"

works but I can't really getting my desired feature representation without
using it in a function query where i want to operate on the result of this
query to derive my actual feature


Re: Autoscaling Rule for replica distribution across zones

2020-09-21 Thread Dominique Bejean
Hi,

I also tried this 2 rules and I still have all replicas of all shards of
the collection created in one single zone

curl 'http://localhost:8983/api/cluster/autoscaling' -H
'Content-type:application/json' -d '{ "set-policy": { "policyzone": [
{"replica": "#EQUAL", "shard": "#EACH", "nodeset":[{"sysprop.zone":
"dc1"},{"sysprop.zone":  "dc2"}]} ] } }'

curl 'http://localhost:8983/api/cluster/autoscaling' -H
'Content-type:application/json' -d '{ "set-policy": { "policyzone":
[{"replica": "50%", "shard": "#EACH", "nodeset":{ "sysprop.zone": "dc1"}},

{"replica": "50%", "shard": "#EACH", "nodeset":{"sysprop.zone": "dc2"}}] }
}'

Dominique


Le ven. 18 sept. 2020 à 12:13, Dominique Bejean 
a écrit :

> Hi,
>
> I have 4 nodes solrcloud cluster. 2 nodes (solr1 and solr3) are started
> with the parametrer -Dzone=dc1 and  the 2 other nodes (solr 2 and Solr4)
> are started with the parametrer -Dzone=dc2
>
> I want to create Autoscaling placement Rule in order to equally distribute
> replicas of a shard over zone (never 2 replicas of a shard in the same
> zone). According documentation, I created this rule
>
> { "set-policy": { "policyzone": [ {"replica": "#EQUAL", "shard": "#EACH",
> "sysprop.zone": ["dc1", "dc2"]} ] } }
>
> I create a collection with 2 shards and 2 replicas, and the 4 cores are
> created on solr2 and solr4 nodes so only in zone=dc2
>
> What is wrong in my rule ?
>
> Regards.
>
> Dominique Béjean
>
>
>
>
>
>
>
>


Re: Many small instances, or few large instances?

2020-09-21 Thread Erick Erickson
In a word, yes. G1GC still has spikes, and the larger the heap the more likely 
you’ll be to encounter them. So having multiple JVMS rather than one large JVM 
with a ginormous heap is still recommended.

I’ve seen some cases that used the Zing zero-pause product with very large 
heaps, but they were forced into that by the project requirements.

That said, when Java has a ZCG option, I think we’re in uncharted territory. I 
frankly don’t know what using very large heaps without having to worry about GC 
pauses will mean for Solr. I suspect we’ll have to do something to take 
advantage of that. For instance, could we support a topology where all shards 
had at least one replica in the same JVM that didn’t make any HTTP requests? 
Would that topology be common enough to support? Maybe extend “rack aware” to 
be “JVM aware”? Etc.

One thing that does worry me is that it’ll be easier and easier to “just throw 
more memory at it” rather than examine whether you’re choosing options that 
minimize heap requirements. And Lucene has done a lot to move memory to the OS 
rather than heap (e.g. docValues, MMapDirectory etc.).

Anyway, carry on as before for the nonce.

Best,
Erick

> On Sep 21, 2020, at 6:06 AM, Bram Van Dam  wrote:
> 
> Hey folks,
> 
> I've always heard that it's preferred to have a SolrCloud setup with
> many smaller instances under the CompressedOops limit in terms of
> memory, instead of having larger instances with, say, 256GB worth of
> heap space.
> 
> Does this recommendation still hold true with newer garbage collectors?
> G1 is pretty fast on large heaps. ZGC and Shenandoah promise even more
> improvements.
> 
> Thx,
> 
> - Bram



Solr LTR Performance Issues

2020-09-21 Thread krishan goyal
I was observing a high degradation in performance when adding more features
to my solr LTR model even if the model complexity (no of trees, depth of
tree) remains same. I am using the MultipleAdditiveTreesModel model

Moreover, if model complexity increases keeping no of features constant,
performance degrades only slightly.

This seemed odd as model complexity should have been much more performance
heavy than just looking up features, so I looked at LTR code to understand
cause. This is my findings in solr 7.7

Use case:

   - The features to my model are very dynamic and request dependent.
   - The features are mainly scoring features rather than filter/boolean
   features


Findings

   - The assumption was that features are computed only for top N docs
   which need to be reranked by LTR
   - The problem starts in the LTRRescorer.scoreFeatures.
  - This ends up calling SolrIndexSearcher.getProcessedFilter() for
  each top doc to be reranked and for each feature required.
  - Each feature is an individual query
  to SolrIndexSearcher.getProcessedFilter(). And each query is looked up /
  inserted into filter cache in getPositiveDocSet().
  - The bulk of the cost (>90%) of LTRRescorer.scoreFeatures() is in
  DefaultBulkScorer.scoreAll() method which actually creates the
doc set for
  these queries.
  - This ends up collecting all docs for few features which are scoring
  features rather than filtering features
  - Because features are dynamic, there is actually very little reuse
  of the filter cache except for the ongoing request thus the doc bit set
  collection happens almost every request
   - We probably need to change SolrFeature.scorer() to
  - only operate on doc required to be scored
  - utilise a cache where applicable for features which can be reused
  across requests

Please let me know if this seems appropriate and valid and will file a JIRA
request


Re: Issues deploying LTR into SolrCloud

2020-09-21 Thread krishan goyal
Not sure how solr cloud works but if your still facing issues, can try this

1. Deploy the features and models as a _schema_feature-store.json
and _schema_model-store.json file in the right config set.
2. Can either deploy to all nodes (works for me) or add these files
to confFiles in /replication request handler.


On Wed, Aug 26, 2020 at 1:00 PM Dmitry Kan  wrote:

> Hello,
>
> Just noticed my numbering is off, should be:
>
> 1. Deploy a feature store from a JSON file to each collection.
> 2. Reload all collections as advised in the documentation:
>
> https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html#applying-changes
> 3. Deploy the related model from a JSON file.
> 4. Reload all collections again.
>
>
> An update: applying this process twice I was able to fix the issue.
> However, it required "patching" individual collections, while reloading was
> done for all collections at once. I'm not sure this is very transparent to
> the user: maybe show the model deployment status per collection in the
> admin UI?
>
> Thanks,
>
> Dmitry
>
> On Tue, Aug 25, 2020 at 6:20 PM Dmitry Kan  wrote:
>
> > Hi,
> >
> > There is a recent thread "Replication of Solr Model and feature store" on
> > deploying LTR feature store and model into a master/slave Solr topology.
> >
> > I'm facing an issue of deploying into SolrCloud (solr 7.5.0), where
> > collections have shards with replicas. This is the process I've been
> > following:
> >
> > 1. Deploy a feature store from a JSON file to each collection.
> > 2. Reload all collections as advised in the documentation:
> >
> https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html#applying-changes
> > 3. Deploy the related model from a JSON file.
> > 3. Reload all collections again.
> >
> >
> > The problem is that even after reloading the collections, shard replicas
> > continue to not have the model:
> >
> > Error from server at
> > http://server1:8983/solr/collection1_shard1_replica_n1: cannot find
> model
> > 'model_name'
> >
> > What is the proper way to address this issue and can it be potentially a
> > bug in SolrCloud?
> >
> > Is there any workaround I can try, like saving the feature store and
> model
> > JSON files into the collection config path and creating the SolrCloud
> from
> > there?
> >
> > Thanks,
> >
> > Dmitry
> >
> > --
> > Dmitry Kan
> > Luke Toolbox: http://github.com/DmitryKey/luke
> > Blog: http://dmitrykan.blogspot.com and https://medium.com/@dmitry.kan
> > Twitter: http://twitter.com/dmitrykan
> > SemanticAnalyzer: https://semanticanalyzer.info
> >
> >
>


Many small instances, or few large instances?

2020-09-21 Thread Bram Van Dam
Hey folks,

I've always heard that it's preferred to have a SolrCloud setup with
many smaller instances under the CompressedOops limit in terms of
memory, instead of having larger instances with, say, 256GB worth of
heap space.

Does this recommendation still hold true with newer garbage collectors?
G1 is pretty fast on large heaps. ZGC and Shenandoah promise even more
improvements.

Thx,

 - Bram


Re: Solr training

2020-09-21 Thread Charlie Hull

Hi Matthew & all,

Why not? Try the code 'evenearlier' for a further discount! (Oh and we 
extended the earlybird period for another week).


Cheers

Charlie

On 17/09/2020 21:00, matthew sporleder wrote:

Is there a friends-on-the-mailing list discount?  I had a bit of sticker shock!

On Wed, Sep 16, 2020 at 9:38 AM Charlie Hull  wrote:

I do of course mean 'Group Discounts': you don't get a discount for
being in a 'froup' sadly (I wasn't even aware that was a thing!)

Charlie





On 16/09/2020 13:26, Charlie Hull wrote:

Hi all,

We're running our SolrThink Like a Relevance Engineer training 6-9 Oct
- you can find out more & book tickets at
https://opensourceconnections.com/training/solr-think-like-a-relevance-engineer-tlre/

The course is delivered over 4 half-days from 9am EST / 2pm BST / 3pm
CET and is led by Eric Pugh who co-wrote the first book on Solr and is
a Solr Committer. It's suitable for all members of the search team -
search engineers, data scientists, even product owners who want to
know how Solr search can be measured & tuned. Delivered by working
relevance engineers the course features practical exercises and will
give you a great foundation in how to use Solr to build great search.

Tthe early bird discount expires end of this week so do book soon if
you're interested! Froup discounts also available. We're also running
a more advanced course on Learning to Rank a couple of weeks later -
you can find all our training courses and dates at
https://opensourceconnections.com/training/

Cheers

Charlie

--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web:www.o19s.com


--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com



--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com