Using Autoscaling Simulation Framework to simulate a lost node in a cluster
Hello folks, has anyone tried to use the autoscaling simulation framework to simulate a lost node in a solr cluster? I was trying to do the following: 1.- Take a current production cluster state snapshout using bin/solr autoscaling -save 2.- Modify the clusterstate and livenodes json files in the generated folder to delete one of the nodes and its related replicas. 3.- Modify the clusterstate and livenodes json files in the generated folder to create a new empty node in the cluster. 4.- Run simulations playing with different policies and trigger waitFor values when adding a new node in the cluster replacing the lost one to see if the rules and triggers behave as expected. But I was wondering, is this something supported by the framework?. Or if there's a better approach to simulate this? Thanks in advance for any guidance/tips.
How to use query function inside a function query in Solr LTR
Hi, I have use cases of features which require a query function and some more math on top of the result of the query function Eg of a feature : no of extra terms in the document from input text I am trying various ways of representing this feature but always getting an exception java.lang.RuntimeException: Exception from createWeight for SolrFeature . Failed to parse feature query. Feature representations "name" : "no_of_extra_terms", "class" : "org.apache.solr.ltr.feature.SolrFeature", "params": { "q": "{!func}sub(num_tokens_int,query({!dismax qf=field_name}${text}))" }, where num_tokens_int is a stored field which contains no of tokens in the document Also, feature representation with just a query parser like "q": "{!dismax df=field_name}${text}" works but I can't really getting my desired feature representation without using it in a function query where i want to operate on the result of this query to derive my actual feature
Re: Autoscaling Rule for replica distribution across zones
Hi, I also tried this 2 rules and I still have all replicas of all shards of the collection created in one single zone curl 'http://localhost:8983/api/cluster/autoscaling' -H 'Content-type:application/json' -d '{ "set-policy": { "policyzone": [ {"replica": "#EQUAL", "shard": "#EACH", "nodeset":[{"sysprop.zone": "dc1"},{"sysprop.zone": "dc2"}]} ] } }' curl 'http://localhost:8983/api/cluster/autoscaling' -H 'Content-type:application/json' -d '{ "set-policy": { "policyzone": [{"replica": "50%", "shard": "#EACH", "nodeset":{ "sysprop.zone": "dc1"}}, {"replica": "50%", "shard": "#EACH", "nodeset":{"sysprop.zone": "dc2"}}] } }' Dominique Le ven. 18 sept. 2020 à 12:13, Dominique Bejean a écrit : > Hi, > > I have 4 nodes solrcloud cluster. 2 nodes (solr1 and solr3) are started > with the parametrer -Dzone=dc1 and the 2 other nodes (solr 2 and Solr4) > are started with the parametrer -Dzone=dc2 > > I want to create Autoscaling placement Rule in order to equally distribute > replicas of a shard over zone (never 2 replicas of a shard in the same > zone). According documentation, I created this rule > > { "set-policy": { "policyzone": [ {"replica": "#EQUAL", "shard": "#EACH", > "sysprop.zone": ["dc1", "dc2"]} ] } } > > I create a collection with 2 shards and 2 replicas, and the 4 cores are > created on solr2 and solr4 nodes so only in zone=dc2 > > What is wrong in my rule ? > > Regards. > > Dominique Béjean > > > > > > > >
Re: Many small instances, or few large instances?
In a word, yes. G1GC still has spikes, and the larger the heap the more likely you’ll be to encounter them. So having multiple JVMS rather than one large JVM with a ginormous heap is still recommended. I’ve seen some cases that used the Zing zero-pause product with very large heaps, but they were forced into that by the project requirements. That said, when Java has a ZCG option, I think we’re in uncharted territory. I frankly don’t know what using very large heaps without having to worry about GC pauses will mean for Solr. I suspect we’ll have to do something to take advantage of that. For instance, could we support a topology where all shards had at least one replica in the same JVM that didn’t make any HTTP requests? Would that topology be common enough to support? Maybe extend “rack aware” to be “JVM aware”? Etc. One thing that does worry me is that it’ll be easier and easier to “just throw more memory at it” rather than examine whether you’re choosing options that minimize heap requirements. And Lucene has done a lot to move memory to the OS rather than heap (e.g. docValues, MMapDirectory etc.). Anyway, carry on as before for the nonce. Best, Erick > On Sep 21, 2020, at 6:06 AM, Bram Van Dam wrote: > > Hey folks, > > I've always heard that it's preferred to have a SolrCloud setup with > many smaller instances under the CompressedOops limit in terms of > memory, instead of having larger instances with, say, 256GB worth of > heap space. > > Does this recommendation still hold true with newer garbage collectors? > G1 is pretty fast on large heaps. ZGC and Shenandoah promise even more > improvements. > > Thx, > > - Bram
Solr LTR Performance Issues
I was observing a high degradation in performance when adding more features to my solr LTR model even if the model complexity (no of trees, depth of tree) remains same. I am using the MultipleAdditiveTreesModel model Moreover, if model complexity increases keeping no of features constant, performance degrades only slightly. This seemed odd as model complexity should have been much more performance heavy than just looking up features, so I looked at LTR code to understand cause. This is my findings in solr 7.7 Use case: - The features to my model are very dynamic and request dependent. - The features are mainly scoring features rather than filter/boolean features Findings - The assumption was that features are computed only for top N docs which need to be reranked by LTR - The problem starts in the LTRRescorer.scoreFeatures. - This ends up calling SolrIndexSearcher.getProcessedFilter() for each top doc to be reranked and for each feature required. - Each feature is an individual query to SolrIndexSearcher.getProcessedFilter(). And each query is looked up / inserted into filter cache in getPositiveDocSet(). - The bulk of the cost (>90%) of LTRRescorer.scoreFeatures() is in DefaultBulkScorer.scoreAll() method which actually creates the doc set for these queries. - This ends up collecting all docs for few features which are scoring features rather than filtering features - Because features are dynamic, there is actually very little reuse of the filter cache except for the ongoing request thus the doc bit set collection happens almost every request - We probably need to change SolrFeature.scorer() to - only operate on doc required to be scored - utilise a cache where applicable for features which can be reused across requests Please let me know if this seems appropriate and valid and will file a JIRA request
Re: Issues deploying LTR into SolrCloud
Not sure how solr cloud works but if your still facing issues, can try this 1. Deploy the features and models as a _schema_feature-store.json and _schema_model-store.json file in the right config set. 2. Can either deploy to all nodes (works for me) or add these files to confFiles in /replication request handler. On Wed, Aug 26, 2020 at 1:00 PM Dmitry Kan wrote: > Hello, > > Just noticed my numbering is off, should be: > > 1. Deploy a feature store from a JSON file to each collection. > 2. Reload all collections as advised in the documentation: > > https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html#applying-changes > 3. Deploy the related model from a JSON file. > 4. Reload all collections again. > > > An update: applying this process twice I was able to fix the issue. > However, it required "patching" individual collections, while reloading was > done for all collections at once. I'm not sure this is very transparent to > the user: maybe show the model deployment status per collection in the > admin UI? > > Thanks, > > Dmitry > > On Tue, Aug 25, 2020 at 6:20 PM Dmitry Kan wrote: > > > Hi, > > > > There is a recent thread "Replication of Solr Model and feature store" on > > deploying LTR feature store and model into a master/slave Solr topology. > > > > I'm facing an issue of deploying into SolrCloud (solr 7.5.0), where > > collections have shards with replicas. This is the process I've been > > following: > > > > 1. Deploy a feature store from a JSON file to each collection. > > 2. Reload all collections as advised in the documentation: > > > https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html#applying-changes > > 3. Deploy the related model from a JSON file. > > 3. Reload all collections again. > > > > > > The problem is that even after reloading the collections, shard replicas > > continue to not have the model: > > > > Error from server at > > http://server1:8983/solr/collection1_shard1_replica_n1: cannot find > model > > 'model_name' > > > > What is the proper way to address this issue and can it be potentially a > > bug in SolrCloud? > > > > Is there any workaround I can try, like saving the feature store and > model > > JSON files into the collection config path and creating the SolrCloud > from > > there? > > > > Thanks, > > > > Dmitry > > > > -- > > Dmitry Kan > > Luke Toolbox: http://github.com/DmitryKey/luke > > Blog: http://dmitrykan.blogspot.com and https://medium.com/@dmitry.kan > > Twitter: http://twitter.com/dmitrykan > > SemanticAnalyzer: https://semanticanalyzer.info > > > > >
Many small instances, or few large instances?
Hey folks, I've always heard that it's preferred to have a SolrCloud setup with many smaller instances under the CompressedOops limit in terms of memory, instead of having larger instances with, say, 256GB worth of heap space. Does this recommendation still hold true with newer garbage collectors? G1 is pretty fast on large heaps. ZGC and Shenandoah promise even more improvements. Thx, - Bram
Re: Solr training
Hi Matthew & all, Why not? Try the code 'evenearlier' for a further discount! (Oh and we extended the earlybird period for another week). Cheers Charlie On 17/09/2020 21:00, matthew sporleder wrote: Is there a friends-on-the-mailing list discount? I had a bit of sticker shock! On Wed, Sep 16, 2020 at 9:38 AM Charlie Hull wrote: I do of course mean 'Group Discounts': you don't get a discount for being in a 'froup' sadly (I wasn't even aware that was a thing!) Charlie On 16/09/2020 13:26, Charlie Hull wrote: Hi all, We're running our SolrThink Like a Relevance Engineer training 6-9 Oct - you can find out more & book tickets at https://opensourceconnections.com/training/solr-think-like-a-relevance-engineer-tlre/ The course is delivered over 4 half-days from 9am EST / 2pm BST / 3pm CET and is led by Eric Pugh who co-wrote the first book on Solr and is a Solr Committer. It's suitable for all members of the search team - search engineers, data scientists, even product owners who want to know how Solr search can be measured & tuned. Delivered by working relevance engineers the course features practical exercises and will give you a great foundation in how to use Solr to build great search. Tthe early bird discount expires end of this week so do book soon if you're interested! Froup discounts also available. We're also running a more advanced course on Learning to Rank a couple of weeks later - you can find all our training courses and dates at https://opensourceconnections.com/training/ Cheers Charlie -- Charlie Hull OpenSource Connections, previously Flax tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web:www.o19s.com -- Charlie Hull OpenSource Connections, previously Flax tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.o19s.com -- Charlie Hull OpenSource Connections, previously Flax tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.o19s.com