Solr : 4.9.x , with simple solr cloud on jetty. JDK 1.7 num of replica : 4 , one replica for each shard num of shards : 1
Hi All, I have been facing below issues with solr suggester introduced in 4.7.x. Do any one have good working solution or buildOnCommit=true property is suggested not to use with index with more frequent softcommits as suggested in the documentation https://cwiki.apache.org/confluence/display/solr/Suggester So we have disabled this (buildOnCommit=false) and started using buildOnOptimize=true, which was not helping us to have latest document suggestion (with frequent softcommits), as hardly there was one optimize each day. (we have default optimize setting in solrconfig) So we have disabled buildOnOptimize (buildOnOptimize=false) As suggested in the documentation, as of now, we came up with cron jobs to build the suggester for every hour. These jobs are doing their job, i.e, we are having the latest suggestions available every hour, below are issues that we have this implementation. *Issue#1* : Suggest built url i.e, *http://$solrnode:8983/solr/collection1/suggest?suggest.build=true* if issued to one replica of solr cloud does not build suggesters in all of the replicas in solrcloud. Resolution: For which we have separate cron jobs on each of the solr instance having the build call to build the suggester, below is the raw pictorial representation of this impl (which is not the best implementation which has many flaws) *http://$solrnode:8983/solr/collection1/suggest?suggest.build=true* * |* * |-- suggestcron.job.sh <http://suggestcron.job.sh> (on solr1.aws.instance)* *http://$solrnode:8983/solr/collection1/suggest?suggest.build=true* * |* * |-- suggestcron.job.sh <http://suggestcron.job.sh> (on solr2.aws.instance)* * .......... similar for other solr nodes* * We will be coming up with single script to go this for all collection later.* we were bit happy that we are having a updated suggester in all of the instances, *which is not!* *The issue#2 the suggester built on all solr nodes were not consistent as the solr core in each solr replica have difference in max-docs and num-docs * *(which is quiet normal **with frequent softcommits , when updates mostly have the same documents updated with different data, **i guess , correct me if i'm wrong )* when we query curl -i "http:// $solrnode:8983/solr/liveaodfuture/suggest?q=Nirvana&wt=json&indent=true" one of the solr node returns { "responseHeader":{ "status":0, "QTime":0}, "suggest":{ "AnalyzingSuggester":{ "Nirvana":{ "numFound":1, "suggestions":[{ "term":"nirvana", "weight":6, "payload":""}]}}, "DictionarySuggester":{ "Nirvana":{ "numFound":0, "suggestions":[]}}}} /admin/luke/collection/ call status "index":{ "numDocs":90564, "maxDoc":94583, "deletedDocs":4019, .......} while other 3 solr node returns { "responseHeader":{ "status":0, "QTime":1}, "suggest":{ "AnalyzingSuggester":{ "Nirvana":{ "numFound":2, "suggestions":[{ "term":"nirvana", "weight":163, "payload":""}, * {* * "term":"nirvana cover",* * "weight":11,* * "payload":""}]}},* "DictionarySuggester":{ "Nirvana":{ "numFound":0, "suggestions":[]}}}} /admin/luke/collection/ call status on other 3 solr nodes... which have different maxDoc that the above solr node. "index":{ "numDocs":90564, "maxDoc":156760, ........} when i check the built time for suggest directory of the collection on each solr node have the same time ls -lah /mnt/solrdrive/solr/cores/*/data/suggest_analyzing/* -rw-r--r-- 1 root root 3.0M May 20 16:00 /mnt/solrdrive/solr/cores/collection1_shard1_replica3/data/suggest_analyzing/wfsta.bin Questions: Does the suggester built url i.e, *http://$solrnode:8983/solr/collection1/suggest?suggest.build=true *consider maxdocs or deleted docs also? Does the suggester built from i.e, *solr/collection1/suggest?suggest.build=true *is different from buildOnCommit=true property ? Do any one have better solution to keep the suggester current with contents in the index with more frequent softcommits? Does solr have any component like scheduler like cron scheduler to schedule the suggest build and scheduling the optimize on daily basis ? *Thanks,* *Rajesh**.*