Re: Replica is going into recovery in Solr 6.1.0

2020-02-13 Thread vishal patel
Total memory of server is 256 GB and in this server below application running Application1 50 GB Application2 30 GB Application3 8 GB Application4 2 GB Solr shard164 GB Solr shard2 replica 64 GB Note: Solr shard2 and shard1

Re: offline Solr index creation

2020-02-13 Thread Erick Erickson
Indexing rates scale pretty linearly with the number of shards, so one way to increase throughput is to simply create a collection with more shards. For the initial bulk-indexing operations, you can go with a 1-replica-per-shard scenario then ADDREPLICA if you need to build things out. However…

offline Solr index creation

2020-02-13 Thread vivek chaurasiya
Hi there, We are using AWS EMR as our big data processing cluster. We have like 3TB of text files where each line denotes a json record which I want to be indexed into Solr. I have tried this by batching them and pushing to Solr index using SolrJClient. But I feel thats really slow. My doubt is

Re: Replica is going into recovery in Solr 6.1.0

2020-02-13 Thread Erick Erickson
What Walter said. Also, you _must_ leave quite a bit of free RAM for the OS due to Lucene using MMapDirectory space, see: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Basically until you can get your GC pauses under control, you’ll have an unstable collection. How

Re: StatelessScriptUpdateProcessorFactory causing OOM errors?

2020-02-13 Thread Erick Erickson
Robert: My concern with fixing by adding memory is that it may just be kicking the can down the road. Assuming there really is some leak eventually they’ll accumulate and you’ll hit another OOM. If that were the case, I’d expect a cursory look at your memory usage to just keep increasing over

Re: Replica is going into recovery in Solr 6.1.0

2020-02-13 Thread Walter Underwood
You have a 64GB heap. That is extremely unusual. You can only do that if the instance has 80 GB or more of RAM. If you don’t have enough RAM, the JVM will start using swap space and cause extremely long GC pauses. How much RAM do you have? How did you choose these GC settings? We have been

Re: Adding replica to a shard with only down replicas

2020-02-13 Thread Erick Erickson
Adding a new replica won’t do you much good. Since there’s no leader, it won’t (well, shouldn’t) sync the index. Did you try the collections API FORCELEADER? It was put in as a last resort for this kind of situation. Best, Erick > On Feb 13, 2020, at 3:22 PM, tedsolr wrote: > > Solr 5.5.4. I

Re: Zookeeper upgrade required with Solr upgrade?

2020-02-13 Thread Erick Erickson
Yeah, 3.4.x upgrades were pretty strainght-forward. The 3.5.5 upgrade was trickier, but the problems were in the admin UI. The admin UI uses several “4 letter words” to do its ZooKeeper reporting, and that required explicit permissions, but IIRC that all only affected the admin UI reporting about

Re: Zookeeper upgrade required with Solr upgrade?

2020-02-13 Thread Rahul Goswami
Thanks Eric. Also, thanks for that little pointer about the JIRA. Just to make sure I also checked for the upgrade JIRAs for other two intermediate Zookeeper versions 3.4.11 and 3.4.13 between Solr 7.2.1 and Solr 7.7.2 and they didn't seem to contain any Solr code changes either. On Thu, Feb 13,

Adding replica to a shard with only down replicas

2020-02-13 Thread tedsolr
Solr 5.5.4. I have a collection with a single shard and two replicas. Both are reporting down. No shard leader exists. Each replica is on a different node. Should it be safe to attempt an ADDREPLICA command? Since there's no leader I don't know if that will work. This is the cluster state for the

Re: StatelessScriptUpdateProcessorFactory causing OOM errors?

2020-02-13 Thread Jörn Franke
I had also issues with this factory when creating atomic updates inside there. They worked, but searcher where never closed and new ones where open and stayed open with all the issues related to that one. Maybe one needs to look into more detail into that. However - it is a script in the end so

Re: StatelessScriptUpdateProcessorFactory causing OOM errors?

2020-02-13 Thread Haschart, Robert J (rh9ec)
Erick, Sorry I didn't see this response, for some reason solr-users has stopped being delivered to my mail box. The script that adds a field based on the value(s) in some other field doesn't add a large number of different fields to the index. The pool_f field only has a total of 11 different

Re: Bug? Documents not visible after sucessful commit - chaos testing

2020-02-13 Thread Chris Hostetter
: We think this is a bug (silently dropping commits even if the client : requested "waitForSearcher"), or at least a missing feature (commits beging : the only UpdateRequests not reporting the achieved RF), which should be : worth a JIRA Ticket. Thanks for your analysis Michael -- I agree

Re: wildcards match end-of-word?

2020-02-13 Thread Walter Underwood
Remove the stopword and stemmer filters from your schema and reindex. Removing stopwords means you can never match “vitamin a”. Stemming interferes with wildcard matches. Either stem or do wildcards on a field, not both. Also, what do your users expect to get with wildcard matches? Those are a

Re: [External] wildcards match end-of-word?

2020-02-13 Thread Jan Høydahl
Be aware that if you search a field with stemming, then the index will only contain the stems, i.e. cars, caring may both be indexed as «car», and when you do a wildcard search, all analysis is skipped, so you are only targeting the exact tokens that happen to be in that field. Thus a search

Re: Zookeeper upgrade required with Solr upgrade?

2020-02-13 Thread Erick Erickson
That should be OK. There were no code changes necessary for that upgrade. see SOLR-13363 > On Feb 12, 2020, at 5:34 PM, Rahul Goswami wrote: > > Hello, > We are running a SolrCloud (7.2.1) cluster and upgrading to Solr 7.7.2. We > run a separate multi node zookeeper ensemble which currently

Re: Using MM efficiently to get right number of results

2020-02-13 Thread Erick Erickson
It can be basically any thing you can do with a standard Solr query. > On Feb 13, 2020, at 9:09 AM, Nitin Arora wrote: > > Thanks Erick, a follow-up question for RerankQParser: > How complex can the rerank query itself be? Can we add multiple boost > factors based on different conditions - say,

Re: Using MM efficiently to get right number of results

2020-02-13 Thread Nitin Arora
Thanks Erick, a follow-up question for RerankQParser: How complex can the rerank query itself be? Can we add multiple boost factors based on different conditions - say, if category is X boost by 2, if brand is Y boost by 3, etc.? On Mon, 10 Feb 2020 at 18:12, Erick Erickson wrote: > There isn’t

RE: [External] Re: wildcards match end-of-word?

2020-02-13 Thread Fischer, Stephen
Also, if helpful, here is our solarconfig.xml https://github.com/VEuPathDB/SolrDeployment/blob/master/configsets/site-search/conf/solrconfig.xml Thanks again, from a Solr Newbie, steve -Original Message- From: Fischer, Stephen Sent: Thursday, February 13, 2020 7:52 AM To:

Re: [External] Re: wildcards match end-of-word?

2020-02-13 Thread Sotiris Fragkiskos
Hi, I could be wrong, but I'm starting to think that it has to do with the fieldType. In our case, wildcards don't seem to work at all with text_en types, but they do work with string types. On Thu, Feb 13, 2020 at 1:52 PM Fischer, Stephen < sfisc...@pennmedicine.upenn.edu> wrote: > Folks, > > I

RE: [External] Re: wildcards match end-of-word?

2020-02-13 Thread Fischer, Stephen
Folks, I am seeing very strange (bad) wildcard behavior (solr 8). "kinase" finds hits as expected. "kin*ase" and "kin*se" find 0 results. "kinase*" matches only values like "kinase," and "kinase-" but not "kinase" I have done the analysis as Erick suggested (thanks!) but it is not

Re: Async RELOADCOLLECTION never completes

2020-02-13 Thread Karl Stoney
When performing a rolling restart we see: 09:43:31.890 [OverseerThreadFactory-42-thread-5-processing-n:solr-5.search-solr.prod.k8.atcloud.io:80_solr] ERROR org.apache.solr.cloud.OverseerTaskProcessor - :org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session

Async RELOADCOLLECTION never completes

2020-02-13 Thread Karl Stoney
Hi, We’re periodically seeing an ASYNC task to RELOADCOLLECTION never complete, it’s just permanently “running”: ❯ curl -s http://solr.search-solr.prod.k8.atcloud.io/solr/admin/collections\?action\=REQUESTSTATUS\\=1581585716 | jq . { "responseHeader": { "status": 0, "QTime": 2 },

Re: wildcards match end-of-word?

2020-02-13 Thread Sotiris Fragkiskos
Hi Erick, thanks very much for this information, it was immensely useful, I always had the same question! I'm now seeing the Analysis page and finally I don't have to rely on an external online stemmer to see what solr *probably* stemmed the term to!! But I still can't make the asterisk and

Would changing the schema version from 1.5 to 1.6 require a reindex

2020-02-13 Thread Karl Stoney
Hey, I’m going to bump our schema version from 1.5 to 1.6 to get the implicit useDocValuesAsStored=true, would this require a reindex? Thanks Karl This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in

Re: Mongolian language in Solr

2020-02-13 Thread Charlie Hull
Hi, There's no Mongolian stemmer in Snowball, the stemmer project Lucene uses. I found one paper discussing how one might lemmatize Mongolian: https://www.researchgate.net/publication/220229332_A_lemmatization_method_for_Mongolian_and_its_application_to_indexing_for_information_retrieval