Solr NRT Replicas Out of Sync

2021-03-03 Thread Anshuman Singh
Hi, In our Solr 7.4 cluster, we have noticed that some replicas of some of our Collections are out of sync, the slave replica has more number of records than the leader. This is resulting in different number of records on subsequent queries on the same Collection. Commit is also not helping in

Re: How to Prevent Recovery?

2020-09-08 Thread Anshuman Singh
ler-19-thread-1) [ x:###] > o.a.s.u.LoggingInfoStream [MS][commitScheduler-19-thread-1]: too many > merges; stalling... > 2020-05-03 16:31:31.402 INFO (Lucene Merge Thread #55) [ x:###] > o.a.s.u.LoggingInfoStream [SM][Lucene Merge Thread #55]: 1291879 msec to > merge do

Re: How to Prevent Recovery?

2020-08-30 Thread Anshuman Singh
, you could reduce some of the disk pressure if you can > put your > tlogs on another drive, don’t know if that’s possible. Ditto the Solr logs. > > Beyond that, it may be a matter of increasing the hardware. You’re really > indexing > 120K records second ((1 leader + 2 followers) * 40K)

How to Prevent Recovery?

2020-08-25 Thread Anshuman Singh
Hi, We have a 10 node (150G RAM, 1TB SAS HDD, 32 cores) Solr 8.5.1 cluster with 50 shards, rf 2 (NRT replicas), 7B docs, We have 5 Zk with 2 running on the same nodes where Solr is running. Our use case requires continuous ingestions (updates mostly). If we ingest at 40k records per sec, after

Re: Replicas in Recovery During Atomic Updates

2020-08-19 Thread Anshuman Singh
n10/ => org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://x.x.x.24:8983/solr/collection_4_shard3_replica_n10/: null On Tue, Aug 11, 2020 at 2:08 AM Anshuman Singh wrote: > Just to give you an idea, this is how we are ingesting: > > {&

Re: Replicas in Recovery During Atomic Updates

2020-08-10 Thread Anshuman Singh
What are your settings for hard/soft commit? > > For the shared going to recovery - do you have a log entry or something ? > > What is the Solr version? > > How do you setup ZK? > > > Am 10.08.2020 um 16:24 schrieb Anshuman Singh >: > > > > Hi, > > > &g

Replicas in Recovery During Atomic Updates

2020-08-10 Thread Anshuman Singh
Hi, We have a SolrCloud cluster with 10 nodes. We have 6B records ingested in the Collection. Our use case requires atomic updates ("inc") on 5 fields. Now almost 90% documents are atomic updates and as soon as we start our ingestion pipelines, multiple shards start going into recovery, sometimes

Ext4 or XFS

2020-08-05 Thread Anshuman Singh
Hi, Which file system would be better for Solr, ext4 or XFS? Regards, Anshuman

Custom Snitch for Rack Awareness

2020-07-31 Thread Anshuman Singh
Hi, I'm using Solr-7.4 and I want to create collections in my cluster such that no two replicas should be assigned to the same Rack. I read about Rule-based Replica Placement https://lucene.apache.org/solr/guide/7_4/rule-based-replica-placement.html. What I got is I have to create a tag/snitch

Case insensitive search on String field

2020-07-25 Thread Anshuman Singh
Hi, We missed the fact that case insensitive search doesn't work with field type "string". We have 3B docs indexed and we cannot reindex the data. Now, as schema changes require reindexing, is there any other way to achieve case insensitive search on string fields? Regards, Anshuman

Solr Backup/Restore

2020-07-21 Thread Anshuman Singh
Hi, I'm using Solr-7.4.0 and I want to export 4TB of data from our current Solr cluster to a different cluster. The new cluster has twice the number of nodes than the current cluster and I want data to be distributed among all the nodes. Is this possible with the Backup/Restore feature

Prevent Re-indexing if Doc Fields are Same

2020-06-26 Thread Anshuman Singh
I was reading about in-place updates https://lucene.apache.org/solr/guide/7_4/updating-parts-of-documents.html, In my use case I have to update the field "LASTUPDATETIME", all other fields are same. Updates are very frequent and I can't bear the cost of deleted docs. If I provide all the fields,

Replicas going into recovery

2020-06-11 Thread Anshuman Singh
We are running a test case, ingesting 2B records in a collection in 24 hrs. This collection is spread across 10 solr nodes with a replication factor of 2. We are noticing many replicas going into recovery while indexing. And it is degrading indexing performance. We are observing errors like:

Re: Limit Solr Disk IO

2020-06-07 Thread Anshuman Singh
mportant portions of your index. If the OS isn’t large > enough, the additional I/O pressure from merging may be enough to start > your system swapping which is A Bad Thing. > > See: > https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > for how Lucene uses MMapDir

Re: Limit Solr Disk IO

2020-06-06 Thread Anshuman Singh
ch one becomes 10G. What happens when the 11th > segment is created and it’s 100M? Do you rewrite one of the 10G segments > just > to add 100M? Your problem gets worse, not better. > > > Best, > Erick > > > On Jun 5, 2020, at 1:41 AM, Anshuman Singh > wrote: > &g

Re: Limit Solr Disk IO

2020-06-04 Thread Anshuman Singh
d the > updates. > And yes, that requires reading the old segment. > It is common to allow multiple segments when you update often, > so updating does not interfere with reading the index too often. > > > > On 4 Jun 2020, at 14:08, Anshuman Singh > wrote: > > > &

Limit Solr Disk IO

2020-06-04 Thread Anshuman Singh
I noticed that while indexing, when commit happens, there is high disk read by Solr. The problem is that it is impacting search performance when the index is loaded from the disk with respect to the query, as the disk read speed is not quite good and the whole index is not cached in RAM. When no

Re: Solr multi core query too slow

2020-05-30 Thread Anshuman Singh
a bunch of them > in order to not get fooled by hitting, say, your queryResultCache. I > had one client who “stress tested” with the same query and was > getting 3ms response times because, after the first one, they never > needed to do any searching at all, everything was

Re: Solr multi core query too slow

2020-05-29 Thread Anshuman Singh
ur use-case requires 100K rows, you should be using streaming or > cursorMark. While that won’t make the end-to-end time shorter, but won’t > put such a strain on the system. > > Best, > Erick > > > On May 27, 2020, at 10:38 AM, Anshuman Singh > wrote: > > > >

Solr multi core query too slow

2020-05-27 Thread Anshuman Singh
I have a Solr cloud setup (Solr 7.4) with a collection "test" having two shards on two different nodes. There are 4M records equally distributed across the shards. If I query the collection like below, it is slow. http://localhost:8983/solr/*test*/select?q=*:*=10 QTime: 6930 If I query a

Why Solr query time is more in case the searched value frequency is more even if no sorting is applied, for the same number of rows?

2020-05-11 Thread Anshuman Singh
Suppose I have two phone numbers P1 and P2 and the number of records with P1 are X and with P2 are 2X (2 times X) respectively. If I query for R rows for P1 and P2, the QTime in case of P2 is more. I am not specifying any sort parameter and the number of rows I'm asking for is same in both the