Solr and commits

2020-08-12 Thread Jayadevan Maymala
Hi all, A few doubts about commits. 1)If no commit parameters are passed from a client (solarium) update, will the autoSoftCommit values automatically work? 2) When we are not committing from the client, when will the data actually be flushed to disk? Regards, Jayadevan

SPLITSHARD failed after running for hours

2020-08-12 Thread sanjay dutt
Hello Solr community, We tried to split shard of one collection which contains 80M documents. After running for few hours it failed with the exception org.apache.solr.common.SolrException. Upon further investigation, I found below exception Caused by:

Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
There may be other ways, easiest way is to write a script that gets the cluster status, and for each collection per replica you will have these details: "collections":{ “collection1":{ "pullReplicas":"0", "replicationFactor":"1", "shards":{ "shard1":{

Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
Glad u nailed the out of sync one :) > On Aug 12, 2020, at 4:38 PM, Jae Joo wrote: > > I found it the root cause. I have 3 collections assigned to a alias and one > of them are NOT synched. > By the alias. > > > > > > > > > > > > Collection 1 > > > > > > > > > > > >

Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo
I found it the root cause. I have 3 collections assigned to a alias and one of them are NOT synched. By the alias. Collection 1 Collection 2 Collection 3 On Wed, Aug 12, 2020 at 7:29 PM Jae Joo wrote: > Good question. How can I validate if the replicas

Re: Multiple Collections in a Alias.

2020-08-12 Thread Walter Underwood
Different absolute scores from different collections are OK, because the exact values depend on the number of deleted documents. For the set of documents that are in different orders from different collections, are the scores of that set identical? If they are, then it is normal to have a

Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo
Good question. How can I validate if the replicas are all synched? On Wed, Aug 12, 2020 at 7:28 PM Jae Joo wrote: > numFound is same but different score. > > > > > > > > > On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly > wrote: > >> Try a simple test of querying each collection 5 times

Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo
numFound is same but different score. On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly wrote: > Try a simple test of querying each collection 5 times in a row, if the > numFound are different for a single collection within tase 5 calls then u > have it. > Please try it, what you may think

Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
Try a simple test of querying each collection 5 times in a row, if the numFound are different for a single collection within tase 5 calls then u have it. Please try it, what you may think is sync’d may actually not be. How do you validate correct sync ? > On Aug 12, 2020, at 10:55 AM, Jae Joo

Re: Multiple Collections in a Alias.

2020-08-12 Thread Walter Underwood
Are the scores the same for the documents that are ordered differently? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 12, 2020, at 10:55 AM, Jae Joo wrote: > > The replications are all synched and there are no updates while I was > testing. >

Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo
The replications are all synched and there are no updates while I was testing. On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly wrote: > Most likely you have 1 or more collections behind the alias that have > replicas out of sync :) > > Try querying each collection to find the one out of sync. >

Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly
Most likely you have 1 or more collections behind the alias that have replicas out of sync :) Try querying each collection to find the one out of sync. > On Aug 12, 2020, at 10:47 AM, Jae Joo wrote: > > I have 10 collections in single alias and having different result sets for > every time

Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo
I have 10 collections in single alias and having different result sets for every time with the same query. Is it as designed or do I miss something? The configuration and schema for all 10 collections are identical. Thanks, Jae

Managing leaders when recycling a cluster

2020-08-12 Thread Adam Woods
Hi, We've just recently gone through the process of upgrading Solr the 8.6 and have implemented an automated rolling update mechanism to allow us to more easily make changes to our cluster in the future. Our process for this looks like this: 1. Cluster has 3 nodes. 2. Scale out to 6 nodes. 3.

Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Norbert Kutasi
The version it's working on is 8.5! On Wed, 12 Aug 2020 at 17:16, Norbert Kutasi wrote: > I see what you mean, however the request results in cartesian products , > because of subordinate.q=*:* : > > http://localhost:8981/solr/Collection1/query?q=*=*,subordinate:[subquery]=*:*=*=Collection2 >

Number of times in document

2020-08-12 Thread David Hastings
Is there any way to do a query for the minimum number of times a phrase or string exists in a document? This has been a request from some users as other search services (names not to be mentioned) have such a functionality. Ive been using solr since 1.4 and i think ive tried finding this ability

Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Norbert Kutasi
I see what you mean, however the request results in cartesian products , because of subordinate.q=*:* : http://localhost:8981/solr/Collection1/query?q=*=*,subordinate:[subquery]=*:*=*=Collection2 { "responseHeader":{ "zkConnected":true, "status":0, "QTime":0, "params":{

Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Erick Erickson
This works from a browser: http://localhost:8981/solr/Collection1/query?q=*=*,subordinate:[subquery]=*:*=*=Collection2 One problem you’re having is that “fromIndex” is a _core_ not a collection. See: https://lucene.apache.org/solr/guide/8_2/transforming-result-documents.html It’s vaguely

Re: Managing leaders when recycling a cluster

2020-08-12 Thread Erick Erickson
There’s no particular need to do this unless you have a very large number of leaders on a single node. That functionality was added for a special case where there were 100s of leaders on the same node. The fact that a leader happens to be on a node that’s going away shouldn’t matter at all; as

Re: Multiple "df" fields

2020-08-12 Thread Erick Erickson
Probably a typo but I think you mean qf rather than pf? They’re both actually valid, but pf is “phrase field” which will give different results…. Best, Erick > On Aug 12, 2020, at 5:26 AM, Edward Turner wrote: > > Many thanks for your suggestions. > > We do use edismax and bq fields to

Re: Solr 8.3.1 - NullPointer during Autoscaling

2020-08-12 Thread Erick Erickson
Yeah, unfortunately I don’t have much do offer when it comes to autoscaling…. > On Aug 12, 2020, at 8:09 AM, Anton Pfennig wrote: > > Hi Erick, > > thx! > the idea behind is to have a dedicated Kubernetes deployment for each > collection. So e.g. if I need more solr nodes for particular

Re: Solr 8.3.1 - NullPointer during Autoscaling

2020-08-12 Thread Anton Pfennig
Hi Erick, thx! the idea behind is to have a dedicated Kubernetes deployment for each collection. So e.g. if I need more solr nodes for particular collection I would just scale the Kubernetes deployment and solr should automatically add new replicas to these new nodes. does it makes sense?

Re: Solr 8.3.1 - NullPointer during Autoscaling

2020-08-12 Thread Erick Erickson
Autoscaling may be overkill, Is this a one-time thing or something you need automated? Because for a one-time event, it’s simpler/faster just to specify createNodeSet with the CREATE command that lists the new machines you want the collection to be placed on. Note that there’s the special value

Solr 8.3.1 - NullPointer during Autoscaling

2020-08-12 Thread Anton Pfennig
Hi guys, in my solr setup as SolrCloud (8.3.1) I’m using 5 nodes for one collection (say “collection1”, one on each node. Now I would like to add a new collection on the same solr cluster but additionally the new collection (say “collection2”) should be only replicated on only nodes with

Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Norbert Kutasi
Hi Dominique, Sorry, I was in a hurry to create a simple enough yet similar case that we face with internally. reporting_to indeed is the right field , but the same error still persists, something is seemingly wrong when invoking the *subquery *with *fromIndex* { params: { q: "*", fq:

Re: [Subquery] Transform Documents across Collections

2020-08-12 Thread Dominique Bejean
Hi Norbert, The field name in collection2 is "reporting_to" not "reporting". Dominique Le mer. 12 août 2020 à 11:59, Norbert Kutasi a écrit : > Hello, > > We have been using [subquery] to come up with arbitrary complex hierarchies > in our document responses. > > It works well as long as

[Subquery] Transform Documents across Collections

2020-08-12 Thread Norbert Kutasi
Hello, We have been using [subquery] to come up with arbitrary complex hierarchies in our document responses. It works well as long as the documents are in the same collection however based on the reference guide I infer it can bring in documents from different collections except it throws an

Re: Multiple "df" fields

2020-08-12 Thread Edward Turner
Many thanks for your suggestions. We do use edismax and bq fields to help with our result ranking, but we'd never thought about using it for this purpose (we were stuck on the copyfield pattern + df pattern). This is a good suggestion though thank you. We're now exploring the use of the pf field

Managing leaders when recycling a cluster

2020-08-12 Thread Adam Woods
Hi, We've just recently gone through the process of upgrading Solr the 8.6 and have implemented an automated rolling update mechanism to allow us to more easily make changes to our cluster in the future. Our process for this looks like this: 1. Cluster has 3 nodes. 2. Scale out to 6 nodes. 3.