Re: Cores and and ranking (search quality)
SOLR-1632 will certainly help. But trying to predict whether your core A or core B will appear first doesn't really seem like a good use of time. If you actually have a setup like you describe, add debug=all to your query on both cores and you'll see all the gory detail of how the scores are calculated, providing a definitive answer in _your_ situation. Best, Erick On Mon, Mar 9, 2015 at 5:44 AM, johnmu...@aol.com wrote: (reposing this to see if anyone can help) Help me understand this better (regarding ranking). If I have two docs that are 100% identical with the exception of uid (which is stored but not indexed). In a single core setup, if I search xyz such that those 2 docs end up ranking as #1 and #2. When I switch over to two core setup, doc-A goes to core-A (which has 10 records) and doc-B goes to core-B (which has 100,000 records). Now, are you saying in 2 core setup if I search on xyz (just like in singe core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking? That is, are you saying doc-A may now be somewhere at the top / bottom far away from doc-B? If so, which will be #1: the doc off core-A (that has 10 records) or doc-B off core-B (that has 100,000 records)? If I got all this right, are you saying SOLR-1632 will fix this issue such that the end result will now be as if I had 1 core? - MJ -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Thursday, March 5, 2015 9:06 AM To: solr-user@lucene.apache.org Subject: Re: Cores and and ranking (search quality) On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote: My question is this: if I put my data in multiple cores and use distributed search will the ranking be different if I had all my data in a single core? Yes, it will be different. The practical impact depends on how homogeneous your data are across the shards and how large your shards are. If you have small and dissimilar shards, your ranking will suffer a lot. Work is being done to remedy this: https://issues.apache.org/jira/browse/SOLR-1632 Also, will facet and more-like-this quality / result be the same? It is not formally guaranteed, but for most practical purposes, faceting on multi-shards will give you the same results as single-shards. I don't know about more-like-this. My guess is that it will be affected in the same way that standard searches are. Also, reading the distributed search wiki (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the search and result merging (all I have to do is issue a search), is this correct? Yes. From a user-perspective, searches are no different. - Toke Eskildsen, State and University Library, Denmark
Re: Cores and and ranking (search quality)
Thanks Walter. This explains a lot. - MJ -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Tuesday, March 10, 2015 4:41 PM To: solr-user@lucene.apache.org Subject: Re: Cores and and ranking (search quality) If the documents are distributed randomly across shards/cores, then the statistics will be similar in each core and the results will be similar. If the documents are distributed semantically (say, by topic or type), the statistics of each core will be skewed towards that set of documents and the results could be quite different. Assume I have tech support documents and I put all the LaserJet docs in one core. That term is very common in that core (poor idf) and rare in other cores (strong idf). But for the query “laserjet”, all the good answers are in the LaserJet-specific core, where they will be scored low. An identical document that mentions “LaserJet” once will score fairly low in the LaserJet-specific collection and fairly high in the other collection. Global IDF fixes this, by using corpus-wide statistics. That’s how we ran Infoseek and Ultraseek in the late 1990’s. Random allocation to cores avoids it. If you have significant traffic directed to one object type AND you need peak performance, you may want to segregate your cores by object type. Otherwise, I’d let SolrCloud spread them around randomly and filter based on an object type field. That should work well for most purposes. Any core with less than 1000 records is likely to give somewhat mysterious results. A word that is common in English, like “next”, will only be in one document and will score too high. A less-common word, like “unreasonably”, will be in 20 and will score low. You need lots of docs for the language statistics to even out. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Mar 10, 2015, at 1:23 PM, johnmu...@aol.com wrote: Thanks Walter. The design decision I'm trying to solve is this: using multiple cores, will my ranking be impacted vs. using single core? I have records to index and each record can be grouped into object-types, such as object-A, object-B, object-C, etc. I have a total of 30 (maybe more) object-types. There may be only 10 records of object-A, but 10 million records of object-B or 1 million of object-C, etc. I need to be able to search against a single object-type and / or across all object-types. From my past experience, in a single core setup, if I have two identical records, and I search on the term XYZ that matches one of the records, the second record ranks right next to the other (because it too contains XYZ). This is good and is the expected behavior. If I want to limit my search to an object-type, I AND XYZ with that object-type. So all is well. What I'm considering to do for my new design is use multi-cores and distributed search. I am considering to create a core for each object-type: core-A will hold records from object-A, core-B will hold records from object-B, etc. Before I can make a decision on this design, I need to know how ranking will be impacted. Going back to my earlier example: if I have 2 identical records, one of them went to core-A which has 10 records, and the other went to core-B which has 10 million records, using distributed search, if I now search across all cores on the term XYZ (just like in the single core case), it will match both of those records all right, but will those two records be ranked next to each other just like in the single core case? If not, which will rank higher, the one from core-A or the one from core-B? My concern is, using multi-cores and distributed search means I will give up on rank quality when records are not distributed across cores evenly. If so, than maybe this is not a design I can use. - MJ -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Tuesday, March 10, 2015 2:39 PM To: solr-user@lucene.apache.org Subject: Re: Cores and and ranking (search quality) On Mar 10, 2015, at 10:17 AM, johnmu...@aol.com wrote: If I have two cores, one core has 10 docs another has 100,000 docs. I then submit two docs that are 100% identical (with the exception of the unique-ID fields, which is stored but not indexed) one to each core. The question is, during search, will both of those docs rank near each other or not? […] Put another way: are docs from the smaller core (the one has 10 docs only) rank higher or lower compared to docs from the larger core (the one with 100,000) docs? These are not quite the same question. tf.idf ranking depends on the other documents in the collection (the idf term). With 10 docs, the document frequency statistics are effectively random noise, so the ranking is unpredictable. Identical documents should rank identically, but whether they are higher or lower in the two
Re: Cores and and ranking (search quality)
On 3/10/2015 11:17 AM, johnmu...@aol.com wrote: If I have two cores, one core has 10 docs another has 100,000 docs. I then submit two docs that are 100% identical (with the exception of the unique-ID fields, which is stored but not indexed) one to each core. The question is, during search, will both of those docs rank near each other or not? If so, this is great because it will behave the same as if I had one core and index both docs to this single core. If not, which core's doc will rank higher and how far apart the two docs be from each other in the ranking? Put another way: are docs from the smaller core (the one has 10 docs only) rank higher or lower compared to docs from the larger core (the one with 100,000) docs? Without specific knowledge about the document in question as well as all the other documents, this is impossible to answer, except to say that the relative ranking position is likely to be different. Dropping back to general info: The overall term frequency and inverse document frequency (TF-IDF) in the 100,000 document index will very likely be quite a lot different than in the 10 document index. That will affect ranking order. Sometimes users are surprised by the results they get, but it is very rare to find a bug in Lucene scoring. In addition to the debug parameter that Erick told you about, here are a couple of classes you could investigate at the source code level for more information about ranking: http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/Similarity.html http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/DefaultSimilarity.html Here's info that is more general, and from a much earlier Lucene version: https://lucene.apache.org/core/3_6_2/scoring.html I have my Solr install configured to use the BM25 similarity. http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/BM25Similarity.html http://en.wikipedia.org/wiki/Okapi_BM25 SOLR-1632 aims to make TF-IDF the same across multiple cores as you would get if you only had one core. I do not know enough about it to know whether it is EXACTLY the same, or only an approximation ... but in a search context, 100 percent precise calculation is rarely required. When you drop that as a requirement, search becomes easier and a LOT faster. Thanks, Shawn
Re: Cores and and ranking (search quality)
Thanks Erick for trying to help, I really appreciate it. Unfortunately, I'm still stuck. There are times one must know the inner working and behavior of the software to make design decision and this one is one of them. If I know the inner working of Solr, I would not be asking. In addition, I'm in the design process, so I'm not able to fully test. Beside my test could be invalid because I may not set it up right due to my lack of understanding the inner working of Solr. Given this, I hope you don't mind me asking again. If I have two cores, one core has 10 docs another has 100,000 docs. I then submit two docs that are 100% identical (with the exception of the unique-ID fields, which is stored but not indexed) one to each core. The question is, during search, will both of those docs rank near each other or not? If so, this is great because it will behave the same as if I had one core and index both docs to this single core. If not, which core's doc will rank higher and how far apart the two docs be from each other in the ranking? Put another way: are docs from the smaller core (the one has 10 docs only) rank higher or lower compared to docs from the larger core (the one with 100,000) docs? Thanks! -- MJ -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, March 10, 2015 11:47 AM To: solr-user@lucene.apache.org Subject: Re: Cores and and ranking (search quality) SOLR-1632 will certainly help. But trying to predict whether your core A or core B will appear first doesn't really seem like a good use of time. If you actually have a setup like you describe, add debug=all to your query on both cores and you'll see all the gory detail of how the scores are calculated, providing a definitive answer in _your_ situation. Best, Erick On Mon, Mar 9, 2015 at 5:44 AM, johnmu...@aol.com wrote: (reposing this to see if anyone can help) Help me understand this better (regarding ranking). If I have two docs that are 100% identical with the exception of uid (which is stored but not indexed). In a single core setup, if I search xyz such that those 2 docs end up ranking as #1 and #2. When I switch over to two core setup, doc-A goes to core-A (which has 10 records) and doc-B goes to core-B (which has 100,000 records). Now, are you saying in 2 core setup if I search on xyz (just like in singe core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking? That is, are you saying doc-A may now be somewhere at the top / bottom far away from doc-B? If so, which will be #1: the doc off core-A (that has 10 records) or doc-B off core-B (that has 100,000 records)? If I got all this right, are you saying SOLR-1632 will fix this issue such that the end result will now be as if I had 1 core? - MJ -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Thursday, March 5, 2015 9:06 AM To: solr-user@lucene.apache.org Subject: Re: Cores and and ranking (search quality) On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote: My question is this: if I put my data in multiple cores and use distributed search will the ranking be different if I had all my data in a single core? Yes, it will be different. The practical impact depends on how homogeneous your data are across the shards and how large your shards are. If you have small and dissimilar shards, your ranking will suffer a lot. Work is being done to remedy this: https://issues.apache.org/jira/browse/SOLR-1632 Also, will facet and more-like-this quality / result be the same? It is not formally guaranteed, but for most practical purposes, faceting on multi-shards will give you the same results as single-shards. I don't know about more-like-this. My guess is that it will be affected in the same way that standard searches are. Also, reading the distributed search wiki (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the search and result merging (all I have to do is issue a search), is this correct? Yes. From a user-perspective, searches are no different. - Toke Eskildsen, State and University Library, Denmark
Re: Cores and and ranking (search quality)
Thanks Walter. The design decision I'm trying to solve is this: using multiple cores, will my ranking be impacted vs. using single core? I have records to index and each record can be grouped into object-types, such as object-A, object-B, object-C, etc. I have a total of 30 (maybe more) object-types. There may be only 10 records of object-A, but 10 million records of object-B or 1 million of object-C, etc. I need to be able to search against a single object-type and / or across all object-types. From my past experience, in a single core setup, if I have two identical records, and I search on the term XYZ that matches one of the records, the second record ranks right next to the other (because it too contains XYZ). This is good and is the expected behavior. If I want to limit my search to an object-type, I AND XYZ with that object-type. So all is well. What I'm considering to do for my new design is use multi-cores and distributed search. I am considering to create a core for each object-type: core-A will hold records from object-A, core-B will hold records from object-B, etc. Before I can make a decision on this design, I need to know how ranking will be impacted. Going back to my earlier example: if I have 2 identical records, one of them went to core-A which has 10 records, and the other went to core-B which has 10 million records, using distributed search, if I now search across all cores on the term XYZ (just like in the single core case), it will match both of those records all right, but will those two records be ranked next to each other just like in the single core case? If not, which will rank higher, the one from core-A or the one from core-B? My concern is, using multi-cores and distributed search means I will give up on rank quality when records are not distributed across cores evenly. If so, than maybe this is not a design I can use. - MJ -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Tuesday, March 10, 2015 2:39 PM To: solr-user@lucene.apache.org Subject: Re: Cores and and ranking (search quality) On Mar 10, 2015, at 10:17 AM, johnmu...@aol.com wrote: If I have two cores, one core has 10 docs another has 100,000 docs. I then submit two docs that are 100% identical (with the exception of the unique-ID fields, which is stored but not indexed) one to each core. The question is, during search, will both of those docs rank near each other or not? […] Put another way: are docs from the smaller core (the one has 10 docs only) rank higher or lower compared to docs from the larger core (the one with 100,000) docs? These are not quite the same question. tf.idf ranking depends on the other documents in the collection (the idf term). With 10 docs, the document frequency statistics are effectively random noise, so the ranking is unpredictable. Identical documents should rank identically, but whether they are higher or lower in the two cores depends on the rest of the docs. idf statistics don’t settle down until at least 10K docs. You still sometimes see anomalies under a million documents. What design decision do you need to make? We can probably answer that for you. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: Cores and and ranking (search quality)
If the documents are distributed randomly across shards/cores, then the statistics will be similar in each core and the results will be similar. If the documents are distributed semantically (say, by topic or type), the statistics of each core will be skewed towards that set of documents and the results could be quite different. Assume I have tech support documents and I put all the LaserJet docs in one core. That term is very common in that core (poor idf) and rare in other cores (strong idf). But for the query “laserjet”, all the good answers are in the LaserJet-specific core, where they will be scored low. An identical document that mentions “LaserJet” once will score fairly low in the LaserJet-specific collection and fairly high in the other collection. Global IDF fixes this, by using corpus-wide statistics. That’s how we ran Infoseek and Ultraseek in the late 1990’s. Random allocation to cores avoids it. If you have significant traffic directed to one object type AND you need peak performance, you may want to segregate your cores by object type. Otherwise, I’d let SolrCloud spread them around randomly and filter based on an object type field. That should work well for most purposes. Any core with less than 1000 records is likely to give somewhat mysterious results. A word that is common in English, like “next”, will only be in one document and will score too high. A less-common word, like “unreasonably”, will be in 20 and will score low. You need lots of docs for the language statistics to even out. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Mar 10, 2015, at 1:23 PM, johnmu...@aol.com wrote: Thanks Walter. The design decision I'm trying to solve is this: using multiple cores, will my ranking be impacted vs. using single core? I have records to index and each record can be grouped into object-types, such as object-A, object-B, object-C, etc. I have a total of 30 (maybe more) object-types. There may be only 10 records of object-A, but 10 million records of object-B or 1 million of object-C, etc. I need to be able to search against a single object-type and / or across all object-types. From my past experience, in a single core setup, if I have two identical records, and I search on the term XYZ that matches one of the records, the second record ranks right next to the other (because it too contains XYZ). This is good and is the expected behavior. If I want to limit my search to an object-type, I AND XYZ with that object-type. So all is well. What I'm considering to do for my new design is use multi-cores and distributed search. I am considering to create a core for each object-type: core-A will hold records from object-A, core-B will hold records from object-B, etc. Before I can make a decision on this design, I need to know how ranking will be impacted. Going back to my earlier example: if I have 2 identical records, one of them went to core-A which has 10 records, and the other went to core-B which has 10 million records, using distributed search, if I now search across all cores on the term XYZ (just like in the single core case), it will match both of those records all right, but will those two records be ranked next to each other just like in the single core case? If not, which will rank higher, the one from core-A or the one from core-B? My concern is, using multi-cores and distributed search means I will give up on rank quality when records are not distributed across cores evenly. If so, than maybe this is not a design I can use. - MJ -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Tuesday, March 10, 2015 2:39 PM To: solr-user@lucene.apache.org Subject: Re: Cores and and ranking (search quality) On Mar 10, 2015, at 10:17 AM, johnmu...@aol.com wrote: If I have two cores, one core has 10 docs another has 100,000 docs. I then submit two docs that are 100% identical (with the exception of the unique-ID fields, which is stored but not indexed) one to each core. The question is, during search, will both of those docs rank near each other or not? […] Put another way: are docs from the smaller core (the one has 10 docs only) rank higher or lower compared to docs from the larger core (the one with 100,000) docs? These are not quite the same question. tf.idf ranking depends on the other documents in the collection (the idf term). With 10 docs, the document frequency statistics are effectively random noise, so the ranking is unpredictable. Identical documents should rank identically, but whether they are higher or lower in the two cores depends on the rest of the docs. idf statistics don’t settle down until at least 10K docs. You still sometimes see anomalies under a million documents. What design decision do you need to make? We can probably answer that for you
Re: Cores and and ranking (search quality)
On Mar 10, 2015, at 10:17 AM, johnmu...@aol.com wrote: If I have two cores, one core has 10 docs another has 100,000 docs. I then submit two docs that are 100% identical (with the exception of the unique-ID fields, which is stored but not indexed) one to each core. The question is, during search, will both of those docs rank near each other or not? […] Put another way: are docs from the smaller core (the one has 10 docs only) rank higher or lower compared to docs from the larger core (the one with 100,000) docs? These are not quite the same question. tf.idf ranking depends on the other documents in the collection (the idf term). With 10 docs, the document frequency statistics are effectively random noise, so the ranking is unpredictable. Identical documents should rank identically, but whether they are higher or lower in the two cores depends on the rest of the docs. idf statistics don’t settle down until at least 10K docs. You still sometimes see anomalies under a million documents. What design decision do you need to make? We can probably answer that for you. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: Cores and and ranking (search quality)
(reposing this to see if anyone can help) Help me understand this better (regarding ranking). If I have two docs that are 100% identical with the exception of uid (which is stored but not indexed). In a single core setup, if I search xyz such that those 2 docs end up ranking as #1 and #2. When I switch over to two core setup, doc-A goes to core-A (which has 10 records) and doc-B goes to core-B (which has 100,000 records). Now, are you saying in 2 core setup if I search on xyz (just like in singe core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking? That is, are you saying doc-A may now be somewhere at the top / bottom far away from doc-B? If so, which will be #1: the doc off core-A (that has 10 records) or doc-B off core-B (that has 100,000 records)? If I got all this right, are you saying SOLR-1632 will fix this issue such that the end result will now be as if I had 1 core? - MJ -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Thursday, March 5, 2015 9:06 AM To: solr-user@lucene.apache.org Subject: Re: Cores and and ranking (search quality) On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote: My question is this: if I put my data in multiple cores and use distributed search will the ranking be different if I had all my data in a single core? Yes, it will be different. The practical impact depends on how homogeneous your data are across the shards and how large your shards are. If you have small and dissimilar shards, your ranking will suffer a lot. Work is being done to remedy this: https://issues.apache.org/jira/browse/SOLR-1632 Also, will facet and more-like-this quality / result be the same? It is not formally guaranteed, but for most practical purposes, faceting on multi-shards will give you the same results as single-shards. I don't know about more-like-this. My guess is that it will be affected in the same way that standard searches are. Also, reading the distributed search wiki (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the search and result merging (all I have to do is issue a search), is this correct? Yes. From a user-perspective, searches are no different. - Toke Eskildsen, State and University Library, Denmark
RE: Cores and and ranking (search quality)
Help me understand this better (regarding ranking). If I have two docs that are 100% identical with the exception of uid (which is stored but not indexed). In a single core setup, if I search xyz such that those 2 docs end up ranking as #1 and #2. When I switch over to two core setup, doc-A goes to core-A (which has 10 records) and doc-B goes to core-B (which has 100,000 records). Now, are you saying in 2 core setup if I search on xyz (just like in singe core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking? That is, are you saying doc-A may now be somewhere at the top / bottom far away from doc-B? If so, which will be #1: the doc off core-A (that has 10 records) or doc-B off core-B (that has 100,000 records)? If I got all this right, are you saying SOLR-1632 will fix this issue such that the end result will now be as if I had 1 core? - MJ -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Thursday, March 5, 2015 9:06 AM To: solr-user@lucene.apache.org Subject: Re: Cores and and ranking (search quality) On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote: My question is this: if I put my data in multiple cores and use distributed search will the ranking be different if I had all my data in a single core? Yes, it will be different. The practical impact depends on how homogeneous your data are across the shards and how large your shards are. If you have small and dissimilar shards, your ranking will suffer a lot. Work is being done to remedy this: https://issues.apache.org/jira/browse/SOLR-1632 Also, will facet and more-like-this quality / result be the same? It is not formally guaranteed, but for most practical purposes, faceting on multi-shards will give you the same results as single-shards. I don't know about more-like-this. My guess is that it will be affected in the same way that standard searches are. Also, reading the distributed search wiki (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the search and result merging (all I have to do is issue a search), is this correct? Yes. From a user-perspective, searches are no different. - Toke Eskildsen, State and University Library, Denmark
RE: Cores and and ranking (search quality)
Hello - facetting will be the same and distributed more like this is also possible since 5.0, and there is a working patch for 4.10.3. Regular search will work as well since 5.0 because of distributed IDF, which you need to enable manually. Behaviour will not be the same if you rely on average document length statistics, which is true when you use BM25 instead of the default TFIDF similarity. Solr will do the result merging so everything is transparent, awesome! Markus -Original message- From:johnmu...@aol.com johnmu...@aol.com Sent: Thursday 5th March 2015 14:38 To: solr-user@lucene.apache.org Subject: Cores and and ranking (search quality) Hi, I have data in which I will index and search on. This data is well define such that I can index into a single core or multiple cores like so: core_1:Jan2015, core_2:Feb2015, core_3:Mar2015, etc. My question is this: if I put my data in multiple cores and use distributed search will the ranking be different if I had all my data in a single core? If yes, how will it be different? Also, will facet and more-like-this quality / result be the same? Also, reading the distributed search wiki (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the search and result merging (all I have to do is issue a search), is this correct? Thanks! - MJ
Re: Cores and and ranking (search quality)
On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote: My question is this: if I put my data in multiple cores and use distributed search will the ranking be different if I had all my data in a single core? Yes, it will be different. The practical impact depends on how homogeneous your data are across the shards and how large your shards are. If you have small and dissimilar shards, your ranking will suffer a lot. Work is being done to remedy this: https://issues.apache.org/jira/browse/SOLR-1632 Also, will facet and more-like-this quality / result be the same? It is not formally guaranteed, but for most practical purposes, faceting on multi-shards will give you the same results as single-shards. I don't know about more-like-this. My guess is that it will be affected in the same way that standard searches are. Also, reading the distributed search wiki (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the search and result merging (all I have to do is issue a search), is this correct? Yes. From a user-perspective, searches are no different. - Toke Eskildsen, State and University Library, Denmark
Cores and and ranking (search quality)
Hi, I have data in which I will index and search on. This data is well define such that I can index into a single core or multiple cores like so: core_1:Jan2015, core_2:Feb2015, core_3:Mar2015, etc. My question is this: if I put my data in multiple cores and use distributed search will the ranking be different if I had all my data in a single core? If yes, how will it be different? Also, will facet and more-like-this quality / result be the same? Also, reading the distributed search wiki (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the search and result merging (all I have to do is issue a search), is this correct? Thanks! - MJ