Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Erick Erickson
bq: shouldn't the two replicas have the same number of deletions

Not necessarily. We're back to the fact that commits on the replicas in
a single shard fire at different wall clock times. Plus, when segments
are merged, the deleted docs are purged. So it's quite common that
two replicas in the same shard do _not_ have the same deleted doc
count and will also have different maxDoc counts.

The fact that they aren't showing the same numDocs is the only part
of this that "shouldn't be happening"...

Best,
Erick

On Thu, Dec 15, 2016 at 11:41 AM, Webster Homer  wrote:
> Something I hadn't know until now. The source cdcr collection has 2 shards
> with 1 replica, our target cloud has 2 shards with 2 replicas
> Both Source and Target have indexes that are not current
>
> Also we have set all of our collections to ignore external commits
>
> On Thu, Dec 15, 2016 at 1:31 PM, Webster Homer 
> wrote:
>
>> Looking through our replicas I noticed that in one of our shards (each
>> shard has 2 replicas)
>> 1 replica shows:
>> "replicas": [
>>
>> {
>> "name": "core_node1",
>> "core": "sial-catalog-material_shard2_replica2",
>> "baseUrl": "http://ae1b-ecom-msc04:8983/solr;,
>> "nodeName": "ae1b-ecom-msc04:8983_solr",
>> "state": "active",
>> "leader": false,
>> "index":
>> {
>> "numDocs": 487123,
>> "maxDocs": 711973,
>> *"deletedDocs": 224850,*
>> "size": "331.96 MB",
>> "lastModified": "2016-12-08T11:10:05.969Z",
>> "current": false,
>> "version": 17933,
>> "segmentCount": 17
>> }
>> }
>> ,
>> while the second replica shows this:
>>
>> {
>> "name": "core_node3",
>> "core": "sial-catalog-material_shard2_replica1",
>> "baseUrl": "http://ae1b-ecom-msc02:8983/solr;,
>> "nodeName": "ae1b-ecom-msc02:8983_solr",
>> "state": "active",
>> "leader": true,
>> "index":
>> {
>> "numDocs": 487063,
>> "maxDocs": 487064,
>> "deletedDocs": 1,
>> "size": "224.83 MB",
>> "lastModified": "2016-12-08T11:10:02.625Z",
>> "current": false,
>> "version": 8208,
>> "segmentCount": 19
>> }
>> }
>> ],
>> I wrote a routine that uses the Collections API Info call and then for
>> each replica calls the Core API to get the information on the index
>>
>> shouldn't the two replicas have the same number of deletions?
>>
>> On Thu, Dec 15, 2016 at 12:36 PM, Webster Homer 
>> wrote:
>>
>>> I am trying to find the reported inconsistencies now.
>>>
>>> The timestamp I have was created by our ETL process, which may not be in
>>> exactly the same order as the indexing occurred
>>>
>>> When I tried to sort the results by _docid_ desc, solr through a 500
>>> error:
>>> { "responseHeader":{ "zkConnected":true, "status":500, "QTime":7, "params
>>> ":{ "q":"*:*", "indent":"on", "fl":"record_spec,s_id,pid,search_concat_pno,
>>> search_pno, search_user_term, search_lform, search_eform, search_acronym,
>>> search_synonyms, root_name, search_s_pri_name, search_p_pri_name,
>>> search_keywords, lookahead_search_terms, sortkey, search_rtecs,
>>> search_chem_comp, cas_number, search_component_cas, search_beilstein,
>>> search_color_idx, search_ecnumber, search_femanumber, search_isbn,
>>> search_mdl_number, search_descriptions, page_title,
>>> search_xref_comparable_pno, search_xref_comparable_sku,
>>> search_xref_equivalent_pno, search_xref_exact_pno create_date
>>> search_xref_exact_sku, score", "sort":"_docid_ desc", "rows":"20", "wt":
>>> "json", "_":"1481821047026"}}, "error":{ "msg":"Index: 1, Size: 0", "
>>> trace":"java.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat
>>> java.util.ArrayList.rangeCheck(ArrayList.java:653)\n\tat
>>> java.util.ArrayList.get(ArrayList.java:429)\n\tat
>>> org.apache.solr.common.util.NamedList.getVal(NamedList.java:174)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$S
>>> hardComparator.sortVal(ShardFieldSortedHitQueue.java:146)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
>>> 1.compare(ShardFieldSortedHitQueue.java:167)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
>>> 1.compare(ShardFieldSortedHitQueue.java:159)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l
>>> essThan(ShardFieldSortedHitQueue.java:91)\n\tat
>>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l
>>> essThan(ShardFieldSortedHitQueue.java:33)\n\tat
>>> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat
>>> org.apache.solr.handler.component.QueryComponent.mergeIds(
>>> QueryComponent.java:1098)\n\tat org.apache.solr.handler.compon
>>> ent.QueryComponent.handleRegularResponses(QueryComponent.java:758)\n\tat
>>> org.apache.solr.handler.component.QueryComponent.handleRespo
>>> nses(QueryComponent.java:737)\n\tat org.apache.solr.handler.compon
>>> ent.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(Req
>>> uestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore.
>>> 

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
Something I hadn't know until now. The source cdcr collection has 2 shards
with 1 replica, our target cloud has 2 shards with 2 replicas
Both Source and Target have indexes that are not current

Also we have set all of our collections to ignore external commits

On Thu, Dec 15, 2016 at 1:31 PM, Webster Homer 
wrote:

> Looking through our replicas I noticed that in one of our shards (each
> shard has 2 replicas)
> 1 replica shows:
> "replicas": [
>
> {
> "name": "core_node1",
> "core": "sial-catalog-material_shard2_replica2",
> "baseUrl": "http://ae1b-ecom-msc04:8983/solr;,
> "nodeName": "ae1b-ecom-msc04:8983_solr",
> "state": "active",
> "leader": false,
> "index":
> {
> "numDocs": 487123,
> "maxDocs": 711973,
> *"deletedDocs": 224850,*
> "size": "331.96 MB",
> "lastModified": "2016-12-08T11:10:05.969Z",
> "current": false,
> "version": 17933,
> "segmentCount": 17
> }
> }
> ,
> while the second replica shows this:
>
> {
> "name": "core_node3",
> "core": "sial-catalog-material_shard2_replica1",
> "baseUrl": "http://ae1b-ecom-msc02:8983/solr;,
> "nodeName": "ae1b-ecom-msc02:8983_solr",
> "state": "active",
> "leader": true,
> "index":
> {
> "numDocs": 487063,
> "maxDocs": 487064,
> "deletedDocs": 1,
> "size": "224.83 MB",
> "lastModified": "2016-12-08T11:10:02.625Z",
> "current": false,
> "version": 8208,
> "segmentCount": 19
> }
> }
> ],
> I wrote a routine that uses the Collections API Info call and then for
> each replica calls the Core API to get the information on the index
>
> shouldn't the two replicas have the same number of deletions?
>
> On Thu, Dec 15, 2016 at 12:36 PM, Webster Homer 
> wrote:
>
>> I am trying to find the reported inconsistencies now.
>>
>> The timestamp I have was created by our ETL process, which may not be in
>> exactly the same order as the indexing occurred
>>
>> When I tried to sort the results by _docid_ desc, solr through a 500
>> error:
>> { "responseHeader":{ "zkConnected":true, "status":500, "QTime":7, "params
>> ":{ "q":"*:*", "indent":"on", "fl":"record_spec,s_id,pid,search_concat_pno,
>> search_pno, search_user_term, search_lform, search_eform, search_acronym,
>> search_synonyms, root_name, search_s_pri_name, search_p_pri_name,
>> search_keywords, lookahead_search_terms, sortkey, search_rtecs,
>> search_chem_comp, cas_number, search_component_cas, search_beilstein,
>> search_color_idx, search_ecnumber, search_femanumber, search_isbn,
>> search_mdl_number, search_descriptions, page_title,
>> search_xref_comparable_pno, search_xref_comparable_sku,
>> search_xref_equivalent_pno, search_xref_exact_pno create_date
>> search_xref_exact_sku, score", "sort":"_docid_ desc", "rows":"20", "wt":
>> "json", "_":"1481821047026"}}, "error":{ "msg":"Index: 1, Size: 0", "
>> trace":"java.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat
>> java.util.ArrayList.rangeCheck(ArrayList.java:653)\n\tat
>> java.util.ArrayList.get(ArrayList.java:429)\n\tat
>> org.apache.solr.common.util.NamedList.getVal(NamedList.java:174)\n\tat
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$S
>> hardComparator.sortVal(ShardFieldSortedHitQueue.java:146)\n\tat
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
>> 1.compare(ShardFieldSortedHitQueue.java:167)\n\tat
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
>> 1.compare(ShardFieldSortedHitQueue.java:159)\n\tat
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l
>> essThan(ShardFieldSortedHitQueue.java:91)\n\tat
>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l
>> essThan(ShardFieldSortedHitQueue.java:33)\n\tat
>> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat
>> org.apache.solr.handler.component.QueryComponent.mergeIds(
>> QueryComponent.java:1098)\n\tat org.apache.solr.handler.compon
>> ent.QueryComponent.handleRegularResponses(QueryComponent.java:758)\n\tat
>> org.apache.solr.handler.component.QueryComponent.handleRespo
>> nses(QueryComponent.java:737)\n\tat org.apache.solr.handler.compon
>> ent.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(Req
>> uestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore.
>> execute(SolrCore.java:2089)\n\tat org.apache.solr.servlet.HttpSo
>> lrCall.execute(HttpSolrCall.java:652)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>> r(ServletHandler.java:1668)\n\tat org.eclipse.jetty.servlet.Serv
>> letHandler.doHandle(ServletHandler.java:581)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> 

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
Looking through our replicas I noticed that in one of our shards (each
shard has 2 replicas)
1 replica shows:
"replicas": [

{
"name": "core_node1",
"core": "sial-catalog-material_shard2_replica2",
"baseUrl": "http://ae1b-ecom-msc04:8983/solr;,
"nodeName": "ae1b-ecom-msc04:8983_solr",
"state": "active",
"leader": false,
"index":
{
"numDocs": 487123,
"maxDocs": 711973,
*"deletedDocs": 224850,*
"size": "331.96 MB",
"lastModified": "2016-12-08T11:10:05.969Z",
"current": false,
"version": 17933,
"segmentCount": 17
}
}
,
while the second replica shows this:

{
"name": "core_node3",
"core": "sial-catalog-material_shard2_replica1",
"baseUrl": "http://ae1b-ecom-msc02:8983/solr;,
"nodeName": "ae1b-ecom-msc02:8983_solr",
"state": "active",
"leader": true,
"index":
{
"numDocs": 487063,
"maxDocs": 487064,
"deletedDocs": 1,
"size": "224.83 MB",
"lastModified": "2016-12-08T11:10:02.625Z",
"current": false,
"version": 8208,
"segmentCount": 19
}
}
],
I wrote a routine that uses the Collections API Info call and then for each
replica calls the Core API to get the information on the index

shouldn't the two replicas have the same number of deletions?

On Thu, Dec 15, 2016 at 12:36 PM, Webster Homer 
wrote:

> I am trying to find the reported inconsistencies now.
>
> The timestamp I have was created by our ETL process, which may not be in
> exactly the same order as the indexing occurred
>
> When I tried to sort the results by _docid_ desc, solr through a 500 error:
> { "responseHeader":{ "zkConnected":true, "status":500, "QTime":7, "params
> ":{ "q":"*:*", "indent":"on", "fl":"record_spec,s_id,pid,search_concat_pno,
> search_pno, search_user_term, search_lform, search_eform, search_acronym,
> search_synonyms, root_name, search_s_pri_name, search_p_pri_name,
> search_keywords, lookahead_search_terms, sortkey, search_rtecs,
> search_chem_comp, cas_number, search_component_cas, search_beilstein,
> search_color_idx, search_ecnumber, search_femanumber, search_isbn,
> search_mdl_number, search_descriptions, page_title,
> search_xref_comparable_pno, search_xref_comparable_sku,
> search_xref_equivalent_pno, search_xref_exact_pno create_date
> search_xref_exact_sku, score", "sort":"_docid_ desc", "rows":"20", "wt":
> "json", "_":"1481821047026"}}, "error":{ "msg":"Index: 1, Size: 0", "trace
> ":"java.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat
> java.util.ArrayList.rangeCheck(ArrayList.java:653)\n\tat
> java.util.ArrayList.get(ArrayList.java:429)\n\tat
> org.apache.solr.common.util.NamedList.getVal(NamedList.java:174)\n\tat
> org.apache.solr.handler.component.ShardFieldSortedHitQueue$
> ShardComparator.sortVal(ShardFieldSortedHitQueue.java:146)\n\tat
> org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(
> ShardFieldSortedHitQueue.java:167)\n\tat org.apache.solr.handler.
> component.ShardFieldSortedHitQueue$1.compare(
> ShardFieldSortedHitQueue.java:159)\n\tat org.apache.solr.handler.
> component.ShardFieldSortedHitQueue.lessThan(ShardFieldSortedHitQueue.java:91)\n\tat
> org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(
> ShardFieldSortedHitQueue.java:33)\n\tat org.apache.lucene.util.
> PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat
> org.apache.solr.handler.component.QueryComponent.
> mergeIds(QueryComponent.java:1098)\n\tat org.apache.solr.handler.
> component.QueryComponent.handleRegularResponses(QueryComponent.java:758)\n\tat
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:737)\n\tat
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore.
> execute(SolrCore.java:2089)\n\tat org.apache.solr.servlet.
> HttpSolrCall.execute(HttpSolrCall.java:652)\n\tat org.apache.solr.servlet.
> HttpSolrCall.call(HttpSolrCall.java:459)\n\tat org.apache.solr.servlet.
> SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:208)\n\tat org.eclipse.jetty.servlet.
> ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.
> session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.
> ServletHandler.doScope(ServletHandler.java:511)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.
> handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
> 

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer
I am trying to find the reported inconsistencies now.

The timestamp I have was created by our ETL process, which may not be in
exactly the same order as the indexing occurred

When I tried to sort the results by _docid_ desc, solr through a 500 error:
{ "responseHeader":{ "zkConnected":true, "status":500, "QTime":7, "params":{
"q":"*:*", "indent":"on", "fl":"record_spec,s_id,pid,search_concat_pno,
search_pno, search_user_term, search_lform, search_eform, search_acronym,
search_synonyms, root_name, search_s_pri_name, search_p_pri_name,
search_keywords, lookahead_search_terms, sortkey, search_rtecs,
search_chem_comp, cas_number, search_component_cas, search_beilstein,
search_color_idx, search_ecnumber, search_femanumber, search_isbn,
search_mdl_number, search_descriptions, page_title,
search_xref_comparable_pno, search_xref_comparable_sku,
search_xref_equivalent_pno, search_xref_exact_pno create_date
search_xref_exact_sku, score", "sort":"_docid_ desc", "rows":"20", "wt":
"json", "_":"1481821047026"}}, "error":{ "msg":"Index: 1, Size: 0",
"trace":"java.lang.IndexOutOfBoundsException:
Index: 1, Size: 0\n\tat
java.util.ArrayList.rangeCheck(ArrayList.java:653)\n\tat
java.util.ArrayList.get(ArrayList.java:429)\n\tat
org.apache.solr.common.util.NamedList.getVal(NamedList.java:174)\n\tat
org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardFieldSortedHitQueue.java:146)\n\tat
org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardFieldSortedHitQueue.java:167)\n\tat
org.apache.solr.handler.component.ShardFieldSortedHitQueue$1.compare(ShardFieldSortedHitQueue.java:159)\n\tat
org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardFieldSortedHitQueue.java:91)\n\tat
org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardFieldSortedHitQueue.java:33)\n\tat
org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:1098)\n\tat
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:758)\n\tat
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:737)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
java.lang.Thread.run(Thread.java:745)\n", "code":500}}

On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson 
wrote:

> Let's back up a bit. You say "This seems to cause 

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Erick Erickson
Let's back up a bit. You say "This seems to cause two replicas to
return different hits depending upon which one is queried."

OK, _how_ are they different? I've been assuming different numbers of
hits. If you're getting the same number of hits but different document
ordering, that's a completely different issue and may be easily
explainable. If this is true, skip the rest of this message. I only
realized we may be using a different definition of "different hits"
part way through writing this reply.



Having the timestamp as a string isn't a problem, you can do something
very similar with wildcards and the like if it's a string that sorts
the same way the timestamp would. And it's best if it's created
upstream anyway that way it's guaranteed to be the same for the doc on
all replicas.

If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple
copyfield to a date field would do the trick.

But there's no real reason to do any of that. Given that you see this
when there's no indexing going on then there's no point to those
tests, those were just for a way to examine your nodes while there was
active indexing.

How do you fix this problem when you see it? If it goes away by itself
that would gives at least a start on where to look. If you have to
manually intervene it would be good to know what you do.

The CDCR pattern is docs to from the leader on the source cluster to
the leader on the target cluster. Once the target leader gets the
docs, it's supposed to send the doc to all the replicas.

To try to narrow down the issue, next time it occurs can you look at
_both_ the source and target clusters and see if they _both_ show the
same discrepancy? What I'm looking for is whether both are
self-consistent. That is, all the replicas for shardN on the source
cluster show the same documents (M). All the replicas for shardN on
the target cluster show the same number of docs (N). I'm not as
concerned if M != N at this point. Note I'm looking at the number of
hits here, not say the document ordering.

To do this you'll have to do the trick I mentioned where you query
each replica separately.

And are you absolutely sure that your different results are coming
from the _same_ cluster? If you're comparing a query from the source
cluster with a query from the target cluster, that's different than if
the queries come from the same cluster.

Best,
Erick

On Wed, Dec 14, 2016 at 2:48 PM, Webster Homer  wrote:
> Thanks for the quick feedback.
>
> We are not doing continuous indexing, we do a complete load once a week and
> then have a daily partial load for any documents that have changed since
> the load. These partial loads take only a few minutes every morning.
>
> The problem is we see this discrepancy long after the data load completes.
>
> We have a source collection that uses cdcr to replicate to the target. I
> see the current=false setting in both the source and target collections.
> Only the target collection is being heavily searched so that is where my
> concern is. So what could cause this kind of issue?
> Do we have a configuration problem?
>
> It doesn't happen all the time, so I don't currently have a reproducible
> test case, yet.
>
> I will see about adding the timestamp, we have one, but it was created as a
> string, and was generated by our ETL job
>
> On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson 
> wrote:
>
>> The commit points on different replicas will trip at different wall
>> clock times so the leader and replica may return slightly different
>> results depending on whether doc X was included in the commit on one
>> replica but not on the second. After the _next_ commit interval (2
>> seconds in your case), doc X will be committed on the second replica:
>> that is it's not lost.
>>
>> Here's a couple of ways to verify:
>>
>> 1> turn off indexing and wait a few seconds. The replicas should have
>> the exact same documents. "A few seconds" is your autocommit (soft in
>> your case) interval + autowarm time. This last is unknown, but you can
>> check your admin/plugins-stats search handler times, it's reported
>> there. Now issue your queries. If the replicas don't report the same
>> docs A Bad Thing that should be worrying. BTW, with a 2 second soft
>> commit interval, which is really aggressive, you _better not_ have
>> very large autowarm intervals!
>>
>> 2> Include a timestamp in your docs when they are indexed. There's an
>> automatic way to do that BTW now do your queries and append an FQ
>> clause like =timestamp:[* TO some_point_in_the_past]. The replicas
>> should have the same counts unless you are deleting documents. I
>> mention deletes on the off chance that you're deleting documents that
>> fall in the interval and then the same as above could theoretically
>> occur. Updates should be fine.
>>
>> BTW, I've seen continuous monitoring of this done by automated
>> scripts. The key is to get the shard 

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Webster Homer
Thanks for the quick feedback.

We are not doing continuous indexing, we do a complete load once a week and
then have a daily partial load for any documents that have changed since
the load. These partial loads take only a few minutes every morning.

The problem is we see this discrepancy long after the data load completes.

We have a source collection that uses cdcr to replicate to the target. I
see the current=false setting in both the source and target collections.
Only the target collection is being heavily searched so that is where my
concern is. So what could cause this kind of issue?
Do we have a configuration problem?

It doesn't happen all the time, so I don't currently have a reproducible
test case, yet.

I will see about adding the timestamp, we have one, but it was created as a
string, and was generated by our ETL job

On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson 
wrote:

> The commit points on different replicas will trip at different wall
> clock times so the leader and replica may return slightly different
> results depending on whether doc X was included in the commit on one
> replica but not on the second. After the _next_ commit interval (2
> seconds in your case), doc X will be committed on the second replica:
> that is it's not lost.
>
> Here's a couple of ways to verify:
>
> 1> turn off indexing and wait a few seconds. The replicas should have
> the exact same documents. "A few seconds" is your autocommit (soft in
> your case) interval + autowarm time. This last is unknown, but you can
> check your admin/plugins-stats search handler times, it's reported
> there. Now issue your queries. If the replicas don't report the same
> docs A Bad Thing that should be worrying. BTW, with a 2 second soft
> commit interval, which is really aggressive, you _better not_ have
> very large autowarm intervals!
>
> 2> Include a timestamp in your docs when they are indexed. There's an
> automatic way to do that BTW now do your queries and append an FQ
> clause like =timestamp:[* TO some_point_in_the_past]. The replicas
> should have the same counts unless you are deleting documents. I
> mention deletes on the off chance that you're deleting documents that
> fall in the interval and then the same as above could theoretically
> occur. Updates should be fine.
>
> BTW, I've seen continuous monitoring of this done by automated
> scripts. The key is to get the shard URL and ping that with
> =false. It'll look something like
> http://host:port/solr/collection_shard1_replica1 People usually
> just use *:* and compare numFound.
>
> Best,
> Erick
>
>
>
> On Wed, Dec 14, 2016 at 1:10 PM, Webster Homer 
> wrote:
> > We are using Solr Cloud 6.2
> >
> > We have been noticing an issue where the index in a core shows as
> current =
> > false
> >
> > We have autocommit set for 15 seconds, and soft commit at 2 seconds
> >
> > This seems to cause two replicas to return different hits depending upon
> > which one is queried.
> >
> > What would lead to the indexes not being "current"? The documentation on
> > the meaning of current is vague.
> >
> > The collections in our cloud have two shards each with two replicas. I
> see
> > this with several of the collections.
> >
> > We don't know how they get like this but it's troubling
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.merckgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of 

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Erick Erickson
The commit points on different replicas will trip at different wall
clock times so the leader and replica may return slightly different
results depending on whether doc X was included in the commit on one
replica but not on the second. After the _next_ commit interval (2
seconds in your case), doc X will be committed on the second replica:
that is it's not lost.

Here's a couple of ways to verify:

1> turn off indexing and wait a few seconds. The replicas should have
the exact same documents. "A few seconds" is your autocommit (soft in
your case) interval + autowarm time. This last is unknown, but you can
check your admin/plugins-stats search handler times, it's reported
there. Now issue your queries. If the replicas don't report the same
docs A Bad Thing that should be worrying. BTW, with a 2 second soft
commit interval, which is really aggressive, you _better not_ have
very large autowarm intervals!

2> Include a timestamp in your docs when they are indexed. There's an
automatic way to do that BTW now do your queries and append an FQ
clause like =timestamp:[* TO some_point_in_the_past]. The replicas
should have the same counts unless you are deleting documents. I
mention deletes on the off chance that you're deleting documents that
fall in the interval and then the same as above could theoretically
occur. Updates should be fine.

BTW, I've seen continuous monitoring of this done by automated
scripts. The key is to get the shard URL and ping that with
=false. It'll look something like
http://host:port/solr/collection_shard1_replica1 People usually
just use *:* and compare numFound.

Best,
Erick



On Wed, Dec 14, 2016 at 1:10 PM, Webster Homer  wrote:
> We are using Solr Cloud 6.2
>
> We have been noticing an issue where the index in a core shows as current =
> false
>
> We have autocommit set for 15 seconds, and soft commit at 2 seconds
>
> This seems to cause two replicas to return different hits depending upon
> which one is queried.
>
> What would lead to the indexes not being "current"? The documentation on
> the meaning of current is vague.
>
> The collections in our cloud have two shards each with two replicas. I see
> this with several of the collections.
>
> We don't know how they get like this but it's troubling
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Webster Homer
We are using Solr Cloud 6.2

We have been noticing an issue where the index in a core shows as current =
false

We have autocommit set for 15 seconds, and soft commit at 2 seconds

This seems to cause two replicas to return different hits depending upon
which one is queried.

What would lead to the indexes not being "current"? The documentation on
the meaning of current is vague.

The collections in our cloud have two shards each with two replicas. I see
this with several of the collections.

We don't know how they get like this but it's troubling

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.