subject:"\[jira\] \[Commented\] \(SOLR\-8586\) Implement hash over all documents to check for shard synchronization"

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-08-12 Thread Yago Riveiro (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419184#comment-15419184
 ] 

Yago Riveiro commented on SOLR-8586:


Then I do not understand, how this is possible:

https://www.dropbox.com/s/a6e2wrmedop7xjv/Screenshot%202016-08-12%2018.19.22.png?dl=0

Only with 5.5.x and 6.x the heap grows to the infinite. Rolling back to 5.4 the 
amount of memory needed to become up is constant ...

With only one node running 5.5.x I have no problems, when I start a second node 
with 5.5.x they never pass the phase where they are checking replica 
synchronization.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-08-12 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419145#comment-15419145
 ] 

Yonik Seeley commented on SOLR-8586:


You can set the environment variable solr.disableFingerprint to "false" to 
disable the fingerprint check.

If your indexes ever have updates to existing documents, then you're still 
risking OOMs anyway (the first time a replica detects that an update may be 
reordered will cause the FieldCache to be populated for _version_ for that 
segment).  The fingerprint makes that happen up-front (what I meant to say in 
my previous message was "the maximum required amount of memory shouldn't be 
changed").

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-08-12 Thread Yago Riveiro (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419078#comment-15419078
 ] 

Yago Riveiro commented on SOLR-8586:


My index has 12T of data indexed with 4.0, the _version_ field only support 
docValues since 4.7.

To Upgrade to 5.x I ran the lucene-core-5.x over all my data,but with this new 
feature I need to re-index all my data because I don't have docValues for 
__version__ field and this feature use instead the un-inverted method that 
creates a memory struct that doesn't fit the memory of my servers ...

To be honest, this never should be done in a minor release ... this mandatory 
feature is based in a optional configuration :/

I will die in 5.4 or spend several months re-indexing data and figure out how 
to update production without downtime.  Not an easy task.



> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-08-12 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418799#comment-15418799
 ] 

Yonik Seeley commented on SOLR-8586:


If the _version_ field doesn't have docValues, then it will be un-inverted 
(i.e. FieldCache entries will be built to support _version_ lookups, and that 
does require memory).
Since _version_ lookups are needed in the course of indexing anyway (to detect 
update reorders on replicas), this should really just change when these 
FieldCache entries are created... hence the maximum required amount of memory 
should be changed.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-08-11 Thread Yago Riveiro (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417460#comment-15417460
 ] 

Yago Riveiro commented on SOLR-8586:


Is this operation memory bound?

I'm trying to update my SolrCloud from 5.4 to 5.5.2 and I can only update one 
node, if I start another node with 5.5.2 the first dies with an OOM.

The second node never pass the phase where is checking if replicas are sync.

The SolrCloud deploy (2 nodes) has no activity at all, is a cold repository for 
archived data (around 5 Billion documents).



> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-10 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142215#comment-15142215
 ] 

Yonik Seeley commented on SOLR-8586:


bq. Yep, I've been looping a custom version of the HDFS-nothing-safe test that 
among other things, only does adds, no deletes.

Update: when I reverted my custom changes to the chaos test (so that it also 
did deletes), I got a high amount of shard-out-of-sync errors... seemingly even 
more than before, so I've been trying to track those down.  What I saw were 
issues that did not look related to PeerSync... I saw missing documents from a 
shard that replicated from the leader while buffering documents, and I saw the 
missing documents come in and get buffered, pointing to transaction log 
buffering or replay issues.

Then I realized that I had tested "adds only" before committing, and tested the 
normal test after committing and doing a "git pull".  In-between those times 
was SOLR-8575, which was a fix to the HDFS tlog!  I've been looping the test 
for a number of hours with those changes reverted, and I haven't seen a 
shards-out-of-sync fail so far.  I've also done a quick review of SOLR-8575, 
but didn't see anything obviously incorrect.

I've also been running the non-hdfs version of the test for over a day, and 
also had no inconsistent shard failures.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Fix For: 5.5, master
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-08 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137248#comment-15137248
 ] 

Yonik Seeley commented on SOLR-8586:


bq.  The first thing I see is to compare hashes between the shards and if there 
is a difference use the ComplementStream to determine which id's are missing. 

Implementing eventual consistency with this is problematic in a general sense:
If one shard has an ID and another doesn't, you don't know what the correct 
state is.
The other general issue is the inability to actually retrieve an arbitrary 
document from the index (i.e. all source fields must be stored).

It may still be useful for add-only systems that do store all source fields...  
but in that case, we could make things much more efficient by adding in the 
ability to use hash trees to drastically narrow the ids that need to be 
communicated.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-08 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137227#comment-15137227
 ] 

Yonik Seeley commented on SOLR-8586:


OK, I did some basic performance testing...
On an index w/ 5M docs, the first-time fingerprint took 1100ms (most of that 
time was un-inversion of the version field, which did not use docValues).
After the first time, subsequent fingerprints took ~55ms

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-08 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136924#comment-15136924
 ] 

Joel Bernstein commented on SOLR-8586:
--

Now that this is in place it may make sense to combine this with Streaming. The 
first thing I see is to compare hashes between the shards and if there is a 
difference use the ComplementStream to determine which id's are missing. The 
missing id's could then be automatically fetched from the source and 
re-indexed. There could be a DaemonStream that lives inside the collection that 
performs this check periodically. This could also sort out a situation where 
non of the shards have the complete truth. 

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-08 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137272#comment-15137272
 ] 

Joel Bernstein commented on SOLR-8586:
--

I think there would need to be a system of truth involved, which there often 
is. The steps would be:

1) Check the hashes.
2) If hashes differ find the difference in id's. 
3) Refetch Id's from the system of truth. Streaming data from the system of 
truth is easily done with streams like the JdbcStream which streams data from a 
relational database. 









> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-05 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134387#comment-15134387
 ] 

ASF subversion and git services commented on SOLR-8586:
---

Commit ff83a400156beb6a8dd2d0845c7f878c28431739 in lucene-solr's branch 
refs/heads/branch_5x from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff83a40 ]

SOLR-8586: add index fingerprinting and use it in peersync
(cherry picked from commit 629767b)


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-05 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134476#comment-15134476
 ] 

ASF subversion and git services commented on SOLR-8586:
---

Commit 629767be0686d39995f2afc1f1f267f9d1a68cef in lucene-solr's branch 
refs/heads/lucene-6997 from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=629767b ]

SOLR-8586: add index fingerprinting and use it in peersync


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-05 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134529#comment-15134529
 ] 

ASF subversion and git services commented on SOLR-8586:
---

Commit d75abb2539fb62514c506776c1db6182803745bc in lucene-solr's branch 
refs/heads/branch_5x from [~thetaphi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d75abb2 ]

SOLR-8586: Fix forbidden APIS; cleanup of imports


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-05 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134542#comment-15134542
 ] 

ASF subversion and git services commented on SOLR-8586:
---

Commit 629767be0686d39995f2afc1f1f267f9d1a68cef in lucene-solr's branch 
refs/heads/lucene-6835 from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=629767b ]

SOLR-8586: add index fingerprinting and use it in peersync


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-05 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134545#comment-15134545
 ] 

ASF subversion and git services commented on SOLR-8586:
---

Commit f6400e9cbb1158178af0b6cb7901a784368ab589 in lucene-solr's branch 
refs/heads/lucene-6835 from [~thetaphi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f6400e9 ]

SOLR-8586: Fix forbidden APIS; cleanup of imports


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-04 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133217#comment-15133217
 ] 

Erick Erickson commented on SOLR-8586:
--

OK, does this mean I can commit SOLR-8500 (after this is committed to 5x)?

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133278#comment-15133278
 ] 

ASF subversion and git services commented on SOLR-8586:
---

Commit f6400e9cbb1158178af0b6cb7901a784368ab589 in lucene-solr's branch 
refs/heads/master from [~thetaphi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f6400e9 ]

SOLR-8586: Fix forbidden APIS; cleanup of imports


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-04 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133286#comment-15133286
 ] 

Mark Miller commented on SOLR-8586:
---

I think that we should warn that it can result in more often needing to do full 
index replication for recovery, but I have nothing against it.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-04 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133458#comment-15133458
 ] 

Erick Erickson commented on SOLR-8586:
--

Yeah, this is kind of a "use at your own risk in very specialized situations" 
kind of thing so I'll be sure and include that warning.


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-04 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132688#comment-15132688
 ] 

Mark Miller commented on SOLR-8586:
---

Any chaos monkey test results yet?

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-04 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132725#comment-15132725
 ] 

Yonik Seeley commented on SOLR-8586:


Yep, I've been looping a custom version of the HDFS-nothing-safe test that 
among other things, only does adds, no deletes.  It's the same test I've been 
using all along in SOLR-8129 .  I've gotten 66 fails (most due to mismatch with 
control), but no fails due to shards being out of sync!

I plan on committing this soon.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-04 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132892#comment-15132892
 ] 

ASF subversion and git services commented on SOLR-8586:
---

Commit 629767be0686d39995f2afc1f1f267f9d1a68cef in lucene-solr's branch 
refs/heads/master from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=629767b ]

SOLR-8586: add index fingerprinting and use it in peersync


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, 
> SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-01 Thread Stephan Lagraulet (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126086#comment-15126086
 ] 

Stephan Lagraulet commented on SOLR-8586:
-

I'm trying to gather all issues related to SolrCloud that affects Solr 5.4. Can 
you affect SolrCloud component to this issue ?

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-02-01 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126352#comment-15126352
 ] 

Yonik Seeley commented on SOLR-8586:


{quote}
PeerSync always returned "true" if the core doing the sync was judged to be 
either equal to or ahead of the remote core.
So one outstanding question is: under what circumstances do we change this to 
only return true on an exact match?
{quote}

So I think the answer to this is that we're OK, as long as *both* peers don't 
end up returning true.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-28 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122039#comment-15122039
 ] 

Yonik Seeley commented on SOLR-8586:


So the basic idea is that when a replica coming back up syncs to a leader, it 
can request a fingerprint in addition to the last leader versions.  It can then 
grab and apply any missing versions, calculate it's own fingerprint, and 
compare for equality.


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-28 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122496#comment-15122496
 ] 

Yonik Seeley commented on SOLR-8586:


bq. I am thinking if per-segment caching would conflict with any potential for 
in-place docValues updates

Hmmm, excellent thought.
Previously, if caching by the "core" segment key,  one only needed to take into 
account deletions.  In this case we could have just subtracted the hash for 
each deletion to do per-segment caching.  But I don't know how this works with 
updateable doc values.  They may invalidate previous techniques for per-segment 
caching (for those fields only of course).

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-28 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122410#comment-15122410
 ] 

Ishan Chattopadhyaya commented on SOLR-8586:


The approach looks good to me.

{quote}
{code}
+// TODO: this could be parallelized, or even cached per-segment if 
performance becomes an issue
{code}
{quote}

I am thinking if per-segment caching would conflict with any potential for 
in-place docValues updates support (SOLR-5944)? I'm saying this based on my 
assumption that docValues updates re-writes the docValues file for a previously 
written segment. Given that, in such a case, version field would be a DV field, 
would per-segment caching of the fingerprint need to be aware of in-place 
updates within a segment (whenever that support is built)?

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-28 Thread David Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122311#comment-15122311
 ] 

David Smith commented on SOLR-8586:
---

Trying to understand this without knowing the internals of the sync process -- 
apologies in advance if these are dumb questions:

It isn't stated, but I assume the replica does a full sync if its fingerprint, 
after sync, does not match the leader's?

Are there any scale concerns around calculating the fingerprint?  Say, if there 
are 100,000,000 (non-deleted) docs in the index? 

In a high volume situation (1000's updates / sec), will the leader's 
fingerprint calculation be in perfect sync with the last versions it is 
communicating to the replica?  Thinking about a searcher being refreshed in the 
middle of this request, or something like that.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-28 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122263#comment-15122263
 ] 

Yonik Seeley commented on SOLR-8586:


PeerSync always returned "true" if the core doing the sync was judged to be 
either equal to or ahead of the remote core.
So one outstanding question is: under what circumstances do we change this to 
only return true on an exact match?

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-28 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122527#comment-15122527
 ] 

Yonik Seeley commented on SOLR-8586:



bq. It isn't stated, but I assume the replica does a full sync if its 
fingerprint, after sync, does not match the leader's?

right.

bq. Are there any scale concerns around calculating the fingerprint? Say, if 
there are 100,000,000 (non-deleted) docs in the index?

Yes, this needs to be tested.  We can do some caching if it's an issue.

bq. In a high volume situation (1000's updates / sec), will the leader's 
fingerprint calculation be in perfect sync with the last versions it is 
communicating to the replica?

No, but in a high volume situation, we won't be able to sync up by requesting a 
few missed docs from the leader anyway, so it probably doesn't matter.  This is 
more for both low update scenarios, and for bringing the whole cluster back up.


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-28 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122872#comment-15122872
 ] 

Joel Bernstein commented on SOLR-8586:
--

This exactly what we need for implementing alerts (SOLR-8577).

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch, SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-27 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119093#comment-15119093
 ] 

Ishan Chattopadhyaya commented on SOLR-8586:


numRecordsToKeep can be configured.
https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-TransactionLog

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-27 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119099#comment-15119099
 ] 

Ishan Chattopadhyaya commented on SOLR-8586:


Just want to understand that those updates (at a replica) which are rejected 
due to reordering, or older versions which have since been updated, would also 
be counted towards this hash, isn't it?
Or, instead, would the fingerprint be the sum of hashes of only the latest 
versions of all docs?

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-27 Thread Stephan Lagraulet (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119107#comment-15119107
 ] 

Stephan Lagraulet commented on SOLR-8586:
-

Thanks I missed this Solr5 enhancement...

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-27 Thread Stephan Lagraulet (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119062#comment-15119062
 ] 

Stephan Lagraulet commented on SOLR-8586:
-

Would it be possible to increase this "100 updates" window as it seems quite 
low for heavy indexing use cases ?

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-27 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119180#comment-15119180
 ] 

Yonik Seeley commented on SOLR-8586:


The latter.  We're looking at that is in the index, and that will only have the 
last version of every non-deleted document.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-26 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118596#comment-15118596
 ] 

Yonik Seeley commented on SOLR-8586:


bq. I wonder if adding hashes of the version might be prone to problems if the 
version of any given document tends to be identical to many other documents

Nope, versions are unique to a shard (the leader assigns a unique version to 
every update).

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-26 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117982#comment-15117982
 ] 

David Smiley commented on SOLR-8586:


Could you please clarify what this issue is all about? I don't get it.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-26 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118332#comment-15118332
 ] 

Yonik Seeley commented on SOLR-8586:


bq. Could you please clarify what this issue is all about? I don't get it.

Are you familiar with PeerSync?  I just linked  SOLR-8129 as well.

PeerSync currently checks for replicas being in-sync by looking at the last 100 
updates, and if there are only a few updates missing (judged by a sufficient 
overlap of those updates) it will grab the missing updates from the peer and 
then assume that it is in sync.  For whatever reason, updates can sometimes get 
wildly reordered, and looking at the last N updates is not sufficient.  
Hopefully "Implement hash over all documents to check for shard 
synchronization" should now make sense?

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-26 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118579#comment-15118579
 ] 

David Smiley commented on SOLR-8586:


Ah, ok.  I wasn't familiar with PeerSync; thanks for educating me.

I wonder if adding hashes of the version might be prone to problems if the 
version of any given document tends to be identical to many other documents if 
they were added at once, and assuming a timestamp based version.  Just throwing 
that out there; maybe it wouldn't be a problem and/or too unlikely to worry 
about, all things considered.

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-22 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112702#comment-15112702
 ] 

Ishan Chattopadhyaya commented on SOLR-8586:


A bloom filter with all versions, maybe?

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-22 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112725#comment-15112725
 ] 

Yonik Seeley commented on SOLR-8586:


A bloom filter would allow one to estimate (with a known error) if a specific 
version is contained within the index.  But it's not clear how we would use 
that info.  All we need here is to know if two indexes are in sync or not.

I was thinking of something as simple as
{code}
h = 0
for version in versions:
  h += hash(version)
{code}


> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-22 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112734#comment-15112734
 ] 

Ishan Chattopadhyaya commented on SOLR-8586:


I see.. My initial thought was that a bloom filter from one replica could be 
compared against another bloom filter from another replica (bitwise), to arrive 
at the same checking. And also, it could be re-used later for other purposes, 
if needed (maybe to find out a missing update, i.e. by running a loop over all 
updates one replica has and comparing against the bloom filter of the replica 
that has missing update; but haven't thought about this carefully enough). 
However, your logic seems to do the needful and comparing two longs is surely 
faster than two bit arrays (or two arrays of longs).

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8586) Implement hash over all documents to check for shard synchronization

2016-01-22 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112751#comment-15112751
 ] 

Yonik Seeley commented on SOLR-8586:


bq. My initial thought was that a bloom filter from one replica could be 
compared against a bloom filter from another replica (bitwise), to arrive at 
the same checking. 

We'd need to figure out how big of a bloom filter would be needed to avoid a 
false match (no idea, off the top of my head).

For adding up good hashes, 64 bits feels like it should be plenty.  We could 
always easily extend that by accumulating in multiple buckets (the bucket being 
chosen by either a few bits of the hash, or a completely different hash).

> Implement hash over all documents to check for shard synchronization
> 
>
> Key: SOLR-8586
> URL: https://issues.apache.org/jira/browse/SOLR-8586
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>
> An order-independent hash across all of the versions in the index should 
> suffice.  The hash itself is pretty easy, but we need to figure out 
> when/where to do this check (for example, I think PeerSync is currently used 
> in multiple contexts and this check would perhaps not be appropriate for all 
> PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

44 matches

Mail list logo