[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-11-15 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667956#comment-15667956
 ] 

Ishan Chattopadhyaya commented on SOLR-9506:


Can we resolve this issue, since it seems it was released as part of 6.3.0? (I 
will open another issue for the issue I wrote about two comments before).

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, 
> SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-11-15 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667801#comment-15667801
 ] 

Ishan Chattopadhyaya commented on SOLR-9506:


I see.. I saw it was unresolved, and I thought it didn't make it into 6.3 yet. 
I'll see if it made it into 6.3, and open a new ticket if that's the case.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, 
> SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-11-15 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667795#comment-15667795
 ] 

Noble Paul commented on SOLR-9506:
--

Ishan , i guess this is already fixed in 6.3. so, we may need to open another 
ticket

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, 
> SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601820#comment-15601820
 ] 

ASF subversion and git services commented on SOLR-9506:
---

Commit 265d425b00181dd384fa963e46dc35b92b7e02c0 in lucene-solr's branch 
refs/heads/branch_6x from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=265d425 ]

SOLR-9506: cache IndexFingerprint for each segment


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506_POC.patch, SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601675#comment-15601675
 ] 

ASF subversion and git services commented on SOLR-9506:
---

Commit 184b0f221559eaed5f273b1907e8af07bc95fec9 in lucene-solr's branch 
refs/heads/master from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=184b0f2 ]

SOLR-9506: cache IndexFingerprint for each segment


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506_POC.patch, SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-23 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15599826#comment-15599826
 ] 

Pushkar Raste commented on SOLR-9506:
-

Yeah, I looked into it. I will try that approach, if I can get to it before 
[~noble.paul] applies the patch. 


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506_POC.patch, SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-21 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596148#comment-15596148
 ] 

Pushkar Raste commented on SOLR-9506:
-

Don't use patch for parallalized computation. Parallel streams in use a shared 
fork-join pool. A bad actor can create havoc.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-19 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589649#comment-15589649
 ] 

Pushkar Raste commented on SOLR-9506:
-

[~noble.paul] and [~yo...@apache.org] I was able to put together test to show 
that current implementation is broken. 
I will update patch with the test and a fix by EOD today

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Keith Laban (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586530#comment-15586530
 ] 

Keith Laban commented on SOLR-9506:
---

How expensive would it be to check numDocs (#4 in yoniks comment earlier). I 
think this would be the most straightforward and understandable approach.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586419#comment-15586419
 ] 

ASF subversion and git services commented on SOLR-9506:
---

Commit ffa5c4ba2c2d6fa6bb943a70196aad0058333fa2 in lucene-solr's branch 
refs/heads/master from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ffa5c4b ]

SOLR-9506: reverting the previous commit


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585918#comment-15585918
 ] 

Yonik Seeley commented on SOLR-9506:


Please do.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585801#comment-15585801
 ] 

Noble Paul commented on SOLR-9506:
--

If the above case fails, let's revert the commit and revisit the fingerprint 
computation

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585781#comment-15585781
 ] 

ASF subversion and git services commented on SOLR-9506:
---

Commit 9aa764a54f50eca5a8ef805bdb29e4ad90fcce5e in lucene-solr's branch 
refs/heads/master from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9aa764a ]

* SOLR-9506: cache IndexFingerprint for each segment


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585759#comment-15585759
 ] 

Yonik Seeley commented on SOLR-9506:



The above manual test only exhibited this bad behavior after the commit today.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585756#comment-15585756
 ] 

Yonik Seeley commented on SOLR-9506:



Not sure I understand... are you suggesting a workaround in PeerSync 
(recoverWithReplicationOnly) to work around the correctness problem caused by 
this commit?


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585736#comment-15585736
 ] 

Pushkar Raste commented on SOLR-9506:
-

There is lot of confusion going on here. Would above test fail not fail, if we 
won't cache per segment indexfingerprint ?
If yes, them we should revert the commit, if not we should open a new issue to 
fix the indexfingerprint computation altogether. 


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585709#comment-15585709
 ] 

Yonik Seeley commented on SOLR-9506:


Pretty simple to try out:
{code}
bin/solr start -e techproducts

http://localhost:8983/solr/techproducts/query?q=*:*
  "response":{"numFound":32,"start":0,"docs":[

http://localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807
{
  "fingerprint":{
"maxVersionSpecified":9223372036854775807,
"maxVersionEncountered":1548538118066405376,
"maxInHash":1548538118066405376,
"versionsHash":8803836617561505377,
"numVersions":32,
"numDocs":32,
"maxDoc":32}}

curl http://localhost:8983/solr/techproducts/update?commit=true -H 
"Content-Type: text/xml" -d 'apple'

# this shows that the delete is visibie
http://localhost:8983/solr/techproducts/query?q=*:*
  "response":{"numFound":31,"start":0,"docs":[

#fingerprint returns the same thing
http://localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807
{
  "fingerprint":{
"maxVersionSpecified":9223372036854775807,
"maxVersionEncountered":1548538118066405376,
"maxInHash":1548538118066405376,
"versionsHash":8803836617561505377,
"numVersions":32,
"numDocs":32,
"maxDoc":32}}

bin/solr stop -all
bin/solr start -e techproducts

#after a restart, fingerprint returns something different
http://localhost:8983/solr/techproducts/get?getFingerprint=9223372036854775807
{
  "fingerprint":{
"maxVersionSpecified":9223372036854775807,
"maxVersionEncountered":1548538118066405376,
"maxInHash":1548538118066405376,
"versionsHash":-131508374066080,
"numVersions":31,
"numDocs":31,
"maxDoc":32}}

{code}

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585707#comment-15585707
 ] 

Pushkar Raste commented on SOLR-9506:
-

I think what Yonik is implying is that, if for some reason, replica does not 
apply delete properly, index fingerprint would still checkout and that would be 
a problem.

Considering the issues with {{PeerSync}}, should add that option  
{{recoverWithReplicationOnly}} ? For most of the setups I doubt if people would 
have hundreds of thousands of records in updateLog in which which almost no one 
is using {{PeerSync}} anyway

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585700#comment-15585700
 ] 

Yonik Seeley commented on SOLR-9506:



Yep.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Keith Laban (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585694#comment-15585694
 ] 

Keith Laban commented on SOLR-9506:
---

Are you implying that if you add a document. commit it, compute the index 
fingerprint and cache the segments. Then delete that document and commit that 
change, and compute the fingerprint again with the cached segment fingerprint, 
you will end up with the same index fingerprint?

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Keith Laban (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585693#comment-15585693
 ] 

Keith Laban commented on SOLR-9506:
---

Are you implying that if you add a document. commit it, compute the index 
fingerprint and cache the segments. Then delete that document and commit that 
change, and compute the fingerprint again with the cached segment fingerprint, 
you will end up with the same index fingerprint?

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585688#comment-15585688
 ] 

Pushkar Raste commented on SOLR-9506:
-

i.e. we really need fix IndexFingerprint computation, whether or not we cache. 
I will open a separate issue to fix it in that case.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585683#comment-15585683
 ] 

Yonik Seeley commented on SOLR-9506:


"Right... the core cache key does not change, even if there are deletes for the 
segment."

So the cache key ignores deleted documents, while the value being cached does 
not.  It's a fundamental mis-match.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585677#comment-15585677
 ] 

Pushkar Raste commented on SOLR-9506:
-

I don't see why caching indexfingerprint per segment and using that later would 
be different than computing indexfingerprint on entire segment by going through 
one segment at time. 

I tried to come up with scenarios where caching solution would fail and 
original solution would not, but could not think of any. 


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585635#comment-15585635
 ] 

Yonik Seeley commented on SOLR-9506:


Hmmm, why was this committed?
See my comments regarding deleted documents that were never addressed.  What 
was committed will now result in incorrect fingerprints being returned.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585612#comment-15585612
 ] 

Pushkar Raste commented on SOLR-9506:
-

I did not upload the patch with parallelStream. In SolrIndexSearcher where we 
compute and cache per segment indexfingerprint try switching from {{stream()}} 
to {{parallelStream()}} and you will see {{PeerSyncTest}} fails. 

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585570#comment-15585570
 ] 

Noble Paul commented on SOLR-9506:
--

which test. I did not find?

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585553#comment-15585553
 ] 

ASF subversion and git services commented on SOLR-9506:
---

Commit bb907a2983b4a7eba8cb4d527a859f1b312bdc79 in lucene-solr's branch 
refs/heads/master from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb907a2 ]

* SOLR-9506: cache IndexFingerprint for each segment


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-07 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556139#comment-15556139
 ] 

Pushkar Raste commented on SOLR-9506:
-

I computed hash w/o regard to deleted docs and cached it. All the tests are 
passing even without doing steps #2 and #3. I also verified that index 
fingerprint computed on entire index matches to that of fingerprint computed on 
from individual segments (even after deletions).

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-07 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556132#comment-15556132
 ] 

Pushkar Raste commented on SOLR-9506:
-

I also found some weird behavior. If I use {{parallelStream}} to compute 
segment fingerprints in parallel. When I reduce it to the index fingerprint on 
the index searcher, test fails. Why should order of computation and reduction 
matter in this case?

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-06 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552125#comment-15552125
 ] 

Pushkar Raste commented on SOLR-9506:
-

Updated patch, added a scenario in {{PeerSyncTest}} about replica missing an 
update.
Looks like with don't need to remove live docs check {{if (liveDocs != null && 
!liveDocs.get(doc)) continue;}}

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548767#comment-15548767
 ] 

Yonik Seeley commented on SOLR-9506:


A few random points after browsing this issue...

bq. We can not use current versionsHash (unless we cache all the individual 
version numbers), as it is not additive.

The current versionsHash is additive (it must be, because as you say segments 
may not line up between leader and replica, and document order may differ).  
When caching per segment, keep this property by simply adding the segment 
fingerprints together.  Am I missing something here?

bq. private final Map cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-05 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548510#comment-15548510
 ] 

Noble Paul commented on SOLR-9506:
--

https://github.com/apache/lucene-solr/pull/84

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-29 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532555#comment-15532555
 ] 

Pushkar Raste commented on SOLR-9506:
-

Discussed with [~noble.paul] 
We should cache fingerprint for a segment only if  *maxVersion specified* > 
*max version in the segment*

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531954#comment-15531954
 ] 

Noble Paul commented on SOLR-9506:
--

Few quick points

{code:java}
  // Map is not concurrent, but since computeIfAbsent is idempotent, it should 
be alright for two threads to compute values for the same key.   
  private final Map> 
perSegmentFingerprintCache = new WeakHashMap<>();
{code}

The map has to be thread safe, javadocs say that threadsafety depends on the 
map implementation
We really don't need to keep a cache per version. The reason is, we only give 
one version number and only the latest segment will have to have to compute 
anything other than the full fingerprint. As soon as a new segment is added 
everything else other than the full fingerprint becomes useless. So, the 
solution is , if maxVersion is Long.MAX_VALUE, cache it, else recompute 
everytime. So, the cache should be

{code:java}
  private final Map 
perSegmentFingerprintCache = Collections.synchronizedMap(new WeakHashMap<>());
{code}
 

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523631#comment-15523631
 ] 

Noble Paul commented on SOLR-9506:
--

the cumulative numDocs will be same anyway

I guess it can be reproduced as follows

# take a 2 replica shard
# index and commit multiple times
# delete one doc and commit
# bring down replica
# optimize leader
# bring up replica

I guess this will lead to a full replication





> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523564#comment-15523564
 ] 

Pushkar Raste commented on SOLR-9506:
-

I think what [~ichattopadhyaya] is hinting at, is that if {{numDocs}} account 
only for live (active) docs, then once documents are deleted in a segment, 
{{numDocs}} in the cached fingerprint might be wrong. 

Surprising, following test cases passed with my POC
1. {{PeerSyncTest}}
2. {{PeerSyncReplicationTest}}
3. {{SyncSliceTest}}

In the worst case, we can atleast parallalize fingerprint computation. 

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523566#comment-15523566
 ] 

Pushkar Raste commented on SOLR-9506:
-

Adding [~ysee...@gmail.com] in the loop

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523490#comment-15523490
 ] 

Noble Paul commented on SOLR-9506:
--

no. segments don't change


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523485#comment-15523485
 ] 

Ishan Chattopadhyaya commented on SOLR-9506:


We should keep in mind that previously written segments can change if there are 
deletes. Maybe we should recompute the per-segment fingerprints upon deletion 
in that segment.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523273#comment-15523273
 ] 

Pushkar Raste commented on SOLR-9506:
-

In short you are suggesting that when we cache fingerprint for individual 
segments, we keep a list of version numbers in those segments around? That 
would be billions of {{Long}} values cached, which might be counter-productive,

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523257#comment-15523257
 ] 

Noble Paul commented on SOLR-9506:
--

[~praste] I've attached a sample program which computes versionsHash for leader 
and replica using the above example

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521130#comment-15521130
 ] 

Pushkar Raste commented on SOLR-9506:
-

POC/Initial commit - 
https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13

There are two issues we still need to solve. 
* How to compute `versionsInHash` from `versionsInHash` of individual segments. 
We can not use current `versionsHash` (unless we cache all the individual 
version numbers), as it is not additive.  Consider following scenario
*Leader segments, versions and hash*
*seg1* : 
 versions: 100, 101, 102  
  versionHash: hash(100) + hash(101) + hash(102)
*seg2*: 
 versions: 103, 104, 105
  versionHash: hash(103) + hash(104) + hash(105) 
\\ \\ *Replica segments, versions and hash*
*seg1*: 
 versions: 100, 101
  versionHash: hash(100) + hash(101) 
*seg2*: 
 versions: 102, 103, 104, 105
  versionHash: hash(102) + hash(103) + hash(104) + hash(105)
\\ \\Leader and Replica are essentially in sync, however using current method 
there is no way to compute and ensure cumulative `versionHash` of leader and 
replica would match

* I still need to figure out how to keep cache in   `DefaultSolrCoreState`, so 
that we can reuse `IndexFingerprint` of individual segments when a new Searcher 
is opened.  

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org