[jira] [Updated] (SOLR-13511) For SearchHandler, expose "new ResponseBuilder()" to allow override

2019-06-03 Thread Ramsey Haddad (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-13511:
-
Attachment: SOLR-13511.patch

> For SearchHandler, expose "new ResponseBuilder()" to allow override
> ---
>
> Key: SOLR-13511
> URL: https://issues.apache.org/jira/browse/SOLR-13511
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Reporter: Ramsey Haddad
>Priority: Trivial
>  Labels: easyfix
> Attachments: SOLR-13511.patch
>
>
> This change is all we want upstream. To use this from our plugins, we intend:
> Extend ResponseBuilder to have additional state (and we think others might 
> want to as well).
> Use an extended SearchHandler that simply creates our ResponseBuilder instead 
> of the standard one.
> We also extend QueryComponent to do our extra behavior if it sees our 
> Response builder instead of the standard one.
> We then change config to use our Search Handler for requestHandler with 
> name="/select" and our QueryComponent for searchComponent with name="query".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-13511) For SearchHandler, expose "new ResponseBuilder()" to allow override

2019-06-03 Thread Ramsey Haddad (JIRA)
Ramsey Haddad created SOLR-13511:


 Summary: For SearchHandler, expose "new ResponseBuilder()" to 
allow override
 Key: SOLR-13511
 URL: https://issues.apache.org/jira/browse/SOLR-13511
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: search
Reporter: Ramsey Haddad


This change is all we want upstream. To use this from our plugins, we intend:

Extend ResponseBuilder to have additional state (and we think others might want 
to as well).
Use an extended SearchHandler that simply creates our ResponseBuilder instead 
of the standard one.
We also extend QueryComponent to do our extra behavior if it sees our Response 
builder instead of the standard one.
We then change config to use our Search Handler for requestHandler with 
name="/select" and our QueryComponent for searchComponent with name="query".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Intervals vs Span guidance

2019-03-28 Thread Ramsey Haddad (BLOOMBERG/ LONDON)
We are building our needed customizations/extensions on Solr/Lucene 7.7 or 8.0 
or later. We are unclear on whether/when to use Intervals vs Span.

We know that Intervals is still maturing (new functionality in 8.0 and probably 
on-going for a while?)

But what is the overall intention/guidance? "If you need X, then use Spans." 
"If you need Y, then use Intervals." "After the year 20xy, we expect everyone 
to be using Intervals." ??

Any opinions valued.

Thanks,
Ramsey.

[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack

2017-10-24 Thread Ramsey Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216636#comment-16216636
 ] 

Ramsey Haddad edited comment on SOLR-11179 at 10/24/17 10:10 AM:
-

OK. And here's a refined patch that also adds {{jstack}} to the windows 
{{solr.cmd}} file.



was (Author: rwhaddad):
OK. And here's is refined patch that also adds {{jstack}} to the windows 
{{solr.cmd}} file.


> Ability to dump jstack
> --
>
> Key: SOLR-11179
> URL: https://issues.apache.org/jira/browse/SOLR-11179
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-11179.patch, SOLR-11179.patch, SOLR-11179.patch
>
>
> Add a "jstack" command to the "bin/solr" script to ease capture of jstacks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack

2017-10-24 Thread Ramsey Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216636#comment-16216636
 ] 

Ramsey Haddad edited comment on SOLR-11179 at 10/24/17 10:09 AM:
-

OK. And here's is refined patch that also adds {{jstack}} to the windows 
{{solr.cmd}} file.



was (Author: rwhaddad):
OK. And here's is refined patch that also adds "jstack" to the windows solr.cmd 
file.


> Ability to dump jstack
> --
>
> Key: SOLR-11179
> URL: https://issues.apache.org/jira/browse/SOLR-11179
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-11179.patch, SOLR-11179.patch, SOLR-11179.patch
>
>
> Add a "jstack" command to the "bin/solr" script to ease capture of jstacks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11179) Ability to dump jstack

2017-10-24 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-11179:
-
Attachment: SOLR-11179.patch

OK. And here's is refined patch that also adds "jstack" to the windows solr.cmd 
file.


> Ability to dump jstack
> --
>
> Key: SOLR-11179
> URL: https://issues.apache.org/jira/browse/SOLR-11179
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-11179.patch, SOLR-11179.patch, SOLR-11179.patch
>
>
> Add a "jstack" command to the "bin/solr" script to ease capture of jstacks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack

2017-08-17 Thread Ramsey Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130055#comment-16130055
 ] 

Ramsey Haddad edited comment on SOLR-11179 at 8/17/17 7:44 AM:
---

Here is a patch that incorporates some of your suggestions.
* We don't build/run Solr on Windows. I'd be happy to include changes for 
{{solr.cmd}} by someone in a position to test them.
* Yes, I have added a {{-o}} flag as suggested. .
* Yes, with the new design, if no output file is specified via {{-o}}, then the 
output will now go to {{stdout}}.
* Yes, the {{jstack}} needs to be run on the same box, as do many of the other 
commands, including the stars of this {{bin/solr}} script: {{start}} and 
{{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}/}} for 
Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}},
* Yes, documentation added to Solr Ref Guide.



was (Author: rwhaddad):
Here is a patch that incorporates some of your suggestions.
* We don't build/run Solr on Windows. I'd be happy to include changes for 
{{solr.cmd}} by someone in a position to test them.
* Yes, I have added a {{-o}} flag as suggested. .
* Yes, with the new design, if no output file is specified via {{-o}}, then the 
output will now go to {{stdout}}.
* Yes, the {{jstack}} needs to be run on the same box, as do many of the other 
commands, including the stars of this {{bin/solr}} script: {{start}} and 
{{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\} }} for 
Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}},
* Yes, documentation added to Solr Ref Guide.


> Ability to dump jstack
> --
>
> Key: SOLR-11179
> URL: https://issues.apache.org/jira/browse/SOLR-11179
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-11179.patch, SOLR-11179.patch
>
>
> Add a "jstack" command to the "bin/solr" script to ease capture of jstacks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack

2017-08-17 Thread Ramsey Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130055#comment-16130055
 ] 

Ramsey Haddad edited comment on SOLR-11179 at 8/17/17 7:43 AM:
---

Here is a patch that incorporates some of your suggestions.
* We don't build/run Solr on Windows. I'd be happy to include changes for 
{{solr.cmd}} by someone in a position to test them.
* Yes, I have added a {{-o}} flag as suggested. .
* Yes, with the new design, if no output file is specified via {{-o}}, then the 
output will now go to {{stdout}}.
* Yes, the {{jstack}} needs to be run on the same box, as do many of the other 
commands, including the stars of this {{bin/solr}} script: {{start}} and 
{{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\} }} for 
Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}},
* Yes, documentation added to Solr Ref Guide.



was (Author: rwhaddad):
Here is a patch that incorporates some of your suggestions.
* We don't build/run Solr on Windows. I'd be happy to include changes for 
{{solr.cmd}} by someone in a position to test them.
* Yes, I have added a {{-o}} flag as suggested. .
* Yes, with the new design, if no output file is specified via {{-o}}, then the 
output will now go to {{stdout}}.
* Yes, the {{jstack}} needs to be run on the same box, as do many of the other 
commands, including the stars of this {{bin/solr}} script: {{start}} and 
{{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for 
Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}},
* Yes, documentation added to Solr Ref Guide.


> Ability to dump jstack
> --
>
> Key: SOLR-11179
> URL: https://issues.apache.org/jira/browse/SOLR-11179
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-11179.patch, SOLR-11179.patch
>
>
> Add a "jstack" command to the "bin/solr" script to ease capture of jstacks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack

2017-08-17 Thread Ramsey Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130055#comment-16130055
 ] 

Ramsey Haddad edited comment on SOLR-11179 at 8/17/17 7:42 AM:
---

Here is a patch that incorporates some of your suggestions.
* We don't build/run Solr on Windows. I'd be happy to include changes for 
solr.cmd by someone in a position to test them.
* Yes, I have added a {{-o}} flag as suggested. .
* Yes, with the new design, if no output file is specified via {{-o}}, then the 
output will now go to {{stdout}}.
* Yes, the {{jstack}} needs to be run on the same box, as do many of the other 
commands, including the stars of this {{bin/solr}} script: {{start}} and 
{{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for 
Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}},
* Yes, documentation added to Solr Ref Guide.



was (Author: rwhaddad):
Here is a patch that incorporates some of your suggestions.
* We don't build/run Solr on Windows. I'd be happy to include changes for 
solr.cmd by someone in a position to test them.
* Yes, I have added a {{-o}} flag as suggested. .
* Yes, with the new design, if no output file is specified via {{-o}}, then the 
output will now go to stdout.
* Yes, the {{jstack}} needs to be run on the same box, as do many of the other 
commands, including the stars of this {{bin/solr}} script: {{start}} and 
{{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for 
Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}},
* Yes, documentation added to Solr Ref Guide.


> Ability to dump jstack
> --
>
> Key: SOLR-11179
> URL: https://issues.apache.org/jira/browse/SOLR-11179
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-11179.patch, SOLR-11179.patch
>
>
> Add a "jstack" command to the "bin/solr" script to ease capture of jstacks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11179) Ability to dump jstack

2017-08-17 Thread Ramsey Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130055#comment-16130055
 ] 

Ramsey Haddad edited comment on SOLR-11179 at 8/17/17 7:42 AM:
---

Here is a patch that incorporates some of your suggestions.
* We don't build/run Solr on Windows. I'd be happy to include changes for 
{{solr.cmd}} by someone in a position to test them.
* Yes, I have added a {{-o}} flag as suggested. .
* Yes, with the new design, if no output file is specified via {{-o}}, then the 
output will now go to {{stdout}}.
* Yes, the {{jstack}} needs to be run on the same box, as do many of the other 
commands, including the stars of this {{bin/solr}} script: {{start}} and 
{{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for 
Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}},
* Yes, documentation added to Solr Ref Guide.



was (Author: rwhaddad):
Here is a patch that incorporates some of your suggestions.
* We don't build/run Solr on Windows. I'd be happy to include changes for 
solr.cmd by someone in a position to test them.
* Yes, I have added a {{-o}} flag as suggested. .
* Yes, with the new design, if no output file is specified via {{-o}}, then the 
output will now go to {{stdout}}.
* Yes, the {{jstack}} needs to be run on the same box, as do many of the other 
commands, including the stars of this {{bin/solr}} script: {{start}} and 
{{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for 
Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}},
* Yes, documentation added to Solr Ref Guide.


> Ability to dump jstack
> --
>
> Key: SOLR-11179
> URL: https://issues.apache.org/jira/browse/SOLR-11179
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-11179.patch, SOLR-11179.patch
>
>
> Add a "jstack" command to the "bin/solr" script to ease capture of jstacks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11179) Ability to dump jstack

2017-08-17 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-11179:
-
Attachment: SOLR-11179.patch

Here is a patch that incorporates some of your suggestions.
* We don't build/run Solr on Windows. I'd be happy to include changes for 
solr.cmd by someone in a position to test them.
* Yes, I have added a {{-o}} flag as suggested. .
* Yes, with the new design, if no output file is specified via {{-o}}, then the 
output will now go to stdout.
* Yes, the {{jstack}} needs to be run on the same box, as do many of the other 
commands, including the stars of this {{bin/solr}} script: {{start}} and 
{{stop}}. I'm pretty sure that if you have a suitable {{$\{JAVA_HOME\}}} for 
Solr, then {{jstack}} will be there in {{$\{JAVA_HOME\}/bin}},
* Yes, documentation added to Solr Ref Guide.


> Ability to dump jstack
> --
>
> Key: SOLR-11179
> URL: https://issues.apache.org/jira/browse/SOLR-11179
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-11179.patch, SOLR-11179.patch
>
>
> Add a "jstack" command to the "bin/solr" script to ease capture of jstacks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11179) Ability to dump jstack

2017-08-02 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-11179:
-
Attachment: SOLR-11179.patch

Here is the proposed addition.

> Ability to dump jstack
> --
>
> Key: SOLR-11179
> URL: https://issues.apache.org/jira/browse/SOLR-11179
> Project: Solr
>  Issue Type: New Feature
>  Components: scripts and tools
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-11179.patch
>
>
> Add a "jstack" command to the "bin/solr" script to ease capture of jstacks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-11179) Ability to dump jstack

2017-08-02 Thread Ramsey Haddad (JIRA)
Ramsey Haddad created SOLR-11179:


 Summary: Ability to dump jstack
 Key: SOLR-11179
 URL: https://issues.apache.org/jira/browse/SOLR-11179
 Project: Solr
  Issue Type: New Feature
  Components: scripts and tools
Reporter: Ramsey Haddad
Priority: Minor


Add a "jstack" command to the "bin/solr" script to ease capture of jstacks.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode

2017-08-02 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-10962:
-
Attachment: SOLR-10962.patch

Here is the Config API patch updated because of the int=>Long change in 
SOLR-11052


> replicationHandler's reserveCommitDuration configurable in SolrCloud mode
> -
>
> Key: SOLR-10962
> URL: https://issues.apache.org/jira/browse/SOLR-10962
> Project: Solr
>  Issue Type: New Feature
>  Components: replication (java)
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-10962.patch, SOLR-10962.patch, SOLR-10962.patch, 
> SOLR-10962.patch, SOLR-10962.patch, SOLR-10962.patch
>
>
> With SolrCloud mode, when doing replication via IndexFetcher, we occasionally 
> see the Fetch fail and then get restarted from scratch in cases where an 
> Index file is deleted after fetch manifest is computed and before the fetch 
> actually transfers the file. The risk of this happening can be reduced with a 
> higher value of reserveCommitDuration. However, the current configuration 
> only allows this value to be adjusted for "master" mode. This change allows 
> the value to also be changed when using "SolrCloud" mode.
> https://lucene.apache.org/solr/guide/6_6/index-replication.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11052) reserveCommitDuration from Integer to Long

2017-07-12 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-11052:
-
Attachment: SOLR-11052.patch

Small fix.

> reserveCommitDuration from Integer to Long
> --
>
> Key: SOLR-11052
> URL: https://issues.apache.org/jira/browse/SOLR-11052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>    Reporter: Ramsey Haddad
>Priority: Trivial
> Attachments: SOLR-11052.patch
>
>
> reserveCommitDuration gets created as a Long and then stored as an Integer.
> It is used as a Long and hence get reconverted back from Integer to Long.
> Let's just leave it as a Long the whole time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11052) reserveCommitDuration from Integer to Long

2017-07-12 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-11052:
-
Security: (was: Public)

> reserveCommitDuration from Integer to Long
> --
>
> Key: SOLR-11052
> URL: https://issues.apache.org/jira/browse/SOLR-11052
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>    Reporter: Ramsey Haddad
>Priority: Trivial
> Attachments: SOLR-11052.patch
>
>
> reserveCommitDuration gets created as a Long and then stored as an Integer.
> It is used as a Long and hence get reconverted back from Integer to Long.
> Let's just leave it as a Long the whole time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-11052) reserveCommitDuration from Integer to Long

2017-07-12 Thread Ramsey Haddad (JIRA)
Ramsey Haddad created SOLR-11052:


 Summary: reserveCommitDuration from Integer to Long
 Key: SOLR-11052
 URL: https://issues.apache.org/jira/browse/SOLR-11052
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: replication (java)
Reporter: Ramsey Haddad
Priority: Trivial


reserveCommitDuration gets created as a Long and then stored as an Integer.
It is used as a Long and hence get reconverted back from Integer to Long.

Let's just leave it as a Long the whole time.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode

2017-07-12 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-10962:
-
Attachment: SOLR-10962.patch

Here is what this looks like using the Config API.

We initially tried to convert all the controls for ReplicationHandler, but it 
became too much of a mess.
* partly messy because the args passed onto IndexFetcher can come from two 
places
* partly because of the work to put in backward compatibility warnings
So we only changed what we need at the moment.

We ended up partitioning the Info structure work between SolrConfig and 
*Handler in a different way than UpdateHandler, because:
* we wanted to still allow the legacy default "00:00:10" behavior
* we wanted to keep various ReplicationHandler details local to that class

Also, since the internals work in MilliSeconds, we thought it simpler to expose 
that to the user.


> replicationHandler's reserveCommitDuration configurable in SolrCloud mode
> -
>
> Key: SOLR-10962
> URL: https://issues.apache.org/jira/browse/SOLR-10962
> Project: Solr
>  Issue Type: New Feature
>  Components: replication (java)
>Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-10962.patch, SOLR-10962.patch, SOLR-10962.patch, 
> SOLR-10962.patch, SOLR-10962.patch
>
>
> With SolrCloud mode, when doing replication via IndexFetcher, we occasionally 
> see the Fetch fail and then get restarted from scratch in cases where an 
> Index file is deleted after fetch manifest is computed and before the fetch 
> actually transfers the file. The risk of this happening can be reduced with a 
> higher value of reserveCommitDuration. However, the current configuration 
> only allows this value to be adjusted for "master" mode. This change allows 
> the value to also be changed when using "SolrCloud" mode.
> https://lucene.apache.org/solr/guide/6_6/index-replication.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode

2017-07-04 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-10962:
-
Attachment: SOLR-10962.patch

This patch takes [~cpoerschke]'s patch and adds [~hossman]'s suggestion.
I will look into [~shalinmangar]'s suggestion within the next week.

> replicationHandler's reserveCommitDuration configurable in SolrCloud mode
> -
>
> Key: SOLR-10962
> URL: https://issues.apache.org/jira/browse/SOLR-10962
> Project: Solr
>  Issue Type: New Feature
>  Components: replication (java)
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-10962.patch, SOLR-10962.patch, SOLR-10962.patch, 
> SOLR-10962.patch
>
>
> With SolrCloud mode, when doing replication via IndexFetcher, we occasionally 
> see the Fetch fail and then get restarted from scratch in cases where an 
> Index file is deleted after fetch manifest is computed and before the fetch 
> actually transfers the file. The risk of this happening can be reduced with a 
> higher value of reserveCommitDuration. However, the current configuration 
> only allows this value to be adjusted for "master" mode. This change allows 
> the value to also be changed when using "SolrCloud" mode.
> https://lucene.apache.org/solr/guide/6_6/index-replication.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode

2017-06-28 Thread Ramsey Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066745#comment-16066745
 ] 

Ramsey Haddad edited comment on SOLR-10962 at 6/28/17 3:48 PM:
---

While I was initially trying to mimic the old structure, I agree that it is 
better to move to what Christine suggests.

Here is the fixed patch.



was (Author: rwhaddad):
While I was initially trying to mimic the old structure, I agree that is better 
to move to what Christine suggests.

Here is the fixed patch.


> replicationHandler's reserveCommitDuration configurable in SolrCloud mode
> -
>
> Key: SOLR-10962
> URL: https://issues.apache.org/jira/browse/SOLR-10962
> Project: Solr
>  Issue Type: New Feature
>  Components: replication (java)
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-10962.patch, SOLR-10962.patch
>
>
> With SolrCloud mode, when doing replication via IndexFetcher, we occasionally 
> see the Fetch fail and then get restarted from scratch in cases where an 
> Index file is deleted after fetch manifest is computed and before the fetch 
> actually transfers the file. The risk of this happening can be reduced with a 
> higher value of reserveCommitDuration. However, the current configuration 
> only allows this value to be adjusted for "master" mode. This change allows 
> the value to also be changed when using "SolrCloud" mode.
> https://lucene.apache.org/solr/guide/6_6/index-replication.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode

2017-06-28 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-10962:
-
Attachment: SOLR-10962.patch

While I was initially trying to mimic the old structure, I agree that is better 
to move to what Christine suggests.

Here is the fixed patch.


> replicationHandler's reserveCommitDuration configurable in SolrCloud mode
> -
>
> Key: SOLR-10962
> URL: https://issues.apache.org/jira/browse/SOLR-10962
> Project: Solr
>  Issue Type: New Feature
>  Components: replication (java)
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-10962.patch, SOLR-10962.patch
>
>
> With SolrCloud mode, when doing replication via IndexFetcher, we occasionally 
> see the Fetch fail and then get restarted from scratch in cases where an 
> Index file is deleted after fetch manifest is computed and before the fetch 
> actually transfers the file. The risk of this happening can be reduced with a 
> higher value of reserveCommitDuration. However, the current configuration 
> only allows this value to be adjusted for "master" mode. This change allows 
> the value to also be changed when using "SolrCloud" mode.
> https://lucene.apache.org/solr/guide/6_6/index-replication.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode

2017-06-27 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-10962:
-
Attachment: SOLR-10962.patch

> replicationHandler's reserveCommitDuration configurable in SolrCloud mode
> -
>
> Key: SOLR-10962
> URL: https://issues.apache.org/jira/browse/SOLR-10962
> Project: Solr
>  Issue Type: New Feature
>  Components: replication (java)
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: SOLR-10962.patch
>
>
> With SolrCloud mode, when doing replication via IndexFetcher, we occasionally 
> see the Fetch fail and then get restarted from scratch in cases where an 
> Index file is deleted after fetch manifest is computed and before the fetch 
> actually transfers the file. The risk of this happening can be reduced with a 
> higher value of reserveCommitDuration. However, the current configuration 
> only allows this value to be adjusted for "master" mode. This change allows 
> the value to also be changed when using "SolrCloud" mode.
> https://lucene.apache.org/solr/guide/6_6/index-replication.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode

2017-06-27 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-10962:
-
Attachment: (was: patch.SOLR-10962)

> replicationHandler's reserveCommitDuration configurable in SolrCloud mode
> -
>
> Key: SOLR-10962
> URL: https://issues.apache.org/jira/browse/SOLR-10962
> Project: Solr
>  Issue Type: New Feature
>  Components: replication (java)
>    Reporter: Ramsey Haddad
>Priority: Minor
>
> With SolrCloud mode, when doing replication via IndexFetcher, we occasionally 
> see the Fetch fail and then get restarted from scratch in cases where an 
> Index file is deleted after fetch manifest is computed and before the fetch 
> actually transfers the file. The risk of this happening can be reduced with a 
> higher value of reserveCommitDuration. However, the current configuration 
> only allows this value to be adjusted for "master" mode. This change allows 
> the value to also be changed when using "SolrCloud" mode.
> https://lucene.apache.org/solr/guide/6_6/index-replication.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode

2017-06-27 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-10962:
-
Attachment: patch.SOLR-10962

> replicationHandler's reserveCommitDuration configurable in SolrCloud mode
> -
>
> Key: SOLR-10962
> URL: https://issues.apache.org/jira/browse/SOLR-10962
> Project: Solr
>  Issue Type: New Feature
>  Components: replication (java)
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: patch.SOLR-10962
>
>
> With SolrCloud mode, when doing replication via IndexFetcher, we occasionally 
> see the Fetch fail and then get restarted from scratch in cases where an 
> Index file is deleted after fetch manifest is computed and before the fetch 
> actually transfers the file. The risk of this happening can be reduced with a 
> higher value of reserveCommitDuration. However, the current configuration 
> only allows this value to be adjusted for "master" mode. This change allows 
> the value to also be changed when using "SolrCloud" mode.
> https://lucene.apache.org/solr/guide/6_6/index-replication.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-10962) replicationHandler's reserveCommitDuration configurable in SolrCloud mode

2017-06-27 Thread Ramsey Haddad (JIRA)
Ramsey Haddad created SOLR-10962:


 Summary: replicationHandler's reserveCommitDuration configurable 
in SolrCloud mode
 Key: SOLR-10962
 URL: https://issues.apache.org/jira/browse/SOLR-10962
 Project: Solr
  Issue Type: New Feature
  Components: replication (java)
Reporter: Ramsey Haddad
Priority: Minor


With SolrCloud mode, when doing replication via IndexFetcher, we occasionally 
see the Fetch fail and then get restarted from scratch in cases where an Index 
file is deleted after fetch manifest is computed and before the fetch actually 
transfers the file. The risk of this happening can be reduced with a higher 
value of reserveCommitDuration. However, the current configuration only allows 
this value to be adjusted for "master" mode. This change allows the value to 
also be changed when using "SolrCloud" mode.

https://lucene.apache.org/solr/guide/6_6/index-replication.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5127) Allow multiple wildcards in hl.fl

2017-03-14 Thread Ramsey Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924077#comment-15924077
 ] 

Ramsey Haddad commented on SOLR-5127:
-

This problem still exists in the code.
But, the patch is fairly old and might need a minor tweak?

Any reason to not have this fix?


> Allow multiple wildcards in hl.fl
> -
>
> Key: SOLR-5127
> URL: https://issues.apache.org/jira/browse/SOLR-5127
> Project: Solr
>  Issue Type: New Feature
>  Components: highlighter
>Affects Versions: 3.6.1, 4.4
>Reporter: Sven-S. Porst
> Attachments: highlight-wildcards.patch
>
>
> When a wildcard is present in the hl.fl field, the field is not split up at 
> commas/spaces into components. As a consequence settings like 
> hl.fl=*_highlight,*_data do not work.
> Splitting the string first and evaluating wildcards on each component 
> afterwards would be more powerful and consistent with the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10112) Prevent DBQs from getting reordered

2017-02-24 Thread Ramsey Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882638#comment-15882638
 ] 

Ramsey Haddad commented on SOLR-10112:
--

We don't do many DBQs and we only use them to garbage collect stories that are 
older than 10 days -- so, the type of race problems you are worried about are 
not relevant to our specific use of DBQs.

But, still, I'm curious: do you see "Reordered DBQs detected" messages during 
regular use?

We only see them as a side effect of the replaying operations during a 
PeerSync. Do you see them outside of PeerSyncs?


> Prevent DBQs from getting reordered
> ---
>
> Key: SOLR-10112
> URL: https://issues.apache.org/jira/browse/SOLR-10112
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>
> Reordered DBQs are problematic for various reasons. We might be able to 
> prevent DBQs from getting re-ordered by making sure, at the leader, that all 
> updates before a DBQ have been written successfully on the replicas, and 
> block all updates after the DBQ until the DBQ is written successfully at the 
> replicas.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10173) Enable extension/customization of HttpShardHandler by increasing visibility

2017-02-20 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-10173:
-
Summary: Enable extension/customization of HttpShardHandler by increasing 
visibility  (was: Enable extension/customization of HttpShardHandler by 
increasing visability)

> Enable extension/customization of HttpShardHandler by increasing visibility
> ---
>
> Key: SOLR-10173
> URL: https://issues.apache.org/jira/browse/SOLR-10173
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: solr-10173.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Increase visibility of 2 elements of HttpShardHandlerFactory from "private" 
> to "protected" to facilitate extension of the class. Make 
> ReplicaListTransformer "public" to enable implementation of the interface in 
> custom classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10173) Enable extension/customization of HttpShardHandler by increasing visability

2017-02-20 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-10173:
-
Attachment: solr-10173.patch

> Enable extension/customization of HttpShardHandler by increasing visability
> ---
>
> Key: SOLR-10173
> URL: https://issues.apache.org/jira/browse/SOLR-10173
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: solr-10173.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Increase visibility of 2 elements of HttpShardHandlerFactory from "private" 
> to "protected" to facilitate extension of the class. Make 
> ReplicaListTransformer "public" to enable implementation of the interface in 
> custom classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-10173) Enable extension/customization of HttpShardHandler by increasing visability

2017-02-20 Thread Ramsey Haddad (JIRA)
Ramsey Haddad created SOLR-10173:


 Summary: Enable extension/customization of HttpShardHandler by 
increasing visability
 Key: SOLR-10173
 URL: https://issues.apache.org/jira/browse/SOLR-10173
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Ramsey Haddad
Priority: Minor


Increase visibility of 2 elements of HttpShardHandlerFactory from "private" to 
"protected" to facilitate extension of the class. Make ReplicaListTransformer 
"public" to enable implementation of the interface in custom classes.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8760) PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to stall new leadership

2016-02-29 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-8760:

Description: 
When we are doing rolling restarts of our Solr servers, we are sometimes 
hitting painfully long times without a shard leader. What happens is that a new 
leader is elected, but first needs to fully sync old updates before it assumes 
the leadership role and accepts new updates. The syncing process is taking 
unusually long because of an interaction between having one of our hourly 
garbage collection DBQs in the update logs and the replaying of old ADDs. If 
there is a single DBQ, and 1000 older ADDs that are getting replayed, then the 
DBQ is replayed 1000 times, instead of once. This itself may be hard to fix. 
But, the thing that is easier to fix is that most of the ADDs getting replayed 
shouldn't need to get replayed in the first place, since they are older than 
ourLowThreshold.

The problem can be fixed by eliminating or by modifying the way that the 
"completeList" term is used to effect the PeerSync lists.

We propose two alternatives to fix this:

FixA: Based on my possibly incomplete understanding of PeerSync, the 
completeList term should be eliminated. If updates older than ourLowThreshold 
need to replayed, then aren't all the prerequisities for PeerSync violated and 
hence we should fall back to SnapPull? (My gut suspects that a later bug fix to 
PeerSync fixed whatever issue completeList was trying to deal with.)

FixB: The patch that added the completeList term mentions that it is needed for 
the replay of some DELETEs. Well, if that is true and we do need to replay some 
DELETEs older than ourLowThreshold, then there is still no need to replay any 
ADDs older than ourLowThreshold, right??


  was:
When we are doing rolling restarts of our Solr servers, we are sometimes 
hitting painfully long times without a shard leader. What happens is that a new 
leader is elected, but first needs to fully sync old updates before it assumes 
the leadership role and accepts new updates. The syncing process is taking 
unusually long because of an interaction between having one of our hourly 
garbage collection DBQs in the update logs and the replaying of old ADDs. If 
there is a single DBQ, and 1000 older ADDs that are getting replayed, then the 
DBQ is replayed 1000 times, instead of once. This itself may be hard to fix. 
But, the thing that is easier to fix is that most of the ADDs getting replayed 
shouldn't need to get replayed in the first place, since they are older than 
ourLowThreshold.

The problem can be fixed by eliminating or by modifying the way that the 
"completeList" term is used to effect the PeerSync lists.

We propose two alternatives to fix this:

FixA: Based on my possibly incomplete understanding of PeerSync, the 
completeList term should be eliminated. If updates older than ourLowThreshold 
need to replayed, then aren't all the prerequisities for PeerSync violated and 
hence we should fall back to SnapPull? (My gut suspects that a later bug fix to 
PeerSync fixed whatever issue completeList was trying to deal with.)

FixB: The patch that added the ourLowThreshold term mentions that it is needed 
for the replay of some DELETEs. Well, if that is true and we do need to replay 
some DELETEs older than ourLowThreshold, then there is still no need to replay 
any ADDs older than ourLowThreshold, right??



> PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to 
> stall new leadership
> 
>
> Key: SOLR-8760
> URL: https://issues.apache.org/jira/browse/SOLR-8760
> Project: Solr
>  Issue Type: Bug
>Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: solr-8760-fixA.patch, solr-8760-fixB.patch
>
>
> When we are doing rolling restarts of our Solr servers, we are sometimes 
> hitting painfully long times without a shard leader. What happens is that a 
> new leader is elected, but first needs to fully sync old updates before it 
> assumes the leadership role and accepts new updates. The syncing process is 
> taking unusually long because of an interaction between having one of our 
> hourly garbage collection DBQs in the update logs and the replaying of old 
> ADDs. If there is a single DBQ, and 1000 older ADDs that are getting 
> replayed, then the DBQ is replayed 1000 times, instead of once. This itself 
> may be hard to fix. But, the thing that is easier to fix is that most of the 
> ADDs getting replayed shouldn't need to get replayed in the first place, 
> since they are older than ourLowThreshold.
> The problem can be fixed by eliminating or by modifying the way that the 
> "com

[jira] [Commented] (SOLR-8760) PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to stall new leadership

2016-02-29 Thread Ramsey Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171913#comment-15171913
 ] 

Ramsey Haddad commented on SOLR-8760:
-

More details about the conditions leading up to this problem are in: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201602.mbox/%3ccac2x+z3at7ileypotx3xzrp5qysklaatgm-xtjn1a8zpxus...@mail.gmail.com%3E


> PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to 
> stall new leadership
> 
>
> Key: SOLR-8760
> URL: https://issues.apache.org/jira/browse/SOLR-8760
> Project: Solr
>  Issue Type: Bug
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: solr-8760-fixA.patch, solr-8760-fixB.patch
>
>
> When we are doing rolling restarts of our Solr servers, we are sometimes 
> hitting painfully long times without a shard leader. What happens is that a 
> new leader is elected, but first needs to fully sync old updates before it 
> assumes the leadership role and accepts new updates. The syncing process is 
> taking unusually long because of an interaction between having one of our 
> hourly garbage collection DBQs in the update logs and the replaying of old 
> ADDs. If there is a single DBQ, and 1000 older ADDs that are getting 
> replayed, then the DBQ is replayed 1000 times, instead of once. This itself 
> may be hard to fix. But, the thing that is easier to fix is that most of the 
> ADDs getting replayed shouldn't need to get replayed in the first place, 
> since they are older than ourLowThreshold.
> The problem can be fixed by eliminating or by modifying the way that the 
> "completeList" term is used to effect the PeerSync lists.
> We propose two alternatives to fix this:
> FixA: Based on my possibly incomplete understanding of PeerSync, the 
> completeList term should be eliminated. If updates older than ourLowThreshold 
> need to replayed, then aren't all the prerequisities for PeerSync violated 
> and hence we should fall back to SnapPull? (My gut suspects that a later bug 
> fix to PeerSync fixed whatever issue completeList was trying to deal with.)
> FixB: The patch that added the ourLowThreshold term mentions that it is 
> needed for the replay of some DELETEs. Well, if that is true and we do need 
> to replay some DELETEs older than ourLowThreshold, then there is still no 
> need to replay any ADDs older than ourLowThreshold, right??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8760) PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to stall new leadership

2016-02-29 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-8760:

Attachment: solr-8760-fixB.patch
solr-8760-fixA.patch

> PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to 
> stall new leadership
> 
>
> Key: SOLR-8760
> URL: https://issues.apache.org/jira/browse/SOLR-8760
> Project: Solr
>  Issue Type: Bug
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: solr-8760-fixA.patch, solr-8760-fixB.patch
>
>
> When we are doing rolling restarts of our Solr servers, we are sometimes 
> hitting painfully long times without a shard leader. What happens is that a 
> new leader is elected, but first needs to fully sync old updates before it 
> assumes the leadership role and accepts new updates. The syncing process is 
> taking unusually long because of an interaction between having one of our 
> hourly garbage collection DBQs in the update logs and the replaying of old 
> ADDs. If there is a single DBQ, and 1000 older ADDs that are getting 
> replayed, then the DBQ is replayed 1000 times, instead of once. This itself 
> may be hard to fix. But, the thing that is easier to fix is that most of the 
> ADDs getting replayed shouldn't need to get replayed in the first place, 
> since they are older than ourLowThreshold.
> The problem can be fixed by eliminating or by modifying the way that the 
> "completeList" term is used to effect the PeerSync lists.
> We propose two alternatives to fix this:
> FixA: Based on my possibly incomplete understanding of PeerSync, the 
> completeList term should be eliminated. If updates older than ourLowThreshold 
> need to replayed, then aren't all the prerequisities for PeerSync violated 
> and hence we should fall back to SnapPull? (My gut suspects that a later bug 
> fix to PeerSync fixed whatever issue completeList was trying to deal with.)
> FixB: The patch that added the ourLowThreshold term mentions that it is 
> needed for the replay of some DELETEs. Well, if that is true and we do need 
> to replay some DELETEs older than ourLowThreshold, then there is still no 
> need to replay any ADDs older than ourLowThreshold, right??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8760) PeerSync replay of ADDs older than ourLowThreshold interacting with DBQs to stall new leadership

2016-02-29 Thread Ramsey Haddad (JIRA)
Ramsey Haddad created SOLR-8760:
---

 Summary: PeerSync replay of ADDs older than ourLowThreshold 
interacting with DBQs to stall new leadership
 Key: SOLR-8760
 URL: https://issues.apache.org/jira/browse/SOLR-8760
 Project: Solr
  Issue Type: Bug
Reporter: Ramsey Haddad
Priority: Minor


When we are doing rolling restarts of our Solr servers, we are sometimes 
hitting painfully long times without a shard leader. What happens is that a new 
leader is elected, but first needs to fully sync old updates before it assumes 
the leadership role and accepts new updates. The syncing process is taking 
unusually long because of an interaction between having one of our hourly 
garbage collection DBQs in the update logs and the replaying of old ADDs. If 
there is a single DBQ, and 1000 older ADDs that are getting replayed, then the 
DBQ is replayed 1000 times, instead of once. This itself may be hard to fix. 
But, the thing that is easier to fix is that most of the ADDs getting replayed 
shouldn't need to get replayed in the first place, since they are older than 
ourLowThreshold.

The problem can be fixed by eliminating or by modifying the way that the 
"completeList" term is used to effect the PeerSync lists.

We propose two alternatives to fix this:

FixA: Based on my possibly incomplete understanding of PeerSync, the 
completeList term should be eliminated. If updates older than ourLowThreshold 
need to replayed, then aren't all the prerequisities for PeerSync violated and 
hence we should fall back to SnapPull? (My gut suspects that a later bug fix to 
PeerSync fixed whatever issue completeList was trying to deal with.)

FixB: The patch that added the ourLowThreshold term mentions that it is needed 
for the replay of some DELETEs. Well, if that is true and we do need to replay 
some DELETEs older than ourLowThreshold, then there is still no need to replay 
any ADDs older than ourLowThreshold, right??




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: PeerSync.java: why "completeList" in handleVersions()?

2016-02-26 Thread Ramsey Haddad
My co-worker, Christine Poerschke, pointed out that the "completeList"
term was added in a change described as "restore old deletes via tlog
so peersync won't reorder".

If the goal was only the replay of deletes older than ourLowThreshold,
then keeping that goal doesn't need to interfere with the performance
fix we want. The code could be changed to:

if (!completeList && Math.abs(otherVersion) < ourLowThreshold) break;
if (completeList && 0 < otherVersion && otherVersion <
ourLowThreshold) continue;

On Thu, Feb 25, 2016 at 3:24 PM, Ramsey Haddad <ramsey.had...@gmail.com> wrote:
> Does "!completeList" do anything necessary in the line:
>
> if (!completeList && Math.abs(otherVersion) < ourLowThreshold) break;
>
> I think the line should simply be:
>
> if (Math.abs(otherVersion) < ourLowThreshold) break;
>
> -
> The inclusion of "!completeList" in this conditional would seem to
> only cause some minor performance penalty: replaying a bunch of ADDs
> that the syncing replica already has ADDed.
>
> BUT: in our set-up this is causing a noticeable problem. In
> particular, we use a large value of nUpdates and we have an hourly DBQ
> for garbage collection. If we do rolling restarts of our replicas,
> then the second restart can leave us leaderless for a long span of
> time.
>
> This happens as follows:
> * Replica1 is leader. Replica1 goes down.
> * Leadership goes to Replica2. It resyncs with all replicas except Replica1.
> * Replica1 returns and resyncs.
> * Replica2 is leader. Replica2 goes down.
> * Leadership goes to Replica3. It resyncs with all replicas except Replica2.
>
> At this point, Replica1 has a longer updatelog (less trimmed -- more
> old updates) than the other replicas. We will refer to these as the
> "ancient" updates.
> Replica3 does a getVersion from Replica1 and Replica4 and receives
> replies from them. The ancient updates will not be contained in
> ourUpdateSet. While the ancient updates are older than
> ourLowThreshold, the check is skipped because of the "completeList"
> term that make no sense to me. So Replica3 replays the ancient ADDs.
> Say that 1000 of these ADDs are older than a DBQ in Replica3's update
> log? Then the DBQ gets replayed 1000 times ... once after each ADD is
> replayed. Fixing the replay mechanism to only replay the DBQ once
> looks hard because of the code structure. However, these ADDs (and
> hence the DBQ) shouldn't have even been replayed at all!
>
> After the leader Replica3 is synced. It asks Replica 1 and Replica4 to
> sync to it. The ancient ADDs have now been merged back unto Replica3's
> update log and so when Replica4 is syncing with Replica3, then
> Replica4 also ends up replaying the ancient ADDs and replaying the DBQ
> 1000 times.
>
> Only when all of this finally completes can Replica3 finally perform
> its role as leader and accept new updates.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



PeerSync.java: why "completeList" in handleVersions()?

2016-02-25 Thread Ramsey Haddad
Does "!completeList" do anything necessary in the line:

if (!completeList && Math.abs(otherVersion) < ourLowThreshold) break;

I think the line should simply be:

if (Math.abs(otherVersion) < ourLowThreshold) break;

-
The inclusion of "!completeList" in this conditional would seem to
only cause some minor performance penalty: replaying a bunch of ADDs
that the syncing replica already has ADDed.

BUT: in our set-up this is causing a noticeable problem. In
particular, we use a large value of nUpdates and we have an hourly DBQ
for garbage collection. If we do rolling restarts of our replicas,
then the second restart can leave us leaderless for a long span of
time.

This happens as follows:
* Replica1 is leader. Replica1 goes down.
* Leadership goes to Replica2. It resyncs with all replicas except Replica1.
* Replica1 returns and resyncs.
* Replica2 is leader. Replica2 goes down.
* Leadership goes to Replica3. It resyncs with all replicas except Replica2.

At this point, Replica1 has a longer updatelog (less trimmed -- more
old updates) than the other replicas. We will refer to these as the
"ancient" updates.
Replica3 does a getVersion from Replica1 and Replica4 and receives
replies from them. The ancient updates will not be contained in
ourUpdateSet. While the ancient updates are older than
ourLowThreshold, the check is skipped because of the "completeList"
term that make no sense to me. So Replica3 replays the ancient ADDs.
Say that 1000 of these ADDs are older than a DBQ in Replica3's update
log? Then the DBQ gets replayed 1000 times ... once after each ADD is
replayed. Fixing the replay mechanism to only replay the DBQ once
looks hard because of the code structure. However, these ADDs (and
hence the DBQ) shouldn't have even been replayed at all!

After the leader Replica3 is synced. It asks Replica 1 and Replica4 to
sync to it. The ancient ADDs have now been merged back unto Replica3's
update log and so when Replica4 is syncing with Replica3, then
Replica4 also ends up replaying the ancient ADDs and replaying the DBQ
1000 times.

Only when all of this finally completes can Replica3 finally perform
its role as leader and accept new updates.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8656) PeerSync should use same nUpdates everywhere

2016-02-08 Thread Ramsey Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramsey Haddad updated SOLR-8656:

Attachment: solr-8656.patch

> PeerSync should use same nUpdates everywhere
> 
>
> Key: SOLR-8656
> URL: https://issues.apache.org/jira/browse/SOLR-8656
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: Trunk, 5.4.1
>    Reporter: Ramsey Haddad
>Priority: Minor
> Attachments: solr-8656.patch
>
>
> PeerSync requests information on the most recent nUpdates updates from 
> another instance to determine whether PeerSync can succeed. The value of 
> nUpdates can be customized in solrconfig.xml: 
> UpdateHandler.UpdateLog.NumRecordsToKeep.
> PeerSync can be initiated in a number of different paths. One path to start 
> PeerSync (leader-initiated sync) is incorrectly still using a hard-coded 
> value of nUpdates=100.
> This change fixes leader-initiated-sync code path to also pick up the value 
> of nUpdates from the customized/configured value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8656) PeerSync should use same nUpdates everywhere

2016-02-08 Thread Ramsey Haddad (JIRA)
Ramsey Haddad created SOLR-8656:
---

 Summary: PeerSync should use same nUpdates everywhere
 Key: SOLR-8656
 URL: https://issues.apache.org/jira/browse/SOLR-8656
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.4.1, Trunk
Reporter: Ramsey Haddad
Priority: Minor


PeerSync requests information on the most recent nUpdates updates from another 
instance to determine whether PeerSync can succeed. The value of nUpdates can 
be customized in solrconfig.xml: UpdateHandler.UpdateLog.NumRecordsToKeep.

PeerSync can be initiated in a number of different paths. One path to start 
PeerSync (leader-initiated sync) is incorrectly still using a hard-coded value 
of nUpdates=100.

This change fixes leader-initiated-sync code path to also pick up the value of 
nUpdates from the customized/configured value.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org