[jira] [Resolved] (LUCENE-8588) Replace usage of deprecated RAMOutputStream

2018-12-04 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-8588.

   Resolution: Fixed
Fix Version/s: 7.7
   master (8.0)

> Replace usage of deprecated RAMOutputStream
> ---
>
> Key: LUCENE-8588
> URL: https://issues.apache.org/jira/browse/LUCENE-8588
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Trivial
> Fix For: master (8.0), 7.7
>
> Attachments: LUCENE-8588.patch
>
>
> While reviewing code in {{FrozenBufferedUpdates}} I noticed that it uses the 
> deprecated {{RAMOutputStream}}. This issue fixes it. Separately we should 
> reduce the usage of that class, so that we can really remove it.
>  
> Besides that, while running tests I hit a test failure which at first I 
> thought was related to this change, but then noticed that the test doesn't 
> close the DirectoryReader (I run tests on Windows), so that fix is included 
> in this patch too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8588) Replace usage of deprecated RAMOutputStream

2018-12-04 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708780#comment-16708780
 ] 

Shai Erera commented on LUCENE-8588:


[~dweiss] thanks for pointing that out. I will not commit that change then. I 
pushed a commit that closes the DirReader in the test and one that fixes a 
typo. Thanks!

> Replace usage of deprecated RAMOutputStream
> ---
>
> Key: LUCENE-8588
> URL: https://issues.apache.org/jira/browse/LUCENE-8588
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Trivial
> Attachments: LUCENE-8588.patch
>
>
> While reviewing code in {{FrozenBufferedUpdates}} I noticed that it uses the 
> deprecated {{RAMOutputStream}}. This issue fixes it. Separately we should 
> reduce the usage of that class, so that we can really remove it.
>  
> Besides that, while running tests I hit a test failure which at first I 
> thought was related to this change, but then noticed that the test doesn't 
> close the DirectoryReader (I run tests on Windows), so that fix is included 
> in this patch too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8588) Replace usage of deprecated RAMOutputStream

2018-12-04 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-8588:
--

 Summary: Replace usage of deprecated RAMOutputStream
 Key: LUCENE-8588
 URL: https://issues.apache.org/jira/browse/LUCENE-8588
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera


While reviewing code in {{FrozenBufferedUpdates}} I noticed that it uses the 
deprecated {{RAMOutputStream}}. This issue fixes it. Separately we should 
reduce the usage of that class, so that we can really remove it.

 

Besides that, while running tests I hit a test failure which at first I thought 
was related to this change, but then noticed that the test doesn't close the 
DirectoryReader (I run tests on Windows), so that fix is included in this patch 
too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8397) Add DirectoryTaxonomyWriter.getCache

2018-07-13 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542887#comment-16542887
 ] 

Shai Erera commented on LUCENE-8397:


+1

> Add DirectoryTaxonomyWriter.getCache
> 
>
> Key: LUCENE-8397
> URL: https://issues.apache.org/jira/browse/LUCENE-8397
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8397.patch
>
>
> {{DirectoryTaxonomyWriter}} uses a cache to hold recently mapped labels / 
> ordinals.  You can provide an impl when you create the class, or it will use 
> a default impl.
>  
> I'd like to add a getter, {{DirectoryTaxonomyWriter.getCache}} to retrieve 
> the cache it's using; this is helpful for getting diagnostics (how many 
> cached labels, how much RAM used, etc.).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8272) Share internal DV update code between binary and numeric

2018-04-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449891#comment-16449891
 ] 

Shai Erera commented on LUCENE-8272:


I put some comments on the PR, but I don't see them mentioned here, so FYI.

> Share internal DV update code between binary and numeric
> 
>
> Key: LUCENE-8272
> URL: https://issues.apache.org/jira/browse/LUCENE-8272
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 7.4, master (8.0)
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8272.patch
>
>
> Today we duplicate a fair portion of the internal logic to
> apply updates of binary and numeric doc values. This change refactors
> this non-trivial code to share the same code path and only differ in
> if we provide a binary or numeric instance. This also allows us to
> iterator over the updates only once rather than twice once for numeric
> and once for binary fields.
> 
> This change also subclass DocValuesIterator from 
> DocValuesFieldUpdates.Iterator
> which allows easier consumption down the road since it now shares most of 
> it's
> interface with DocIdSetIterator which is the main interface for this in 
> Lucene.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8060) Require users to tell us whether they need total hit counts

2017-11-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263175#comment-16263175
 ] 

Shai Erera commented on LUCENE-8060:


What if we conceptually remove {{TopDocs.totalHits}} and if users require that, 
they can chain their Collector with {{TotalHitCountCollector}}? We can also add 
that boolean as a sugar to {{IndexSearcher.search()}} API.

If we're OK w/ removing {{TopDocs.totalHits}}, and users getting a compilation 
error (that's easy to fix), then that's an easy option/change. Or... we 
deprecate it, but keep the simple IndexSearcher.search() APIs still compute it 
(by chaining this collector), and let users who'd like to optimize use the 
search() API which takes a Collector.

Just a thought...

> Require users to tell us whether they need total hit counts
> ---
>
> Key: LUCENE-8060
> URL: https://issues.apache.org/jira/browse/LUCENE-8060
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: master (8.0)
>
>
> We are getting optimizations when hit counts are not required (sorted 
> indexes, MAXSCORE, short-circuiting of phrase queries) but our users won't 
> benefit from them unless we disable exact hit counts by default or we require 
> them to tell us whether hit counts are required.
> I think making hit counts approximate by default is going to be a bit trappy, 
> so I'm rather leaning towards requiring users to tell us explicitly whether 
> they need total hit counts. I can think of two ways to do that: either by 
> passing a boolean to the IndexSearcher constructor or by adding a boolean to 
> all methods that produce TopDocs instances. I like the latter better but I'm 
> open to discussion or other ideas?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-10505) Support terms' statistics for multiple fields in TermsComponent

2017-04-20 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved SOLR-10505.
---
   Resolution: Fixed
Fix Version/s: master (7.0)
   6.6

Pushed to master and branch_6x.

> Support terms' statistics for multiple fields in TermsComponent
> ---
>
> Key: SOLR-10505
> URL: https://issues.apache.org/jira/browse/SOLR-10505
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 6.6, master (7.0)
>
> Attachments: SOLR-10505.patch
>
>
> Currently if you specify multiple {{terms.fl}} parameters on the request, 
> while requesting terms' statistics, you get them for the first requested 
> field (because the code only uses {{fields[0]}}). There's no reason why not 
> to return the stats for the terms in all specified fields. It's a rather 
> simple change, and I will post a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10505) Support terms' statistics for multiple fields in TermsComponent

2017-04-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973296#comment-15973296
 ] 

Shai Erera commented on SOLR-10505:
---

All tests pass, if there are no objections, I'd like to commit this.

> Support terms' statistics for multiple fields in TermsComponent
> ---
>
> Key: SOLR-10505
> URL: https://issues.apache.org/jira/browse/SOLR-10505
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: SOLR-10505.patch
>
>
> Currently if you specify multiple {{terms.fl}} parameters on the request, 
> while requesting terms' statistics, you get them for the first requested 
> field (because the code only uses {{fields[0]}}). There's no reason why not 
> to return the stats for the terms in all specified fields. It's a rather 
> simple change, and I will post a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10505) Support terms' statistics for multiple fields in TermsComponent

2017-04-17 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-10505:
--
Attachment: SOLR-10505.patch

Patch with tests.

> Support terms' statistics for multiple fields in TermsComponent
> ---
>
> Key: SOLR-10505
> URL: https://issues.apache.org/jira/browse/SOLR-10505
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: SOLR-10505.patch
>
>
> Currently if you specify multiple {{terms.fl}} parameters on the request, 
> while requesting terms' statistics, you get them for the first requested 
> field (because the code only uses {{fields[0]}}). There's no reason why not 
> to return the stats for the terms in all specified fields. It's a rather 
> simple change, and I will post a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-10505) Support terms' statistics for multiple fields in TermsComponent

2017-04-17 Thread Shai Erera (JIRA)
Shai Erera created SOLR-10505:
-

 Summary: Support terms' statistics for multiple fields in 
TermsComponent
 Key: SOLR-10505
 URL: https://issues.apache.org/jira/browse/SOLR-10505
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Shai Erera
Assignee: Shai Erera


Currently if you specify multiple {{terms.fl}} parameters on the request, while 
requesting terms' statistics, you get them for the first requested field 
(because the code only uses {{fields[0]}}). There's no reason why not to return 
the stats for the terms in all specified fields. It's a rather simple change, 
and I will post a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-10349) Add totalTermFreq support to TermsComponent

2017-03-28 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved SOLR-10349.
---
Resolution: Fixed

Pushed to master and branch_6x.

> Add totalTermFreq support to TermsComponent
> ---
>
> Key: SOLR-10349
> URL: https://issues.apache.org/jira/browse/SOLR-10349
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: master (7.0), 6.6
>
> Attachments: SOLR-10349.patch, SOLR-10349.patch, SOLR-10349.patch
>
>
> See discussion here: http://markmail.org/message/gmpmege2jpfrsp75. Both 
> {{docFreq}} and {{totalTermFreq}} are already available to the 
> TermsComponent, it's just that doesn't add the ttf measure to the response.
> This issue adds a new {{terms.ttf}} parameter which if set to true results in 
> the following output:
> {noformat}
> 
>   
> 
>   2
>   2
> 
> ...
> {noformat}
> The reason for the new parameter is to not break backward-compatibility, 
> though I wish we could always return those two measures (it doesn't cost us 
> anything, the two are already available to the code). Maybe we can break the 
> response in {{master}} and add this parameter only to {{6x}} as deprecated? I 
> am also fine if we leave it and handle it in a separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10349) Add totalTermFreq support to TermsComponent

2017-03-28 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-10349:
--
Fix Version/s: 6.6
   master (7.0)

> Add totalTermFreq support to TermsComponent
> ---
>
> Key: SOLR-10349
> URL: https://issues.apache.org/jira/browse/SOLR-10349
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: master (7.0), 6.6
>
> Attachments: SOLR-10349.patch, SOLR-10349.patch, SOLR-10349.patch
>
>
> See discussion here: http://markmail.org/message/gmpmege2jpfrsp75. Both 
> {{docFreq}} and {{totalTermFreq}} are already available to the 
> TermsComponent, it's just that doesn't add the ttf measure to the response.
> This issue adds a new {{terms.ttf}} parameter which if set to true results in 
> the following output:
> {noformat}
> 
>   
> 
>   2
>   2
> 
> ...
> {noformat}
> The reason for the new parameter is to not break backward-compatibility, 
> though I wish we could always return those two measures (it doesn't cost us 
> anything, the two are already available to the code). Maybe we can break the 
> response in {{master}} and add this parameter only to {{6x}} as deprecated? I 
> am also fine if we leave it and handle it in a separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10349) Add totalTermFreq support to TermsComponent

2017-03-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941838#comment-15941838
 ] 

Shai Erera commented on SOLR-10349:
---

If there are no objections, I'd like to commit that tomorrow.

> Add totalTermFreq support to TermsComponent
> ---
>
> Key: SOLR-10349
> URL: https://issues.apache.org/jira/browse/SOLR-10349
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Attachments: SOLR-10349.patch, SOLR-10349.patch, SOLR-10349.patch
>
>
> See discussion here: http://markmail.org/message/gmpmege2jpfrsp75. Both 
> {{docFreq}} and {{totalTermFreq}} are already available to the 
> TermsComponent, it's just that doesn't add the ttf measure to the response.
> This issue adds a new {{terms.ttf}} parameter which if set to true results in 
> the following output:
> {noformat}
> 
>   
> 
>   2
>   2
> 
> ...
> {noformat}
> The reason for the new parameter is to not break backward-compatibility, 
> though I wish we could always return those two measures (it doesn't cost us 
> anything, the two are already available to the code). Maybe we can break the 
> response in {{master}} and add this parameter only to {{6x}} as deprecated? I 
> am also fine if we leave it and handle it in a separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10349) Add totalTermFreq support to TermsComponent

2017-03-23 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-10349:
--
Attachment: SOLR-10349.patch

That was a good comment [~joel.bernstein]!! I changed more code to adapt the 
new format when necessary. Running tests now, but if you think/know of other 
places which might be affected by this change, please let me know.

> Add totalTermFreq support to TermsComponent
> ---
>
> Key: SOLR-10349
> URL: https://issues.apache.org/jira/browse/SOLR-10349
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Attachments: SOLR-10349.patch, SOLR-10349.patch, SOLR-10349.patch
>
>
> See discussion here: http://markmail.org/message/gmpmege2jpfrsp75. Both 
> {{docFreq}} and {{totalTermFreq}} are already available to the 
> TermsComponent, it's just that doesn't add the ttf measure to the response.
> This issue adds a new {{terms.ttf}} parameter which if set to true results in 
> the following output:
> {noformat}
> 
>   
> 
>   2
>   2
> 
> ...
> {noformat}
> The reason for the new parameter is to not break backward-compatibility, 
> though I wish we could always return those two measures (it doesn't cost us 
> anything, the two are already available to the code). Maybe we can break the 
> response in {{master}} and add this parameter only to {{6x}} as deprecated? I 
> am also fine if we leave it and handle it in a separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10349) Add totalTermFreq support to TermsComponent

2017-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938277#comment-15938277
 ] 

Shai Erera commented on SOLR-10349:
---

Thanks [~joel.bernstein], the distributed test suggestion helped me find 
{{DistributedTermsComponentTest}}, and of course as soon as I added a test to 
it, the client failed. Since it expects a number, but got a map. I will see how 
to fix it.

This also answers your second question, this commit changes the response 
structure if you ask for {{terms.ttf}}. I put an example output in the 
description above.

> Add totalTermFreq support to TermsComponent
> ---
>
> Key: SOLR-10349
> URL: https://issues.apache.org/jira/browse/SOLR-10349
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Attachments: SOLR-10349.patch, SOLR-10349.patch
>
>
> See discussion here: http://markmail.org/message/gmpmege2jpfrsp75. Both 
> {{docFreq}} and {{totalTermFreq}} are already available to the 
> TermsComponent, it's just that doesn't add the ttf measure to the response.
> This issue adds a new {{terms.ttf}} parameter which if set to true results in 
> the following output:
> {noformat}
> 
>   
> 
>   2
>   2
> 
> ...
> {noformat}
> The reason for the new parameter is to not break backward-compatibility, 
> though I wish we could always return those two measures (it doesn't cost us 
> anything, the two are already available to the code). Maybe we can break the 
> response in {{master}} and add this parameter only to {{6x}} as deprecated? I 
> am also fine if we leave it and handle it in a separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10349) Add totalTermFreq support to TermsComponent

2017-03-23 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-10349:
--
Attachment: SOLR-10349.patch

Added CHANGES entry.

> Add totalTermFreq support to TermsComponent
> ---
>
> Key: SOLR-10349
> URL: https://issues.apache.org/jira/browse/SOLR-10349
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Attachments: SOLR-10349.patch, SOLR-10349.patch
>
>
> See discussion here: http://markmail.org/message/gmpmege2jpfrsp75. Both 
> {{docFreq}} and {{totalTermFreq}} are already available to the 
> TermsComponent, it's just that doesn't add the ttf measure to the response.
> This issue adds a new {{terms.ttf}} parameter which if set to true results in 
> the following output:
> {noformat}
> 
>   
> 
>   2
>   2
> 
> ...
> {noformat}
> The reason for the new parameter is to not break backward-compatibility, 
> though I wish we could always return those two measures (it doesn't cost us 
> anything, the two are already available to the code). Maybe we can break the 
> response in {{master}} and add this parameter only to {{6x}} as deprecated? I 
> am also fine if we leave it and handle it in a separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10349) Add totalTermFreq support to TermsComponent

2017-03-23 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-10349:
--
Attachment: SOLR-10349.patch

Patch implements the proposed addition. [~joel.bernstein], not sure if you're 
still interested reviewing this, but if you are, your comments are appreciated!

> Add totalTermFreq support to TermsComponent
> ---
>
> Key: SOLR-10349
> URL: https://issues.apache.org/jira/browse/SOLR-10349
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Attachments: SOLR-10349.patch
>
>
> See discussion here: http://markmail.org/message/gmpmege2jpfrsp75. Both 
> {{docFreq}} and {{totalTermFreq}} are already available to the 
> TermsComponent, it's just that doesn't add the ttf measure to the response.
> This issue adds a new {{terms.ttf}} parameter which if set to true results in 
> the following output:
> {noformat}
> 
>   
> 
>   2
>   2
> 
> ...
> {noformat}
> The reason for the new parameter is to not break backward-compatibility, 
> though I wish we could always return those two measures (it doesn't cost us 
> anything, the two are already available to the code). Maybe we can break the 
> response in {{master}} and add this parameter only to {{6x}} as deprecated? I 
> am also fine if we leave it and handle it in a separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-10349) Add totalTermFreq support to TermsComponent

2017-03-23 Thread Shai Erera (JIRA)
Shai Erera created SOLR-10349:
-

 Summary: Add totalTermFreq support to TermsComponent
 Key: SOLR-10349
 URL: https://issues.apache.org/jira/browse/SOLR-10349
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor


See discussion here: http://markmail.org/message/gmpmege2jpfrsp75. Both 
{{docFreq}} and {{totalTermFreq}} are already available to the TermsComponent, 
it's just that doesn't add the ttf measure to the response.

This issue adds a new {{terms.ttf}} parameter which if set to true results in 
the following output:

{noformat}

  

  2
  2

...
{noformat}

The reason for the new parameter is to not break backward-compatibility, though 
I wish we could always return those two measures (it doesn't cost us anything, 
the two are already available to the code). Maybe we can break the response in 
{{master}} and add this parameter only to {{6x}} as deprecated? I am also fine 
if we leave it and handle it in a separate issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7590) Add DocValues statistics helpers

2016-12-20 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766325#comment-15766325
 ] 

Shai Erera commented on LUCENE-7590:


[~shia] where do you see that? I checked master and there's no {{description}} 
in the file at all. Here's the code:

{code}
public LongDocValuesStats(String field) {
  super(field, Long.MAX_VALUE, Long.MIN_VALUE);
}
{code}

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7590-2.patch, LUCENE-7590-sorted-numeric.patch, 
> LUCENE-7590-sorted-set.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7590) Add DocValues statistics helpers

2016-12-18 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-7590.

   Resolution: Fixed
Fix Version/s: 6.4
   master (7.0)

Committed to master and 6x. This is now complete.

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7590-2.patch, LUCENE-7590-sorted-numeric.patch, 
> LUCENE-7590-sorted-set.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7590) Add DocValues statistics helpers

2016-12-18 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
---
Attachment: LUCENE-7590-sorted-set.patch

Patch adds {{SortedDocValuesStats}} and {{SortedSetDocValuesStats}} for sorted 
and sorted-set DV fields. With this patch, I think the issue is ready to be 
closed. I am not sure that we need a DVStats for a BinaryDVField at this point, 
but if demand arises, it should be easy to add.

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590-2.patch, LUCENE-7590-sorted-numeric.patch, 
> LUCENE-7590-sorted-set.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7590) Add DocValues statistics helpers

2016-12-17 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
---
Attachment: LUCENE-7590-sorted-numeric.patch

Patch adds DVStats for {{SortedNumericDocValuesField}}.

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590-2.patch, LUCENE-7590-sorted-numeric.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7590) Add DocValues statistics helpers

2016-12-15 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
---
Attachment: LUCENE-7590-2.patch

Patch adds {{sum}}, {{stdev}} and {{variance}} stats to 
{{NumericDocValuesStats}}. I also added a CHANGES entry which I forgot to in 
the previous commit.

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590-2.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7590) Add DocValues statistics helpers

2016-12-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748192#comment-15748192
 ] 

Shai Erera commented on LUCENE-7590:


There are now few tasks left:

* Add more statistics, such as {{sum}} and {{stdev}} (for numeric fields). 
Should we care about overflow, or only document it?

* We can also compute more stats like what Solr gives in [Stats 
Component|https://cwiki.apache.org/confluence/display/solr/The+Stats+Component#TheStatsComponent-StatisticsSupported].
 What do you think?

* Add stats for {{SortedDocValues}}. This should be fairly straightforward by 
comparing the {{BytesRef}} of all matching documents. But I don't think we 
should have a {{mean}} stat for it? Likewise for {{SortedSetDocValues}}.

* What should we do with {{SortedNumericDocValues}}? {{min}} and {{max}} are 
well defined, but what about {{mean}}? Should it be across all values?

I intend to close this issue and handle the rest in follow-on issues, unless 
you think otherwise. Also, would appreciate your feedback on the above points.

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7590) Add DocValues statistics helpers

2016-12-14 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
---
Attachment: LUCENE-7590.patch

Patch changes {{DocValuesIterator}} package-private again and adds an API to 
{{DocValuesStats}} to help in determining whether a document has or does not 
have a value for the field.

The Collector needs to be public because you're supposed to initialize it and 
run a search with it.

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7590) Add DocValues statistics helpers

2016-12-13 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
---
Attachment: LUCENE-7590.patch

[~jpountz] I accept your proposal about missing, only in case a reader does not 
have the requested DV field, the collector returns a {{LeafCollector}} which 
updates {{missing}} for every hit document.

I also renamed the classes as proposed earlier, as well extracted 
{{DocValuesStats}} and friends to its own class.

I still didn't address changing {{DocValuesIterator}} to public. BTW, I noticed 
that {{SimpleTextDocValuesReader}} defines a private class named 
{{DocValuesIterator}} with exactly the same signature, I assume because the 
other one is package-private. So I feel that changing {{DVI}} to public is 
beneficial beyond the scope of this issue alone. What do you think?

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7590) Add DocValues statistics helpers

2016-12-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745855#comment-15745855
 ] 

Shai Erera commented on LUCENE-7590:


bq. Instead of using a NOOP_COLLECTOR, you could throw a 
CollectionTerminatedException

OK, good idea.

bq. By the way, in such cases I think we should still increase the missing 
count?

I am not sure? I mean, {{missing}} represents all the documents that matched 
the query and did not have a value for that DV field. But when 
{{getLeafCollector}} is called, we don't know yet that any documents will be 
matched by the query at all (I think?) and therefore updating missing might be 
confusing? I.e., I'd expect that if anyone chained {{TotalHitsCollector}} with 
{{DocValuesStatsCollector}}, then {{totalHits = stats.count() + 
stats.missing()}}? I am open to discuss it, just not sure I always want to 
update missing with {{context.reader().numDocs()}} ...

bq. Can we avoid making DocValuesIterator public?

I did not find a way, since it's part of {{DocValuesStats.init()}} API and I 
think users should be able to provide their own {{Stats}} impl, e.g. if they 
want to compute something on a {{BinaryDocValues}} field?

Here too, I'd love to get more ideas though. I tried to avoid implementing N 
collectors, one for each DV type, where they share a large portion of the code. 
But if you have strong opinions about making {{DVI}} public, maybe that's what 
we should do ...

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7590) Add DocValues statistics helpers

2016-12-13 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
---
Attachment: LUCENE-7590.patch

Added tests for {{DoubleNumericDocValuesStats}}.

Now that I review the class names, how do you feel about removing {{Numeric}} 
from the concrete classes, so they're called {{Long/DoubleDocValuesStats}}?

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch, LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7590) Add DocValues statistics helpers

2016-12-13 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
---
Attachment: LUCENE-7590.patch

Patch implements a {{DocValuesStatsCollector}}. Note some key design decisions:

A {{DocValuesStats}} is responsible for providing the specific 
{{DocValuesIterator}} for a {{LeafReaderContext}}. It then accumulates the 
value, computes missing and other statistics. It computes {{missing}} and 
{{count}}, leaving {{min}} and {{max}} to the actual implementation. Also, this 
stats does not define a {{mean}}, as at least for now I'm not sure how the mean 
value of a {{SortedSetDocValues}} is defined.

An abstract {{NumericDocValuesStats}} implementation for single-numeric DV 
fields, which also adds a {{mean}} statistic, with two concrete 
implementations: {{LongNumericDocValuesStats}} and 
{{DoubleNumericDocValuesStats}}.

This hierarchy should allow us to add further statistics for {{SortedSet}} and 
{{SortedNumeric}} DV fields. I did not implement them yet, as I'm not sure 
about some of the statistics (e.g. should the {{mean}} stat of a 
{{SortedNumeric}} be the mean across all values, or the minimum per document or 
...). Let's discuss that separately.

Also, note that I had to make {{DocValuesIterator}} public in order to declare 
it in this collector.

If you're OK with the design and implementation, I want to separate 
{{DovValuesStats}} to its own file, for clarity. I did not do it yet though.

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, 
> LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7590) Add DocValues statistics helpers

2016-12-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743089#comment-15743089
 ] 

Shai Erera commented on LUCENE-7590:


bq. Let's implement the computation of these stats by writing a Collector and 
use a MatchAllDocsQuery?

At first I thought this is an overkill, but a {{Collector}} will allow 
computing them for documents that match another query. I will explore that 
option.

bq. Why is missing undefined when count is zero?

I thought that if you have no documents in the index at all, then {{missing}} 
is undefined, but now that you ask the question, I guess in that case it's fine 
if it's {{0}}, like {{count}}. I'll change the docs.

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7590) Add DocValues statistics helpers

2016-12-12 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
---
Attachment: LUCENE-7590.patch

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7590) Add DocValues statistics helpers

2016-12-12 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
---
Attachment: LUCENE-7590.patch

Thanks [~mikemccand] and [~thetaphi], I changed to a static class and removed 
{{DocsAndContexts}} in favor of a new 
{{Function}}.

Maybe {{BitsDocIdSetIterator}} can go in separately (i.e. a separate issue)? As 
I think it's a useful utility to have anyway.

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590.patch, LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7590) Add DocValues statistics helpers

2016-12-11 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7590:
---
Attachment: LUCENE-7590.patch

First patch adds numeric statistics. I'd appreciate comments about it before I 
add support for sorted-numeric (including, whether we should!).

Note that I chose to take either a field or {{ValueSource}}. The latter gives 
some flexibility by allowing users to pass an arbitrary VS over e.g. an 
{{Expression}} over a numeric DV field.

This, as far as I could tell, does not apply to {{SortedNumericDV}}, or at 
least I couldn't find an existing {{ValueSource}} implementation (like 
{{LongFieldSource}}) for {{SortedNumericDV}}.

If this approach looks good, I'd like to refactor the class so that it's easy 
to share/reuse code between Long and Double NDV fields.

> Add DocValues statistics helpers
> 
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/misc
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow 
> users to query for the min/max/avg etc. stats of a DV field. In this issue 
> I'd like to cover numeric DV, but there's no reason not to add it to other DV 
> types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7590) Add DocValues statistics helpers

2016-12-11 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-7590:
--

 Summary: Add DocValues statistics helpers
 Key: LUCENE-7590
 URL: https://issues.apache.org/jira/browse/LUCENE-7590
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/misc
Reporter: Shai Erera
Assignee: Shai Erera


I think it can be useful to have DocValues statistics helpers, that can allow 
users to query for the min/max/avg etc. stats of a DV field. In this issue I'd 
like to cover numeric DV, but there's no reason not to add it to other DV types 
too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7344) Deletion by query of uncommitted docs not working with DV updates

2016-08-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414871#comment-15414871
 ] 

Shai Erera commented on LUCENE-7344:


bq. I don't understand most of what you're saying

To clarify the problem, both for you but also in the interest of writing a 
detailed plan of the proposed solution: currently when a DBQ is processed, it 
uses the LeafReader *without* the NDV updates, and therefore has no knowledge 
of the updated values. This is relatively easily solved in the patch I 
uploaded, by applying the DV updates before the DBQ is processed. That way, the 
DBQ uses a LeafReader which is already aware of the updates and all works well.

However, there is an order of update operations that occur in IndexWriter. In 
our case it could be a mix in of DBQ and NDV updates. So if we apply *all* the 
DV updates before any of the DBQs, we'll get incorrect results where the DBQ 
either delete a document it shouldn't (see code example above, and also what 
your {{testDeleteFollowedByUpdateOfDeletedValue}} shows), or not delete a 
document that it should.

To properly solve this problem, we need to apply the DV updates and DBQs in the 
order they were received (as opposed to applying them in bulk in current code). 
Meaning if the order of operations is NDVU1, NDVU2, DBQ1, NDVU3, DBQ2, DBQ3, 
NDVU4, then we need to:
# Apply NDVU1 + NDVU2; this will cause a new LeafReader to be created
# Apply DBQ1; using the already updated LeafReader
# Apply NDVU3; another LeafReader will be created, now reflecting all 3 NDV 
updates
# Apply DBQ2 and DBQ3; using the updated LeafReader from above
# Apply NDVU4; this will cause another LeafReader to be created

The adversarial affect in this case is that we cause 3 LeafReader reopens, each 
time (due to how NDV updates are currently implemented) writing the full DV 
field to a new stack. If you have many documents, it's going to be very 
expensive. Also, if you have a bigger sequence of interleaving updates and 
deletes, this gets worse and worse.

And so here comes the optimization that Mike and I discussed above. Since the 
NDV updates are held in-memory until they're applied, we can avoid flushing 
them to disk and creating a LeafReader which reads the original DV field + the 
in-memory DV updates. Note though: not *all* DV updates, but only the ones that 
are relevant up until this point. So in the case above, that LeafReader will 
view only NDVU1 and NDVU2, and later it will be updated to view NDVU3 as well.

This is purely an optimization step and has nothing to do with correctness (of 
course, that optimization is tricky and needs to be implemented correctly!). 
Therefore my plan of attack in this case is:

# Have enough tests that try different cases before any of this is implemented. 
For example, Mike proposed above to have the LeafReader + DV field "view" use 
docIdUpto. I need to check the code again, but I want to make sure that if 
NDVU2, NDVU3 and NDVU4 (with the interleaving DBQs) all affect the *same* 
document, everything still works.
# Implement the less-efficient approach, i.e. flush the DV updates to disk 
before each DBQ is processed. This ensures that we have a proper solution 
implemented, and we leave the optimization to a later step (either literally a 
later commit, or just a different patch or whatever). I think this is 
complicated enough to start with.
# Improve the solution to avoid flushing DV updates between the DBQs, as 
proposed above.

bq. testBiasedMixOfRandomUpdates

I briefly reviewed the test, but not thoroughly (I intend to). However, notice 
that committing (hard/soft ; commit/NRT) completely avoids the problem because 
a commit/NRT already means flushing DV updates. So if that's what this test 
does, I don't think it's going to expose the problem. Perhaps with the 
explanation I wrote above, you can revisit the test and make it fail though.

> Deletion by query of uncommitted docs not working with DV updates
> -
>
> Key: LUCENE-7344
> URL: https://issues.apache.org/jira/browse/LUCENE-7344
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ishan Chattopadhyaya
> Attachments: LUCENE-7344.patch, LUCENE-7344.patch, LUCENE-7344.patch
>
>
> When DVs are updated, delete by query doesn't work with the updated DV value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5944) Support updates of numeric DocValues

2016-08-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413974#comment-15413974
 ] 

Shai Erera commented on SOLR-5944:
--

[~ichattopadhyaya], I've continued the discussion in LUCENE-7344. I am looking 
into fixing the bug, though it's a hairy one...

> Support updates of numeric DocValues
> 
>
> Key: SOLR-5944
> URL: https://issues.apache.org/jira/browse/SOLR-5944
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Shalin Shekhar Mangar
> Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.failures.tar.gz, 
> hoss.62D328FA1DEA57FD.fail.txt, hoss.62D328FA1DEA57FD.fail2.txt, 
> hoss.62D328FA1DEA57FD.fail3.txt, hoss.D768DD9443A98DC.fail.txt, 
> hoss.D768DD9443A98DC.pass.txt
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be 
> really nice to have Solr support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7344) Deletion by query of uncommitted docs not working with DV updates

2016-08-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413364#comment-15413364
 ] 

Shai Erera commented on LUCENE-7344:


After chatting with Mike about this, here's an example for an "interleaving" 
case that Mike mentioned, where this patch does not work:

{code}
writer.updateNumericDocValue(new Term("id", "doc-1"), "val", 17L);
writer.deleteDocuments(DocValuesRangeQuery.newLongRange("val", 5L, 10L, 
true, true));
writer.updateNumericDocValue(new Term("id", "doc-1"), "val", 7L);
{code}

Here, "doc-1" should not be deleted, because the DBQ is submitted before the DV 
update, but because we resolve all DV updates before DBQ (in this patch), it 
ends up deleted. This is wrong of course. I'm looking into Mike's other idea of 
having a LeafReader view with the DV updates up until that document, and then 
ensuring DV updates / DBQs are applied in the order they were submitted. This 
starts to get very complicated.

> Deletion by query of uncommitted docs not working with DV updates
> -
>
> Key: LUCENE-7344
> URL: https://issues.apache.org/jira/browse/LUCENE-7344
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ishan Chattopadhyaya
> Attachments: LUCENE-7344.patch, LUCENE-7344.patch
>
>
> When DVs are updated, delete by query doesn't work with the updated DV value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7344) Deletion by query of uncommitted docs not working with DV updates

2016-08-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-7344:
---
Attachment: LUCENE-7344.patch

Patch applies the DBQ after resolving the DV updates. With this patch, if there 
were DV updates, then the SegmentState is updated with the up-to-date reader.

Note that this does not mean more work compared to what was done before -- if 
there are no DV updates, writeFieldUpdates isn't called, and no reader is 
updated. If there were field updates, then writeFieldUpdates was called anyway, 
refreshing the internal reader.

This patch does not change the behavior, except it also updates the 
SegmentState.reader if there were DV updates.

[~mikemccand] what do you think? Our SegmentReader already only refreshes the 
DV updates, that is it already maintains a view of the bare segment with the 
modified DV fields. Also, given what I wrote above, I don't believe that we're 
making more SR reopens?

> Deletion by query of uncommitted docs not working with DV updates
> -
>
> Key: LUCENE-7344
> URL: https://issues.apache.org/jira/browse/LUCENE-7344
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ishan Chattopadhyaya
> Attachments: LUCENE-7344.patch, LUCENE-7344.patch
>
>
> When DVs are updated, delete by query doesn't work with the updated DV value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7344) Deletion by query of uncommitted docs not working with DV updates

2016-08-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413216#comment-15413216
 ] 

Shai Erera commented on LUCENE-7344:


Hmm ... I had to refresh my memory of the DV updates code and I agree with 
[~mikemccand] that the fix is hairy (which goes hand-in-hand with the hairy 
{{BufferedUpdatesStream}}). The problem is that deleteByQuery uses the existing 
LeafReader, but the DV updates themselves were not yet applied so the reader is 
unaware of the change.

I changed the test to call {{updateDocument}} instead of updating the NDV and 
the test passes. This is expected because updating a document deletes the old 
one and adds a new document. So when DBQ is processed, a LeafReader is opened 
on the new segment (with the new document; it has to work that way cause the 
new document isn't yet flushed) and the new segment thus has the new document 
with the updated NDV.

I agree this is a bug *only* because updating a document followed by DBQ works 
as expected. The internals of how in-place updates are applied should not 
concern the user.

I wonder if we need to implement a complex merge-sorting approach as 
[~mikemccand] proposes, or if we applied the DV updates before processing and 
DBQ would be enough (ignoring the adversarial affects that Mike describes; 
they're true, but I ignore them for the moment). I want to try that.

If that works, then perhaps we can detect if a DBQ involves an NDV field (or 
BDV field for that matter) and refresh the reader only then, or refresh the 
reader whenever there are DBQ and any DV updates, even if they are unrelated. 
But first I want to try and make the test pass, before we decide on how to 
properly fix it.

> Deletion by query of uncommitted docs not working with DV updates
> -
>
> Key: LUCENE-7344
> URL: https://issues.apache.org/jira/browse/LUCENE-7344
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ishan Chattopadhyaya
> Attachments: LUCENE-7344.patch
>
>
> When DVs are updated, delete by query doesn't work with the updated DV value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9319) DELETEREPLICA should accept just count and remove replicas intelligenty

2016-07-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384351#comment-15384351
 ] 

Shai Erera commented on SOLR-9319:
--

Thanks [~noble.paul]. The issue description is a bit misleading (_should accept 
*just* count_) but thanks for clarifying.

> DELETEREPLICA should accept  just count and remove replicas intelligenty
> 
>
> Key: SOLR-9319
> URL: https://issues.apache.org/jira/browse/SOLR-9319
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
> Fix For: 6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9319) DELETEREPLICA should accept just count and remove replicas intelligenty

2016-07-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384301#comment-15384301
 ] 

Shai Erera commented on SOLR-9319:
--

What does "just count" mean? Will I not be able to delete a specific replica, 
or is this in addition to being able to delete a selected replica? I think that 
having an API like "delete replicas such that only X remain" is fine, but I 
would like to also be able to specify which replica I want to delete (since in 
my case I need to control that).

> DELETEREPLICA should accept  just count and remove replicas intelligenty
> 
>
> Key: SOLR-9319
> URL: https://issues.apache.org/jira/browse/SOLR-9319
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
> Fix For: 6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376474#comment-15376474
 ] 

Shai Erera commented on SOLR-9290:
--

bq. Which begs the question: why are there 15 CLOSE_WAIT connections that last 
forever on branch_6x even with this patch?

I think Shalin's patch only adds this monitor thread to {{UpdateShardHandler}}, 
but not to {{HttpShardHandlerFactory}} so these 15 could be from it?

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, index.sh, 
> setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375601#comment-15375601
 ] 

Shai Erera commented on SOLR-9290:
--

Oh I see. So we didn't experience the problem because we run w/ 2 replicas (and 
one shard currently) and with 5.4.1's settings the math for us results in a low 
number of connections. But someone running a larger Solr deployment could 
already hit that problem prior to 5.5. Thanks for the clarification!

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375582#comment-15375582
 ] 

Shai Erera commented on SOLR-9290:
--

Regarding the patch, the monitor looks good. Few comments:

* I prefer that we name it {{IdleConnectionsMonitor}} (w/ 's', plural 
connections). It goes for the class, field and thread name.
* Do you intend to keep all the log statements around?
* Do you think we should make the polling interval (10s) and 
idle-connections-time (50s) configurable? Perhaps through system properties?

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375571#comment-15375571
 ] 

Shai Erera commented on SOLR-9290:
--

bq. Do you have only two replicas? Perhaps the maxConnectionsPerHost limit of 
100 is kicking in?

Yes, we do have only 2 replicas and I get why the CLOSE_WAITs stop at 100. I 
was asking about 5.3.2 -- how could CLOSE_WAITs get high in 5.3.2 when 
maxConnectionsPerHost was the same as in 5.4.1?

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375552#comment-15375552
 ] 

Shai Erera commented on SOLR-9290:
--

bq. I thought that hypothesis holds only after SOLR-8533. Are you saying you 
also saw it on 5.3.2? If so, what are the values that are set for these 
properties there? We definitely do not see the problem with 5.4.1, but we 
didn't test prior versions.

We posted at the same time, I read your answer above. I wonder why we don't see 
the problem with 5.4.1. I mean, we do see CLOSE_WAITs piling, but stop at ~100 
(200 for the leader).

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375548#comment-15375548
 ] 

Shai Erera commented on SOLR-9290:
--

Thanks [~shalinmangar]. Few questions:

bq. Also, I think the reason this wasn't reproducible on master is because 
SOLR-4509 enabled eviction of idle threads by calling 
HttpClientBuilder#evictIdleConnections with a 50 second limit.

Is this something we can apply to 5x/6x too?

bq. This patch adds a monitor thread for the pool created in UpdateShardHandler 
and with this applied

I didn't see the monitor in the latest patch, only the log printouts. Did you 
forget to add it?

bq. There are still a few connections in CLOSE_WAIT at steady state but I 
verified that they belong to a different HttpClient instance in 
HttpShardHandlerFactory and other places.

(1) Can/Should we have a similar monitor for HttpShardHandlerFactory?
(2) Any reason why the two don't share the same HttpClient instance?

bq. This patch applies on 5.3.2
bq. We have a large limit for maxConnections and maxConnectionsPerHost

I thought that hypothesis holds only after SOLR-8533. Are you saying you also 
saw it on 5.3.2? If so, what are the values that are set for these properties 
there? We definitely *do not* see the problem with 5.4.1, but we didn't test 
prior versions.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375500#comment-15375500
 ] 

Shai Erera commented on SOLR-9290:
--

Thanks [~yo...@apache.org], I'll read the issue.

I agree with what you write in general, but we do hit an issue with these 
settings. That that it reproduces easily with SSL enabled suggests that the 
issue may not be in Solr code at all, but I wonder if we shouldn't perhaps pick 
smaller default values if SSL is enabled? (Our guess at the moment is that HC 
keeps more connections in the pool when SSL is enabled because they are more 
expensive to initiate, but it's just a guess).

And maybe the proper solution would be what [~shalinmangar] wrote above -- have 
a bg monitor which closes idle/expired connections. I actually wonder why it 
can't be a property of {{ClientConnectionManager}} that you can set to auto 
close idle/expired connections after a period of time. We can potentially have 
that monitor act only if SSL is enabled (or at least until non-SSL exhibits the 
same problems too).

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375184#comment-15375184
 ] 

Shai Erera commented on SOLR-9290:
--

Also [~markrmil...@gmail.com], for education purposes, if you have a link to a 
discussion about why it may lead to a distributed deadlock, I'd be happy to 
read it.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375151#comment-15375151
 ] 

Shai Erera commented on SOLR-9290:
--

Thanks [~markrmil...@gmail.com]. In that case, what's your take on the issue at 
hand?

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375082#comment-15375082
 ] 

Shai Erera commented on SOLR-9290:
--

An update -- I've modified our solr.xml (which is basically the vanilla 
solr.xml) with these added props (under the {{solrcloud}} element) and I do not 
see the connections spike anymore:

{noformat}
1
100
{noformat}

Those changes were part of SOLR-8533. [~markrmil...@gmail.com] on that issue 
you didn't explain why the defaults need to be set that high. Was there perhaps 
an email thread you can link to which includes more details? I ask because one 
thing I've noticed is that if I query {{solr/admin/info/system}}, the 
{{system.openFileDescriptorCount}} is very high when there are many 
CLOSE_WAITs. Such a change in Solr default probably need to be accompanied by 
an OS-level setting too, no?

I am still running tests with those props set in solr.xml, on top of 5.5.1. 
[~mbjorgan] would you mind testing in your environment too?

[~hoss...@fucit.org], sorry I completely missed your questions. Our solr.xml is 
the vanilla one, we didn't modify anything in it. We did uncomment the SSL 
props in solr.in.sh as the ref guide says, but aside from the key name and 
password, we didn't change any settings.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374779#comment-15374779
 ] 

Shai Erera commented on SOLR-9290:
--

bq. Interestingly, the number of connections stuck in CLOSE_WAIT decrease 
during indexing and increase again about 10 or so seconds after the indexing is 
stopped.

I've observed that too and it's not that they decrease, but rather that the 
connections change their state from CLOSE_WAIT to ESTABLISHED, then when 
indexing is done to TIME_WAIT and then finally to CLOSE_WAIT again. I believe 
this aligns with what the HC documentation says -- the connections are not 
necessarily released, but kept in the pool. When you re-index again, they are 
reused and go back to the pool.

bq. However, this commit only increases the limits on how many update 
connections that can be open

That's interesting and might be a temporary workaround for the problem, which I 
intend to test shortly. In 5.4.1 they were both modified to 100,000:

{noformat}
-  public static final int DEFAULT_MAXUPDATECONNECTIONS = 1;
-  public static final int DEFAULT_MAXUPDATECONNECTIONSPERHOST = 100;
+  public static final int DEFAULT_MAXUPDATECONNECTIONS = 10;
+  public static final int DEFAULT_MAXUPDATECONNECTIONSPERHOST = 10;
{noformat}

This can explain why we run into trouble with 5.5.1 but not with 5.4.1. Though 
even in 5.4.1 there are few hundreds of CLOSE_WAIT connections, with 5.5.1 they 
reach (in our case) the orders of 35-40K, at which point Solr became useless, 
not being able to talk to the replica or pretty much anything else.

I see these can be defined in solr.xml, though it's not documented how, so I'm 
going to give it a try and will report back here.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7253) Sparse data in doc values and segments merging

2016-05-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268847#comment-15268847
 ] 

Shai Erera commented on LUCENE-7253:


I thought so, but that still needs to be benchmarked. Maybe [~prog] has an idea 
of an implementation that will keep both efficient? Maybe read time isn't 
affected, but merge is? Maybe if we choose to sparsely encode fields that are 
very sparse, the read time isn't affected? As a first step that can work. Point 
is we shouldn't shoot down an idea before we have code/results to back the 
shooting.

And I agree that if we had iterator-like API it would make a stronger case for 
sparse DV. Maybe though both need not be coupled and one can be done before the 
other.

> Sparse data in doc values and segments merging 
> ---
>
> Key: LUCENE-7253
> URL: https://issues.apache.org/jira/browse/LUCENE-7253
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 5.5, 6.0
>Reporter: Pawel Rog
>  Labels: performance
>
> Doc Values were optimized recently to efficiently store sparse data. 
> Unfortunately there is still big problem with Doc Values merges for sparse 
> fields. When we imagine 1 billion documents index it seems it doesn't matter 
> if all documents have value for this field or there is only 1 document with 
> value. Segment merge time is the same for both cases. In most cases this is 
> not a problem but there are several cases in which one can expect having many 
> fields with sparse doc values.
> I can describe an example. During performance tests of a system with large 
> number of sparse fields I realized that Doc Values merges are a bottleneck. I 
> had hundreds of different numeric fields. Each document contained only small 
> subset of all fields. Average document contains 5-7 different numeric values. 
> As you can see data was very sparse in these fields. It turned out that 
> ingestion process was CPU-bound. Most of CPU time was spent in DocValues 
> related methods (SingletonSortedNumericDocValues#setDocument, 
> DocValuesConsumer$10$1#next, DocValuesConsumer#isSingleValued, 
> DocValuesConsumer$4$1#setNext, ...) - mostly during merging segments.
> Adrien Grand suggested to reduce the number of sparse fields and replace them 
> with smaller number of denser fields. This helped a lot but complicated 
> fields naming. 
> I am not much familiar with Doc Values source code but I have small 
> suggestion how to improve Doc Values merges for sparse fields. I realized 
> that Doc Values producers and consumers use Iterators. Let's take an example 
> of numeric Doc Values. Would it be possible to replace Iterator which 
> "travels" through all documents with Iterator over collection of non empty 
> values? Of course this would require storing object (instead of numeric) 
> which contains value and document ID. Such an iterator could significantly 
> improve merge time of sparse Doc Values fields. IMHO this won't cause big 
> overhead for dense structures but it can be game changer for sparse 
> structures.
> This is what happens in NumericDocValuesWriter on flush
> {code}
> dvConsumer.addNumericField(fieldInfo,
>new Iterable() {
>  @Override
>  public Iterator iterator() {
>return new NumericIterator(maxDoc, values, 
> docsWithField);
>  }
>});
> {code}
> Before this happens during addValue, this loop is executed to fill holes.
> {code}
> // Fill in any holes:
> for (int i = (int)pending.size(); i < docID; ++i) {
>   pending.add(MISSING);
> }
> {code}
> It turns out that variable called pending is used only internally in 
> NumericDocValuesWriter. I know pending is PackedLongValues and it wouldn't be 
> good to change it with different class (some kind of list) because this may 
> break DV performance for dense fields. I hope someone can suggest interesting 
> solutions for this problem :).
> It would be great if discussion about sparse Doc Values merge performance can 
> start here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7253) Sparse data in doc values and segments merging

2016-05-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268770#comment-15268770
 ] 

Shai Erera commented on LUCENE-7253:


bq. Read some actual literature on column store databases, see how these 
situations are handled.

Would be great if you can recommend some particular references.

bq. I'm not going to argue with you guys here, because your argument is 
pathetic ... After you have educated yourselves, you will look less silly.

I don't get the patronizing tone, really.

--

What if the numeric DV consumer encoded the data differently based on the 
cardinality of the field? Dense fields would be encoded as today and low 
cardinality ones encode two arrays of docs and values (over simplifying, I 
know)? We can then benchmark what 'dense' means (50%/10%/100 docs) based on 
benchmark results?

It's hard to overrule an idea without (a) an implementation that we can refer 
to and (b) proof that it does help in some cases as well not make other cases 
worse.

[~prog]: maybe you should start playing with the idea, upload some patches, 
perform some benchmarks etc. Then we'll have more data to discuss and decide if 
this is worth pursuing or not. What do you think?

> Sparse data in doc values and segments merging 
> ---
>
> Key: LUCENE-7253
> URL: https://issues.apache.org/jira/browse/LUCENE-7253
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 5.5, 6.0
>Reporter: Pawel Rog
>  Labels: performance
>
> Doc Values were optimized recently to efficiently store sparse data. 
> Unfortunately there is still big problem with Doc Values merges for sparse 
> fields. When we imagine 1 billion documents index it seems it doesn't matter 
> if all documents have value for this field or there is only 1 document with 
> value. Segment merge time is the same for both cases. In most cases this is 
> not a problem but there are several cases in which one can expect having many 
> fields with sparse doc values.
> I can describe an example. During performance tests of a system with large 
> number of sparse fields I realized that Doc Values merges are a bottleneck. I 
> had hundreds of different numeric fields. Each document contained only small 
> subset of all fields. Average document contains 5-7 different numeric values. 
> As you can see data was very sparse in these fields. It turned out that 
> ingestion process was CPU-bound. Most of CPU time was spent in DocValues 
> related methods (SingletonSortedNumericDocValues#setDocument, 
> DocValuesConsumer$10$1#next, DocValuesConsumer#isSingleValued, 
> DocValuesConsumer$4$1#setNext, ...) - mostly during merging segments.
> Adrien Grand suggested to reduce the number of sparse fields and replace them 
> with smaller number of denser fields. This helped a lot but complicated 
> fields naming. 
> I am not much familiar with Doc Values source code but I have small 
> suggestion how to improve Doc Values merges for sparse fields. I realized 
> that Doc Values producers and consumers use Iterators. Let's take an example 
> of numeric Doc Values. Would it be possible to replace Iterator which 
> "travels" through all documents with Iterator over collection of non empty 
> values? Of course this would require storing object (instead of numeric) 
> which contains value and document ID. Such an iterator could significantly 
> improve merge time of sparse Doc Values fields. IMHO this won't cause big 
> overhead for dense structures but it can be game changer for sparse 
> structures.
> This is what happens in NumericDocValuesWriter on flush
> {code}
> dvConsumer.addNumericField(fieldInfo,
>new Iterable() {
>  @Override
>  public Iterator iterator() {
>return new NumericIterator(maxDoc, values, 
> docsWithField);
>  }
>});
> {code}
> Before this happens during addValue, this loop is executed to fill holes.
> {code}
> // Fill in any holes:
> for (int i = (int)pending.size(); i < docID; ++i) {
>   pending.add(MISSING);
> }
> {code}
> It turns out that variable called pending is used only internally in 
> NumericDocValuesWriter. I know pending is PackedLongValues and it wouldn't be 
> good to change it with different class (some kind of list) because this may 
> break DV performance for dense fields. I hope someone can suggest interesting 
> solutions for this problem :).
> It would be great if discussion about sparse Doc Values merge performance can 
> start here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Commented] (SOLR-9057) CloudSolrClient should be able to work w/o ZK url

2016-05-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268367#comment-15268367
 ] 

Shai Erera commented on SOLR-9057:
--

In CSC code I see that {{connect()}} is called from several places, one of 
which is {{sendRequest}} and another is {{requestWithRetryOnStaleState}}. And 
in {{connect()}} I see that a watcher is created both by calling {{new 
ZkStateReader(zkHost, zkClientTimeout, zkConnectTimeout)}} and immediately 
after {{zk.createClusterStateWatchersAndUpdate()}}.

I don't reject your statement about my understanding of how CSC works, but 
could you please explain how it does not create a watcher today? Or if that's 
case today and this issue is about changing it, what are you proposing to 
change?

If you prefer to wait with answering these questions until you have a patch, 
I'm OK with that too.

> CloudSolrClient should be able to work w/o ZK url
> -
>
> Key: SOLR-9057
> URL: https://issues.apache.org/jira/browse/SOLR-9057
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Reporter: Noble Paul
>
> It should be possible to pass one or more Solr urls to Solrj and it should be 
> able to get started from there. Exposing ZK to users should not be required. 
> it is a security vulnerability 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9057) CloudSolrClient should be able to work w/o ZK url

2016-05-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268329#comment-15268329
 ] 

Shai Erera commented on SOLR-9057:
--

I thought that the whole idea of CSC is to use ZkStateReader so that it can 
react to state changes quickly, because ZkStateReader does create a watch on 
the cluster state. If it doesn't use ZkStateReader anymore, will it 
periodically poll CLUSTERSTATUS? Isn't that less efficient and maybe even doing 
a lot of redundant CLUSTERSTATUS checks when the cluster state doesn't change?

I always viewed CSC and its use of ZkStateReader as an advantage. I do 
understand though that it currently plays two roles, which I believe you 
propose to separate: (1) understanding the distributed topology of the Solr 
nodes, so that it forwards requests to leaders etc. and (2) getting notified on 
cluster state changes rather than querying for it repeatedly.

I personally think that CSC should continue to use ZkStateReader and be tied to 
it. Users who don't want to expose/get-exposed to ZK can use a regular 
HttpSolrClient. True, their requests may get routed to the right node (so that 
adds an extra hop), but perhaps it's not that bad?

Alternatively, you could have CSC take a ClusterStateProvider with two impls: 
one that uses HTTP CLUSTERSTATUS and another that uses ZkStateReader. Then 
users can enjoy the best of both worlds: CSC does the "right" thing and the 
user can choose whether to work w/ the HTTP end-point or the ZK one.

> CloudSolrClient should be able to work w/o ZK url
> -
>
> Key: SOLR-9057
> URL: https://issues.apache.org/jira/browse/SOLR-9057
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Reporter: Noble Paul
>
> It should be possible to pass one or more Solr urls to Solrj and it should be 
> able to get started from there. Exposing ZK to users should not be required. 
> it is a security vulnerability 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9057) CloudSolrClient should be able to work w/o ZK url

2016-05-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268315#comment-15268315
 ] 

Shai Erera edited comment on SOLR-9057 at 5/3/16 7:49 AM:
--

How will it initiate {{ZkStateReader}} without getting the ZK host? Or do you 
mean it will extract the ZK info from one of the Solr URLs, by submitting a 
call like {{/admin/info/system}}?


was (Author: shaie):
How will it initiate {{ZkStateReader}} without getting the ZK host? Or do you 
mean it will extract the ZK info from one of the Solr URLs, but submitting a 
call like {{/admin/info/system}}?

> CloudSolrClient should be able to work w/o ZK url
> -
>
> Key: SOLR-9057
> URL: https://issues.apache.org/jira/browse/SOLR-9057
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Reporter: Noble Paul
>
> It should be possible to pass one or more Solr urls to Solrj and it should be 
> able to get started from there. Exposing ZK to users should not be required. 
> it is a security vulnerability 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9057) CloudSolrClient should be able to work w/o ZK url

2016-05-03 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268315#comment-15268315
 ] 

Shai Erera commented on SOLR-9057:
--

How will it initiate {{ZkStateReader}} without getting the ZK host? Or do you 
mean it will extract the ZK info from one of the Solr URLs, but submitting a 
call like {{/admin/info/system}}?

> CloudSolrClient should be able to work w/o ZK url
> -
>
> Key: SOLR-9057
> URL: https://issues.apache.org/jira/browse/SOLR-9057
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Reporter: Noble Paul
>
> It should be possible to pass one or more Solr urls to Solrj and it should be 
> able to get started from there. Exposing ZK to users should not be required. 
> it is a security vulnerability 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7253) Sparse data in doc values and segments merging

2016-05-02 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266764#comment-15266764
 ] 

Shai Erera commented on LUCENE-7253:


To add to the sparsity discussion, when I did the numeric DV updates I already 
wrote (somewhere) that I think if we could cater for sparse DV fields better, 
it might also improve the numeric DV updates case. Today when you update a 
numeric DV field, we rewrite the entire DV for that field in the "stacked" DV. 
This works well if you perform many updates before you flush/commit, but if you 
only update the value of one document, that's costly. If we could write just 
that one update to a stack, we could _collapse_ the stacks at read time.

Of course, that _collapsing_ might slow searches down, so the whole idea of 
writing just the updated values needs to be benchmarked before we actually do 
it, so I'm not proposing that here. Just wanted to give another (potential) use 
case for sparse DV fields.

And FWIW, I do agree with [~yo...@apache.org] and [~dsmiley] about sparse DV 
not being an abuse case, as I'm seeing them very often too. That's of course 
unless you mean something else by abuse case...

> Sparse data in doc values and segments merging 
> ---
>
> Key: LUCENE-7253
> URL: https://issues.apache.org/jira/browse/LUCENE-7253
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 5.5, 6.0
>Reporter: Pawel Rog
>  Labels: performance
>
> Doc Values were optimized recently to efficiently store sparse data. 
> Unfortunately there is still big problem with Doc Values merges for sparse 
> fields. When we imagine 1 billion documents index it seems it doesn't matter 
> if all documents have value for this field or there is only 1 document with 
> value. Segment merge time is the same for both cases. In most cases this is 
> not a problem but there are several cases in which one can expect having many 
> fields with sparse doc values.
> I can describe an example. During performance tests of a system with large 
> number of sparse fields I realized that Doc Values merges are a bottleneck. I 
> had hundreds of different numeric fields. Each document contained only small 
> subset of all fields. Average document contains 5-7 different numeric values. 
> As you can see data was very sparse in these fields. It turned out that 
> ingestion process was CPU-bound. Most of CPU time was spent in DocValues 
> related methods (SingletonSortedNumericDocValues#setDocument, 
> DocValuesConsumer$10$1#next, DocValuesConsumer#isSingleValued, 
> DocValuesConsumer$4$1#setNext, ...) - mostly during merging segments.
> Adrien Grand suggested to reduce the number of sparse fields and replace them 
> with smaller number of denser fields. This helped a lot but complicated 
> fields naming. 
> I am not much familiar with Doc Values source code but I have small 
> suggestion how to improve Doc Values merges for sparse fields. I realized 
> that Doc Values producers and consumers use Iterators. Let's take an example 
> of numeric Doc Values. Would it be possible to replace Iterator which 
> "travels" through all documents with Iterator over collection of non empty 
> values? Of course this would require storing object (instead of numeric) 
> which contains value and document ID. Such an iterator could significantly 
> improve merge time of sparse Doc Values fields. IMHO this won't cause big 
> overhead for dense structures but it can be game changer for sparse 
> structures.
> This is what happens in NumericDocValuesWriter on flush
> {code}
> dvConsumer.addNumericField(fieldInfo,
>new Iterable() {
>  @Override
>  public Iterator iterator() {
>return new NumericIterator(maxDoc, values, 
> docsWithField);
>  }
>});
> {code}
> Before this happens during addValue, this loop is executed to fill holes.
> {code}
> // Fill in any holes:
> for (int i = (int)pending.size(); i < docID; ++i) {
>   pending.add(MISSING);
> }
> {code}
> It turns out that variable called pending is used only internally in 
> NumericDocValuesWriter. I know pending is PackedLongValues and it wouldn't be 
> good to change it with different class (some kind of list) because this may 
> break DV performance for dense fields. I hope someone can suggest interesting 
> solutions for this problem :).
> It would be great if discussion about sparse Doc Values merge performance can 
> start here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional 

[jira] [Commented] (SOLR-9016) SolrIdentifierValidator accepts empty names

2016-04-27 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259689#comment-15259689
 ] 

Shai Erera commented on SOLR-9016:
--

Thanks [~anshumg] for doing all the backports!

> SolrIdentifierValidator accepts empty names
> ---
>
> Key: SOLR-9016
> URL: https://issues.apache.org/jira/browse/SOLR-9016
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Reporter: Shai Erera
> Fix For: 5.5.1, 6.1, 6.0.1
>
> Attachments: SOLR-9016.patch
>
>
> SolrIdentifierValidator accepts shard, collection, cores and alias names 
> following this pattern:
> {code}
> ^(?!\\-)[\\._A-Za-z0-9\\-]*$
> {code}
> This accepts an "empty" name. This is easily fixable by changing the {{\*}} 
> to {{+}}. However, it also accepts names such as {{..}}, {{,__---}} etc. Do 
> we not want to require collection names to have a letter/digit identifier in 
> them? Something like the following pattern:
> {code}
> ^(\\.)?[a-zA-Z0-9]+[\\._\\-a-zA-Z0-9]*$
> {code}
> That pattern requires the name to start with an optional {{.}} followed by a 
> series of letters/digits followed by the rest of the allowed characters.
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9016) SolrIdentifierValidator accepts empty names

2016-04-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258170#comment-15258170
 ] 

Shai Erera commented on SOLR-9016:
--

All tests pass, so if there are no objections, I'd like to push this change so 
that it even makes it into 5.5.1.

> SolrIdentifierValidator accepts empty names
> ---
>
> Key: SOLR-9016
> URL: https://issues.apache.org/jira/browse/SOLR-9016
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Reporter: Shai Erera
> Attachments: SOLR-9016.patch
>
>
> SolrIdentifierValidator accepts shard, collection, cores and alias names 
> following this pattern:
> {code}
> ^(?!\\-)[\\._A-Za-z0-9\\-]*$
> {code}
> This accepts an "empty" name. This is easily fixable by changing the {{\*}} 
> to {{+}}. However, it also accepts names such as {{..}}, {{,__---}} etc. Do 
> we not want to require collection names to have a letter/digit identifier in 
> them? Something like the following pattern:
> {code}
> ^(\\.)?[a-zA-Z0-9]+[\\._\\-a-zA-Z0-9]*$
> {code}
> That pattern requires the name to start with an optional {{.}} followed by a 
> series of letters/digits followed by the rest of the allowed characters.
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9016) SolrIdentifierValidator accepts empty names

2016-04-26 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-9016:
-
Attachment: SOLR-9016.patch

Patch fixes the regex to not accept empty identifiers, however it does not 
modify the rule, i.e. someone could still use an identifier like {{\_\_.--}} if 
they want to. I'll be happy to change that, but since I didn't receive any 
feedback I think this fix is the least we can do (and also push into 5.5.1).

The patch also modifies the exception message slightly.

> SolrIdentifierValidator accepts empty names
> ---
>
> Key: SOLR-9016
> URL: https://issues.apache.org/jira/browse/SOLR-9016
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Reporter: Shai Erera
> Attachments: SOLR-9016.patch
>
>
> SolrIdentifierValidator accepts shard, collection, cores and alias names 
> following this pattern:
> {code}
> ^(?!\\-)[\\._A-Za-z0-9\\-]*$
> {code}
> This accepts an "empty" name. This is easily fixable by changing the {{\*}} 
> to {{+}}. However, it also accepts names such as {{..}}, {{,__---}} etc. Do 
> we not want to require collection names to have a letter/digit identifier in 
> them? Something like the following pattern:
> {code}
> ^(\\.)?[a-zA-Z0-9]+[\\._\\-a-zA-Z0-9]*$
> {code}
> That pattern requires the name to start with an optional {{.}} followed by a 
> series of letters/digits followed by the rest of the allowed characters.
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9016) SolrIdentifierValidator accepts empty names

2016-04-20 Thread Shai Erera (JIRA)
Shai Erera created SOLR-9016:


 Summary: SolrIdentifierValidator accepts empty names
 Key: SOLR-9016
 URL: https://issues.apache.org/jira/browse/SOLR-9016
 Project: Solr
  Issue Type: Bug
  Components: Server
Reporter: Shai Erera


SolrIdentifierValidator accepts shard, collection, cores and alias names 
following this pattern:

{code}
^(?!\\-)[\\._A-Za-z0-9\\-]*$
{code}

This accepts an "empty" name. This is easily fixable by changing the {{\*}} to 
{{+}}. However, it also accepts names such as {{..}}, {{,__---}} etc. Do we not 
want to require collection names to have a letter/digit identifier in them? 
Something like the following pattern:

{code}
^(\\.)?[a-zA-Z0-9]+[\\._\\-a-zA-Z0-9]*$
{code}

That pattern requires the name to start with an optional {{.}} followed by a 
series of letters/digits followed by the rest of the allowed characters.

What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-8793) Fix stale commit files' size computation in LukeRequestHandler

2016-03-08 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved SOLR-8793.
--
   Resolution: Fixed
 Assignee: Shai Erera
Fix Version/s: 5.5.1
   master

Pushed the fix to master, branch_6x, branch_6_0, branch_5x and branch_5_5. I 
think it would be good if it's released in a 5.5.1.

> Fix stale commit files' size computation in LukeRequestHandler
> --
>
> Key: SOLR-8793
> URL: https://issues.apache.org/jira/browse/SOLR-8793
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 5.5
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: master, 5.5.1
>
> Attachments: SOLR-8793.patch
>
>
> SOLR-8587 added segments file information and its size to core admin status 
> API. However in case of stale commits, calling that API may result on 
> {{FileNotFoundException}} or {{NoSuchFileException}}, if the segments file no 
> longer exists due to a new commit. We should fix that by returning a proper 
> value for the file's length in this case, maybe -1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8728) Splitting a shard of a collection created with a rule fails with NPE

2016-03-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185223#comment-15185223
 ] 

Shai Erera commented on SOLR-8728:
--

We usually set the fix version to be e.g. "5.5" and "trunk/master".

Cause there are issues that are fixed only in a specific version, e.g. if they 
only affect that version.

> Splitting a shard of a collection created with a rule fails with NPE
> 
>
> Key: SOLR-8728
> URL: https://issues.apache.org/jira/browse/SOLR-8728
> Project: Solr
>  Issue Type: Bug
>Reporter: Shai Erera
>Assignee: Noble Paul
> Fix For: 6.0
>
> Attachments: SOLR-8728.patch, SOLR-8728.patch
>
>
> Spinoff from this discussion: http://markmail.org/message/f7liw4hqaagxo7y2
> I wrote a short test which reproduces, will upload shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8793) Fix stale commit files' size computation in LukeRequestHandler

2016-03-08 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-8793:
-
Attachment: SOLR-8793.patch

Patch fixes the bug by catching the {{IOException}} and returning -1. In that 
case, the index info will should a file size of -1, until the reader is 
refreshed.

I chose to return a -1 over setting an empty string, or not returning the value 
at all since I feel it's better, but if others think otherwise, please comment.

> Fix stale commit files' size computation in LukeRequestHandler
> --
>
> Key: SOLR-8793
> URL: https://issues.apache.org/jira/browse/SOLR-8793
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 5.5
>Reporter: Shai Erera
>Priority: Minor
> Attachments: SOLR-8793.patch
>
>
> SOLR-8587 added segments file information and its size to core admin status 
> API. However in case of stale commits, calling that API may result on 
> {{FileNotFoundException}} or {{NoSuchFileException}}, if the segments file no 
> longer exists due to a new commit. We should fix that by returning a proper 
> value for the file's length in this case, maybe -1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8728) Splitting a shard of a collection created with a rule fails with NPE

2016-03-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185200#comment-15185200
 ] 

Shai Erera commented on SOLR-8728:
--

This is marked as fixed in 6.0, but should it also be marked for 6.1(since it's 
also committed to 6x)?

What about master -- was it not committed to master too? Does it not affect 
master?

And lastly, in case we will have a 5.5.1, is this considered a bugfix that 
we'll want to backport?

> Splitting a shard of a collection created with a rule fails with NPE
> 
>
> Key: SOLR-8728
> URL: https://issues.apache.org/jira/browse/SOLR-8728
> Project: Solr
>  Issue Type: Bug
>Reporter: Shai Erera
>Assignee: Noble Paul
> Fix For: 6.0
>
> Attachments: SOLR-8728.patch, SOLR-8728.patch
>
>
> Spinoff from this discussion: http://markmail.org/message/f7liw4hqaagxo7y2
> I wrote a short test which reproduces, will upload shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8587) Add segments file information to core admin status

2016-03-06 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182640#comment-15182640
 ] 

Shai Erera commented on SOLR-8587:
--

OK yea you're right, I was confused. The file can be read by the open IR, but 
won't appear in directory listing. I opened SOLR-8793 to fix this, sorry for 
that!

Is there a workaround until the fix is released? Refresh the searcher maybe?

> Add segments file information to core admin status
> --
>
> Key: SOLR-8587
> URL: https://issues.apache.org/jira/browse/SOLR-8587
> Project: Solr
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.5, master
>
> Attachments: SOLR-8587.patch, SOLR-8587.patch
>
>
> Having the index's segments file name returned by CoreAdminHandler STATUS can 
> be useful. The info I'm thinking about is the segments file name and its 
> size. If you record that from time to time, in a case of crisis, when u need 
> to restore the index and may not be sure which copy you need to restore, this 
> tiny piece of info can be very useful, as the segmentsN file records the 
> commit point, and therefore what you core reported and what you see at hand 
> can help you make a safer decision.
> I also think it's useful info in general, e.g. probably even more than 
> 'version', and it doesn't add much complexity to the handler or the response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8793) Fix stale commit files' size computation in LukeRequestHandler

2016-03-06 Thread Shai Erera (JIRA)
Shai Erera created SOLR-8793:


 Summary: Fix stale commit files' size computation in 
LukeRequestHandler
 Key: SOLR-8793
 URL: https://issues.apache.org/jira/browse/SOLR-8793
 Project: Solr
  Issue Type: Bug
  Components: Server
Affects Versions: 5.5
Reporter: Shai Erera
Priority: Minor


SOLR-8587 added segments file information and its size to core admin status 
API. However in case of stale commits, calling that API may result on 
{{FileNotFoundException}} or {{NoSuchFileException}}, if the segments file no 
longer exists due to a new commit. We should fix that by returning a proper 
value for the file's length in this case, maybe -1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8587) Add segments file information to core admin status

2016-03-06 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182197#comment-15182197
 ] 

Shai Erera commented on SOLR-8587:
--

How can a changing index affect the code here. The code uses an IndexReader 
instance, and Lucene doesn't delete index files until the last reader is closed 
(within the same process). Also, if that IR instance says its commit point 
references a certain segments_N file, and some other process then goes and 
deletes index files, how can that IR instance work afterwards? I guess I'm 
missing something here.

Also, {{CoreAdminHandler.getCoreStatus()}} computes the size of the directory 
immediately after, although it doesn't use an IR instance to do that, but 
still, can't it fail while listing the directory and then attempt to compute 
file lengths? How is that different then computing the file length of the 
segments_N?

The length is important so that you can make some comparisons to e.g. one 
{{segments_37}} vs another (in a backup maybe), to know if one may be corrupt. 
Maybe not the best use case, but I think that it's an important piece of 
information. Before we remove it I'd like to get to the bottom of the 
exception, to ensure we don't miss something here, or there's a potential bug 
lurking in the code, and removing that particular code will just hide it again.

If my assumptions are incorrect, and however that {{LukeRequestHandler}} works 
is not the usual workflow with Lucene's IndexWriter/Reader, I don't mind 
catching the exception and reporting size 0 for that file. But I think it's 
unhealthy if the system can have an IR which references files that may be 
deleted ...

> Add segments file information to core admin status
> --
>
> Key: SOLR-8587
> URL: https://issues.apache.org/jira/browse/SOLR-8587
> Project: Solr
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.5, master
>
> Attachments: SOLR-8587.patch, SOLR-8587.patch
>
>
> Having the index's segments file name returned by CoreAdminHandler STATUS can 
> be useful. The info I'm thinking about is the segments file name and its 
> size. If you record that from time to time, in a case of crisis, when u need 
> to restore the index and may not be sure which copy you need to restore, this 
> tiny piece of info can be very useful, as the segmentsN file records the 
> commit point, and therefore what you core reported and what you see at hand 
> can help you make a safer decision.
> I also think it's useful info in general, e.g. probably even more than 
> 'version', and it doesn't add much complexity to the handler or the response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8587) Add segments file information to core admin status

2016-03-06 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182185#comment-15182185
 ] 

Shai Erera commented on SOLR-8587:
--

I've just tried that on 5.5.0 and it seems to work without any problems. I did 
use {{bin/solr start -e cloud}} and not on a "real" Solr box though (haven't 
upgraded them to 5.5 yet), but I don't see how this exception is possible. The 
code uses an instance of {{IndexReader}} to get all that information, and that 
instance must have read the {{segments_N}} to read the commit point. Could 
there be some other process on that may delete this file?

> Add segments file information to core admin status
> --
>
> Key: SOLR-8587
> URL: https://issues.apache.org/jira/browse/SOLR-8587
> Project: Solr
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 5.5, master
>
> Attachments: SOLR-8587.patch, SOLR-8587.patch
>
>
> Having the index's segments file name returned by CoreAdminHandler STATUS can 
> be useful. The info I'm thinking about is the segments file name and its 
> size. If you record that from time to time, in a case of crisis, when u need 
> to restore the index and may not be sure which copy you need to restore, this 
> tiny piece of info can be very useful, as the segmentsN file records the 
> commit point, and therefore what you core reported and what you see at hand 
> can help you make a safer decision.
> I also think it's useful info in general, e.g. probably even more than 
> 'version', and it doesn't add much complexity to the handler or the response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8734) fix deprecation warnings for absent (maxMergeDocs|mergeFactor)

2016-02-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167049#comment-15167049
 ] 

Shai Erera commented on SOLR-8734:
--

Patch looks good. Even while working on the previous issue I thought that we 
should be able to detect the existence of {{}} element in the 
.xml, and not compare the value to the default. What happens if someone 
includes both {{}} and {{}} set to the default 
value? Would we also fail?

> fix deprecation warnings for absent (maxMergeDocs|mergeFactor)
> --
>
> Key: SOLR-8734
> URL: https://issues.apache.org/jira/browse/SOLR-8734
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.5, master
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Attachments: SOLR-8734.patch, SOLR-8734.patch
>
>
> [~markus17] wrote on the solr-user mailing list:
> bq. 5.5.0 SOLR-8621 deprecation warnings without maxMergeDocs or mergeFactor
> ...
> bq. o.a.s.c.Config Beginning with Solr 5.5,  is deprecated, 
> configure it on the relevant  instead.
> ...
> bq. On my development machine for all cores. None of the cores has either 
> parameter configured. Is this expected?
> ...
> [~cpoerschke] replied:
> ...
> bq. Could you advise if/that the solrconfig.xml has a  element 
> (for which deprecated warnings would appear separately) or that the 
> solrconfig.xml has no  element?
> ...
> bq. If either is the case then yes based on the code 
> (SolrIndexConfig.java#L153) the warnings would be expected-and-harmless 
> though admittedly are confusing, and fixable.
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8728) Splitting a shard of a collection created with a rule fails with NPE

2016-02-24 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-8728:
-
Attachment: SOLR-8728.patch

If you run this test it fails and you can see this exception in the console:

{noformat}
42071 ERROR 
(OverseerThreadFactory-6-thread-2-processing-n:127.0.0.1:49954_yd_dma%2Fkz) 
[n:127.0.0.1:49954_yd_dma%2Fkz] o.a.s.c.OverseerCollectionMessageHandler 
Error executing split operation for collection: shardSplitWithRule parent 
shard: shard1
java.lang.NullPointerException
at 
org.apache.solr.cloud.rule.Rule.getNumberOfNodesWithSameTagVal(Rule.java:166)
at org.apache.solr.cloud.rule.Rule.tryAssignNodeToShard(Rule.java:128)
at 
org.apache.solr.cloud.rule.ReplicaAssigner.tryAPermutationOfRules(ReplicaAssigner.java:249)
at 
org.apache.solr.cloud.rule.ReplicaAssigner.tryAllPermutations(ReplicaAssigner.java:201)
at 
org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings0(ReplicaAssigner.java:173)
at 
org.apache.solr.cloud.rule.ReplicaAssigner.getNodeMappings(ReplicaAssigner.java:134)
at org.apache.solr.cloud.Assign.getNodesViaRules(Assign.java:215)
at org.apache.solr.cloud.Assign.getNodesForNewReplicas(Assign.java:178)
at 
org.apache.solr.cloud.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:2164)
at 
org.apache.solr.cloud.OverseerCollectionMessageHandler.splitShard(OverseerCollectionMessageHandler.java:1388)
at 
org.apache.solr.cloud.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:236)
at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:433)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

> Splitting a shard of a collection created with a rule fails with NPE
> 
>
> Key: SOLR-8728
> URL: https://issues.apache.org/jira/browse/SOLR-8728
> Project: Solr
>  Issue Type: Bug
>Reporter: Shai Erera
> Attachments: SOLR-8728.patch
>
>
> Spinoff from this discussion: http://markmail.org/message/f7liw4hqaagxo7y2
> I wrote a short test which reproduces, will upload shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8728) Splitting a shard of a collection created with a rule fails with NPE

2016-02-24 Thread Shai Erera (JIRA)
Shai Erera created SOLR-8728:


 Summary: Splitting a shard of a collection created with a rule 
fails with NPE
 Key: SOLR-8728
 URL: https://issues.apache.org/jira/browse/SOLR-8728
 Project: Solr
  Issue Type: Bug
Reporter: Shai Erera


Spinoff from this discussion: http://markmail.org/message/f7liw4hqaagxo7y2

I wrote a short test which reproduces, will upload shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7668) Port-base rule for shard placement causes NPE

2016-02-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162918#comment-15162918
 ] 

Shai Erera commented on SOLR-7668:
--

Is this not resolved already?

> Port-base rule for shard placement causes NPE
> -
>
> Key: SOLR-7668
> URL: https://issues.apache.org/jira/browse/SOLR-7668
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.2
>Reporter: Adam McElwee
>Assignee: Noble Paul
>Priority: Minor
> Attachments: SOLR-7668.patch, SOLR-7668.patch
>
>
> I was setting up some rule-based collections, and I hit an NPE whenever I try 
> to include a port-based rule. It looks like the implementation was started, 
> but not completed for ports. Patch coming in just a moment.
> I included a test, and I have no problems getting the test to pass when run 
> by itself. However, when I run it w/ the other tests in RulesTest, it fails 
> because of some ZK errors, but I often have those issues when running any of 
> the distrib tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-20 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155909#comment-15155909
 ] 

Shai Erera commented on SOLR-8621:
--

Looks good [~cpoerschke]! And yes, let's be consistent and keep the full 
package name, as the other examples.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * Lucene's SortingMergePolicy can be configured in Solr (with SOLR-5730)
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5 deprecates (but maintains)  support
> * SOLR-8668 in solr 6.0(\?) will remove  support 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8674) transition from solr.tests.mergePolicy to solr.tests.mergePolicyFactory

2016-02-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154396#comment-15154396
 ] 

Shai Erera commented on SOLR-8674:
--

Patch looks good!

> transition from solr.tests.mergePolicy to solr.tests.mergePolicyFactory
> ---
>
> Key: SOLR-8674
> URL: https://issues.apache.org/jira/browse/SOLR-8674
> Project: Solr
>  Issue Type: Test
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Attachments: SOLR-8674.patch
>
>
> Following on from SOLR-8621 which deprecated/replaced  with 
>  there are some test solrconfig.xml files and associated 
> test code that need to be transitioned from
> {code}
>  class="${solr.tests.mergePolicy:org.apache.solr.util.RandomMergePolicy}"/>
> {code}
> to
> {code}
>  class="${solr.tests.mergePolicyFactory:org.apache.solr.util.RandomMergePolicyFactory}"/>
> {code}
> or something similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154388#comment-15154388
 ] 

Shai Erera commented on SOLR-8621:
--

I am not sure how to comment on those pages, so I'll comment here:

bq. {{}}

Shouldn't the class be {{TieredMergePolicyFactory}}?

bq. This process can continue indefinitely

Well, not indefinitely :). More like "The process can continue until there are 
no more {{mergeFactor}} segments to merge of same size".

bq. and the maxMergeAtOnce setting determines how many segments should be 
included in the merge

Perhaps "... should be included in each merge"? Cause if segmentsPerTier is 30 
and maxMergeAtOnce is 10, there will be 3 merges.

bq. It also can also result

One extra 'also' here.

bq. {{class="MyCustomMergePolicyFactory"}}

Should we write {{class="full.package.MyCustomMergePolicyFactory"}}? It's not 
critical but I want to emphasize that one cannot just give the class name here, 
but needs to FQCN.

bq. {{org.apache.solr.index.TieredMergePolicyFactory}}

If {{solr.TieredMergePolicyFactory}} works too, let's write that? That way, if 
the factory changes packages, we won't need to update the guide.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * Lucene's SortingMergePolicy can be configured in Solr (with SOLR-5730)
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5 deprecates (but maintains)  support
> * SOLR-8668 in solr 6.0(\?) will remove  support 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8677) SOLR allows creation of shards with invalid names.

2016-02-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146678#comment-15146678
 ] 

Shai Erera commented on SOLR-8677:
--

I don't mind either way Jason. Either you add it to this patch, and with that 
finish the collection/alias/shard name restrictions handling, or to a new 
issue, whatever works for you. I assume both will be released only post 5.5 
anyway, and thus together. If you prefer to handle that separately, let me know 
and I'll add a CHANGES entry before committing this patch.

> SOLR allows creation of shards with invalid names.
> --
>
> Key: SOLR-8677
> URL: https://issues.apache.org/jira/browse/SOLR-8677
> Project: Solr
>  Issue Type: Bug
>Affects Versions: master
>Reporter: Jason Gerlowski
>Priority: Minor
> Fix For: master
>
> Attachments: SOLR-8677.patch
>
>
> Solr currently has "recommendations" about what constitutes a valid 
> identifier, but doesn't enforce these "recommendations" uniformly.  Core 
> (SOLR-8308) and collection (SOLR-8642) names are currently checked, but 
> shards aren't.
> {code}
> $ bin/solr -e cloud -noprompt
> 
> $ curl -i -l -k -X GET 
> "http://localhost:8983/solr/admin/collections?action=CREATE=coll1=implicit=1=bad+shard+name;
> HTTP/1.1 200 OK
> Content-Type: application/xml; charset=UTF-8
> Transfer-Encoding: chunked
> 
> 
> 0 name="QTime">204 name="failure">org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
>  from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
> 'coll1_bad shard name_replica1': Unable to create core [coll1_bad shard 
> name_replica1] Caused by: Invalid name: 'coll1_bad shard name_replica1' 
> Identifiers must consist entirely of periods, underscores and 
> alphanumerics
> 
> {code}
> (Note that the CREATE command above returned 200-OK, and the failure was only 
> apparent when viewing the message.)
> A CLUSTERSTATUS shows that the shard was actually created, but has no 
> underlying cores.
> {code}
> $ curl -i -l -k -X GET 
> "http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS=json=true;
> ...
> "collections":{
>   "coll1":{
> "replicationFactor":"1",
> "shards":{"bad shard name":{
> "range":null,
> "state":"active",
> "replicas":{}}},
> "router":{"name":"implicit"},
> "maxShardsPerNode":"1",
> "autoAddReplicas":"false",
> "znodeVersion":1,
> "configName":"gettingstarted"},
> ...
> {code}
> This JIRA proposes adding a check to ensure that shard names meet SOLR's 
> identifier "recommendations".  This should prevent users from accidentally 
> putting themselves in a bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8677) SOLR allows creation of shards with invalid names.

2016-02-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146458#comment-15146458
 ] 

Shai Erera commented on SOLR-8677:
--

Looks good to me. [~gerlowskija] I know it's not strictly related to this 
issue, but perhaps we can use the validator in SolrJ too, short-circuiting 
alias/collection/shard requests before they reach the server?

> SOLR allows creation of shards with invalid names.
> --
>
> Key: SOLR-8677
> URL: https://issues.apache.org/jira/browse/SOLR-8677
> Project: Solr
>  Issue Type: Bug
>Affects Versions: master
>Reporter: Jason Gerlowski
>Priority: Minor
> Fix For: master
>
> Attachments: SOLR-8677.patch
>
>
> Solr currently has "recommendations" about what constitutes a valid 
> identifier, but doesn't enforce these "recommendations" uniformly.  Core 
> (SOLR-8308) and collection (SOLR-8642) names are currently checked, but 
> shards aren't.
> {code}
> $ bin/solr -e cloud -noprompt
> 
> $ curl -i -l -k -X GET 
> "http://localhost:8983/solr/admin/collections?action=CREATE=coll1=implicit=1=bad+shard+name;
> HTTP/1.1 200 OK
> Content-Type: application/xml; charset=UTF-8
> Transfer-Encoding: chunked
> 
> 
> 0 name="QTime">204 name="failure">org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
>  from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
> 'coll1_bad shard name_replica1': Unable to create core [coll1_bad shard 
> name_replica1] Caused by: Invalid name: 'coll1_bad shard name_replica1' 
> Identifiers must consist entirely of periods, underscores and 
> alphanumerics
> 
> {code}
> (Note that the CREATE command above returned 200-OK, and the failure was only 
> apparent when viewing the message.)
> A CLUSTERSTATUS shows that the shard was actually created, but has no 
> underlying cores.
> {code}
> $ curl -i -l -k -X GET 
> "http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS=json=true;
> ...
> "collections":{
>   "coll1":{
> "replicationFactor":"1",
> "shards":{"bad shard name":{
> "range":null,
> "state":"active",
> "replicas":{}}},
> "router":{"name":"implicit"},
> "maxShardsPerNode":"1",
> "autoAddReplicas":"false",
> "znodeVersion":1,
> "configName":"gettingstarted"},
> ...
> {code}
> This JIRA proposes adding a check to ensure that shard names meet SOLR's 
> identifier "recommendations".  This should prevent users from accidentally 
> putting themselves in a bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8677) SOLR allows creation of shards with invalid names.

2016-02-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146459#comment-15146459
 ] 

Shai Erera commented on SOLR-8677:
--

Also, would u mind adding a CHANGES.txt entry too?

> SOLR allows creation of shards with invalid names.
> --
>
> Key: SOLR-8677
> URL: https://issues.apache.org/jira/browse/SOLR-8677
> Project: Solr
>  Issue Type: Bug
>Affects Versions: master
>Reporter: Jason Gerlowski
>Priority: Minor
> Fix For: master
>
> Attachments: SOLR-8677.patch
>
>
> Solr currently has "recommendations" about what constitutes a valid 
> identifier, but doesn't enforce these "recommendations" uniformly.  Core 
> (SOLR-8308) and collection (SOLR-8642) names are currently checked, but 
> shards aren't.
> {code}
> $ bin/solr -e cloud -noprompt
> 
> $ curl -i -l -k -X GET 
> "http://localhost:8983/solr/admin/collections?action=CREATE=coll1=implicit=1=bad+shard+name;
> HTTP/1.1 200 OK
> Content-Type: application/xml; charset=UTF-8
> Transfer-Encoding: chunked
> 
> 
> 0 name="QTime">204 name="failure">org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
>  from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 
> 'coll1_bad shard name_replica1': Unable to create core [coll1_bad shard 
> name_replica1] Caused by: Invalid name: 'coll1_bad shard name_replica1' 
> Identifiers must consist entirely of periods, underscores and 
> alphanumerics
> 
> {code}
> (Note that the CREATE command above returned 200-OK, and the failure was only 
> apparent when viewing the message.)
> A CLUSTERSTATUS shows that the shard was actually created, but has no 
> underlying cores.
> {code}
> $ curl -i -l -k -X GET 
> "http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS=json=true;
> ...
> "collections":{
>   "coll1":{
> "replicationFactor":"1",
> "shards":{"bad shard name":{
> "range":null,
> "state":"active",
> "replicas":{}}},
> "router":{"name":"implicit"},
> "maxShardsPerNode":"1",
> "autoAddReplicas":"false",
> "znodeVersion":1,
> "configName":"gettingstarted"},
> ...
> {code}
> This JIRA proposes adding a check to ensure that shard names meet SOLR's 
> identifier "recommendations".  This should prevent users from accidentally 
> putting themselves in a bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142568#comment-15142568
 ] 

Shai Erera commented on SOLR-8621:
--

Can you point me to a test failure?

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work left-to-do summary:+
>  * {color:red}WrapperMergePolicyFactory setter logic tweak/mini-bug (and test 
> case){color} - Christine
>  * Solr Reference Guide changes (directly in Confluence?)
>  * changes to remaining solrconfig.xml
>  ** solr/core/src/test-files/solr/collection1/conf - Christine
>  ** solr/server/solr/configsets
> +open question:+
>  * Do we want to error if luceneMatchVersion >= 5.5 and deprecated 
> mergePolicy/mergeFactor/maxMergeDocs are used? See [~hossman]'s comment on 
> Feb 1st. The code as-is permits mergePolicy irrespective of 
> luceneMatchVersion, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142613#comment-15142613
 ] 

Shai Erera commented on SOLR-8621:
--

Thanks I'll take a look.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work left-to-do summary:+
>  * {color:red}WrapperMergePolicyFactory setter logic tweak/mini-bug (and test 
> case){color} - Christine
>  * Solr Reference Guide changes (directly in Confluence?)
>  * changes to remaining solrconfig.xml
>  ** solr/core/src/test-files/solr/collection1/conf - Christine
>  ** solr/server/solr/configsets
> +open question:+
>  * Do we want to error if luceneMatchVersion >= 5.5 and deprecated 
> mergePolicy/mergeFactor/maxMergeDocs are used? See [~hossman]'s comment on 
> Feb 1st. The code as-is permits mergePolicy irrespective of 
> luceneMatchVersion, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143618#comment-15143618
 ] 

Shai Erera commented on SOLR-8621:
--

[~cpoerschke] these look good, so +1 to merge to 5x. I agree that we're done w/ 
this issue here. We can separately take care of SOLR-8674 and SOLR-8668. I 
enjoyed this collaboration, thank you very much for such a fun and positive 
experience!

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work left-to-do summary:+
>  * Solr Reference Guide changes (directly in Confluence?)
>  * changes to remaining solrconfig.xml
>  ** solr/core/src/test-files/solr/collection1/conf - in SOLR-8674
>  ** solr/server/solr/configsets - master committed, branch_5x/branch_5_5 
> cherry-pick to follow
>  * WrapperMergePolicyFactory.getMergePolicyInstance method  - master 
> committed, branch_5x/branch_5_5 cherry-pick to follow
>  * RandomForceMergePolicyFactory - master committed, branch_5x/branch_5_5 
> cherry-pick to follow
> +open question:+
>  * Do we want to error if luceneMatchVersion >= 5.5 and deprecated 
> mergePolicy/mergeFactor/maxMergeDocs are used? See [~hossman]'s comment on 
> Feb 1st. The code as-is permits mergePolicy irrespective of 
> luceneMatchVersion, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5730) make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr

2016-02-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143646#comment-15143646
 ] 

Shai Erera commented on SOLR-5730:
--

Patch looks good! One minor comment -- in 
{{SortingMergePolicyFactory.getMergePolicyInstance()}} I'd inline to {{return 
new Sorting ...}}.


> make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector 
> configurable in Solr
> --
>
> Key: SOLR-5730
> URL: https://issues.apache.org/jira/browse/SOLR-5730
> Project: Solr
>  Issue Type: New Feature
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Labels: blocker
> Fix For: 5.5, master
>
> Attachments: SOLR-5730-part1and2.patch, SOLR-5730-part1and2.patch, 
> SOLR-5730-part1of2.patch, SOLR-5730-part1of2.patch, SOLR-5730-part2of2.patch, 
> SOLR-5730-part2of2.patch
>
>
> *Example configuration (solrconfig.xml) :*
> {noformat}
> -
> +
> +  in
> +  org.apache.solr.index.TieredMergePolicyFactory
> +  timestamp desc
> +
> {noformat}
> *Example use (EarlyTerminatingSortingCollector):*
> {noformat}
> =timestamp+desc=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7020) TieredMergePolicy - cascade maxMergeAtOnce setting to maxMergeAtOnceExplicit

2016-02-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140762#comment-15140762
 ] 

Shai Erera commented on LUCENE-7020:


I don't see how {{maxMergeAtOnce}} and {{maxMergeAtOnceExplicit}} are related, 
and why there should be any ratio defined between the two. The former specifies 
how many segments to merge at once during regular merges, and the latter 
specifies the same for {{forceMerge}}. Why should they be dependent on each 
other?

I agree with [~mikemccand] that setting any of them too high is risky. If they 
user explicitly sets that, that's fine. But if the user sets one of them, 
blindly changing the other seems like a surprising effect to me, which could 
have negative impact on the performance of the system.

The reason why the 'explicit' setting is higher by default than the 
non-explicit one is I believe cause in regular merges, you don't want to 
consume too many resources, cause there are other operations (indexing, search) 
that happen in parallel. But when you explicitly call {{forceMerge}}, and 
assuming you know what you're doing and its impact on the server, you do that 
at _quiet_ hours. I wouldn't use the defaults to come up w/ any global 
recommended settings for the ratio between these two.

> TieredMergePolicy - cascade maxMergeAtOnce setting to maxMergeAtOnceExplicit
> 
>
> Key: LUCENE-7020
> URL: https://issues.apache.org/jira/browse/LUCENE-7020
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 5.4.1
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
> Attachments: LUCENE-7020.patch
>
>
> SOLR-8621 covers improvements in configuring a merge policy in Solr.
> Discussions on that issue brought up the fact that if large values are 
> configured for maxMergeAtOnce and segmentsPerTier, but maxMergeAtOnceExplicit 
> is not changed, then doing a forceMerge is likely to not work as expected.
> When I first configured maxMergeAtOnce and segmentsPerTier to 35 in Solr, I 
> saw an optimize (forceMerge) fully rewrite most of the index *twice* in order 
> to achieve a single segment, because there were approximately 80 segments in 
> the index before the optimize, and maxMergeAtOnceExplicit defaults to 30.  On 
> advice given via the solr-user mailing list, I configured 
> maxMergeAtOnceExplicit to 105 and have not had that problem since.
> I propose that setting maxMergeAtOnce should also set maxMergeAtOnceExplicit 
> to three times the new value -- unless the setMaxMergeAtOnceExplicit method 
> has been invoked, indicating that the user wishes to set that value 
> themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7020) TieredMergePolicy - cascade maxMergeAtOnce setting to maxMergeAtOnceExplicit

2016-02-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140902#comment-15140902
 ] 

Shai Erera commented on LUCENE-7020:


bq.  I think users who have very large indexes are more likely to choose larger 
values for TMP

But that's wrong. The number that you set this too is not linearly-related to 
the number of segments in the index. If for instance you have an index a/ 1000 
segments, you wouldn't want to run a merge with {{maxMergeAtOnce=500}} .. not 
sure you'll even have that many resources.

IMO, the only relation between the two settings is that *neither* of them 
should be set so high. I'd even say that if you consider setting 
{{maxMergeAtOnce}} to 30, do the same for {{*Explicit}}. That is, when you set 
a too high value for regular merges, set the same value for explicit merges. 
Unless, you benchmarked your system that found out that merging 100 segments 
together is (a) possible and (b) really improves perf speed of the merge.

> TieredMergePolicy - cascade maxMergeAtOnce setting to maxMergeAtOnceExplicit
> 
>
> Key: LUCENE-7020
> URL: https://issues.apache.org/jira/browse/LUCENE-7020
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 5.4.1
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
> Attachments: LUCENE-7020.patch
>
>
> SOLR-8621 covers improvements in configuring a merge policy in Solr.
> Discussions on that issue brought up the fact that if large values are 
> configured for maxMergeAtOnce and segmentsPerTier, but maxMergeAtOnceExplicit 
> is not changed, then doing a forceMerge is likely to not work as expected.
> When I first configured maxMergeAtOnce and segmentsPerTier to 35 in Solr, I 
> saw an optimize (forceMerge) fully rewrite most of the index *twice* in order 
> to achieve a single segment, because there were approximately 80 segments in 
> the index before the optimize, and maxMergeAtOnceExplicit defaults to 30.  On 
> advice given via the solr-user mailing list, I configured 
> maxMergeAtOnceExplicit to 105 and have not had that problem since.
> I propose that setting maxMergeAtOnce should also set maxMergeAtOnceExplicit 
> to three times the new value -- unless the setMaxMergeAtOnceExplicit method 
> has been invoked, indicating that the user wishes to set that value 
> themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-10 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-8621:
-
Attachment: SOLR-8621-example_contrib_configs.patch

[~cpoerschke] if you're OK with this patch, I'll commit it.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Blocker
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress summary:+
>  * main code changes have been committed to master and branch_5x
>  * {color:red}further small code change required:{color} MergePolicyFactory 
> constructor or MergePolicyFactory.getMergePolicy method to take IndexSchema 
> argument (e.g. for use by SortingMergePolicyFactory being added under related 
> SOLR-5730)
>  * Solr Reference Guide changes (directly in Confluence?)
>  * changes to remaining solrconfig.xml
>  ** solr/core/src/test-files/solr/collection1/conf - Christine
>  ** solr/contrib
>  ** solr/example
> +open question:+
>  * Do we want to error if luceneMatchVersion >= 5.5 and deprecated 
> mergePolicy/mergeFactor/maxMergeDocs are used? See [~hossman]'s comment on 
> Feb 1st. The code as-is permits mergePolicy irrespective of 
> luceneMatchVersion, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140888#comment-15140888
 ] 

Shai Erera commented on SOLR-8621:
--

You're right Christine, I didn't notice the {{IndexSchema}} only variant. Let's 
continue on SOLR-5730 then by adding the schema to the factory's ctor.

So besides this API change, what's left to do in the context of this issue? 
Since it's marked blocker for 5.5, I want to make sure that we don't hold up 
the release too long:

* So the API change has to go in.
* Adding tests can be done separately? (unless you already have some work done 
there).
* We finished (as far as I could tell), updating all existing solrconfig.xmls.
* We have a separate issue to remove support in 6.0.
* We should update the ref guide, but the ref guide is usually released after 
the binaries anyway.

Am I missing something?

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Blocker
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress summary:+
>  * main code changes have been committed to master and branch_5x
>  * {color:red}further small code change required:{color} MergePolicyFactory 
> constructor or MergePolicyFactory.getMergePolicy method to take IndexSchema 
> argument (e.g. for use by SortingMergePolicyFactory being added under related 
> SOLR-5730)
>  * Solr Reference Guide changes (directly in Confluence?)
>  * changes to remaining solrconfig.xml
>  ** solr/core/src/test-files/solr/collection1/conf - Christine
>  ** solr/contrib
>  ** solr/example
> +open question:+
>  * Do we want to error if luceneMatchVersion >= 5.5 and deprecated 
> mergePolicy/mergeFactor/maxMergeDocs are used? See [~hossman]'s comment on 
> Feb 1st. The code as-is permits mergePolicy irrespective of 
> luceneMatchVersion, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7020) TieredMergePolicy - cascade maxMergeAtOnce setting to maxMergeAtOnceExplicit

2016-02-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141065#comment-15141065
 ] 

Shai Erera commented on LUCENE-7020:


Sure, that's expected behavior and is noted as cascaded merges. The reason for 
these two settings is to better control system resources. And again, if you had 
a 150-segments index, would you change the setting to 150? I think that if you 
run forceMerge(1), you should expect few rounds of merges. Unless you feel 
comfortable with merging 150 segments at once.

But, I don't think this is a global setting and relation that we should set 
between these two settings.

> TieredMergePolicy - cascade maxMergeAtOnce setting to maxMergeAtOnceExplicit
> 
>
> Key: LUCENE-7020
> URL: https://issues.apache.org/jira/browse/LUCENE-7020
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 5.4.1
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
> Attachments: LUCENE-7020.patch
>
>
> SOLR-8621 covers improvements in configuring a merge policy in Solr.
> Discussions on that issue brought up the fact that if large values are 
> configured for maxMergeAtOnce and segmentsPerTier, but maxMergeAtOnceExplicit 
> is not changed, then doing a forceMerge is likely to not work as expected.
> When I first configured maxMergeAtOnce and segmentsPerTier to 35 in Solr, I 
> saw an optimize (forceMerge) fully rewrite most of the index *twice* in order 
> to achieve a single segment, because there were approximately 80 segments in 
> the index before the optimize, and maxMergeAtOnceExplicit defaults to 30.  On 
> advice given via the solr-user mailing list, I configured 
> maxMergeAtOnceExplicit to 105 and have not had that problem since.
> I propose that setting maxMergeAtOnce should also set maxMergeAtOnceExplicit 
> to three times the new value -- unless the setMaxMergeAtOnceExplicit method 
> has been invoked, indicating that the user wishes to set that value 
> themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140753#comment-15140753
 ] 

Shai Erera commented on SOLR-8621:
--

Good catch, I just blindly moved that 'mergeFactor' in :). I'll remove it. I'd 
like to keep the example though, since it's consistent with the rest of the 
commented out examples.

About {{SortingMergePolicyFactory}} and {{IndexSchema}}, I see that 
{{SortSpecParsing}} uses the provided schema only to validate that the field 
exists in the schema. If we want to keep that validity check, then let's 
continue with your proposal of passing {{IndexSchema}} to the factory's ctor.

About this parsing logic, it also relies on request params, so perhaps factor 
out the parsing logic to a utility that you can use? That utility can then also 
validate that the sort field exists in the schema. I didn't review that method 
fully, but I hope it's doable.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Blocker
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621.patch, explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress summary:+
>  * main code changes have been committed to master and branch_5x
>  * {color:red}further small code change required:{color} MergePolicyFactory 
> constructor or MergePolicyFactory.getMergePolicy method to take IndexSchema 
> argument (e.g. for use by SortingMergePolicyFactory being added under related 
> SOLR-5730)
>  * Solr Reference Guide changes (directly in Confluence?)
>  * changes to remaining solrconfig.xml
>  ** solr/core/src/test-files/solr/collection1/conf - Christine
>  ** solr/contrib
>  ** solr/example
> +open question:+
>  * Do we want to error if luceneMatchVersion >= 5.5 and deprecated 
> mergePolicy/mergeFactor/maxMergeDocs are used? See [~hossman]'s comment on 
> Feb 1st. The code as-is permits mergePolicy irrespective of 
> luceneMatchVersion, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142095#comment-15142095
 ] 

Shai Erera commented on SOLR-8621:
--

[~cpoerschke] thx for fixing that typo! And your latest commit looks fine to 
me. +1 to get it in.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work left-to-do summary:+
>  * {color:red}WrapperMergePolicyFactory setter logic tweak/mini-bug (and test 
> case){color} - Christine
>  * Solr Reference Guide changes (directly in Confluence?)
>  * changes to remaining solrconfig.xml
>  ** solr/core/src/test-files/solr/collection1/conf - Christine
>  ** solr/server/solr/configsets
> +open question:+
>  * Do we want to error if luceneMatchVersion >= 5.5 and deprecated 
> mergePolicy/mergeFactor/maxMergeDocs are used? See [~hossman]'s comment on 
> Feb 1st. The code as-is permits mergePolicy irrespective of 
> luceneMatchVersion, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5730) make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr

2016-02-10 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142109#comment-15142109
 ] 

Shai Erera commented on SOLR-5730:
--

Few comments about the patch:

* In QueryComponent: {{if\(existingSegmentTerminatedEarly == null\)}} -- can 
you add a space after the 'if'?

* {{SortingMergePolicyFactory.getMergePolicy()}} calls 
{{args.invokeSetters(mp);}}, like {{UpgradeIndexMergePolicyFactory}}. I wonder 
if we can have a protected abstract {{getMergePolicyInstance(wrappedMP)}}, so 
that {{WrapperMergePolicyFactory.getMergePolicy()}} implements it by calling 
this method followed by {{args.invokeSetters(mp);}}. What do you think?

* {{SolrIndexSearcher}}:  
{{qr.setSegmentTerminatedEarly\(earlyTerminatingSortingCollector.terminatedEarly\(\)\);}}
 -- should we also set {{qr.partialResults}}?

* {{DefaultSolrCoreState}}: you can change the to:

{code}
public Sort getMergePolicySort() throws IOException {
  lock(iwLock.readLock());
  try {
if (indexWriter != null) {
  final MergePolicy mergePolicy = indexWriter.getConfig().getMergePolicy();
  if (mergePolicy instanceof SortingMergePolicy) {
return ((SortingMergePolicy) mergePolicy).getSort();
  }
}
  } finally {
iwLock.readLock().unlock();
  }
}
{code}

* What's the purpose of 
{{enable="$\{solr.sortingMergePolicyFactory.enable:true\}"}}?

* I kind of feel like the test you added to {{TestMiniSolrCloudCluster}} 
doesn't belong in that class. Perhaps it should be in its own test class, 
inheriting from this class, or just using {{MiniSolrCloudCluster}}?

* {{RandomForceMergePolicyFactory}} is not really related to this issue. 
Perhaps you should commit it separately?

> make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector 
> configurable in Solr
> --
>
> Key: SOLR-5730
> URL: https://issues.apache.org/jira/browse/SOLR-5730
> Project: Solr
>  Issue Type: New Feature
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Labels: blocker
> Fix For: 5.5, master
>
> Attachments: SOLR-5730-part1and2.patch, SOLR-5730-part1of2.patch, 
> SOLR-5730-part1of2.patch, SOLR-5730-part2of2.patch, SOLR-5730-part2of2.patch
>
>
> *Example configuration (solrconfig.xml) :*
> {noformat}
> -
> +
> +  in
> +  org.apache.solr.index.TieredMergePolicyFactory
> +  timestamp desc
> +
> {noformat}
> *Example use (EarlyTerminatingSortingCollector):*
> {noformat}
> =timestamp+desc=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138848#comment-15138848
 ] 

Shai Erera commented on SOLR-8621:
--

[~cpoerschke]: OK so let's do the following:

* Move all solrconfig.xmls to use the new factory, except the legacy ones (used 
for tests). Also, let's mark somehow all the tests/code that we want to remove 
in master, so it will be easier to track.

* Open an issue to remove support for {{}} in 6.0. We can make it 
a blocker for 6.0.

* About the ref guide, I don't think that we work on it via JIRA. Do we modify 
it in confluence directly?

About splitting the work, is there something you prefer to do? If not, I'd 
rather handle the existing solrconfigs than modify the ref guide. You'll 
probably do a better job at it than me ;). I will gladly help with reviews!

Also, if there's anything else you think should be done in the context of this 
issue, or need help with, please let me know!

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Attachments: SOLR-8621.patch, explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress git branch:+ 
> [master-solr-8621|https://github.com/apache/lucene-solr/tree/master-solr-8621]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138897#comment-15138897
 ] 

Shai Erera commented on SOLR-8621:
--

bq. Could I take on the

Sure. You lead this, I'm only here to help :). So tell me how can I help more.

bq. Not sure if we need or want a new one to replace it?

For fixing the config xmls, I don't think that we need a branch. If we're not 
collaborating on the same task / code path, no need for a remote branch. If 
that's OK with you, I'll go ahead and delete it.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Attachments: SOLR-8621.patch, explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress git branch:+ 
> [master-solr-8621|https://github.com/apache/lucene-solr/tree/master-solr-8621]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140147#comment-15140147
 ] 

Shai Erera commented on SOLR-8621:
--

bq. Could you take care of the solr/contrib and solr/example solrconfig.xml 
changes?

Sure, I'll take a look at them.

bq. And opening the follow-on issue to remove support for  in 6.0?

Opened SOLR-8668.

About {{IndexSchema}}, I was going to propose that you add it as a ctor 
argument, but I see you've already done that. Just wondering though, what does 
{{SortingMergePolicyFactory}} need from IndexSchema that it cannot create on 
its own? It already receives the sort-by fields and order in the config, all it 
needs to do is to create a {{Sort}} class. What does it get from 
{{IndexSchema}} then?

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Blocker
> Fix For: 5.5, master
>
> Attachments: SOLR-8621.patch, explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress summary:+
>  * main code changes have been committed to master and branch_5x
>  * {color:red}further small code change required:{color} MergePolicyFactory 
> constructor or MergePolicyFactory.getMergePolicy method to take IndexSchema 
> argument (e.g. for use by SortingMergePolicyFactory being added under related 
> SOLR-5730)
>  * Solr Reference Guide changes (directly in Confluence?)
>  * changes to remaining solrconfig.xml
>  ** solr/core/src/test-files/solr/collection1/conf - Christine
>  ** solr/contrib
>  ** solr/example
> +open question:+
>  * Do we want to error if luceneMatchVersion >= 5.5 and deprecated 
> mergePolicy/mergeFactor/maxMergeDocs are used? See [~hossman]'s comment on 
> Feb 1st. The code as-is permits mergePolicy irrespective of 
> luceneMatchVersion, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8668) Remove support for

2016-02-09 Thread Shai Erera (JIRA)
Shai Erera created SOLR-8668:


 Summary: Remove support for 
 Key: SOLR-8668
 URL: https://issues.apache.org/jira/browse/SOLR-8668
 Project: Solr
  Issue Type: Improvement
Reporter: Shai Erera
Priority: Blocker
 Fix For: 6.0


Following SOLR-8621, we should remove support for {{}} in trunk/6x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-8621:
-
Attachment: SOLR-8621-example_contrib_configs.patch

Patch changes solrconfig.xmls under solr/contrib and solr/example. 
[~cpoerschke], none of these files actually configured an MP, so I only changed 
the commented out sections. Let me know if you have concerns about this change.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Blocker
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621.patch, explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress summary:+
>  * main code changes have been committed to master and branch_5x
>  * {color:red}further small code change required:{color} MergePolicyFactory 
> constructor or MergePolicyFactory.getMergePolicy method to take IndexSchema 
> argument (e.g. for use by SortingMergePolicyFactory being added under related 
> SOLR-5730)
>  * Solr Reference Guide changes (directly in Confluence?)
>  * changes to remaining solrconfig.xml
>  ** solr/core/src/test-files/solr/collection1/conf - Christine
>  ** solr/contrib
>  ** solr/example
> +open question:+
>  * Do we want to error if luceneMatchVersion >= 5.5 and deprecated 
> mergePolicy/mergeFactor/maxMergeDocs are used? See [~hossman]'s comment on 
> Feb 1st. The code as-is permits mergePolicy irrespective of 
> luceneMatchVersion, I think.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-08 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated SOLR-8621:
-
Attachment: SOLR-8621.patch

[~cpoerschke], patch includes the changes from *all* commits that we've pushed 
to master-solr-8621, minus the commit we wanted to revert, and also after 
rebasing all our changes on current 'master'. The reason I didn't push it to 
our branch is so that we don't need to revert that commit, and also cause at 
some point we merged with master rather than rebasing.

Note that I had to remove {{SolrIndexConfig.buildMergePolicy()}} cause of a 
test that failed. We should just call 
{{SolrIndexConfig.buildMergePolicyFromInfo()}} to unify how we configure the MP.

If you prefer that I push it to branch, then I can try and squash all the 
commits etc., or can push it to a separate branch. But I think it's ready.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Attachments: SOLR-8621.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress git branch:+ 
> [master-solr-8621|https://github.com/apache/lucene-solr/tree/master-solr-8621]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136886#comment-15136886
 ] 

Shai Erera commented on SOLR-8621:
--

{quote}
Unrelated to the above, just pushed changes to MergePolicyFactoryArgs which 
will do away with the need to change SolrPluginUtils but still keep 
MergePolicyFactoryArgs as an abstraction which just happens to be implemented 
via a NamedList at the moment.
{quote}

I actually prefer that we not extend the usage of {{NamedList}} :). Really what 
we need is a {{Map}} (which {{NamedList}} is also), so why 
introduce it in this class? Would you mind if we switch back to Map? I also 
think that some of the changes to {{SolrPluginUtils}}, e.g. the refactoring of 
the {{findMethod}} method, contributed to its code readability, so I'd like to 
restore them too. And finally, we we move back to a Map, then {{keys()}} don't 
need to create a new {{HashSet}}.

bq. BTW, I thought that we can have wrapped.key optional and default to 
'delegate' if one isn't specified.

I take it back for now. Currently, if {{wrapper.key}} is not specified, we 
assume a default wrapped MP should be used. We can still make {{wrapper.key}} 
optional, but I don't mind if we defer that change for now. It can always be 
made optional later.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress git branch:+ 
> [master-solr-8621|https://github.com/apache/lucene-solr/tree/master-solr-8621]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   9   10   >