[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-15 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084450#comment-17084450
 ] 

Erick Erickson commented on LUCENE-7788:


[~dsmiley] Well, it _is_ silly to test error and fatal level messages so I 
won't flag those.

Oh, and as I find egregious patterns that don't really count, I'm adding them 
to the list of things NOT to report. For instance, lots of the test log 
messages have timeunit conversions, which don't matter.

> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch, 
> gradle_only.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-15 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084447#comment-17084447
 ] 

Erick Erickson commented on LUCENE-7788:


Fixed, gradle_only.patch

> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch, 
> gradle_only.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-15 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated LUCENE-7788:
---
Attachment: gradle_only.patch

> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch, 
> gradle_only.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-15 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084442#comment-17084442
 ] 

Erick Erickson edited comment on LUCENE-7788 at 4/16/20, 12:09 AM:
---

I'm in a bit of an awkward spot, my fork has a ton of changes unrelated to just 
incorporating Gradle. So I'm attaching a separate patch that only contains the 
gradle bits in the hope that [~dweiss]  (or anyone more gradle-knowledgable 
than me) will take a peek at it. If it's OK (or at least a good place to 
start), I'll fold it into my fork for the next commit. Which will be Real Soon 
Now, like Friday.

Please don't bother with how the check actually works, all I'm really asking 
for is whether this looks like something that doesn't violate Gradle norms too 
violently.

NOTE: Checking file paths rather than projects for inclusion/exclusion is 
temporary, going by project at this point is too big a chunk. I'll change it to 
be project-based before I'm done.

Oh, I just noticed that the error message gets printed and the build stops even 
when executing other targets, fixing.


was (Author: erickerickson):
I'm in a bit of an awkward spot, my fork has a ton of changes unrelated to just 
incorporating Gradle. So I'm attaching a separate patch that only contains the 
gradle bits in the hope that [~dweiss]  (or anyone more gradle-knowledgable 
than me) will take a peek at it. If it's OK (or at least a good place to 
start), I'll fold it into my fork for the next commit. Which will be Real Soon 
Now, like Friday.

Please don't bother with how the check actually works, all I'm really asking 
for is whether this looks like something that doesn't violate Gradle norms too 
violently.

NOTE: Checking file paths rather than projects for inclusion/exclusion is 
temporary, going by project at this point is too big a chunk. I'll change it to 
be project-based before I'm done.

> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-15 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated LUCENE-7788:
---
Attachment: gradle_only.patch
Status: Open  (was: Open)

I'm in a bit of an awkward spot, my fork has a ton of changes unrelated to just 
incorporating Gradle. So I'm attaching a separate patch that only contains the 
gradle bits in the hope that [~dweiss]  (or anyone more gradle-knowledgable 
than me) will take a peek at it. If it's OK (or at least a good place to 
start), I'll fold it into my fork for the next commit. Which will be Real Soon 
Now, like Friday.

Please don't bother with how the check actually works, all I'm really asking 
for is whether this looks like something that doesn't violate Gradle norms too 
violently.

NOTE: Checking file paths rather than projects for inclusion/exclusion is 
temporary, going by project at this point is too big a chunk. I'll change it to 
be project-based before I'm done.

> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-15 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084423#comment-17084423
 ] 

Erick Erickson edited comment on LUCENE-7788 at 4/15/20, 11:02 PM:
---

My current thinking:

1> I think I've got the gradle task in place, sometime in the next couple of 
days I'll put up a preliminary version and ask for review

2> Given that the Lucene code only takes a couple of seconds to run, I I'll 
leave it in

3> My current Gradle integration _requires_ a relative path, i.e. "gradlew 
validateLoggingCalls -Psolr/core/src/java/org/apache/solr/response". Before 
it's done, I'll change it to "the gradle way" of specifying a project rather 
than a directory, defaulting to all. But right now projects are too big.

3a> really, if the path to any java file anywhere contains whatever srcDir is, 
it'll check the file. This is temporary so there's no need to refine it IMO.

4> this is not part of the standard check/precommit yet. It will be before I'm 
done.

5> I'm not happy at all with the //verify tag. First of all, exactly when it's 
OK to use it isn't clear at all. So I've changed my mind (again) and I'll 
change the check to be that the call must not have "+" signs or method calls 
_unless_ it's surrounded by "if (log.is*Enabled)". I think that's a much easier 
rule to understand. I'll also add a check that the log level corresponds to the 
if clause when used.


was (Author: erickerickson):
My current thinking:

1> I think I've got the gradle target in place, sometime in the next couple of 
days I'll put up a preliminary version and ask for review

2> Given that the Lucene code only takes a couple of seconds to run, I I'll 
leave it in

3> My current Gradle integration _requires_ a relative path, i.e. "gradlew 
validateLoggingCalls -Psolr/core/src/java/org/apache/solr/response". Before 
it's done, I'll change it to "the gradle way" of specifying a project rather 
than a directory, defaulting to all. But right now projects are too big.

3a> really, if the path to any java file anywhere contains whatever targetDir 
is, it'll check the file. This is temporary so there's no need to refine it IMO.

4> this is not part of the standard check/precommit yet. It will be before I'm 
done.

5> I'm not happy at all with the //verify tag. First of all, exactly when it's 
OK to use it isn't clear at all. So I've changed my mind (again) and I'll 
change the check to be that the call must not have "+" signs or method calls 
_unless_ it's surrounded by "if (log.is*Enabled)". I think that's a much easier 
rule to understand. I'll also add a check that the log level corresponds to the 
if clause when used.

> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-15 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084428#comment-17084428
 ] 

Erick Erickson commented on LUCENE-7788:


Why? What's the advantage of having yet another idiom that is sometimes one way 
and sometimes another? We have far too many nooks and crannies in Solr that are 
opaque, I'm reluctant to add yet another one.

And note that adding I'm proposing adding the "if" clause  _only_ if the 
logging message contains a method call, or can't be rewritten to avoid the 
string concatenation. I'm not proposing wrapping every logging call in an if 
clause.

Plus, the number of INFO level logging calls completely dominates finer-grained 
calls. It's perfectly reasonable to run at WARN level and expect that the WARN 
messages are something that you should pay attention to and can turn on INFO 
when needed.

The entire discussion about whether we should look at the thousands of logging 
calls and figure out which ones should be at a different level is another 
topic, maybe SOLR-11934

> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-15 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084424#comment-17084424
 ] 

David Smiley commented on LUCENE-7788:
--

Can we skip this for info, warn & error please?  These are generally logged by 
default anyway, thus this new check will be even less value for the hassle it 
brings.

> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-15 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084423#comment-17084423
 ] 

Erick Erickson commented on LUCENE-7788:


My current thinking:

1> I think I've got the gradle target in place, sometime in the next couple of 
days I'll put up a preliminary version and ask for review

2> Given that the Lucene code only takes a couple of seconds to run, I I'll 
leave it in

3> My current Gradle integration _requires_ a relative path, i.e. "gradlew 
validateLoggingCalls -Psolr/core/src/java/org/apache/solr/response". Before 
it's done, I'll change it to "the gradle way" of specifying a project rather 
than a directory, defaulting to all. But right now projects are too big.

3a> really, if the path to any java file anywhere contains whatever targetDir 
is, it'll check the file. This is temporary so there's no need to refine it IMO.

4> this is not part of the standard check/precommit yet. It will be before I'm 
done.

5> I'm not happy at all with the //verify tag. First of all, exactly when it's 
OK to use it isn't clear at all. So I've changed my mind (again) and I'll 
change the check to be that the call must not have "+" signs or method calls 
_unless_ it's surrounded by "if (log.is*Enabled)". I think that's a much easier 
rule to understand. I'll also add a check that the log level corresponds to the 
if clause when used.

> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9316) Incorporate all :precommit tasks into :check

2020-04-15 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084410#comment-17084410
 ] 

David Smiley commented on LUCENE-9316:
--

Yay!  So should I basically assume that {{gradlew check -x test}} will be 
checking everything {{ant precommit}} does?  Sorry if I didn't read everything 
here.  What remains for our CI master branch builds to use gradle?

> Incorporate all :precommit tasks into :check
> 
>
> Key: LUCENE-9316
> URL: https://issues.apache.org/jira/browse/LUCENE-9316
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: master (9.0)
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7701) Refactor grouping collectors

2020-04-15 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084365#comment-17084365
 ] 

Mikhail Khludnev commented on LUCENE-7701:
--

Hi, [~romseygeek], would you mind if I followup here? Turns out, if: 
 # {{group.truncate=true}} 
 # {{group.sort=docvalues_enabled_field asc}}
The following hotspot pops up:
{code}

"stackTrace":["org.apache.lucene.store.ByteBufferIndexInput.slice(ByteBufferIndexInput.java:268)",
  
"org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.slice(ByteBufferIndexInput.java:347)",
  
"org.apache.lucene.store.IndexInput.randomAccessSlice(IndexInput.java:122)",
  
"org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict.(Lucene80DocValuesProducer.java:943)",

  
"org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer.getSorted(Lucene80DocValuesProducer.java:750)",
  
"org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.getSorted(PerFieldDocValuesFormat.java:329)",

  "org.apache.lucene.index.DocValues.getSorted(DocValues.java:367)",
  
"org.apache.lucene.search.FieldComparator$TermOrdValComparator.getSortedDocValues(FieldComparator.java:709)",
  
"org.apache.lucene.search.FieldComparator$TermOrdValComparator.getLeafComparator(FieldComparator.java:714)",
  
"org.apache.lucene.search.grouping.AllGroupHeadsCollector$SortingGroupHead.(AllGroupHeadsCollector.java:266)",
  
"org.apache.lucene.search.grouping.AllGroupHeadsCollector$SortingGroupHeadsCollector.newGroupHead(AllGroupHeadsCollector.java:250)",
  
"org.apache.lucene.search.grouping.AllGroupHeadsCollector.collect(AllGroupHeadsCollector.java:133)",

  
"org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:440)",

  
"org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:158)",
  
"org.apache.solr.handler.component.QueryComponent.doProcessGroupedDistributedSearchSecondPhase(QueryComponent.java:1399)",
  
"org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:386)",
{code}
I read it as follows, when group collector encounter new group it creates field 
compactor for this value that opens DocValues that turns out to be a way more 
expensive to open rather than old -good- {{FieldCache}}. I think DocValues 
should somehow to be reused between groups. WDYT?   
 

> Refactor grouping collectors
> 
>
> Key: LUCENE-7701
> URL: https://issues.apache.org/jira/browse/LUCENE-7701
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Priority: Major
> Fix For: 7.0
>
> Attachments: LUCENE-7701.patch, LUCENE-7701.patch
>
>
> Grouping currently works via abstract collectors, which need to be overridden 
> for each way of defining a group - currently we have two, 'term' (based on 
> SortedDocValues) and 'function' (based on ValueSources).  These collectors 
> all have a lot of repeated code, and means that if you want to implement your 
> own group definitions, you need to override four or five different classes.
> This would be easier to deal with if instead the 'group selection' code was 
> abstracted out into a single interface, and the various collectors were 
> changed to concrete implementations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-15 Thread GitBox
mayya-sharipova commented on a change in pull request #1351: LUCENE-9280: 
Collectors to skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r409079589
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LeafCollector.java
 ##
 @@ -93,4 +93,16 @@
*/
   void collect(int doc) throws IOException;
 
+  /**
+   * Optionally creates a view of the scorerIterator where only competitive 
documents
+   * in the scorerIterator are kept and non-competitive are skipped.
+   *
+   * Collectors should delegate this method to their comparators if
+   * their comparators provide the skipping functionality over non-competitive 
docs.
+   * The default is to return the same iterator which is interpreted as the 
collector doesn't filter any documents.
+   */
+  default DocIdSetIterator filterIterator(DocIdSetIterator scorerIterator) {
+return scorerIterator;
+  }
 
 Review comment:
   Thanks, makes sense. Addressed in d7ef9b6


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API

2020-04-15 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084204#comment-17084204
 ] 

Tomoko Uchida commented on LUCENE-9322:
---

Hi [~jtibshirani],

thank you for hard working on this!

 
{code:java}
TopDocs findNearestVectors(float[] queryVector, int k, int recallFactor) throws 
new IOException;
{code}
I like this interface, {{recallFactor}} might be an interface for further 
flexibility, but it's just an idea. 

 
{quote}Why do we have different implementations of `VectorsFormat`, couldn’t we 
just add an enum to the field info like `Strategy.HNSW` and 
`Strategy.COARSE_QUANTIZATION`?
{quote}
Personally I would prefer an unified file format for vectors since it is 
(theoretically) independent from higher level ANN algorithms. Could we expose 
just one "Lucene90VectorsFormat" and low-level I/O, and make only higher logic 
(o.a.l.a.index/document/search) to be customizable? Forward iteration is 
encouraged anyway...
  
{quote}What about different distance metrics like angular and L1 distance?
{quote}
JFYI I previously implemented switchable distance function on the HNSW branch, 
if you have not noticed it: 
[https://github.com/apache/lucene-solr/blob/jira/lucene-9004-aknn-2/lucene/core/src/java/org/apache/lucene/index/VectorValues.java].
 It is implemented by enum with {{distance()}} function. Also, I think it would 
be good to persist (in the codec) which distance metric we use for the field.
{quote}How exactly is this used in a search? Where are the `Query` classes? 
This would be the next part of the API to design/ discuss.
{quote}
We could refer/follow o.a.l.a.index.PointValues's approach, in other words, 
concrete field classes with newXXXQuery() methods? 
[https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/PointValues.java]
 Query part would also need some abstraction and there are many things to be 
well thought..., so could we discuss about it in another dedicated issue, to 
keep the scope here small ?

> Discussing a unified vectors format API
> ---
>
> Key: LUCENE-9322
> URL: https://issues.apache.org/jira/browse/LUCENE-9322
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Julie Tibshirani
>Priority: Major
>
> Two different approximate nearest neighbor approaches are currently being 
> developed, one based on HNSW ([#LUCENE-9004]) and another based on coarse 
> quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to 
> handle vectors. In LUCENE-9136 we discussed the possibility of a unified API 
> that could support both approaches. The two ANN strategies give different 
> trade-offs in terms of speed, memory, and complexity, and it’s likely that 
> we’ll want to support both. Vector search is also an active research area, 
> and it would be great to be able to prototype and incorporate new approaches 
> without introducing more formats.
> To me it seems like a good time to begin discussing a unified API. The 
> prototype for coarse quantization 
> ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit 
> soon (this depends on everyone's feedback of course). The approach is simple 
> and shows solid search performance, as seen 
> [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326].
>  I think this API discussion is an important step in moving that 
> implementation forward.
> The goals of the API would be
> # Support for storing and retrieving individual float vectors.
> # Support for approximate nearest neighbor search -- given a query vector, 
> return the indexed vectors that are closest to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-15 Thread GitBox
romseygeek commented on a change in pull request #1351: LUCENE-9280: Collectors 
to skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r40656
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LeafCollector.java
 ##
 @@ -93,4 +93,16 @@
*/
   void collect(int doc) throws IOException;
 
+  /**
+   * Optionally creates a view of the scorerIterator where only competitive 
documents
+   * in the scorerIterator are kept and non-competitive are skipped.
+   *
+   * Collectors should delegate this method to their comparators if
+   * their comparators provide the skipping functionality over non-competitive 
docs.
+   * The default is to return the same iterator which is interpreted as the 
collector doesn't filter any documents.
+   */
+  default DocIdSetIterator filterIterator(DocIdSetIterator scorerIterator) {
+return scorerIterator;
+  }
 
 Review comment:
   Oh that's a good point. +1 to just return an iterator based on the 
comparator, and do the conjuncion/combination in `BulkScorer`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-15 Thread GitBox
romseygeek commented on a change in pull request #1351: LUCENE-9280: Collectors 
to skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r408887800
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/ScoreMode.java
 ##
 @@ -24,37 +24,53 @@
   /**
* Produced scorers will allow visiting all matches and get their score.
*/
-  COMPLETE {
-@Override
-public boolean needsScores() {
-  return true;
-}
-  },
+  COMPLETE(true, true),
 
   /**
* Produced scorers will allow visiting all matches but scores won't be
* available.
*/
-  COMPLETE_NO_SCORES {
-@Override
-public boolean needsScores() {
-  return false;
-}
-  },
+  COMPLETE_NO_SCORES(true, false),
 
   /**
* Produced scorers will optionally allow skipping over non-competitive
* hits using the {@link Scorer#setMinCompetitiveScore(float)} API.
*/
-  TOP_SCORES {
-@Override
-public boolean needsScores() {
-  return true;
-}
-  };
+  TOP_SCORES(false, true),
+
+  /**
+   * ScoreMode for top field collectors that can provide their own iterators,
+   * to optionally allow to skip for non-competitive docs
+   */
+  TOP_DOCS(false, false),
+
+  /**
+   * ScoreMode for top field collectors that can provide their own iterators,
+   * to optionally allow to skip for non-competitive docs.
+   * This mode is used when there is a secondary sort by _score.
+   */
+  TOP_DOCS_WITH_SCORES(false, true);
 
 Review comment:
   But `TOP_SCORES` and `TOP_DOCS_WITH_SCORES` have identical `needsScores()` 
and `isExhaustive()` values, so I'm not sure why we need both?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9317) Resolve package name conflicts for StandardAnalyzer to allow Java module system support

2020-04-15 Thread David Ryan (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084123#comment-17084123
 ] 

David Ryan commented on LUCENE-9317:


 

I did a new experiment. This is quite severe looking, however, it could make 
sense.

[https://github.com/oobles/lucene-solr/commit/5e25a9a9f4af9641b2ca01565060d4cb244b9266]

The main changes are to move the following packages from common analysis to 
core:
 * org.apache.lucene.analysis.core
 * org.apache.lucene.analysis.custom
 * org.apache.lucene.analysis.standard (Just StandardTokenizerFactory)
 * org.apache.lucene.analysis.util

This is based on the following comments from  Uwe:
{quote}One reason why the factories should move to core is that once we did 
this, one no longer need to depend on analyzers-common anymore. If he has a set 
of factories and tokenizers/filters and otherwise only requires the default 
ones, he can completely remove the huge common.jar file! Also public and 
commonly used abstract base classes should not be part of an optional module!
{quote}
Potentially not all of the classes in those packages need to be moved and would 
need someone with more knowledge than me to decide. Moving these to core has 
the benefit of not needing to change any of their names and leaves 
StandardAnalysis. No need for constructor changes either.  The factory test 
cases will need to be updated so they don't rely on all the classes in common 
analysis. As the Tokenizers and TokenFilters are split over both jars, I tested 
to ensure both were loaded in the common analysis test cases which of course 
they were.

The next changes look severe, but I think have less flow on effects. I've 
renamed org.apache.lucene.analysis to org.apache.lucene.common.analysis. This 
has the benefit of now matching the jar name and future module name. It removes 
any conflicts for packages. I moved classic from standard to classic package.  
I moved the UAX29* classes from standard to email. Both of these could have 
been left in oal.common.analysis.standard too.

There's a few test cases I would need to fix if you think this approach is 
worth continuing, but I think generally makes a lot of sense.  Most test cases 
are passing and I added Ignore to a few that need updating.

Apologies that the commit is difficult to review. I staged the changes and 
moves in one commit when I should have done it as moves then changes. Let me 
know if you'd rather I redo it.

 

 

 

 

 

 

> Resolve package name conflicts for StandardAnalyzer to allow Java module 
> system support
> ---
>
> Key: LUCENE-9317
> URL: https://issues.apache.org/jira/browse/LUCENE-9317
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Affects Versions: master (9.0)
>Reporter: David Ryan
>Priority: Major
>  Labels: build, features
>
>  
> To allow Lucene to be modularised there are a few preparatory tasks to be 
> completed prior to this being possible.  The Java module system requires that 
> jars do not use the same package name in different jars.  The lucene-core and 
> lucene-analyzers-common both share the package 
> org.apache.lucene.analysis.standard.
> Possible resolutions to this issue are discussed by Uwe on the mailing list 
> here:
>  
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202004.mbox/%3CCAM21Rt8FHOq_JeUSELhsQJH0uN0eKBgduBQX4fQKxbs49TLqzA%40mail.gmail.com%3E]
> {quote}About StandardAnalyzer: Unfortunately I aggressively complained a 
> while back when Mike McCandless wanted to move standard analyzer out of the 
> analysis package into core (“for convenience”). This was a bad step, and IMHO 
> we should revert that or completely rename the packages and everything. The 
> problem here is: As the analysis services are only part of lucene-analyzers, 
> we had to leave the factory classes there, but move the implementation 
> classes in core. The package has to be the same. The only way around that is 
> to move the analysis factory framework also to core (I would not be against 
> that). This would include all factory base classes and the service loading 
> stuff. Then we can move standard analyzer and some of the filters/tokenizers 
> including their factories to core an that problem would be solved.
> {quote}
> There are two options here, either move factory framework into core or revert 
> StandardAnalyzer back to lucene-analyzers.  In the email, the solution lands 
> on reverting back as per the task list:
> {quote}Add some preparatory issues to cleanup class hierarchy: Move Analysis 
> SPI to core / remove StandardAnalyzer and related classes out of core back to 
> anaysis
> {quote}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (LUCENE-9271) Make BufferedIndexInput work on a ByteBuffer

2020-04-15 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9271.
--
Fix Version/s: 8.6
   master (9.0)
   Resolution: Fixed

> Make BufferedIndexInput work on a ByteBuffer
> 
>
> Key: LUCENE-9271
> URL: https://issues.apache.org/jira/browse/LUCENE-9271
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: master (9.0), 8.6
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently {{BufferedIndexInput}} works on a {{byte[]}} but its main 
> implementation, in NIOFSDirectory, has to implement a hack to maintain a 
> ByteBuffer view of it that it can use in calls to the FileChannel API. Maybe 
> we should instead make {{BufferedIndexInput}} work directly on a 
> {{ByteBuffer}}? This would also help reuse the existing 
> {{ByteBuffer#get(|Short|Int|long)}} methods instead of duplicating them from 
> {{DataInput}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9260) Verify checksums of CFS files?

2020-04-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084115#comment-17084115
 ] 

ASF subversion and git services commented on LUCENE-9260:
-

Commit 4a559ac0c43ae40b9a70db679cc84d05a2e5f440 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4a559ac ]

LUCENE-9260: Verify checksums of CFS files. (#1311)


> Verify checksums of CFS files?
> --
>
> Key: LUCENE-9260
> URL: https://issues.apache.org/jira/browse/LUCENE-9260
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While CFS files write checksums in their footer, we never validate these 
> checksums. Can we verify them in LeafReader#checkIntegrity?
> This checksum is a bit redundant with the checksums of the files that are 
> stored in the CFS file, but I'd rather verify some bytes multiple times than 
> have checksums that never get verified?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9260) Verify checksums of CFS files?

2020-04-15 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9260.
--
Fix Version/s: 8.6
   master (9.0)
   Resolution: Fixed

> Verify checksums of CFS files?
> --
>
> Key: LUCENE-9260
> URL: https://issues.apache.org/jira/browse/LUCENE-9260
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: master (9.0), 8.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While CFS files write checksums in their footer, we never validate these 
> checksums. Can we verify them in LeafReader#checkIntegrity?
> This checksum is a bit redundant with the checksums of the files that are 
> stored in the CFS file, but I'd rather verify some bytes multiple times than 
> have checksums that never get verified?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9307) Remove the ability to set the buffer size on an existing BufferedIndexInput

2020-04-15 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9307.
--
Fix Version/s: master (9.0)
   Resolution: Fixed

> Remove the ability to set the buffer size on an existing BufferedIndexInput
> ---
>
> Key: LUCENE-9307
> URL: https://issues.apache.org/jira/browse/LUCENE-9307
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This feature is only used as an optimization when reading skip lists. Since 
> our default directory doesn't use buffering, I'd suggest removing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9260) Verify checksums of CFS files?

2020-04-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084070#comment-17084070
 ] 

ASF subversion and git services commented on LUCENE-9260:
-

Commit 0aa4ba7ccb2f2c12e213ea34d76af378e55e3bf9 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0aa4ba7 ]

LUCENE-9260: Verify checksums of CFS files. (#1311)



> Verify checksums of CFS files?
> --
>
> Key: LUCENE-9260
> URL: https://issues.apache.org/jira/browse/LUCENE-9260
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While CFS files write checksums in their footer, we never validate these 
> checksums. Can we verify them in LeafReader#checkIntegrity?
> This checksum is a bit redundant with the checksums of the files that are 
> stored in the CFS file, but I'd rather verify some bytes multiple times than 
> have checksums that never get verified?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9307) Remove the ability to set the buffer size on an existing BufferedIndexInput

2020-04-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084068#comment-17084068
 ] 

ASF subversion and git services commented on LUCENE-9307:
-

Commit aa605b3c70fa5a4fa51761f318d134e387059e28 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=aa605b3 ]

LUCENE-9307: Remove the ability to set the buffer size dynamically on 
BufferedIndexInput (#1415)



> Remove the ability to set the buffer size on an existing BufferedIndexInput
> ---
>
> Key: LUCENE-9307
> URL: https://issues.apache.org/jira/browse/LUCENE-9307
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This feature is only used as an optimization when reading skip lists. Since 
> our default directory doesn't use buffering, I'd suggest removing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1311: LUCENE-9260: Verify checksums of CFS files.

2020-04-15 Thread GitBox
jpountz merged pull request #1311: LUCENE-9260: Verify checksums of CFS files.
URL: https://github.com/apache/lucene-solr/pull/1311
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1415: LUCENE-9307: Remove the ability to set the buffer size dynamically on BufferedIndexInput

2020-04-15 Thread GitBox
jpountz merged pull request #1415: LUCENE-9307: Remove the ability to set the 
buffer size dynamically on BufferedIndexInput
URL: https://github.com/apache/lucene-solr/pull/1415
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14408) Refactor MoreLikeThisHandler Implementation

2020-04-15 Thread Nazerke Seidan (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083995#comment-17083995
 ] 

Nazerke Seidan commented on SOLR-14408:
---

just linked the PR

> Refactor MoreLikeThisHandler Implementation
> ---
>
> Key: SOLR-14408
> URL: https://issues.apache.org/jira/browse/SOLR-14408
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Reporter: Nazerke Seidan
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The main goal of this refactoring is for readability and accessibility of 
> MoreLikeThisHandler class. Current MoreLikeThisHandler class consists of two 
> static subclasses and accessing them later in MoreLikeThisComponent.  I 
> propose to have them as separate public classes. 
> cc: [~abenedetti], as you have had the recent commit for MLT, what do you 
> think about this?  Anyway, the code is ready for review. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] NazerkeBS opened a new pull request #1433: SOLR-14408 Refactor MoreLikeThisHandler implementation

2020-04-15 Thread GitBox
NazerkeBS opened a new pull request #1433: SOLR-14408 Refactor 
MoreLikeThisHandler implementation
URL: https://github.com/apache/lucene-solr/pull/1433
 
 
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14408) Refactor MoreLikeThisHandler Implementation

2020-04-15 Thread Alessandro Benedetti (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083990#comment-17083990
 ] 

Alessandro Benedetti commented on SOLR-14408:
-

can you attach a Pull Request to review? happy to take a look.
I will actively work on More Like This refactor to make it more usable.

> Refactor MoreLikeThisHandler Implementation
> ---
>
> Key: SOLR-14408
> URL: https://issues.apache.org/jira/browse/SOLR-14408
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: MoreLikeThis
>Reporter: Nazerke Seidan
>Priority: Minor
>
> The main goal of this refactoring is for readability and accessibility of 
> MoreLikeThisHandler class. Current MoreLikeThisHandler class consists of two 
> static subclasses and accessing them later in MoreLikeThisComponent.  I 
> propose to have them as separate public classes. 
> cc: [~abenedetti], as you have had the recent commit for MLT, what do you 
> think about this?  Anyway, the code is ready for review. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14408) Refactor MoreLikeThisHandler Implementation

2020-04-15 Thread Nazerke Seidan (Jira)
Nazerke Seidan created SOLR-14408:
-

 Summary: Refactor MoreLikeThisHandler Implementation
 Key: SOLR-14408
 URL: https://issues.apache.org/jira/browse/SOLR-14408
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: MoreLikeThis
Reporter: Nazerke Seidan


The main goal of this refactoring is for readability and accessibility of 
MoreLikeThisHandler class. Current MoreLikeThisHandler class consists of two 
static subclasses and accessing them later in MoreLikeThisComponent.  I propose 
to have them as separate public classes. 

cc: [~abenedetti], as you have had the recent commit for MLT, what do you think 
about this?  Anyway, the code is ready for review. 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-15 Thread GitBox
jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to 
skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r408712107
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LeafCollector.java
 ##
 @@ -93,4 +93,16 @@
*/
   void collect(int doc) throws IOException;
 
+  /**
+   * Optionally creates a view of the scorerIterator where only competitive 
documents
+   * in the scorerIterator are kept and non-competitive are skipped.
+   *
+   * Collectors should delegate this method to their comparators if
+   * their comparators provide the skipping functionality over non-competitive 
docs.
+   * The default is to return the same iterator which is interpreted as the 
collector doesn't filter any documents.
+   */
+  default DocIdSetIterator filterIterator(DocIdSetIterator scorerIterator) {
+return scorerIterator;
+  }
 
 Review comment:
   This allows for some hacks like returning an iterator that matches more docs 
than the scorer. I liked the previous approach that returned an iterator better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-15 Thread GitBox
jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to 
skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r403471265
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/SortField.java
 ##
 @@ -91,6 +91,7 @@
   private String field;
   private Type type;  // defaults to determining type dynamically
   boolean reverse = false;  // defaults to natural order
+  private boolean skipNonCompetitiveDocs = false; // if true, sortField will 
use a comparator that can skip non-competitive docs
 
 Review comment:
   I'd rather not have this on SortField for now. This is an old API that never 
required fields to be indexed. I'd rather have new SortField implementations 
for now, and later look at how we can enable this in SortField.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-15 Thread GitBox
jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to 
skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r403471265
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/SortField.java
 ##
 @@ -91,6 +91,7 @@
   private String field;
   private Type type;  // defaults to determining type dynamically
   boolean reverse = false;  // defaults to natural order
+  private boolean skipNonCompetitiveDocs = false; // if true, sortField will 
use a comparator that can skip non-competitive docs
 
 Review comment:
   I'd rather not have this on SortField for now. This is an old API that never 
required fields to be indexed. I'd rather have new SortField implementations 
for now, and later look at how we can enable this in SortField.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-15 Thread GitBox
jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to 
skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r408712107
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/LeafCollector.java
 ##
 @@ -93,4 +93,16 @@
*/
   void collect(int doc) throws IOException;
 
+  /**
+   * Optionally creates a view of the scorerIterator where only competitive 
documents
+   * in the scorerIterator are kept and non-competitive are skipped.
+   *
+   * Collectors should delegate this method to their comparators if
+   * their comparators provide the skipping functionality over non-competitive 
docs.
+   * The default is to return the same iterator which is interpreted as the 
collector doesn't filter any documents.
+   */
+  default DocIdSetIterator filterIterator(DocIdSetIterator scorerIterator) {
+return scorerIterator;
+  }
 
 Review comment:
   This allows for some hacks like returning an iterator that matches more 
hacks than the scorer. I liked the previous approach that returned an iterator 
better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14013) javabin performance regressions

2020-04-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083861#comment-17083861
 ] 

ASF subversion and git services commented on SOLR-14013:


Commit 5d3dfbd0ce8a2ad990635e71144615f1c4815d22 in lucene-solr's branch 
refs/heads/branch_7_7 from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5d3dfbd ]

SOLR-14013: trying to port to SOlr 7.7 (#1254)



> javabin performance regressions
> ---
>
> Key: SOLR-14013
> URL: https://issues.apache.org/jira/browse/SOLR-14013
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.7
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Blocker
> Fix For: 8.4
>
> Attachments: SOLR-14013.patch, SOLR-14013.patch, TestQuerySpeed.java, 
> test.json
>
>
> As noted by [~rrockenbaugh] in SOLR-13963, javabin also recently became 
> orders of magnitude slower in certain cases since v7.7.  The cases identified 
> so far include large numbers of values in a field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul merged pull request #1254: SOLR-14259: Back porting SOLR-14013 to Solr 7.7

2020-04-15 Thread GitBox
noblepaul merged pull request #1254: SOLR-14259: Back porting SOLR-14013 to 
Solr 7.7
URL: https://github.com/apache/lucene-solr/pull/1254
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org