[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #780: SOLR-11866: Support efficient subset matching in query elevation rules

2019-07-16 Thread GitBox
bruno-roustant commented on a change in pull request #780: SOLR-11866: Support 
efficient subset matching in query elevation rules
URL: https://github.com/apache/lucene-solr/pull/780#discussion_r303909793
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/handler/component/QueryElevationComponent.java
 ##
 @@ -701,28 +703,20 @@ protected ElevationProvider 
createElevationProvider(Map queryTerms, 
Appendable concatenatedTerms) {
+  protected void analyzeQuery(String query, Consumer termsConsumer) {
 try (TokenStream tokens = queryAnalyzer.tokenStream("", query)) {
   tokens.reset();
   CharTermAttribute termAtt = tokens.addAttribute(CharTermAttribute.class);
   while (tokens.incrementToken()) {
-if (queryTerms != null) {
-  queryTerms.add(termAtt.toString());
-}
-if (concatenatedTerms != null) {
-  concatenatedTerms.append(termAtt);
-}
+termsConsumer.accept(termAtt.toString());
 
 Review comment:
   I'll change to a Consumer of CharSequence.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #780: SOLR-11866: Support efficient subset matching in query elevation rules

2019-07-16 Thread GitBox
bruno-roustant commented on a change in pull request #780: SOLR-11866: Support 
efficient subset matching in query elevation rules
URL: https://github.com/apache/lucene-solr/pull/780#discussion_r303910866
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/handler/component/QueryElevationComponent.java
 ##
 @@ -857,31 +851,33 @@ public int size() {
* 
* The terms are tokenized with the query analyzer.
*/
-  protected class SubsetMatchElevationProvider implements ElevationProvider {
+  protected class DefaultElevationProvider implements ElevationProvider {
 
-private final SubsetMatcher subsetMatcher;
+private final TrieSubsetMatcher subsetMatcher;
 private final Map exactMatchElevationMap;
 
 /**
- * @param subsetMatcherBuilder The {@link SubsetMatcher.Builder} to build 
the {@link SubsetMatcher}.
+ * @param subsetMatcherBuilder The {@link TrieSubsetMatcher.Builder} to 
build the {@link TrieSubsetMatcher}.
  * @param elevationBuilderMap The map of elevation rules.
  */
-protected SubsetMatchElevationProvider(SubsetMatcher.Builder subsetMatcherBuilder,
-   Map elevationBuilderMap) {
+protected DefaultElevationProvider(TrieSubsetMatcher.Builder subsetMatcherBuilder,
+   Map 
elevationBuilderMap) {
   exactMatchElevationMap = new LinkedHashMap<>();
   Collection queryTerms = new ArrayList<>();
-  StringBuilder concatenatedTerms = new StringBuilder();
+  Consumer termsConsumer = queryTerms::add;
 
 Review comment:
   Yes it makes a difference. The lambda instance is created once, compared to 
inside the loop where a lambda instance is created for each step of the loop 
because the collection is not static.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #780: SOLR-11866: Support efficient subset matching in query elevation rules

2019-07-16 Thread GitBox
bruno-roustant commented on a change in pull request #780: SOLR-11866: Support 
efficient subset matching in query elevation rules
URL: https://github.com/apache/lucene-solr/pull/780#discussion_r303909793
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/handler/component/QueryElevationComponent.java
 ##
 @@ -701,28 +703,20 @@ protected ElevationProvider 
createElevationProvider(Map queryTerms, 
Appendable concatenatedTerms) {
+  protected void analyzeQuery(String query, Consumer termsConsumer) {
 try (TokenStream tokens = queryAnalyzer.tokenStream("", query)) {
   tokens.reset();
   CharTermAttribute termAtt = tokens.addAttribute(CharTermAttribute.class);
   while (tokens.incrementToken()) {
-if (queryTerms != null) {
-  queryTerms.add(termAtt.toString());
-}
-if (concatenatedTerms != null) {
-  concatenatedTerms.append(termAtt);
-}
+termsConsumer.accept(termAtt.toString());
 
 Review comment:
   I'll change to a Consumer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #780: SOLR-11866: Support efficient subset matching in query elevation rules

2019-07-16 Thread GitBox
bruno-roustant commented on a change in pull request #780: SOLR-11866: Support 
efficient subset matching in query elevation rules
URL: https://github.com/apache/lucene-solr/pull/780#discussion_r303824411
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/handler/component/QueryElevationComponent.java
 ##
 @@ -682,35 +686,48 @@ public String getDescription() {
* Creates the {@link ElevationProvider} to set during configuration 
loading. The same instance will be used later
* when elevating results for queries.
*
-   * @param queryAnalyzer to analyze and tokenize the query.
* @param elevationBuilderMap map of all {@link ElevatingQuery} and their 
corresponding {@link ElevationBuilder}.
* @return The created {@link ElevationProvider}.
*/
-  protected ElevationProvider createElevationProvider(Analyzer queryAnalyzer, 
Map elevationBuilderMap) {
-return new MapElevationProvider(elevationBuilderMap);
+  protected ElevationProvider createElevationProvider(Map elevationBuilderMap) {
+return new SubsetMatchElevationProvider(new TrieSubsetMatcher.Builder<>(), 
elevationBuilderMap);
   }
 
   
//-
   // Query analysis and tokenization
   
//-
 
   /**
-   * Analyzes the provided query string and returns a space concatenation of 
the analyzed tokens.
+   * Analyzes the provided query string and returns a concatenation of the 
analyzed tokens.
*/
   public String analyzeQuery(String query) {
-//split query terms with analyzer then join
-StringBuilder norm = new StringBuilder();
+StringBuilder concatenatedTerms = new StringBuilder();
+analyzeQuery(query, null, concatenatedTerms);
+return concatenatedTerms.toString();
+  }
+
+  /**
+   * Analyzes the provided query string, tokenizes the terms and add them to 
either the provided {@link Collection} or {@link Appendable}.
+   *
+   * @param queryTerms The {@link Collection} that receives the terms; or null 
if none.
+   * @param concatenatedTerms The {@link Appendable} that receives the terms; 
or null if none.
+   */
+  protected void analyzeQuery(String query, Collection queryTerms, 
Appendable concatenatedTerms) {
 
 Review comment:
   I pondered for a while on this signature, because I wanted it to be clear 
but also performant where it is used, to avoid the creation of lots of lambdas 
in loops.
   I'll change back to the Consumer. It will create just an additional lambda 
instance per query to process, given that I'll declare the lambdas out of the 
loops.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #780: SOLR-11866: Support efficient subset matching in query elevation rules

2019-07-16 Thread GitBox
bruno-roustant commented on a change in pull request #780: SOLR-11866: Support 
efficient subset matching in query elevation rules
URL: https://github.com/apache/lucene-solr/pull/780#discussion_r303794163
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/handler/component/QueryElevationComponent.java
 ##
 @@ -425,10 +429,10 @@ protected ElevationProvider 
loadElevationProvider(XmlConfigFile config) throws I
 previousElevationBuilder.merge(elevationBuilder);
   }
 }
-return createElevationProvider(queryAnalyzer, elevationBuilderMap);
+return createElevationProvider(elevationBuilderMap);
 
 Review comment:
   Yes


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #780: SOLR-11866: Support efficient subset matching in query elevation rules

2019-07-16 Thread GitBox
bruno-roustant commented on a change in pull request #780: SOLR-11866: Support 
efficient subset matching in query elevation rules
URL: https://github.com/apache/lucene-solr/pull/780#discussion_r303786550
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/handler/component/QueryElevationComponent.java
 ##
 @@ -1020,8 +1066,58 @@ public Elevation(Set elevatedIds, 
Set excludedIds,
 for (BytesRef excludedId : excludedIds) {
   excludeQueriesBuilder.add(new TermQuery(new Term(queryFieldName, 
excludedId)));
 }
-excludeQueries = excludeQueriesBuilder.toArray(new 
TermQuery[excludeQueriesBuilder.size()]);
+excludeQueries = excludeQueriesBuilder.toArray(new TermQuery[0]);
+  }
+}
+
+protected Elevation(Set elevatedIds, BooleanQuery includeQuery, 
Set excludedIds, TermQuery[] excludeQueries) {
+  this.elevatedIds = elevatedIds;
+  this.includeQuery = includeQuery;
+  this.excludedIds = excludedIds;
+  this.excludeQueries = excludeQueries;
+}
+
+/**
+ * Merges this {@link Elevation} with another and creates a new {@link 
Elevation}.
+
+ * @return A new instance containing the merging of the two elevations; or 
directly this elevation if the other
+ * is null.
+ */
+protected Elevation mergeWith(Elevation elevation) {
+  if (elevation == null) {
+return this;
+  }
+  Set elevatedIds = 
ImmutableSet.builder().addAll(this.elevatedIds).addAll(elevation.elevatedIds).build();
+  boolean overlappingElevatedIds = elevatedIds.size() != 
(this.elevatedIds.size() + elevation.elevatedIds.size());
+  BooleanQuery.Builder includeQueryBuilder = new BooleanQuery.Builder();
+  Set clauseSet = (overlappingElevatedIds ? new HashSet<>() 
: null);
 
 Review comment:
   I'll initialize with Sets.newHashSetWithExpectedSize(elevatedIds.size()).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org