[GitHub] [lucene] mikemccand merged pull request #116: Explicit flush

2021-04-29 Thread GitBox


mikemccand merged pull request #116:
URL: https://github.com/apache/lucene/pull/116


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #116: Explicit flush

2021-04-29 Thread GitBox


mikemccand commented on pull request #116:
URL: https://github.com/apache/lucene/pull/116#issuecomment-829720860


   Thank you @balmukundblr!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #115: LUCENE-4198: add format description for term impacts to javadocs

2021-04-29 Thread GitBox


mocobeta commented on pull request #115:
URL: https://github.com/apache/lucene/pull/115#issuecomment-829706157


   @jpountz Could you take a look at this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9936) update gradle build to support gpg signing of tgz/zip distributions

2021-04-29 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated LUCENE-9936:
---
Attachment: LUCENE-9936.patch
Status: Open  (was: Open)

Patch updated with additional lessons learned in SOLR-15361

> update gradle build to support gpg signing of tgz/zip distributions
> ---
>
> Key: LUCENE-9936
> URL: https://issues.apache.org/jira/browse/LUCENE-9936
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: LUCENE-9936.patch, LUCENE-9936.patch
>
>
> the gradle build does not currently have any support for gpg signing the 
> distributions we produce.
> this is neccessary for releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy merged pull request #2487: SOLR-15383 Solr Zookeeper status page shows green even when a ZK is down

2021-04-29 Thread GitBox


janhoy merged pull request #2487:
URL: https://github.com/apache/lucene-solr/pull/2487


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy opened a new pull request #2487: SOLR-15383 Solr Zookeeper status page shows green even when a ZK is down

2021-04-29 Thread GitBox


janhoy opened a new pull request #2487:
URL: https://github.com/apache/lucene-solr/pull/2487


   This is a backport to 8.9 of https://github.com/apache/solr/pull/103
   
   See https://issues.apache.org/jira/browse/SOLR-15383


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #116: Explicit flush

2021-04-29 Thread GitBox


mikemccand commented on pull request #116:
URL: https://github.com/apache/lucene/pull/116#issuecomment-829452800


   Thanks @balmukundblr -- looks great -- I'll try to push today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] balmukundblr commented on pull request #2349: Added FlushIndexTask to flush documents at index thread level.

2021-04-29 Thread GitBox


balmukundblr commented on pull request #2349:
URL: https://github.com/apache/lucene-solr/pull/2349#issuecomment-829439874


   > Thanks @balmukundblr this looks great! Could you please open a new PR on 
the new Lucene GitHub repo? https://github.com/apache/lucene
   > 
   > Thanks!
   
   As you suggested, i've raised a PR(https://github.com/apache/lucene/pull/116)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] balmukundblr opened a new pull request #116: Explicit flush

2021-04-29 Thread GitBox


balmukundblr opened a new pull request #116:
URL: https://github.com/apache/lucene/pull/116


   
   
   
   # Description
   
   Longer completion time for Close Index call.
   
   Once AddDoc task completes, Benchmark algo calls ForceMerge/CloseIndex task, 
which eventually allows all pending flushes to be completed. Since flushes 
during CloseIndex call are sequential, it takes longer time to complete and 
delays the overall Index completion time. While indexing 1 million documents 
with reuters21578 (plain text Document derived from reuters21578 corpus), we 
observed CloseIndex call takes around 35% of total time.
   
   # Solution
   
   Developed a new FlushIndexTask, which uses flushNextBuffer() Lucene API, to 
flush document at Index thread level, while not impacting any other Index 
threads. Adding this task in the algo file, immediately after AddDoc task, 
would ensure flushing all docs before calling ForceMerge/CloseIndex task.
   With this solution in place, CloseIndex task time was reduced significantly 
and it also improved total time for Indexing.
   
   # Tests
   
   Since, we are using existing Lucene API - flushNextBuffer(), hence it 
already has test cases.
   -Passed existing tests
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code 
conforms to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ x] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x ] I have developed this patch against the `main` branch.
   - [x ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9944) Implement alternative drill sideways faceting with provided CollectorManager

2021-04-29 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335643#comment-17335643
 ] 

Greg Miller commented on LUCENE-9944:
-

Here is a draft version of this change. It's rough (mostly a copy/paste job) 
but illustrates the desired functionality. If folks think this is a good idea, 
I'll clean up the change, add some testing and put together a proper PR.

 

https://github.com/apache/lucene/compare/main...gsmiller:LUCENE-9944/draft

> Implement alternative drill sideways faceting with provided CollectorManager
> 
>
> Key: LUCENE-9944
> URL: https://issues.apache.org/jira/browse/LUCENE-9944
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Priority: Minor
>
> Today, if a user of {{DrillSideways}} wants to provide their own 
> {{CollectorManager}} when invoking {{search}}, they get this alternate, 
> "concurrent" implementation that creates N copies of the provided 
> {{DrillDownQuery}} (where N is the number of drill-down dimensions) and runs 
> them all concurrently. This is a very different implementation than the one a 
> user would get if providing a {{Collector}} instead. Additionally, an 
> {{ExecutorService}} must be provided when constructing a {{DrillSideways}} 
> instance if the user wants to bring their own {{CollectorManager}} 
> (otherwise, they'll get an unfriendly NPE when calling {{search}}).
> I propose adding an implementation to {{DrillSideways}} that will run the 
> "non-concurrent" algorithm in the case that a user wants to provide their own 
> {{CollectorManager}} but doesn't want to provide an {{ExecutorService}} (and 
> doesn't want the concurrent algorithm).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9944) Implement alternative drill sideways faceting with provided CollectorManager

2021-04-29 Thread Greg Miller (Jira)
Greg Miller created LUCENE-9944:
---

 Summary: Implement alternative drill sideways faceting with 
provided CollectorManager
 Key: LUCENE-9944
 URL: https://issues.apache.org/jira/browse/LUCENE-9944
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Affects Versions: main (9.0)
Reporter: Greg Miller


Today, if a user of {{DrillSideways}} wants to provide their own 
{{CollectorManager}} when invoking {{search}}, they get this alternate, 
"concurrent" implementation that creates N copies of the provided 
{{DrillDownQuery}} (where N is the number of drill-down dimensions) and runs 
them all concurrently. This is a very different implementation than the one a 
user would get if providing a {{Collector}} instead. Additionally, an 
{{ExecutorService}} must be provided when constructing a {{DrillSideways}} 
instance if the user wants to bring their own {{CollectorManager}} (otherwise, 
they'll get an unfriendly NPE when calling {{search}}).

I propose adding an implementation to {{DrillSideways}} that will run the 
"non-concurrent" algorithm in the case that a user wants to provide their own 
{{CollectorManager}} but doesn't want to provide an {{ExecutorService}} (and 
doesn't want the concurrent algorithm).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy opened a new pull request #2486: SOLR-15384 Zookeeper Status handler /admin/zookeeper/status not queryable from SolrJ

2021-04-29 Thread GitBox


janhoy opened a new pull request #2486:
URL: https://github.com/apache/lucene-solr/pull/2486


   Backport of https://github.com/apache/solr/pull/105 targeting 8x (8.9)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-29 Thread GitBox


neoremind commented on a change in pull request #91:
URL: https://github.com/apache/lucene/pull/91#discussion_r623150413



##
File path: 
lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointsReaderUtils.java
##
@@ -35,63 +37,60 @@
 
   MutablePointsReaderUtils() {}
 
-  /** Sort the given {@link MutablePointValues} based on its packed value then 
doc ID. */
+  /**
+   * Sort the given {@link MutablePointValues} based on its packed value, note 
that doc ID is not
+   * taken into sorting algorithm, since if they are already in ascending 
order, stable sort is able
+   * to maintain the ordering of doc ID.
+   */

Review comment:
   What if the doc IDs came in are not in ascending order? Shall we use 
MSBRadixSorter with stable reorder to sort? If so, then there needs an if-else 
clause to provide two different sorters.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-29 Thread GitBox


neoremind commented on a change in pull request #91:
URL: https://github.com/apache/lucene/pull/91#discussion_r623147194



##
File path: 
lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointsReaderUtils.java
##
@@ -35,63 +37,60 @@
 
   MutablePointsReaderUtils() {}
 
-  /** Sort the given {@link MutablePointValues} based on its packed value then 
doc ID. */
+  /**
+   * Sort the given {@link MutablePointValues} based on its packed value, note 
that doc ID is not
+   * taken into sorting algorithm, since if they are already in ascending 
order, stable sort is able
+   * to maintain the ordering of doc ID.
+   */

Review comment:
   Sure, I can check here, well, I search the codebase, the only user is 
`BKDWriter`, the others are some testcases (actually I updated 
`TestMutablePointsReaderUtils` to ignore doc ID when comparing), could we make 
it a must for the input to have doc IDs in ascending order? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-29 Thread GitBox


neoremind commented on a change in pull request #91:
URL: https://github.com/apache/lucene/pull/91#discussion_r623143156



##
File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+/**
+ * Stable radix sorter for variable-length strings.
+ *
+ * @lucene.internal
+ */
+public abstract class StableMSBRadixSorter extends MSBRadixSorter {

Review comment:
   The `InPlaceMergeSorter` works like a closure, it needs some runtime 
variables like `reader` and `config` to construct, so it is not able to 
override `getFallbackSorter`. How about make `getFallbackSorter` throws an 
UnexpectedOperationException here explicitly, to give the caller a hint to 
override?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-29 Thread GitBox


neoremind commented on a change in pull request #91:
URL: https://github.com/apache/lucene/pull/91#discussion_r623139405



##
File path: lucene/core/src/test/org/apache/lucene/util/bkd/TestBKD.java
##
@@ -1698,6 +1698,16 @@ public void getValue(int i, BytesRef packedValue) {
   public byte getByteAt(int i, int k) {
 throw new UnsupportedOperationException();
   }
+
+  @Override
+  public void assign(int from, int to) {
+// do nothing
+  }
+
+  @Override
+  public void finalizeAssign(int from, int to) {
+// do nothing
+  }

Review comment:
   comment addressed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-29 Thread GitBox


neoremind commented on a change in pull request #91:
URL: https://github.com/apache/lucene/pull/91#discussion_r623139120



##
File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+/**
+ * Stable radix sorter for variable-length strings.
+ *
+ * @lucene.internal
+ */
+public abstract class StableMSBRadixSorter extends MSBRadixSorter {
+
+  protected boolean useStableSort;
+
+  public StableMSBRadixSorter(int maxLength) {
+super(maxLength);
+  }
+
+  /** Assign the from-th value to to-th position in another array which used 
temporarily. */
+  protected void assign(int from, int to) {
+throw new UnsupportedOperationException();
+  }
+
+  /** Finalize assign operation, to switch array. */
+  protected void finalizeAssign(int from, int to) {
+throw new UnsupportedOperationException();
+  }

Review comment:
   comment addressed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] neoremind commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-29 Thread GitBox


neoremind commented on a change in pull request #91:
URL: https://github.com/apache/lucene/pull/91#discussion_r623138722



##
File path: lucene/core/src/java/org/apache/lucene/codecs/MutablePointValues.java
##
@@ -41,4 +41,10 @@ protected MutablePointValues() {}
 
   /** Swap the i-th and j-th values. */
   public abstract void swap(int i, int j);
+
+  /** Assign the from-th value to to-th position in another array which used 
temporarily. */
+  public abstract void assign(int from, int to);
+
+  /** Finalize assign operation, to switch array. */
+  public abstract void finalizeAssign(int from, int to);

Review comment:
   I have updated the two method names. In terms of performance, `memcpy` 
should be faster than many `memset` operations, so I thought maybe the current 
implementation makes sense, what do you think?

##
File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+/**
+ * Stable radix sorter for variable-length strings.
+ *
+ * @lucene.internal
+ */
+public abstract class StableMSBRadixSorter extends MSBRadixSorter {
+
+  protected boolean useStableSort;

Review comment:
   removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9572) Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types

2021-04-29 Thread Gus Heck (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gus Heck resolved LUCENE-9572.
--
Fix Version/s: 8.9
   Resolution: Implemented

> Allow TypeAsSynonymFilter to propagate selected flags and Ignore some types
> ---
>
> Key: LUCENE-9572
> URL: https://issues.apache.org/jira/browse/LUCENE-9572
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis, modules/test-framework
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.9
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> (Breaking this off of SOLR-14597 for independent review)
> TypeAsSynonymFilter converts types attributes to a synonym. In some cases the 
> original token may have already had flags set on it and it may be useful to 
> propagate some or all of those flags to the synonym we are generating. This 
> ticket provides that ability and allows the user to specify a bitmask to 
> specify which flags are retained.
> Additionally there may be some set of types that should not be converted to 
> synonyms, and this change allows the user to specify a comma separated list 
> of types to ignore (most common case will be to ignore a common default type 
> of 'word' I suspect)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a change in pull request #115: LUCENE-4198: add format description for term impacts to javadocs

2021-04-29 Thread GitBox


mocobeta commented on a change in pull request #115:
URL: https://github.com/apache/lucene/pull/115#discussion_r623054572



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsFormat.java
##
@@ -299,7 +306,7 @@
  *   positions. Some payloads and offsets will be separated out into .pos 
file, for performance
  *   reasons.
  *   
- * PayFile(.pay): --> Header, 
+ * PayFile(.pay): --> Header, 

Review comment:
   It is not related to LUCENE-4198 at all... but TermPayloads seems to be 
optional as well as TermOffsets.
   
https://github.com/apache/lucene/blob/a9a3f6529dac48b9e83a03343b5dda3dc492d955/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsWriter.java#L328-L333




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a change in pull request #115: LUCENE-4198: add format description for term impacts to javadocs

2021-04-29 Thread GitBox


mocobeta commented on a change in pull request #115:
URL: https://github.com/apache/lucene/pull/115#discussion_r623054572



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsFormat.java
##
@@ -299,7 +306,7 @@
  *   positions. Some payloads and offsets will be separated out into .pos 
file, for performance
  *   reasons.
  *   
- * PayFile(.pay): --> Header, 
+ * PayFile(.pay): --> Header, 

Review comment:
   It is not related to LUCENE-4198 at all... but TermPayloads seems to be 
optional as well as TermOffsets.
   
https://github.com/apache/lucene/blob/a9a3f6529dac48b9e83a03343b5dda3dc492d955/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsWriter.java#L300-L313




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a change in pull request #115: LUCENE-4198: add format description for term impacts to javadocs

2021-04-29 Thread GitBox


mocobeta commented on a change in pull request #115:
URL: https://github.com/apache/lucene/pull/115#discussion_r623050451



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsFormat.java
##
@@ -235,6 +238,10 @@
  * and stored as a difference sequence.
  * PayByteUpto indicates the start offset of the current payload. 
It is equivalent to
  * the sum of the payload lengths in the current block up to 
PosBlockOffset
+ * ImpactLength is the total length of CompetitiveFreqDelta and 
CompetitiveNormDelta
+ * pairs. CompetitiveFreqDelta and CompetitiveNormDelta are used 
to safely skip score
+ * calculation for uncompetitive documents; See {@link
+ * org.apache.lucene.codecs.CompetitiveImpactAccumulator} for more 
details.

Review comment:
   Also added brief description about the impacts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta opened a new pull request #115: LUCENE-4198: add format description for term impacts to javadocs

2021-04-29 Thread GitBox


mocobeta opened a new pull request #115:
URL: https://github.com/apache/lucene/pull/115


   https://issues.apache.org/jira/browse/LUCENE-4198 introduced term impacts to 
PostingsFormat; this fixed the format documentation ("SkipDatum" part) to 
reflect the changes.
   
   **latest javadocs (for .doc file)**
   
![postingsformat_latest_javadoc](https://user-images.githubusercontent.com/1825333/116557310-49200f80-a939-11eb-88a3-470c9b7c3162.png)
   
   **modified javadocs with this patch (for .doc file)** 
   
![postingsformat_updated_javadoc](https://user-images.githubusercontent.com/1825333/116557363-55a46800-a939-11eb-91bc-cd02ec86aa44.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov opened a new pull request #114: LUCENE-9905: PerFieldVectorFormat

2021-04-29 Thread GitBox


msokolov opened a new pull request #114:
URL: https://github.com/apache/lucene/pull/114


   This emulates the approach taken for per-field customization of postings and 
doc-values formats and applies that to numeric vectors, ie VectorFormat. It 
registers discoverable services for Lucen90HnswVectorFormat and a new 
AssertingVectorFormat, to enable testing akin to what we have for the other 
formats. The asserting format doesn't assert very much, but it's a start.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova merged pull request #103: Fix regression to account payloads while merging

2021-04-29 Thread GitBox


mayya-sharipova merged pull request #103:
URL: https://github.com/apache/lucene/pull/103


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a change in pull request #103: Fix regression to account payloads while merging

2021-04-29 Thread GitBox


mayya-sharipova commented on a change in pull request #103:
URL: https://github.com/apache/lucene/pull/103#discussion_r623002647



##
File path: lucene/core/src/test/org/apache/lucene/index/TestTermVectors.java
##
@@ -16,70 +16,22 @@
  */
 package org.apache.lucene.index;
 
+import static 
com.carrotsearch.randomizedtesting.RandomizedTest.randomIntBetween;
+
 import java.io.IOException;
 import org.apache.lucene.analysis.MockAnalyzer;
-import org.apache.lucene.analysis.MockTokenizer;
+import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.document.Document;
 import org.apache.lucene.document.Field;
 import org.apache.lucene.document.FieldType;
 import org.apache.lucene.document.TextField;
 import org.apache.lucene.store.Directory;
-import org.apache.lucene.util.English;
+import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.IOUtils;
 import org.apache.lucene.util.LuceneTestCase;
 import org.apache.lucene.util.TestUtil;
-import org.junit.AfterClass;
-import org.junit.BeforeClass;
 
 public class TestTermVectors extends LuceneTestCase {
-  private static IndexReader reader;
-  private static Directory directory;
-
-  @BeforeClass
-  public static void beforeClass() throws Exception {
-directory = newDirectory();
-RandomIndexWriter writer =
-new RandomIndexWriter(
-random(),
-directory,
-newIndexWriterConfig(new MockAnalyzer(random(), 
MockTokenizer.SIMPLE, true))
-.setMergePolicy(newLogMergePolicy()));
-// writer.setNoCFSRatio(1.0);
-// writer.infoStream = System.out;
-for (int i = 0; i < 1000; i++) {
-  Document doc = new Document();
-  FieldType ft = new FieldType(TextField.TYPE_STORED);
-  int mod3 = i % 3;
-  int mod2 = i % 2;
-  if (mod2 == 0 && mod3 == 0) {
-ft.setStoreTermVectors(true);
-ft.setStoreTermVectorOffsets(true);
-ft.setStoreTermVectorPositions(true);
-  } else if (mod2 == 0) {
-ft.setStoreTermVectors(true);
-ft.setStoreTermVectorPositions(true);
-  } else if (mod3 == 0) {
-ft.setStoreTermVectors(true);
-ft.setStoreTermVectorOffsets(true);
-  } else {
-ft.setStoreTermVectors(true);
-  }
-  doc.add(new Field("field", English.intToEnglish(i), ft));
-  // test no term vectors too
-  doc.add(new TextField("noTV", English.intToEnglish(i), Field.Store.YES));
-  writer.addDocument(doc);
-}
-reader = writer.getReader();
-writer.close();
-  }
-
-  @AfterClass
-  public static void afterClass() throws Exception {
-reader.close();
-directory.close();
-reader = null;
-directory = null;
-  }

Review comment:
   Indeed, it was unused. Each test was using different variables for 
directory and different names fields. 

##
File path: lucene/core/src/test/org/apache/lucene/index/TestTermVectors.java
##
@@ -166,4 +118,97 @@ public void testFullMergeAddIndexesReader() throws 
Exception {
 verifyIndex(target);
 IOUtils.close(target, input[0], input[1]);
   }
+
+  /**
+   * Assert that a merged segment has payloads set up in fieldInfo, if at 
least 1 segment has
+   * payloads for this field.
+   */
+  public void testMergeWithPayloads() throws Exception {
+
+final FieldType ft1 = new FieldType(TextField.TYPE_NOT_STORED);
+ft1.setStoreTermVectors(true);
+ft1.setStoreTermVectorOffsets(true);
+ft1.setStoreTermVectorPositions(true);
+ft1.setStoreTermVectorPayloads(true);
+ft1.freeze();
+
+Directory dir = newDirectory();
+final int numDocsInSegment = 10;
+IndexWriterConfig indexWriterConfig =
+new IndexWriterConfig(new 
MockAnalyzer(random())).setMaxBufferedDocs(numDocsInSegment);
+IndexWriter writer = new IndexWriter(dir, indexWriterConfig);
+
+boolean hasPayloads1 = random().nextBoolean();
+boolean hasPayloads2 = !hasPayloads1;

Review comment:
   Addressed in 70bdf18a7ba81f4c80b2663d72793c2c38707da8




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9204) Move span queries to the queries module

2021-04-29 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335319#comment-17335319
 ] 

Adrien Grand commented on LUCENE-9204:
--

+1 I think this code would be best duplicated anyway, so that boolean queries 
can evolve the way they work without risking to break spans or vice-versa.

> Move span queries to the queries module
> ---
>
> Key: LUCENE-9204
> URL: https://issues.apache.org/jira/browse/LUCENE-9204
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>
> We have a slightly odd situation currently, with two parallel query 
> structures for building complex positional queries: the long-standing span 
> queries, in core; and interval queries, in the queries module.  Given that 
> interval queries solve at least some of the problems we've had with Spans, I 
> think we should be pushing users more towards these implementations.  It's 
> counter-intuitive to do that when Spans are in core though.  I've opened this 
> issue to discuss moving the spans package as a whole to the queries module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a change in pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-04-29 Thread GitBox


jpountz commented on a change in pull request #91:
URL: https://github.com/apache/lucene/pull/91#discussion_r622900018



##
File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+/**
+ * Stable radix sorter for variable-length strings.
+ *
+ * @lucene.internal
+ */
+public abstract class StableMSBRadixSorter extends MSBRadixSorter {
+
+  protected boolean useStableSort;

Review comment:
   unused?

##
File path: lucene/core/src/java/org/apache/lucene/codecs/MutablePointValues.java
##
@@ -41,4 +41,10 @@ protected MutablePointValues() {}
 
   /** Swap the i-th and j-th values. */
   public abstract void swap(int i, int j);
+
+  /** Assign the from-th value to to-th position in another array which used 
temporarily. */
+  public abstract void assign(int from, int to);
+
+  /** Finalize assign operation, to switch array. */
+  public abstract void finalizeAssign(int from, int to);

Review comment:
   `TimSorter` uses `save` and `restore` as names, maybe we should try to 
reuse this terminology for consistency?
   
   By the way, maybe we could even reuse the semantics of these methods where 
`save` copies a whole range while `restore` restores a single element at a 
time. This would only require slight modifications to 
`StableMSBRadixSorter#reorder` to copy the whole range first and then move 
items from the `temp` array to their expected index in the original array?

##
File path: lucene/core/src/test/org/apache/lucene/util/bkd/TestBKD.java
##
@@ -1698,6 +1698,16 @@ public void getValue(int i, BytesRef packedValue) {
   public byte getByteAt(int i, int k) {
 throw new UnsupportedOperationException();
   }
+
+  @Override
+  public void assign(int from, int to) {
+// do nothing
+  }
+
+  @Override
+  public void finalizeAssign(int from, int to) {
+// do nothing
+  }

Review comment:
   can you throw an UnexpectedOperationException in both these methods 
instead, like we do for other methods?

##
File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+/**
+ * Stable radix sorter for variable-length strings.
+ *
+ * @lucene.internal
+ */
+public abstract class StableMSBRadixSorter extends MSBRadixSorter {

Review comment:
   Since the name has "stable", can you override `getFallbackSorter` to 
return an `InPlaceMergeSorter` so that the sort is guaranteed to be stable in 
all cases?

##
File path: lucene/core/src/java/org/apache/lucene/util/StableMSBRadixSorter.java
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed unde

[jira] [Commented] (LUCENE-9204) Move span queries to the queries module

2021-04-29 Thread Alan Woodward (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335265#comment-17335265
 ] 

Alan Woodward commented on LUCENE-9204:
---

So there are a few things that need to be done here, because Spans reuse some 
of the two-phase conjunction and disjunction logic.  It can all be done as a 
single PR but it might make sense to split things out into smaller tickets? Or 
multiple PRs attached to this ticket maybe.
* Duplicate ConjunctionDISI and add a package-private version in the spans 
package.  We should be able to make the standard impl package-private at this 
point as well
* Duplicate DisiWrapper/DisiPriorityQueue/DisjunctionDISI and add a 
package-private version in the spans package.  This will make three copies of 
this code (there's one for intervals as well) which suggests that we may be 
able to generify it a bit, but it's also on the hot path for queries so maybe 
we just live with the duplication.
* Actually move all the spans code and move those tests that rely on spans into 
queries/tests.  Some of these can be turned into base test classes - for 
example the Matches test class has a lot of methods which could be moved to 
test-framework and then the span test cases can be added to a new suite in 
queries.

> Move span queries to the queries module
> ---
>
> Key: LUCENE-9204
> URL: https://issues.apache.org/jira/browse/LUCENE-9204
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>
> We have a slightly odd situation currently, with two parallel query 
> structures for building complex positional queries: the long-standing span 
> queries, in core; and interval queries, in the queries module.  Given that 
> interval queries solve at least some of the problems we've had with Spans, I 
> think we should be pushing users more towards these implementations.  It's 
> counter-intuitive to do that when Spans are in core though.  I've opened this 
> issue to discuss moving the spans package as a whole to the queries module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9940) The order of disjuncts in DisjunctionMaxQuery affects equals() impl

2021-04-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335255#comment-17335255
 ] 

ASF subversion and git services commented on LUCENE-9940:
-

Commit f7a3587091f8ce05ef08a56523571239b383b217 in lucene's branch 
refs/heads/main from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f7a3587 ]

LUCENE-9940: DisjunctionMaxQuery shouldn't depend on disjunct order for equals 
checks (#110)

DisjunctionMaxQuery stores its disjuncts in a Query[], and uses
Arrays.equals() for comparisons in its equals() implementation.
This means that the order in which disjuncts are added to the query
matters for equality checks.

This commit changes DMQ to instead store its disjuncts in a Multiset,
meaning that ordering no longer matters. The getDisjuncts()
method now returns a Collection rather than a List, and
some tests are changed to use query equality checks rather than
iterating over disjuncts and expecting a particular order.

> The order of disjuncts in DisjunctionMaxQuery affects equals() impl
> ---
>
> Key: LUCENE-9940
> URL: https://issues.apache.org/jira/browse/LUCENE-9940
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DisjunctionMaxQuery stores its disjuncts in a java array, and its equals() 
> implementation uses Arrays.equal() when checking equality.  This means that 
> two queries with the same disjuncts but added in a different order will 
> compare as different, even though their results will be identical.  We should 
> replace the array with a Set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9940) The order of disjuncts in DisjunctionMaxQuery affects equals() impl

2021-04-29 Thread Alan Woodward (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-9940.
---
Fix Version/s: main (9.0)
   Resolution: Fixed

> The order of disjuncts in DisjunctionMaxQuery affects equals() impl
> ---
>
> Key: LUCENE-9940
> URL: https://issues.apache.org/jira/browse/LUCENE-9940
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> DisjunctionMaxQuery stores its disjuncts in a java array, and its equals() 
> implementation uses Arrays.equal() when checking equality.  This means that 
> two queries with the same disjuncts but added in a different order will 
> compare as different, even though their results will be identical.  We should 
> replace the array with a Set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek merged pull request #110: LUCENE-9940: DisjunctionMaxQuery shouldn't depend on disjunct order for equals checks

2021-04-29 Thread GitBox


romseygeek merged pull request #110:
URL: https://github.com/apache/lucene/pull/110


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a change in pull request #103: Fix regression to account payloads while merging

2021-04-29 Thread GitBox


jpountz commented on a change in pull request #103:
URL: https://github.com/apache/lucene/pull/103#discussion_r622844633



##
File path: lucene/core/src/test/org/apache/lucene/index/TestTermVectors.java
##
@@ -166,4 +118,97 @@ public void testFullMergeAddIndexesReader() throws 
Exception {
 verifyIndex(target);
 IOUtils.close(target, input[0], input[1]);
   }
+
+  /**
+   * Assert that a merged segment has payloads set up in fieldInfo, if at 
least 1 segment has
+   * payloads for this field.
+   */
+  public void testMergeWithPayloads() throws Exception {
+
+final FieldType ft1 = new FieldType(TextField.TYPE_NOT_STORED);
+ft1.setStoreTermVectors(true);
+ft1.setStoreTermVectorOffsets(true);
+ft1.setStoreTermVectorPositions(true);
+ft1.setStoreTermVectorPayloads(true);
+ft1.freeze();
+
+Directory dir = newDirectory();
+final int numDocsInSegment = 10;
+IndexWriterConfig indexWriterConfig =
+new IndexWriterConfig(new 
MockAnalyzer(random())).setMaxBufferedDocs(numDocsInSegment);
+IndexWriter writer = new IndexWriter(dir, indexWriterConfig);
+
+boolean hasPayloads1 = random().nextBoolean();
+boolean hasPayloads2 = !hasPayloads1;

Review comment:
   Can you test both cases everytime instead of relying on randomization?

##
File path: lucene/core/src/test/org/apache/lucene/index/TestTermVectors.java
##
@@ -16,70 +16,22 @@
  */
 package org.apache.lucene.index;
 
+import static 
com.carrotsearch.randomizedtesting.RandomizedTest.randomIntBetween;
+
 import java.io.IOException;
 import org.apache.lucene.analysis.MockAnalyzer;
-import org.apache.lucene.analysis.MockTokenizer;
+import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.document.Document;
 import org.apache.lucene.document.Field;
 import org.apache.lucene.document.FieldType;
 import org.apache.lucene.document.TextField;
 import org.apache.lucene.store.Directory;
-import org.apache.lucene.util.English;
+import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.IOUtils;
 import org.apache.lucene.util.LuceneTestCase;
 import org.apache.lucene.util.TestUtil;
-import org.junit.AfterClass;
-import org.junit.BeforeClass;
 
 public class TestTermVectors extends LuceneTestCase {
-  private static IndexReader reader;
-  private static Directory directory;
-
-  @BeforeClass
-  public static void beforeClass() throws Exception {
-directory = newDirectory();
-RandomIndexWriter writer =
-new RandomIndexWriter(
-random(),
-directory,
-newIndexWriterConfig(new MockAnalyzer(random(), 
MockTokenizer.SIMPLE, true))
-.setMergePolicy(newLogMergePolicy()));
-// writer.setNoCFSRatio(1.0);
-// writer.infoStream = System.out;
-for (int i = 0; i < 1000; i++) {
-  Document doc = new Document();
-  FieldType ft = new FieldType(TextField.TYPE_STORED);
-  int mod3 = i % 3;
-  int mod2 = i % 2;
-  if (mod2 == 0 && mod3 == 0) {
-ft.setStoreTermVectors(true);
-ft.setStoreTermVectorOffsets(true);
-ft.setStoreTermVectorPositions(true);
-  } else if (mod2 == 0) {
-ft.setStoreTermVectors(true);
-ft.setStoreTermVectorPositions(true);
-  } else if (mod3 == 0) {
-ft.setStoreTermVectors(true);
-ft.setStoreTermVectorOffsets(true);
-  } else {
-ft.setStoreTermVectors(true);
-  }
-  doc.add(new Field("field", English.intToEnglish(i), ft));
-  // test no term vectors too
-  doc.add(new TextField("noTV", English.intToEnglish(i), Field.Store.YES));
-  writer.addDocument(doc);
-}
-reader = writer.getReader();
-writer.close();
-  }
-
-  @AfterClass
-  public static void afterClass() throws Exception {
-reader.close();
-directory.close();
-reader = null;
-directory = null;
-  }

Review comment:
   oh, this was unused?

##
File path: lucene/core/src/test/org/apache/lucene/index/TestTermVectors.java
##
@@ -166,4 +118,97 @@ public void testFullMergeAddIndexesReader() throws 
Exception {
 verifyIndex(target);
 IOUtils.close(target, input[0], input[1]);
   }
+
+  /**
+   * Assert that a merged segment has payloads set up in fieldInfo, if at 
least 1 segment has
+   * payloads for this field.
+   */
+  public void testMergeWithPayloads() throws Exception {
+
+final FieldType ft1 = new FieldType(TextField.TYPE_NOT_STORED);
+ft1.setStoreTermVectors(true);
+ft1.setStoreTermVectorOffsets(true);
+ft1.setStoreTermVectorPositions(true);
+ft1.setStoreTermVectorPayloads(true);
+ft1.freeze();
+
+Directory dir = newDirectory();
+final int numDocsInSegment = 10;
+IndexWriterConfig indexWriterConfig =
+new IndexWriterConfig(new 
MockAnalyzer(random())).setMaxBufferedDocs(numDocsInSegment);
+IndexWriter writer = new IndexWriter(dir, indexWriterConfig);
+
+boolean hasPayloads1 = random().nextBo

[jira] [Resolved] (LUCENE-9930) UkrainianMorfologikAnalyzer reloads its Dictionary for every new TokenStreamComponents instance

2021-04-29 Thread Alan Woodward (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-9930.
---
Fix Version/s: main (9.0)
   Resolution: Fixed

> UkrainianMorfologikAnalyzer reloads its Dictionary for every new 
> TokenStreamComponents instance
> ---
>
> Key: LUCENE-9930
> URL: https://issues.apache.org/jira/browse/LUCENE-9930
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Large static data structures should be loaded in Analyzer constructors and 
> shared between threads, but the UkrainianMorfologikAnalyzer is loading its 
> dictionary in `createComponents`, which means it is reloaded and stored on 
> every new analysis thread.  If you have a large dictionary and highly 
> concurrent indexing then this can lead to you running out of memory as 
> multiple copies of the dictionary are held in thread locals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a change in pull request #108: LUCENE-9897 Change dependency checking mechanism to use gradle checksum verification

2021-04-29 Thread GitBox


dweiss commented on a change in pull request #108:
URL: https://github.com/apache/lucene/pull/108#discussion_r622807200



##
File path: gradle/verification-metadata.xml
##
@@ -0,0 +1,2198 @@
+
+https://schema.gradle.org/dependency-verification"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="https://schema.gradle.org/dependency-verification 
https://schema.gradle.org/dependency-verification/dependency-verification-1.0.xsd";>
+   
+  true
+  false
+   
+   
+  
+ 
+
+ 
+ 
+
+ 
+  
+  
+ 
+
+ 
+ 
+
+ 
+  
+  
+ 
+
+ 
+  
+  
+ 
+
+ 
+  
+  
+ 
+
+ 
+  
+  
+ 
+
+ 
+ 
+
+ 
+  
+  
+ 
+
+ 
+ 
+
+ 
+  
+  

Review comment:
   This link points at disabling transitive dependencies for a particular 
dependency - it's not related to checksumming. We do want those transitive 
dependencies and palantir's plugin makes sure they're consistent.
   
   > Gradle provides an API which allows disabling dependency verification on 
some specific configurations.
   
   Right. I see how it's done in the documentation and indeed it may be 
difficult to hook into all the configurations to just pick the subset that we 
want checksums for. 
   
   We could leave all the checksums but things would have to work. If they 
don't then I'd stick to the custom solution we rolled out until switching 
without so much pain is possible?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #107: LUCENE-9047: Move the Directory APIs to be little endian (take 2)

2021-04-29 Thread GitBox


iverase commented on pull request #107:
URL: https://github.com/apache/lucene/pull/107#issuecomment-828998218


   Thanks @rmuir! I will wait until Monday, if there is no more feedback I will 
proceed merging the current patch. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org