[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

2022-08-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582097#comment-17582097
 ] 

ASF subversion and git services commented on LUCENE-9583:
-

Commit 8308688d786cd6c55fcbe4e59f67966f385989a2 in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8308688d786 ]

LUCENE-9583: Remove RandomAccessVectorValuesProducer (#1071)

This change folds the `RandomAccessVectorValuesProducer` interface into
`RandomAccessVectorValues`. This reduces the number of interfaces and clarifies
the cloning/ copying behavior.

This is a small simplification related to LUCENE-9583, but does not address the
main issue.

> How should we expose VectorValues.RandomAccess?
> ---
>
> Key: LUCENE-9583
> URL: https://issues.apache.org/jira/browse/LUCENE-9583
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.0
>Reporter: Michael Sokolov
>Assignee: Julie Tibshirani
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
> sub-interface. [~jtibshirani] pointed out this is not needed by some 
> vector-indexing strategies which can operate solely using a forward-iterator 
> (it is needed by HNSW), and so in the interest of simplifying the public API 
> we should not expose this internal detail (which by the way surfaces internal 
> ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered 
> that we do also currently make use of the RA API when merging vector fields 
> over sorted indexes. Without it, we would need to load all vectors into RAM  
> while flushing/merging, as we currently do in 
> {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
> for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN 
> {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} 
>  to {{VectorValues.RandomAccess}}. This I think we could move back, and 
> handle the HNSW requirements for search elsewhere. I wonder if that would 
> alleviate the major concern here? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10644) Facets#getAllChildren testing should ignore child order

2022-08-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581475#comment-17581475
 ] 

ASF subversion and git services commented on LUCENE-10644:
--

Commit 51d756b7801da5bb3e49b9f887cbf5ec4c05b0c5 in lucene's branch 
refs/heads/branch_9x from Yuting Gan
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=51d756b7801 ]

LUCENE-10644: Facets#getAllChildren testing should ignore child order (#1013)



> Facets#getAllChildren testing should ignore child order
> ---
>
> Key: LUCENE-10644
> URL: https://issues.apache.org/jira/browse/LUCENE-10644
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Minor
> Attachments: failing tests.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Our javadoc for {{Facets#getAllChildren}} explicitly calls out that callers 
> should make no assumptions about child ordering, but a number of our own unit 
> tests turn around and make that assumption. I ran into this when recently 
> trying an optimization that would result in a different child ordering for 
> {{{}getAllChildren{}}}, and found a number of unit tests that started 
> failing. I'll upload a list of what I found failing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10644) Facets#getAllChildren testing should ignore child order

2022-08-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581463#comment-17581463
 ] 

ASF subversion and git services commented on LUCENE-10644:
--

Commit 0914b537dbfb1ecd49bfb90c27df69a67e50c327 in lucene's branch 
refs/heads/main from Yuting Gan
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=0914b537dbf ]

LUCENE-10644: Facets#getAllChildren testing should ignore child order (#1013)



> Facets#getAllChildren testing should ignore child order
> ---
>
> Key: LUCENE-10644
> URL: https://issues.apache.org/jira/browse/LUCENE-10644
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Greg Miller
>Priority: Minor
> Attachments: failing tests.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Our javadoc for {{Facets#getAllChildren}} explicitly calls out that callers 
> should make no assumptions about child ordering, but a number of our own unit 
> tests turn around and make that assumption. I ran into this when recently 
> trying an optimization that would result in a different child ordering for 
> {{{}getAllChildren{}}}, and found a number of unit tests that started 
> failing. I'll upload a list of what I found failing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10654) New companion doc value format for LatLonShape and XYShape field types

2022-08-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579057#comment-17579057
 ] 

ASF subversion and git services commented on LUCENE-10654:
--

Commit 923a9f800aef4f376eb1978c02e94ca6bacc5a5a in lucene's branch 
refs/heads/branch_9x from Nick Knize
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=923a9f800ae ]

LUCENE-10654: Fix ShapeDocValue Bounding Box failure (#1066) (#1067)

The base spatial test case may create invalid self crossing polygons. These
polygons are cleaned by the tessellator which may result in an inconsistent
bounding box between the tessellated shape and the original, invalid, geometry.
This commit fixes the shape doc value test case to compute the bounding box from
the cleaned geometry instead of relying on the, potentially invalid, original
geometry.

Signed-off-by: Nicholas Walter Knize 

> New companion doc value format for LatLonShape and XYShape field types
> --
>
> Key: LUCENE-10654
> URL: https://issues.apache.org/jira/browse/LUCENE-10654
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nick Knize
>Priority: Major
> Fix For: 9.4
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> {{XYDocValuesField}} provides doc value support for {{XYPoint}}. 
> {{LatLonDocValuesField}} provides docvalue support for {{LatLonPoint}}.
> However, neither {{LatLonShape}} nor {{XYShape}} currently have a docvalue 
> format. 
> This lack of doc value support for shapes means facets, aggregations, and 
> IndexOrDocValues queries are currently not possible for Shape field types. 
> This gap needs be closed in lucene.
> To support IndexOrDocValues queries along with various geometry aggregations 
> and facets, the ability to compute the spatial relation with the doc value is 
> needed. This is straightforward with {{XYPoint}} and {{LatLonPoint}} since 
> the doc value encoding is nothing more than a simple 2D integer encoding of 
> the x,y and lat,lon dimensional components. Accomplishing the same with a 
> naive integer encoded binary representation for N-vertex shapes would be 
> costly. 
> {{ComponentTree}} already provides an efficient in memory structure for 
> quickly computing spatial relations over Shape types based on a binary tree 
> of tessellated triangles provided by the {{Tessellator}}. Furthermore, this 
> tessellation is already computed at index time. If we create an on-disk 
> representation of {{ComponentTree}} 's binary tree of tessellated triangles 
> and use this as the doc value {{binaryValue}} format we will be able to 
> efficiently compute spatial relations with this binary representation and 
> achieve the same facet/aggregation result over shapes as we can with points 
> today (e.g., grid facets, centroid, area, etc).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10654) New companion doc value format for LatLonShape and XYShape field types

2022-08-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579030#comment-17579030
 ] 

ASF subversion and git services commented on LUCENE-10654:
--

Commit 543910d9008e714016db8a799058860b8ece5565 in lucene's branch 
refs/heads/main from Nick Knize
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=543910d9008 ]

LUCENE-10654: Fix ShapeDocValue Bounding Box failure (#1066)

The base spatial test case may create invalid self crossing polygons. These
polygons are cleaned by the tessellator which may result in an inconsistent
bounding box between the tessellated shape and the original, invalid, geometry.
This commit fixes the shape doc value test case to compute the bounding box from
the cleaned geometry instead of relying on the, potentially invalid, original
geometry.

Signed-off-by: Nicholas Walter Knize 

> New companion doc value format for LatLonShape and XYShape field types
> --
>
> Key: LUCENE-10654
> URL: https://issues.apache.org/jira/browse/LUCENE-10654
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nick Knize
>Priority: Major
> Fix For: 9.4
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> {{XYDocValuesField}} provides doc value support for {{XYPoint}}. 
> {{LatLonDocValuesField}} provides docvalue support for {{LatLonPoint}}.
> However, neither {{LatLonShape}} nor {{XYShape}} currently have a docvalue 
> format. 
> This lack of doc value support for shapes means facets, aggregations, and 
> IndexOrDocValues queries are currently not possible for Shape field types. 
> This gap needs be closed in lucene.
> To support IndexOrDocValues queries along with various geometry aggregations 
> and facets, the ability to compute the spatial relation with the doc value is 
> needed. This is straightforward with {{XYPoint}} and {{LatLonPoint}} since 
> the doc value encoding is nothing more than a simple 2D integer encoding of 
> the x,y and lat,lon dimensional components. Accomplishing the same with a 
> naive integer encoded binary representation for N-vertex shapes would be 
> costly. 
> {{ComponentTree}} already provides an efficient in memory structure for 
> quickly computing spatial relations over Shape types based on a binary tree 
> of tessellated triangles provided by the {{Tessellator}}. Furthermore, this 
> tessellation is already computed at index time. If we create an on-disk 
> representation of {{ComponentTree}} 's binary tree of tessellated triangles 
> and use this as the doc value {{binaryValue}} format we will be able to 
> efficiently compute spatial relations with this binary representation and 
> achieve the same facet/aggregation result over shapes as we can with points 
> today (e.g., grid facets, centroid, area, etc).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10678) computing the partition point on a BKD tree merge can overflow

2022-08-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578489#comment-17578489
 ] 

ASF subversion and git services commented on LUCENE-10678:
--

Commit d426ff43c719acda20f5fc97a26f9f0774a36284 in lucene-solr's branch 
refs/heads/branch_8_11 from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d426ff43c71 ]

LUCENE-10678: Fix potential overflow when computing the partition point on the 
BKD tree (#1065) (#2668)

We currently compute the partition point for a set of points by multiplying the 
number of nodes that needs to be on
the left of the BKD tree by the maxPointsInLeafNode. This multiplication is 
done on the integer space so if the partition point is bigger than 
Integer.MAX_VALUE it will overflow.
This commit moves the multiplication to the long space so it doesn't overflow.

> computing the partition point on a BKD tree merge can overflow
> --
>
> Key: LUCENE-10678
> URL: https://issues.apache.org/jira/browse/LUCENE-10678
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.11.3, 9.4, 9.3.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I just discover a bad bug in the BKD tree when doing merges. Before calling 
> the BKDTreeRadix selector we need to compute the partition point which is 
> dome multiplying two integers. If the partition point is > Integer.MAX_VALUE 
> then it will overflow.
> https://github.com/apache/lucene/blob/35ca2d79f73c6dfaf5e648fe241f7e0b37084a90/lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java#L2021
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10678) computing the partition point on a BKD tree merge can overflow

2022-08-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578481#comment-17578481
 ] 

ASF subversion and git services commented on LUCENE-10678:
--

Commit b19ba1098cb557fec168c569f8b4bdff9d56260c in lucene-solr's branch 
refs/heads/LUCENE-10678 from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b19ba1098cb ]

LUCENE-10678: Fix potential overflow when computing the partition point on the 
BKD tree (#1065)

We currently compute the partition point for a set of points by multiplying the 
number of nodes that needs to be on
the left of the BKD tree by the maxPointsInLeafNode. This multiplication is 
done on the integer space so if the partition point
is bigger than Integer.MAX_VALUE it will overflow.
This commit moves the multiplication to the long space so it doesn't overflow.


> computing the partition point on a BKD tree merge can overflow
> --
>
> Key: LUCENE-10678
> URL: https://issues.apache.org/jira/browse/LUCENE-10678
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I just discover a bad bug in the BKD tree when doing merges. Before calling 
> the BKDTreeRadix selector we need to compute the partition point which is 
> dome multiplying two integers. If the partition point is > Integer.MAX_VALUE 
> then it will overflow.
> https://github.com/apache/lucene/blob/35ca2d79f73c6dfaf5e648fe241f7e0b37084a90/lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java#L2021
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10678) computing the partition point on a BKD tree merge can overflow

2022-08-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578474#comment-17578474
 ] 

ASF subversion and git services commented on LUCENE-10678:
--

Commit 21f892d09698208ce146775e5b7641c554410002 in lucene's branch 
refs/heads/branch_9_3 from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=21f892d0969 ]

LUCENE-10678: Fix potential overflow when computing the partition point on the 
BKD tree (#1065)

We currently compute the partition point for a set of points by multiplying the 
number of nodes that needs to be on
 the left of the BKD tree by the maxPointsInLeafNode. This multiplication is 
done on the integer space so if the partition point is bigger than 
Integer.MAX_VALUE it will overflow. This commit moves the multiplication to the 
long space so it doesn't overflow.
# Conflicts:
#   lucene/CHANGES.txt


> computing the partition point on a BKD tree merge can overflow
> --
>
> Key: LUCENE-10678
> URL: https://issues.apache.org/jira/browse/LUCENE-10678
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I just discover a bad bug in the BKD tree when doing merges. Before calling 
> the BKDTreeRadix selector we need to compute the partition point which is 
> dome multiplying two integers. If the partition point is > Integer.MAX_VALUE 
> then it will overflow.
> https://github.com/apache/lucene/blob/35ca2d79f73c6dfaf5e648fe241f7e0b37084a90/lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java#L2021
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10678) computing the partition point on a BKD tree merge can overflow

2022-08-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578473#comment-17578473
 ] 

ASF subversion and git services commented on LUCENE-10678:
--

Commit 0b9850448560aae4715719823af9922de2e2dfe2 in lucene's branch 
refs/heads/branch_9x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=0b985044856 ]

LUCENE-10678: Fix potential overflow when computing the partition point on the 
BKD tree (#1065)

We currently compute the partition point for a set of points by multiplying the 
number of nodes that needs to be on
 the left of the BKD tree by the maxPointsInLeafNode. This multiplication is 
done on the integer space so if the partition point is bigger than 
Integer.MAX_VALUE it will overflow. This commit moves the multiplication to the 
long space so it doesn't overflow.

> computing the partition point on a BKD tree merge can overflow
> --
>
> Key: LUCENE-10678
> URL: https://issues.apache.org/jira/browse/LUCENE-10678
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I just discover a bad bug in the BKD tree when doing merges. Before calling 
> the BKDTreeRadix selector we need to compute the partition point which is 
> dome multiplying two integers. If the partition point is > Integer.MAX_VALUE 
> then it will overflow.
> https://github.com/apache/lucene/blob/35ca2d79f73c6dfaf5e648fe241f7e0b37084a90/lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java#L2021
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10678) computing the partition point on a BKD tree merge can overflow

2022-08-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578472#comment-17578472
 ] 

ASF subversion and git services commented on LUCENE-10678:
--

Commit fe8d11254a8a768608d7bb5e2bf8dcfd2c2c9310 in lucene's branch 
refs/heads/main from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fe8d11254a8 ]

LUCENE-10678: Fix potential overflow when computing the partition point on the 
BKD tree (#1065)

We currently compute the partition point for a set of points by multiplying the 
number of nodes that needs to be on
 the left of the BKD tree by the maxPointsInLeafNode. This multiplication is 
done on the integer space so if the partition point is bigger than 
Integer.MAX_VALUE it will overflow. This commit moves the multiplication to the 
long space so it doesn't overflow.

> computing the partition point on a BKD tree merge can overflow
> --
>
> Key: LUCENE-10678
> URL: https://issues.apache.org/jira/browse/LUCENE-10678
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I just discover a bad bug in the BKD tree when doing merges. Before calling 
> the BKDTreeRadix selector we need to compute the partition point which is 
> dome multiplying two integers. If the partition point is > Integer.MAX_VALUE 
> then it will overflow.
> https://github.com/apache/lucene/blob/35ca2d79f73c6dfaf5e648fe241f7e0b37084a90/lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java#L2021
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10577) Quantize vector values

2022-08-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578150#comment-17578150
 ] 

ASF subversion and git services commented on LUCENE-10577:
--

Commit a693fe819b04f07942bb1bcbc28169838f1becfc in lucene's branch 
refs/heads/main from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a693fe819b0 ]

LUCENE-10577: enable quantization of HNSW vectors to 8 bits (#1054)

* LUCENE-10577: enable supplying, storing, and comparing HNSW vectors with 8 
bit precision

> Quantize vector values
> --
>
> Key: LUCENE-10577
> URL: https://issues.apache.org/jira/browse/LUCENE-10577
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The {{KnnVectorField}} api handles vectors with 4-byte floating point values. 
> These fields can be used (via {{KnnVectorsReader}}) in two main ways:
> 1. The {{VectorValues}} iterator enables retrieving values
> 2. Approximate nearest -neighbor search
> The main point of this addition was to provide the search capability, and to 
> support that it is not really necessary to store vectors in full precision. 
> Perhaps users may also be willing to retrieve values in lower precision for 
> whatever purpose those serve, if they are able to store more samples. We know 
> that 8 bits is enough to provide a very near approximation to the same 
> recall/performance tradeoff that is achieved with the full-precision vectors. 
> I'd like to explore how we could enable 4:1 compression of these fields by 
> reducing their precision.
> A few ways I can imagine this would be done:
> 1. Provide a parallel byte-oriented API. This would allow users to provide 
> their data in reduced-precision format and give control over the quantization 
> to them. It would have a major impact on the Lucene API surface though, 
> essentially requiring us to duplicate all of the vector APIs.
> 2. Automatically quantize the stored vector data when we can. This would 
> require no or perhaps very limited change to the existing API to enable the 
> feature.
> I've been exploring (2), and what I find is that we can achieve very good 
> recall results using dot-product similarity scoring by simple linear scaling 
> + quantization of the vector values, so long as  we choose the scale that 
> minimizes the quantization error. Dot-product is amenable to this treatment 
> since vectors are required to be unit-length when used with that similarity 
> function. 
>  Even still there is variability in the ideal scale over different data sets. 
> A good choice seems to be max(abs(min-value), abs(max-value)), but of course 
> this assumes that the data set doesn't have a few outlier data points. A 
> theoretical range can be obtained by 1/sqrt(dimension), but this is only 
> useful when the samples are normally distributed. We could in theory 
> determine the ideal scale when flushing a segment and manage this 
> quantization per-segment, but then numerical error could creep in when 
> merging.
> I'll post a patch/PR with an experimental setup I've been using for 
> evaluation purposes. It is pretty self-contained and simple, but has some 
> drawbacks that need to be addressed:
> 1. No automated mechanism for determining quantization scale (it's a constant 
> that I have been playing with)
> 2. Converts from byte/float when computing dot-product instead of directly 
> computing on byte values
> I'd like to get people's feedback on the approach and whether in general we 
> should think about doing this compression under the hood, or expose a 
> byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty 
> compelling and we should pursue something.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10654) New companion doc value format for LatLonShape and XYShape field types

2022-08-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577620#comment-17577620
 ] 

ASF subversion and git services commented on LUCENE-10654:
--

Commit ddf0d0acf4e4443ddea37bb855dead7bed5cc1a2 in lucene's branch 
refs/heads/branch_9x from Nick Knize
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ddf0d0acf4e ]

LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape (#1017) 
(#1064)

Adds new doc value field to support LatLonShape and XYShape doc values. The
implementation is inspired by ComponentTree. A binary tree of tessellated
components (point, line, or triangle) is created. This tree is then DFS
serialized to a variable compressed DataOutput buffer to keep the doc value
format as compact as possible.

DocValue queries are performed on the serialized tree using a similar component
relation logic as found in SpatialQuery for BKD indexed shapes. To make this
possible some of the relation logic is refactored to make it accessible to the
doc value query counterpart.

Note this does not support the following:

* Multi Geometries or Collections - This will be investigated by exploring
  the addition of multi binary doc values.
* General Geometry Queries - This will be added in a follow on improvement.

Signed-off-by: Nicholas Walter Knize 

> New companion doc value format for LatLonShape and XYShape field types
> --
>
> Key: LUCENE-10654
> URL: https://issues.apache.org/jira/browse/LUCENE-10654
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nick Knize
>Priority: Major
> Fix For: 9.4
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> {{XYDocValuesField}} provides doc value support for {{XYPoint}}. 
> {{LatLonDocValuesField}} provides docvalue support for {{LatLonPoint}}.
> However, neither {{LatLonShape}} nor {{XYShape}} currently have a docvalue 
> format. 
> This lack of doc value support for shapes means facets, aggregations, and 
> IndexOrDocValues queries are currently not possible for Shape field types. 
> This gap needs be closed in lucene.
> To support IndexOrDocValues queries along with various geometry aggregations 
> and facets, the ability to compute the spatial relation with the doc value is 
> needed. This is straightforward with {{XYPoint}} and {{LatLonPoint}} since 
> the doc value encoding is nothing more than a simple 2D integer encoding of 
> the x,y and lat,lon dimensional components. Accomplishing the same with a 
> naive integer encoded binary representation for N-vertex shapes would be 
> costly. 
> {{ComponentTree}} already provides an efficient in memory structure for 
> quickly computing spatial relations over Shape types based on a binary tree 
> of tessellated triangles provided by the {{Tessellator}}. Furthermore, this 
> tessellation is already computed at index time. If we create an on-disk 
> representation of {{ComponentTree}} 's binary tree of tessellated triangles 
> and use this as the doc value {{binaryValue}} format we will be able to 
> efficiently compute spatial relations with this binary representation and 
> achieve the same facet/aggregation result over shapes as we can with points 
> today (e.g., grid facets, centroid, area, etc).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10654) New companion doc value format for LatLonShape and XYShape field types

2022-08-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577561#comment-17577561
 ] 

ASF subversion and git services commented on LUCENE-10654:
--

Commit d7fd48c9502c567e4760a011fa99b1a491fea2cb in lucene's branch 
refs/heads/main from Nick Knize
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d7fd48c9502 ]

LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape (#1017)

Adds new doc value field to support LatLonShape and XYShape doc values. The
implementation is inspired by ComponentTree. A binary tree of tessellated
components (point, line, or triangle) is created. This tree is then DFS
serialized to a variable compressed DataOutput buffer to keep the doc value
format as compact as possible.

DocValue queries are performed on the serialized tree using a similar component
relation logic as found in SpatialQuery for BKD indexed shapes. To make this
possible some of the relation logic is refactored to make it accessible to the
doc value query counterpart.

Note this does not support the following:

* Multi Geometries or Collections - This will be investigated by exploring 
  the addition of multi binary doc values.
* General Geometry Queries - This will be added in a follow on improvement. 

Signed-off-by: Nicholas Walter Knize 

> New companion doc value format for LatLonShape and XYShape field types
> --
>
> Key: LUCENE-10654
> URL: https://issues.apache.org/jira/browse/LUCENE-10654
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Nick Knize
>Priority: Major
> Fix For: 9.4
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> {{XYDocValuesField}} provides doc value support for {{XYPoint}}. 
> {{LatLonDocValuesField}} provides docvalue support for {{LatLonPoint}}.
> However, neither {{LatLonShape}} nor {{XYShape}} currently have a docvalue 
> format. 
> This lack of doc value support for shapes means facets, aggregations, and 
> IndexOrDocValues queries are currently not possible for Shape field types. 
> This gap needs be closed in lucene.
> To support IndexOrDocValues queries along with various geometry aggregations 
> and facets, the ability to compute the spatial relation with the doc value is 
> needed. This is straightforward with {{XYPoint}} and {{LatLonPoint}} since 
> the doc value encoding is nothing more than a simple 2D integer encoding of 
> the x,y and lat,lon dimensional components. Accomplishing the same with a 
> naive integer encoded binary representation for N-vertex shapes would be 
> costly. 
> {{ComponentTree}} already provides an efficient in memory structure for 
> quickly computing spatial relations over Shape types based on a binary tree 
> of tessellated triangles provided by the {{Tessellator}}. Furthermore, this 
> tessellation is already computed at index time. If we create an on-disk 
> representation of {{ComponentTree}} 's binary tree of tessellated triangles 
> and use this as the doc value {{binaryValue}} format we will be able to 
> efficiently compute spatial relations with this binary representation and 
> achieve the same facet/aggregation result over shapes as we can with points 
> today (e.g., grid facets, centroid, area, etc).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10646) Add some comment on LevenshteinAutomata

2022-08-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576392#comment-17576392
 ] 

ASF subversion and git services commented on LUCENE-10646:
--

Commit 68c4cd8f68a574d1f89ed3e25bdbfe330a9508d6 in lucene's branch 
refs/heads/branch_9x from tang donghai
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=68c4cd8f68a ]

LUCENE-10646: Add some comment on LevenshteinAutomata (#1016)

* add Comment on Lev & pretty the toDot

* use auto generate scripts to add comment

* update checksum

* update checksum

* restore toDot

* add removeDeadStates in levAutomata

Co-authored-by: tangdonghai 

> Add some comment on LevenshteinAutomata
> ---
>
> Key: LUCENE-10646
> URL: https://issues.apache.org/jira/browse/LUCENE-10646
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/FSTs
>Affects Versions: 9.2
>Reporter: tang donghai
>Priority: Minor
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> After having a hard time reading the code, I may have understood the relevant 
> code of levenshteinautomata, except for the part of minErrors.
> I think this part of the code is too difficult to understand, full of magic 
> numbers. I will sort it out and then raise a PR to add some necessary 
> comments to this part of the code. So, others can better understand this part 
> of the code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10646) Add some comment on LevenshteinAutomata

2022-08-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576391#comment-17576391
 ] 

ASF subversion and git services commented on LUCENE-10646:
--

Commit b08e34722df87e86611ba1afe42cbe7dc052f6e4 in lucene's branch 
refs/heads/main from tang donghai
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b08e34722df ]

LUCENE-10646: Add some comment on LevenshteinAutomata (#1016)

* add Comment on Lev & pretty the toDot

* use auto generate scripts to add comment

* update checksum

* update checksum

* restore toDot

* add removeDeadStates in levAutomata

Co-authored-by: tangdonghai 

> Add some comment on LevenshteinAutomata
> ---
>
> Key: LUCENE-10646
> URL: https://issues.apache.org/jira/browse/LUCENE-10646
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/FSTs
>Affects Versions: 9.2
>Reporter: tang donghai
>Priority: Minor
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> After having a hard time reading the code, I may have understood the relevant 
> code of levenshteinautomata, except for the part of minErrors.
> I think this part of the code is too difficult to understand, full of magic 
> numbers. I will sort it out and then raise a PR to add some necessary 
> comments to this part of the code. So, others can better understand this part 
> of the code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10673) Spatial3d fails constructing a legal bounding box

2022-08-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575016#comment-17575016
 ] 

ASF subversion and git services commented on LUCENE-10673:
--

Commit 17e8a42e1aa5c5c8328b2e675605a731adeea201 in lucene's branch 
refs/heads/branch_9x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=17e8a42e1aa ]

LUCENE-10673: Improve check of equality for latitudes for spatial3d 
GeoBoundingBox (#1056)



> Spatial3d fails constructing a legal bounding box
> -
>
> Key: LUCENE-10673
> URL: https://issues.apache.org/jira/browse/LUCENE-10673
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial3d
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The issue can be reproduced with the following test:
> {code}
>   @Test
>   public void testBBoxLatDegenerate() {
>   double minX = Geo3DUtil.fromDegrees(-180.0);
>   double maxX = Geo3DUtil.fromDegrees(-174.3758381903);
>   double minY = Geo3DUtil.fromDegrees(89.9765306711);
>   double maxY = Geo3DUtil.fromDegrees(89.9794643372);
>   assertNotNull(GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE, maxY, 
> minY, minX, maxX));
>   }
> {code}
> this currently fails with the following error:
> {code}
> Cannot determine sidedness because check point is on plane.
> java.lang.IllegalArgumentException: Cannot determine sidedness because check 
> point is on plane.
>   at 
> __randomizedtesting.SeedInfo.seed([F8095E23398C1BA6:396A213B6362092D]:0)
>   at 
> org.apache.lucene.spatial3d.geom.SidedPlane.(SidedPlane.java:137)
>   at 
> org.apache.lucene.spatial3d.geom.GeoRectangle.(GeoRectangle.java:149)
>   at 
> org.apache.lucene.spatial3d.geom.GeoBBoxFactory.makeGeoBBox(GeoBBoxFactory.java:134)
>   at 
> org.apache.lucene.spatial3d.geom.GeoAreaFactory.makeGeoArea(GeoAreaFactory.java:43)
>   at 
> org.apache.lucene.spatial3d.geom.TestGeoBBox.testBBoxLonDegenerate(TestGeoBBox.java:538)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10673) Spatial3d fails constructing a legal bounding box

2022-08-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575015#comment-17575015
 ] 

ASF subversion and git services commented on LUCENE-10673:
--

Commit bd0718f0716ad3f30e0d79b352cd678df249f550 in lucene's branch 
refs/heads/main from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=bd0718f0716 ]

LUCENE-10673: Improve check of equality for latitudes for spatial3d 
GeoBoundingBox (#1056)



> Spatial3d fails constructing a legal bounding box
> -
>
> Key: LUCENE-10673
> URL: https://issues.apache.org/jira/browse/LUCENE-10673
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/spatial3d
>Reporter: Ignacio Vera
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The issue can be reproduced with the following test:
> {code}
>   @Test
>   public void testBBoxLatDegenerate() {
>   double minX = Geo3DUtil.fromDegrees(-180.0);
>   double maxX = Geo3DUtil.fromDegrees(-174.3758381903);
>   double minY = Geo3DUtil.fromDegrees(89.9765306711);
>   double maxY = Geo3DUtil.fromDegrees(89.9794643372);
>   assertNotNull(GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE, maxY, 
> minY, minX, maxX));
>   }
> {code}
> this currently fails with the following error:
> {code}
> Cannot determine sidedness because check point is on plane.
> java.lang.IllegalArgumentException: Cannot determine sidedness because check 
> point is on plane.
>   at 
> __randomizedtesting.SeedInfo.seed([F8095E23398C1BA6:396A213B6362092D]:0)
>   at 
> org.apache.lucene.spatial3d.geom.SidedPlane.(SidedPlane.java:137)
>   at 
> org.apache.lucene.spatial3d.geom.GeoRectangle.(GeoRectangle.java:149)
>   at 
> org.apache.lucene.spatial3d.geom.GeoBBoxFactory.makeGeoBBox(GeoBBoxFactory.java:134)
>   at 
> org.apache.lucene.spatial3d.geom.GeoAreaFactory.makeGeoArea(GeoAreaFactory.java:43)
>   at 
> org.apache.lucene.spatial3d.geom.TestGeoBBox.testBBoxLonDegenerate(TestGeoBBox.java:538)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10627) Using ByteBuffersDataInput reduce memory copy on compressing data

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573859#comment-17573859
 ] 

ASF subversion and git services commented on LUCENE-10627:
--

Commit 2b75fe6d2005785e5214364a0563fdcba5d66c50 in lucene's branch 
refs/heads/branch_9x from luyuncheng
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2b75fe6d200 ]

LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data 
(#987)


> Using ByteBuffersDataInput reduce memory copy on compressing data
> -
>
> Key: LUCENE-10627
> URL: https://issues.apache.org/jira/browse/LUCENE-10627
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/store
>Reporter: LuYunCheng
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Code: [https://github.com/apache/lucene/pull/987]
> I see When Lucene Do flush and merge store fields, need many memory copies:
> {code:java}
> Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms 
> elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable  
> [0x7f17718db000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
>     at 
> org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682)
>  {code}
> When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many 
> memory copies:
> With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}:
>  # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
> compress
>  # compressor copy dict and data into one block buffer
>  # do compress
>  # copy compressed data out
> With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}:
>  # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
> compress
>  # do compress
>  # copy compressed data out
>  
> I think we can use -CompositeByteBuf- to reduce temp memory copies:
>  # we do not have to *bufferedDocs.toArrayCopy* when just need continues 
> content for chunk compress
>  
> I write a simple mini benchamrk in test code ([link 
> |https://github.com/apache/lucene/blob/5a406a5c483c7fadaf0e8a5f06732c79ad174d11/lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestCompressingStoredFieldsFormat.java#L353]):
> *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin 
> elapse:5391ms , New elapse:5297ms
> *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin 
> elapse:{*}115ms{*}, New elapse:{*}12ms{*}
>  
> And I run runStoredFieldsBenchmark with doc_limit=-1:
> shows:
> ||Msec to index||BEST_SPEED ||BEST_COMPRESSION||
> |Baseline|318877.00|606288.00|
> |Candidate|314442.00|604719.00|
>  
> --{-}UPDATE{-}--
>  
>  I try to *reuse ByteBuffersDataInput* to reduce memory copy because it can 
> get from ByteBuffersDataOutput.toDataInput.  and it could reduce this 
> complexity ([PR|https://github.com/apache/lucene/pull/987])
> BUT i am not sure whether can change Compressor interface compress input 
> param from byte[] to ByteBuffersDataInput. If change this interface 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/Compressor.java#L35],
>  it increased the backport code 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L274],
>  however if we change the interface with ByteBuffersDataInput, we can 
> optimize memory copy into different compress algorithm code.
> Also, i found we can do 

[jira] [Commented] (LUCENE-10627) Using ByteBuffersDataInput reduce memory copy on compressing data

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573851#comment-17573851
 ] 

ASF subversion and git services commented on LUCENE-10627:
--

Commit 34154736c6ed241d7d9d0c6f4a0e6419936490b7 in lucene's branch 
refs/heads/main from luyuncheng
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=34154736c6e ]

LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data 
(#987)



> Using ByteBuffersDataInput reduce memory copy on compressing data
> -
>
> Key: LUCENE-10627
> URL: https://issues.apache.org/jira/browse/LUCENE-10627
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/store
>Reporter: LuYunCheng
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Code: [https://github.com/apache/lucene/pull/987]
> I see When Lucene Do flush and merge store fields, need many memory copies:
> {code:java}
> Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms 
> elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable  
> [0x7f17718db000]
>    java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624)
>     at 
> org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682)
>  {code}
> When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many 
> memory copies:
> With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}:
>  # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
> compress
>  # compressor copy dict and data into one block buffer
>  # do compress
>  # copy compressed data out
> With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}:
>  # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk 
> compress
>  # do compress
>  # copy compressed data out
>  
> I think we can use -CompositeByteBuf- to reduce temp memory copies:
>  # we do not have to *bufferedDocs.toArrayCopy* when just need continues 
> content for chunk compress
>  
> I write a simple mini benchamrk in test code ([link 
> |https://github.com/apache/lucene/blob/5a406a5c483c7fadaf0e8a5f06732c79ad174d11/lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestCompressingStoredFieldsFormat.java#L353]):
> *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin 
> elapse:5391ms , New elapse:5297ms
> *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin 
> elapse:{*}115ms{*}, New elapse:{*}12ms{*}
>  
> And I run runStoredFieldsBenchmark with doc_limit=-1:
> shows:
> ||Msec to index||BEST_SPEED ||BEST_COMPRESSION||
> |Baseline|318877.00|606288.00|
> |Candidate|314442.00|604719.00|
>  
> --{-}UPDATE{-}--
>  
>  I try to *reuse ByteBuffersDataInput* to reduce memory copy because it can 
> get from ByteBuffersDataOutput.toDataInput.  and it could reduce this 
> complexity ([PR|https://github.com/apache/lucene/pull/987])
> BUT i am not sure whether can change Compressor interface compress input 
> param from byte[] to ByteBuffersDataInput. If change this interface 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/Compressor.java#L35],
>  it increased the backport code 
> [like|https://github.com/apache/lucene/blob/382962f22df3ee3af3fb538b877c98d61a622ddb/lucene/core/src/java/org/apache/lucene/codecs/compressing/CompressionMode.java#L274],
>  however if we change the interface with ByteBuffersDataInput, we can 
> optimize memory copy into different compress algorithm code.
> Also, i found we can do more 

[jira] [Commented] (LUCENE-10648) Fix TestAssertingPointsFormat.testWithExceptions failure

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573741#comment-17573741
 ] 

ASF subversion and git services commented on LUCENE-10648:
--

Commit 5dd8e9bdc5ae72fc726a98a64bfce5119c77b558 in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5dd8e9bdc5a ]

LUCENE-10216: Use MergeScheduler and MergePolicy to run 
addIndexes(CodecReader[]) merges. (#1051)

Use merge policy and merge scheduler to run addIndexes merges.

This is a back port of the following commits from main:
 * LUCENE-10216: Use MergeScheduler and MergePolicy to run 
addIndexes(CodecReader[]) merges. (#633)
 * LUCENE-10648: Fix failures in TestAssertingPointsFormat.testWithExceptions 
(#1012)


> Fix TestAssertingPointsFormat.testWithExceptions failure
> 
>
> Key: LUCENE-10648
> URL: https://issues.apache.org/jira/browse/LUCENE-10648
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Vigya Sharma
>Priority: Major
> Fix For: 10.0 (main)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We are seeing build failures due to 
> TestAssertingPointsFormat.testWithExceptions. I am able to repro this on my 
> box with the random seed. Tracking the issue here.
> Sample Failing Build: 
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-main/6057/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10216) Add concurrency to addIndexes(CodecReader…) API

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573740#comment-17573740
 ] 

ASF subversion and git services commented on LUCENE-10216:
--

Commit 5dd8e9bdc5ae72fc726a98a64bfce5119c77b558 in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5dd8e9bdc5a ]

LUCENE-10216: Use MergeScheduler and MergePolicy to run 
addIndexes(CodecReader[]) merges. (#1051)

Use merge policy and merge scheduler to run addIndexes merges.

This is a back port of the following commits from main:
 * LUCENE-10216: Use MergeScheduler and MergePolicy to run 
addIndexes(CodecReader[]) merges. (#633)
 * LUCENE-10648: Fix failures in TestAssertingPointsFormat.testWithExceptions 
(#1012)


> Add concurrency to addIndexes(CodecReader…) API
> ---
>
> Key: LUCENE-10216
> URL: https://issues.apache.org/jira/browse/LUCENE-10216
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Vigya Sharma
>Priority: Major
> Fix For: main
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> I work at Amazon Product Search, and we use Lucene to power search for the 
> e-commerce platform. I’m working on a project that involves applying 
> metadata+ETL transforms and indexing documents on n different _indexing_ 
> boxes, combining them into a single index on a separate _reducer_ box, and 
> making it available for queries on m different _search_ boxes (replicas). 
> Segments are asynchronously copied from indexers to reducers to searchers as 
> they become available for the next layer to consume.
> I am using the addIndexes API to combine multiple indexes into one on the 
> reducer boxes. Since we also have taxonomy data, we need to remap facet field 
> ordinals, which means I need to use the {{addIndexes(CodecReader…)}} version 
> of this API. The API leverages {{SegmentMerger.merge()}} to create segments 
> with new ordinal values while also merging all provided segments in the 
> process.
> _This is however a blocking call that runs in a single thread._ Until we have 
> written segments with new ordinal values, we cannot copy them to searcher 
> boxes, which increases the time to make documents available for search.
> I was playing around with the API by creating multiple concurrent merges, 
> each with only a single reader, creating a concurrently running 1:1 
> conversion from old segments to new ones (with new ordinal values). We follow 
> this up with non-blocking background merges. This lets us copy the segments 
> to searchers and replicas as soon as they are available, and later replace 
> them with merged segments as background jobs complete. On the Amazon dataset 
> I profiled, this gave us around 2.5 to 3x improvement in addIndexes() time. 
> Each call was given about 5 readers to add on average.
> This might be useful add to Lucene. We could create another {{addIndexes()}} 
> API with a {{boolean}} flag for concurrency, that internally submits multiple 
> merge jobs (each with a single reader) to the {{ConcurrentMergeScheduler}}, 
> and waits for them to complete before returning.
> While this is doable from outside Lucene by using your thread pool, starting 
> multiple addIndexes() calls and waiting for them to complete, I felt it needs 
> some understanding of what addIndexes does, why you need to wait on the merge 
> and why it makes sense to pass a single reader in the addIndexes API.
> Out of box support in Lucene could simplify this for folks a similar use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573728#comment-17573728
 ] 

ASF subversion and git services commented on LUCENE-10629:
--

Commit 18f839bbf408abe8816e0647a06a062f9086fdce in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=18f839bbf40 ]

LUCENE-10629: Fix NullPointerException.

I hit a NPE while running tests. `Weight#scorer` may return `null`, but not
`Scorer#iterator`.


> Add fastMatchQuery param to MatchingFacetSetCounts
> --
>
> Key: LUCENE-10629
> URL: https://issues.apache.org/jira/browse/LUCENE-10629
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some facet counters, like {{RangeFacetCounts}}, allow the user to pass in a 
> {{fastMatchQuery}} parameter in order to quickly and efficiently filter out 
> documents in the passed in match set. We should create this same parameter in 
> {{MatchingFacetSetCounts}} as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts

2022-08-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573729#comment-17573729
 ] 

ASF subversion and git services commented on LUCENE-10629:
--

Commit 04e4f317cb210158dd10c68ac2b970a688c9ae2c in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=04e4f317cb2 ]

LUCENE-10629: Fix NullPointerException.

I hit a NPE while running tests. `Weight#scorer` may return `null`, but not
`Scorer#iterator`.


> Add fastMatchQuery param to MatchingFacetSetCounts
> --
>
> Key: LUCENE-10629
> URL: https://issues.apache.org/jira/browse/LUCENE-10629
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some facet counters, like {{RangeFacetCounts}}, allow the user to pass in a 
> {{fastMatchQuery}} parameter in order to quickly and efficiently filter out 
> documents in the passed in match set. We should create this same parameter in 
> {{MatchingFacetSetCounts}} as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts

2022-07-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573341#comment-17573341
 ] 

ASF subversion and git services commented on LUCENE-10629:
--

Commit d192cc0de245cff1cdc0e9f9b52c146da7815241 in lucene's branch 
refs/heads/branch_9x from Shai Erera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d192cc0de24 ]

[LUCENE-10629]: Add fast match query support to FacetSets (#1015) (#1055)



> Add fastMatchQuery param to MatchingFacetSetCounts
> --
>
> Key: LUCENE-10629
> URL: https://issues.apache.org/jira/browse/LUCENE-10629
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Some facet counters, like {{RangeFacetCounts}}, allow the user to pass in a 
> {{fastMatchQuery}} parameter in order to quickly and efficiently filter out 
> documents in the passed in match set. We should create this same parameter in 
> {{MatchingFacetSetCounts}} as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10629) Add fastMatchQuery param to MatchingFacetSetCounts

2022-07-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573338#comment-17573338
 ] 

ASF subversion and git services commented on LUCENE-10629:
--

Commit 7ac75135b9f9d17a8d68af9d7c05544e766e7cf7 in lucene's branch 
refs/heads/main from Shai Erera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7ac75135b9f ]

[LUCENE-10629]: Add fast match query support to FacetSets (#1015)



> Add fastMatchQuery param to MatchingFacetSetCounts
> --
>
> Key: LUCENE-10629
> URL: https://issues.apache.org/jira/browse/LUCENE-10629
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Some facet counters, like {{RangeFacetCounts}}, allow the user to pass in a 
> {{fastMatchQuery}} parameter in order to quickly and efficiently filter out 
> documents in the passed in match set. We should create this same parameter in 
> {{MatchingFacetSetCounts}} as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10669) The build should be more helpful when generated resources are touched

2022-07-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573290#comment-17573290
 ] 

ASF subversion and git services commented on LUCENE-10669:
--

Commit 3e207d362296a170fedebc88a061a4665d3e9b92 in lucene's branch 
refs/heads/branch_9x from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=3e207d36229 ]

LUCENE-10669: The build should be more helpful when generated resources are 
touched (#1053)



> The build should be more helpful when generated resources are touched
> -
>
> Key: LUCENE-10669
> URL: https://issues.apache.org/jira/browse/LUCENE-10669
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 10.0 (main)
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As per discussion at [https://github.com/apache/lucene/pull/1016,] it'd be 
> good if a build failure could point at the sources and generated files of the 
> task for which checksums are mismatched (signaling either modified templates 
> or accidentally modified generated files).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10669) The build should be more helpful when generated resources are touched

2022-07-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573289#comment-17573289
 ] 

ASF subversion and git services commented on LUCENE-10669:
--

Commit f93e52e5bb1503bc7ca175e157f0f6c96dafd383 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f93e52e5bb1 ]

LUCENE-10669: The build should be more helpful when generated resources are 
touched (#1053)



> The build should be more helpful when generated resources are touched
> -
>
> Key: LUCENE-10669
> URL: https://issues.apache.org/jira/browse/LUCENE-10669
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 10.0 (main)
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As per discussion at [https://github.com/apache/lucene/pull/1016,] it'd be 
> good if a build failure could point at the sources and generated files of the 
> task for which checksums are mismatched (signaling either modified templates 
> or accidentally modified generated files).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10504) KnnGraphTester should use KnnVectorQuery

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573155#comment-17573155
 ] 

ASF subversion and git services commented on LUCENE-10504:
--

Commit 2cb0e26075559e4ce38d2fa9765bcccaa187ce0d in lucene's branch 
refs/heads/branch_9x from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2cb0e260755 ]

LUCENE-10504: KnnGraphTester to use KnnVectorQuery (#796)

* LUCENE-10504: KnnGraphTester to use KnnVectorQuery

> KnnGraphTester should use KnnVectorQuery
> 
>
> Key: LUCENE-10504
> URL: https://issues.apache.org/jira/browse/LUCENE-10504
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Assignee: Michael Sokolov
>Priority: Major
> Fix For: 10.0 (main)
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> to get a more realistic picture, and to track developments in the query 
> implementation, the tester should use that rather than implementing its own 
> per-segment search and merging logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10663) KnnVectorQuery explain incorrect when multiple segments

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573154#comment-17573154
 ] 

ASF subversion and git services commented on LUCENE-10663:
--

Commit 1559de836ce347a8fd8e5ffbbb51fda14f8c16cf in lucene's branch 
refs/heads/branch_9x from Shiming Li
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1559de836ce ]

LUCENE-10663: Fix KnnVectorQuery explain with multiple segments (#1050)

If there are multiple segments. KnnVectorQuery explain has a bug in locating
the doc ID. This is because the doc ID in explain is the docBase without the
segment.  In KnnVectorQuery.DocAndScoreQuery docs docid is increased in each
segment of the docBase. So, in the 'DocAndScoreQuery.explain', needs to be
added with the segment's docBase. 

Co-authored-by: Julie Tibshirani 

> KnnVectorQuery explain incorrect when multiple segments
> ---
>
> Key: LUCENE-10663
> URL: https://issues.apache.org/jira/browse/LUCENE-10663
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 9.0, 9.1, 9.2
>Reporter: Shiming Li
>Priority: Major
> Fix For: 9.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If there are multiple segments. KnnVectorQuery explain has a bug in locating 
> docid. 
> This is because the docid in explain, which is the docBase without the 
> segment. 
> In KnnVectorQuery.DocAndScoreQuery docs docid is increased in each segment of 
> the docBase. 
> The two docid are not in the same dimension.
> So, in the 'DocAndScoreQuery.explain', needs to be added with the segment's 
> docBase. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10559) Add preFilter/postFilter options to KnnGraphTester

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573156#comment-17573156
 ] 

ASF subversion and git services commented on LUCENE-10559:
--

Commit 33d5ab96f266447b228833baee3489b12bdc3a68 in lucene's branch 
refs/heads/branch_9x from Kaival Parikh
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=33d5ab96f26 ]

LUCENE-10559: Add Prefilter Option to KnnGraphTester (#932)

Added a `prefilter` and `filterSelectivity` argument to KnnGraphTester to be
able to compare pre and post-filtering benchmarks.

`filterSelectivity` expresses the selectivity of a filter as proportion of
passing docs that are randomly selected. We store these in a FixedBitSet and
use this to calculate true KNN as well as in HNSW search.

In case of post-filter, we over-select results as `topK / filterSelectivity` to
get final hits close to actual requested `topK`. For pre-filter, we wrap the
FixedBitSet in a query and pass it as prefilter argument to KnnVectorQuery.

> Add preFilter/postFilter options to KnnGraphTester
> --
>
> Key: LUCENE-10559
> URL: https://issues.apache.org/jira/browse/LUCENE-10559
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
> Fix For: 9.4
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> We want to be able to test the efficacy of pre-filtering in KnnVectorQuery: 
> if you (say) want the top K nearest neighbors subject to a constraint Q, are 
> you better off over-selecting (say 2K) top hits and *then* filtering 
> (post-filtering), or incorporating the filtering into the query 
> (pre-filtering). How does it depend on the selectivity of the filter?
> I think we can get a reasonable testbed by generating a uniform random filter 
> with some selectivity (that is consistent and repeatable). Possibly we'd also 
> want to try filters that are correlated with index order, but it seems they'd 
> be unlikely to be correlated with vector values in a way that the graph 
> structure would notice, so random is a pretty good starting point for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573147#comment-17573147
 ] 

ASF subversion and git services commented on LUCENE-10633:
--

Commit 7c9d3cd6ff6c6af153ee756a983dc323133f33c7 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7c9d3cd6ff6 ]

LUCENE-10633: Fix handling of missing values in reverse sorts.


> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573148#comment-17573148
 ] 

ASF subversion and git services commented on LUCENE-10633:
--

Commit 6366cf2e7ad37dd4f14bb5b7facd3477124073fc in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6366cf2e7ad ]

LUCENE-10633: Fix handling of missing values in reverse sorts.


> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10559) Add preFilter/postFilter options to KnnGraphTester

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573103#comment-17573103
 ] 

ASF subversion and git services commented on LUCENE-10559:
--

Commit 1ad28a3136fceb248ef55f2a09e77e7797bef51e in lucene's branch 
refs/heads/main from Kaival Parikh
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1ad28a3136f ]

LUCENE-10559: Add Prefilter Option to KnnGraphTester (#932)

Added a `prefilter` and `filterSelectivity` argument to KnnGraphTester to be
able to compare pre and post-filtering benchmarks.

`filterSelectivity` expresses the selectivity of a filter as proportion of
passing docs that are randomly selected. We store these in a FixedBitSet and
use this to calculate true KNN as well as in HNSW search.

In case of post-filter, we over-select results as `topK / filterSelectivity` to
get final hits close to actual requested `topK`. For pre-filter, we wrap the
FixedBitSet in a query and pass it as prefilter argument to KnnVectorQuery.

> Add preFilter/postFilter options to KnnGraphTester
> --
>
> Key: LUCENE-10559
> URL: https://issues.apache.org/jira/browse/LUCENE-10559
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> We want to be able to test the efficacy of pre-filtering in KnnVectorQuery: 
> if you (say) want the top K nearest neighbors subject to a constraint Q, are 
> you better off over-selecting (say 2K) top hits and *then* filtering 
> (post-filtering), or incorporating the filtering into the query 
> (pre-filtering). How does it depend on the selectivity of the filter?
> I think we can get a reasonable testbed by generating a uniform random filter 
> with some selectivity (that is consistent and repeatable). Possibly we'd also 
> want to try filters that are correlated with index order, but it seems they'd 
> be unlikely to be correlated with vector values in a way that the graph 
> structure would notice, so random is a pretty good starting point for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572880#comment-17572880
 ] 

ASF subversion and git services commented on LUCENE-10633:
--

Commit 261db55806cd352520e406d5e5a684ce45afa9f4 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=261db55806c ]

LUCENE-10633: Dynamic pruning for sorting on SORTED(_SET) fields. (#1023)

This commit enables dynamic pruning for queries sorted on SORTED(_SET) fields
by using postings to filter competitive documents.

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10633) Dynamic pruning for queries sorted by SORTED(_SET) field

2022-07-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572873#comment-17572873
 ] 

ASF subversion and git services commented on LUCENE-10633:
--

Commit eb7b7791ba615dfb52d25fb7e542aab539be293e in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=eb7b7791ba6 ]

LUCENE-10633: Dynamic pruning for sorting on SORTED(_SET) fields. (#1023)

This commit enables dynamic pruning for queries sorted on SORTED(_SET) fields
by using postings to filter competitive documents.

> Dynamic pruning for queries sorted by SORTED(_SET) field
> 
>
> Key: LUCENE-10633
> URL: https://issues.apache.org/jira/browse/LUCENE-10633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> LUCENE-9280 introduced the ability to dynamically prune non-competitive hits 
> when sorting by a numeric field, by leveraging the points index to skip 
> documents that do not compare better than the top of the priority queue 
> maintained by the field comparator.
> However queries sorted by a SORTED(_SET) field still look at all hits, which 
> is disappointing. Could we leverage the terms index to skip hits?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10663) KnnVectorQuery explain incorrect when multiple segments

2022-07-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572575#comment-17572575
 ] 

ASF subversion and git services commented on LUCENE-10663:
--

Commit bb752c774ca0264a02a60e9b8568addb7b6722d3 in lucene's branch 
refs/heads/main from Shiming Li
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=bb752c774ca ]

LUCENE-10663: Fix KnnVectorQuery explain with multiple segments (#1050)

If there are multiple segments. KnnVectorQuery explain has a bug in locating
the doc ID. This is because the doc ID in explain is the docBase without the
segment.  In KnnVectorQuery.DocAndScoreQuery docs docid is increased in each
segment of the docBase. So, in the 'DocAndScoreQuery.explain', needs to be
added with the segment's docBase. 

Co-authored-by: Julie Tibshirani 

> KnnVectorQuery explain incorrect when multiple segments
> ---
>
> Key: LUCENE-10663
> URL: https://issues.apache.org/jira/browse/LUCENE-10663
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 9.0, 9.1, 9.2
>Reporter: Shiming Li
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If there are multiple segments. KnnVectorQuery explain has a bug in locating 
> docid. 
> This is because the docid in explain, which is the docBase without the 
> segment. 
> In KnnVectorQuery.DocAndScoreQuery docs docid is increased in each segment of 
> the docBase. 
> The two docid are not in the same dimension.
> So, in the 'DocAndScoreQuery.explain', needs to be added with the segment's 
> docBase. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10661) Reduce memory copy in BytesStore

2022-07-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571961#comment-17571961
 ] 

ASF subversion and git services commented on LUCENE-10661:
--

Commit 0ff987562aefa1dfc5d86e4e5908f6121aa50956 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=0ff987562ae ]

LUCENE-10661: Move CHANGES entry to 9.4.


> Reduce memory copy in BytesStore
> 
>
> Key: LUCENE-10661
> URL: https://issues.apache.org/jira/browse/LUCENE-10661
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: LuYunCheng
>Priority: Major
> Fix For: 9.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is derived from 
> [LUCENE-10627](https://github.com/apache/lucene/pull/987) AND 
> [LUCENE-10657](https://github.com/apache/lucene/pull/1034)
> The abstract method copyBytes in DataOutput have to copy from input to a 
> copyBuffer and then write into BytesStore.blocks, which is called in FST 
> initialization read from metaIn. 
> Although, this copy bytes only a few bytes (in the testscase only 3-10 
> bytes), i think we can save this memory copy, just save the 
> DataOutput.copyBytes to create new copyBuffer with 16384 bytes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10661) Reduce memory copy in BytesStore

2022-07-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571960#comment-17571960
 ] 

ASF subversion and git services commented on LUCENE-10661:
--

Commit e1a91aef51ac8fd55c7f4cb26ae471b53d4879da in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e1a91aef51a ]

LUCENE-10661: Move CHANGES entry to 9.4.


> Reduce memory copy in BytesStore
> 
>
> Key: LUCENE-10661
> URL: https://issues.apache.org/jira/browse/LUCENE-10661
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: LuYunCheng
>Priority: Major
> Fix For: 9.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is derived from 
> [LUCENE-10627](https://github.com/apache/lucene/pull/987) AND 
> [LUCENE-10657](https://github.com/apache/lucene/pull/1034)
> The abstract method copyBytes in DataOutput have to copy from input to a 
> copyBuffer and then write into BytesStore.blocks, which is called in FST 
> initialization read from metaIn. 
> Although, this copy bytes only a few bytes (in the testscase only 3-10 
> bytes), i think we can save this memory copy, just save the 
> DataOutput.copyBytes to create new copyBuffer with 16384 bytes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10661) Reduce memory copy in BytesStore

2022-07-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571959#comment-17571959
 ] 

ASF subversion and git services commented on LUCENE-10661:
--

Commit 169af9c6511576486bad249f702ce63b2d1ed3ab in lucene's branch 
refs/heads/branch_9x from luyuncheng
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=169af9c6511 ]

LUCENE-10661: Reduce memory copy in BytesStore (#1047)



> Reduce memory copy in BytesStore
> 
>
> Key: LUCENE-10661
> URL: https://issues.apache.org/jira/browse/LUCENE-10661
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: LuYunCheng
>Priority: Major
> Fix For: 9.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is derived from 
> [LUCENE-10627](https://github.com/apache/lucene/pull/987) AND 
> [LUCENE-10657](https://github.com/apache/lucene/pull/1034)
> The abstract method copyBytes in DataOutput have to copy from input to a 
> copyBuffer and then write into BytesStore.blocks, which is called in FST 
> initialization read from metaIn. 
> Although, this copy bytes only a few bytes (in the testscase only 3-10 
> bytes), i think we can save this memory copy, just save the 
> DataOutput.copyBytes to create new copyBuffer with 16384 bytes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10661) Reduce memory copy in BytesStore

2022-07-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571958#comment-17571958
 ] 

ASF subversion and git services commented on LUCENE-10661:
--

Commit 107747f359162f12b8f1561b2ea7da071232ab00 in lucene's branch 
refs/heads/main from luyuncheng
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=107747f3591 ]

LUCENE-10661: Reduce memory copy in BytesStore (#1047)



> Reduce memory copy in BytesStore
> 
>
> Key: LUCENE-10661
> URL: https://issues.apache.org/jira/browse/LUCENE-10661
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: LuYunCheng
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is derived from 
> [LUCENE-10627](https://github.com/apache/lucene/pull/987) AND 
> [LUCENE-10657](https://github.com/apache/lucene/pull/1034)
> The abstract method copyBytes in DataOutput have to copy from input to a 
> copyBuffer and then write into BytesStore.blocks, which is called in FST 
> initialization read from metaIn. 
> Although, this copy bytes only a few bytes (in the testscase only 3-10 
> bytes), i think we can save this memory copy, just save the 
> DataOutput.copyBytes to create new copyBuffer with 16384 bytes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher

2022-07-26 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571367#comment-17571367
 ] 

ASF subversion and git services commented on LUCENE-10151:
--

Commit be81cd79346e869da94d9db89e1b863bfaabbd65 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=be81cd79346 ]

LUCENE-10151: Some fixes to query timeouts. (#996)

I noticed some minor bugs in the original PR #927 that this PR should fix:
 - When a timeout is set, we would no longer catch
   `CollectionTerminatedException`.
 - I added randomization to `LuceneTestCase` to randomly set a timeout, it
   would have caught the above bug.
 - Fixed visibility of `TimeLimitingBulkScorer`.

> Add timeout support to IndexSearcher
> 
>
> Key: LUCENE-10151
> URL: https://issues.apache.org/jira/browse/LUCENE-10151
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> I'd like to explore adding optional "timeout" capabilities to 
> {{IndexSearcher}}. This would enable users to (optionally) specify a maximum 
> time budget for search execution. If the search "times out", partial results 
> would be available.
> This idea originated on the dev list (thanks [~jpountz] for the suggestion). 
> Thread for reference: 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E]
>  
> A couple things to watch out for with this change:
>  # We want to make sure it's robust to a two-phase query evaluation scenario 
> where the "approximate" step matches a large number of candidates but the 
> "confirmation" step matches very few (or none). This is a particularly tricky 
> case.
>  # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is 
> {{GREATER_THAN_OR_EQUAL_TO}} if the query times out
>  # We want to make sure it plays nice with the {{LRUCache}} since it iterates 
> the query to pre-populate a {{BitSet}} when caching. That step shouldn't be 
> allowed to overrun the timeout. The proper way to handle this probably needs 
> some thought.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing

2022-07-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570929#comment-17570929
 ] 

ASF subversion and git services commented on LUCENE-10592:
--

Commit b15bcd11c333a96c043a3cc1e3498b8b09e7d6a2 in lucene's branch 
refs/heads/branch_9x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b15bcd11c33 ]

LUCENE-10592 Strengthen 
TestHnswGraph::testSortedAndUnsortedIndicesReturnSameResults

This test occasionally fails if knn search returns only 1 document
in the index, as we have an assertion that returned doc IDs from
sorted and unsorted index must be different.

This patch ensures that we have many documents in the index, so
that knn search always returns enough results.


> Should we build HNSW graph on the fly during indexing
> -
>
> Key: LUCENE-10592
> URL: https://issues.apache.org/jira/browse/LUCENE-10592
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently, when we index vectors for KnnVectorField, we buffer those vectors 
> in memory and on flush during a segment construction we build an HNSW graph.  
> As building an HNSW graph is very expensive, this makes flush operation take 
> a lot of time. This also makes overall indexing performance quite 
> unpredictable (as the number of flushes are defined by memory used, and the 
> presence of concurrent searches), e.g. some indexing operations return almost 
> instantly while others that trigger flush take a lot of time. 
> Building an HNSW graph on the fly as we index vectors allows to avoid this 
> problem, and spread a load of HNSW graph construction evenly during indexing.
> This will also supersede LUCENE-10194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing

2022-07-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570912#comment-17570912
 ] 

ASF subversion and git services commented on LUCENE-10592:
--

Commit 2efc204a390044b67bcfb85683d82a9ea2f852a2 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2efc204a390 ]

LUCENE-10592 Strengthen 
TestHnswGraph::testSortedAndUnsortedIndicesReturnSameResults

This test occasionally fails if knn search returns only 1 document
in the index, as we have an assertion that returned doc IDs from
sorted and unsorted index must be different.

This patch ensures that we have many documents in the index, so
that knn search always returns enough results.


> Should we build HNSW graph on the fly during indexing
> -
>
> Key: LUCENE-10592
> URL: https://issues.apache.org/jira/browse/LUCENE-10592
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
> Fix For: 9.4
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently, when we index vectors for KnnVectorField, we buffer those vectors 
> in memory and on flush during a segment construction we build an HNSW graph.  
> As building an HNSW graph is very expensive, this makes flush operation take 
> a lot of time. This also makes overall indexing performance quite 
> unpredictable (as the number of flushes are defined by memory used, and the 
> presence of concurrent searches), e.g. some indexing operations return almost 
> instantly while others that trigger flush take a lot of time. 
> Building an HNSW graph on the fly as we index vectors allows to avoid this 
> problem, and spread a load of HNSW graph construction evenly during indexing.
> This will also supersede LUCENE-10194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing

2022-07-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570112#comment-17570112
 ] 

ASF subversion and git services commented on LUCENE-10592:
--

Commit bd06cebfc2815bb508314ed8a4215e9da7f36de6 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=bd06cebfc28 ]

Add change log for LUCENE-10592


> Should we build HNSW graph on the fly during indexing
> -
>
> Key: LUCENE-10592
> URL: https://issues.apache.org/jira/browse/LUCENE-10592
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently, when we index vectors for KnnVectorField, we buffer those vectors 
> in memory and on flush during a segment construction we build an HNSW graph.  
> As building an HNSW graph is very expensive, this makes flush operation take 
> a lot of time. This also makes overall indexing performance quite 
> unpredictable (as the number of flushes are defined by memory used, and the 
> presence of concurrent searches), e.g. some indexing operations return almost 
> instantly while others that trigger flush take a lot of time. 
> Building an HNSW graph on the fly as we index vectors allows to avoid this 
> problem, and spread a load of HNSW graph construction evenly during indexing.
> This will also supersede LUCENE-10194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing

2022-07-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570098#comment-17570098
 ] 

ASF subversion and git services commented on LUCENE-10592:
--

Commit a65a41855c7f7e93a5852d8af34d37fa01e0972b in lucene's branch 
refs/heads/branch_9x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a65a41855c7 ]

LUCENE-10592 Build HNSW Graph on indexing  (#1043)

Currently, when indexing knn vectors, we buffer them in memory and
on flush during a segment construction we build an HNSW graph.
As building an HNSW graph is very expensive, this makes flush
operation take a lot of time. This also makes overall indexing
performance quite unpredictable – some indexing operations return
almost instantly while others that trigger flush take a lot of time.
This happens because flushes are unpredictable and trigged
by memory used, presence of concurrent searches etc.

Building an HNSW graph as we index documents avoid these problems,
as the load of HNSW graph construction is spread evenly during indexing.

Co-authored-by: Adrien Grand 

> Should we build HNSW graph on the fly during indexing
> -
>
> Key: LUCENE-10592
> URL: https://issues.apache.org/jira/browse/LUCENE-10592
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently, when we index vectors for KnnVectorField, we buffer those vectors 
> in memory and on flush during a segment construction we build an HNSW graph.  
> As building an HNSW graph is very expensive, this makes flush operation take 
> a lot of time. This also makes overall indexing performance quite 
> unpredictable (as the number of flushes are defined by memory used, and the 
> presence of concurrent searches), e.g. some indexing operations return almost 
> instantly while others that trigger flush take a lot of time. 
> Building an HNSW graph on the fly as we index vectors allows to avoid this 
> problem, and spread a load of HNSW graph construction evenly during indexing.
> This will also supersede LUCENE-10194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10592) Should we build HNSW graph on the fly during indexing

2022-07-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570081#comment-17570081
 ] 

ASF subversion and git services commented on LUCENE-10592:
--

Commit ba4bc0427146669ffd1c41fc0151db33e5a5be33 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ba4bc042714 ]

LUCENE-10592 Build HNSW Graph on indexing (#992)

Currently, when indexing knn vectors, we buffer them in memory and
on flush during a segment construction we build an HNSW graph.
As building an HNSW graph is very expensive, this makes flush
operation take a lot of time. This also makes overall indexing
performance quite unpredictable – some indexing operations return
almost instantly while others that trigger flush take a lot of time.
This happens because flushes are unpredictable and trigged
by memory used, presence of concurrent searches etc.

Building an HNSW graph as we index documents avoid these problems,
as the load of HNSW graph construction is spread evenly during indexing.

Co-authored-by: Adrien Grand 

> Should we build HNSW graph on the fly during indexing
> -
>
> Key: LUCENE-10592
> URL: https://issues.apache.org/jira/browse/LUCENE-10592
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Currently, when we index vectors for KnnVectorField, we buffer those vectors 
> in memory and on flush during a segment construction we build an HNSW graph.  
> As building an HNSW graph is very expensive, this makes flush operation take 
> a lot of time. This also makes overall indexing performance quite 
> unpredictable (as the number of flushes are defined by memory used, and the 
> presence of concurrent searches), e.g. some indexing operations return almost 
> instantly while others that trigger flush take a lot of time. 
> Building an HNSW graph on the fly as we index vectors allows to avoid this 
> problem, and spread a load of HNSW graph construction evenly during indexing.
> This will also supersede LUCENE-10194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10583) Deadlock with MMapDirectory while waitForMerges

2022-07-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569345#comment-17569345
 ] 

ASF subversion and git services commented on LUCENE-10583:
--

Commit 1884a8730a315e1e51e6ad0b43774e6714a3b9d1 in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1884a8730a3 ]

LUCENE-10583: Add docstring warning to not lock on Lucene objects (#963)

* add locking warning to docstring

* git tidy

> Deadlock with MMapDirectory while waitForMerges
> ---
>
> Key: LUCENE-10583
> URL: https://issues.apache.org/jira/browse/LUCENE-10583
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 8.11.1
> Environment: Java 17
> OS: Windows 2016
>Reporter: Thomas Hoffmann
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hello,
> a deadlock situation happened in our application. We are using MMapDirectory 
> on Windows 2016 and got the following stacktrace:
> {code:java}
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> elapsed=81248.18s tid=0x2860af10 nid=0x237c in Object.wait()  
> [0x413fc000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>     at java.lang.Object.wait(java.base@17.0.2/Native Method)
>     - waiting on 
>     at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at 
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236)
>     at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278)
>     at 
> com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723)
>     - locked <0x0006d5c00208> (a org.apache.lucene.store.MMapDirectory)
>     at 
> com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142)
> ...{code}
> All threads were waiting to lock <0x0006d5c00208> which got never 
> released.
> A lucene thread was also blocked, I dont know if this is relevant:
> {code:java}
> "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms 
> elapsed=3499.07s tid=0x459453e0 nid=0x1f8 waiting for monitor entry  
> [0x5da9e000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346)
>     - waiting to lock <0x0006d5c00208> (a 
> org.apache.lucene.store.MMapDirectory)
>     at 
> org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363)
>     at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248)
>     at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289)
>     at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:121)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130)
>     at 
> org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code}
> If looks like the merge operation never finished and released the lock.
> Is there any option to prevent this deadlock or how to investigate it further?
> A load-test didn't show this problem unfortunately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] [Commented] (LUCENE-10583) Deadlock with MMapDirectory while waitForMerges

2022-07-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569344#comment-17569344
 ] 

ASF subversion and git services commented on LUCENE-10583:
--

Commit 25a842d87198af7b930d890a93b63093d9ca93c3 in lucene's branch 
refs/heads/main from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=25a842d8719 ]

LUCENE-10583: Add docstring warning to not lock on Lucene objects (#963)

* add locking warning to docstring

* git tidy

> Deadlock with MMapDirectory while waitForMerges
> ---
>
> Key: LUCENE-10583
> URL: https://issues.apache.org/jira/browse/LUCENE-10583
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 8.11.1
> Environment: Java 17
> OS: Windows 2016
>Reporter: Thomas Hoffmann
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hello,
> a deadlock situation happened in our application. We are using MMapDirectory 
> on Windows 2016 and got the following stacktrace:
> {code:java}
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> "https-openssl-nio-443-exec-30" #166 daemon prio=5 os_prio=0 cpu=78703.13ms 
> elapsed=81248.18s tid=0x2860af10 nid=0x237c in Object.wait()  
> [0x413fc000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>     at java.lang.Object.wait(java.base@17.0.2/Native Method)
>     - waiting on 
>     at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4983)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at 
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2697)
>     - locked <0x0006ef1fc020> (a org.apache.lucene.index.IndexWriter)
>     at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1236)
>     at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1278)
>     at 
> com.speed4trade.ebs.module.search.SearchService.updateSearchIndex(SearchService.java:1723)
>     - locked <0x0006d5c00208> (a org.apache.lucene.store.MMapDirectory)
>     at 
> com.speed4trade.ebs.module.businessrelations.ticket.TicketChangedListener.postUpdate(TicketChangedListener.java:142)
> ...{code}
> All threads were waiting to lock <0x0006d5c00208> which got never 
> released.
> A lucene thread was also blocked, I dont know if this is relevant:
> {code:java}
> "Lucene Merge Thread #0" #18466 daemon prio=5 os_prio=0 cpu=15.63ms 
> elapsed=3499.07s tid=0x459453e0 nid=0x1f8 waiting for monitor entry  
> [0x5da9e000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.lucene.store.FSDirectory.deletePendingFiles(FSDirectory.java:346)
>     - waiting to lock <0x0006d5c00208> (a 
> org.apache.lucene.store.MMapDirectory)
>     at 
> org.apache.lucene.store.FSDirectory.maybeDeletePendingFiles(FSDirectory.java:363)
>     at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:248)
>     at 
> org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:44)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:289)
>     at 
> org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:43)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.(CompressingStoredFieldsWriter.java:121)
>     at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsWriter(CompressingStoredFieldsFormat.java:130)
>     at 
> org.apache.lucene.codecs.lucene87.Lucene87StoredFieldsFormat.fieldsWriter(Lucene87StoredFieldsFormat.java:141)
>     at 
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:227)
>     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
>     at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
>     at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
>     at 
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
>     at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684){code}
> If looks like the merge operation never finished and released the lock.
> Is there any option to prevent this deadlock or how to investigate it further?
> A load-test didn't show this problem unfortunately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568798#comment-17568798
 ] 

ASF subversion and git services commented on LUCENE-10480:
--

Commit 8ebb3305648aea8f551c2dd144d5a527b8982638 in lucene's branch 
refs/heads/branch_9x from Zach Chen
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8ebb3305648 ]

LUCENE-10480: (Backporting) Use BulkScorer to limit BMMScorer to only top-level 
disjunctions (#1037)



> Specialize 2-clauses disjunctions
> -
>
> Key: LUCENE-10480
> URL: https://issues.apache.org/jira/browse/LUCENE-10480
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its 
> invariants: one linked list for the current candidates, one priority queue of 
> scorers that are behind, another one for scorers that are ahead. All this 
> could be simplified in the 2-clauses case, which feels worth specializing for 
> as it's very common that end users enter queries that only have two terms?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10656) It is unnecessary that using `limit` to check boundary

2022-07-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568782#comment-17568782
 ] 

ASF subversion and git services commented on LUCENE-10656:
--

Commit 39e7597f6e83e10136323e5a67cbdf45a13c4f2b in lucene's branch 
refs/heads/main from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=39e7597f6e8 ]

LUCENE-10656: It is unnecessary that using `limit` to check boundary (#1027)



> It is unnecessary that using `limit` to check boundary
> --
>
> Key: LUCENE-10656
> URL: https://issues.apache.org/jira/browse/LUCENE-10656
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> follow-up discussion in [https://github.com/apache/lucene/pull/1021]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568781#comment-17568781
 ] 

ASF subversion and git services commented on LUCENE-10480:
--

Commit 28ce8abb5105dba5bc08b7f800f86f3741268bc9 in lucene's branch 
refs/heads/main from Zach Chen
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=28ce8abb510 ]

LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions 
(#1018)



> Specialize 2-clauses disjunctions
> -
>
> Key: LUCENE-10480
> URL: https://issues.apache.org/jira/browse/LUCENE-10480
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its 
> invariants: one linked list for the current candidates, one priority queue of 
> scorers that are behind, another one for scorers that are ahead. All this 
> could be simplified in the 2-clauses case, which feels worth specializing for 
> as it's very common that end users enter queries that only have two terms?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10653) Should BlockMaxMaxscoreScorer rebuild its heap in bulk?

2022-07-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568742#comment-17568742
 ] 

ASF subversion and git services commented on LUCENE-10653:
--

Commit 42729b46c48308405f0575aa41cc655f94b549f0 in lucene's branch 
refs/heads/branch_9x from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=42729b46c48 ]

LUCENE-10653: Heapify in BMMScorer (#1022)


> Should BlockMaxMaxscoreScorer rebuild its heap in bulk?
> ---
>
> Key: LUCENE-10653
> URL: https://issues.apache.org/jira/browse/LUCENE-10653
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> BMMScorer has to frequently rebuild its heap, and does do by clearing and 
> then iteratively calling {{{}add{}}}. It would be more efficient to heapify 
> in bulk. This is more academic than anything right now though since BMMScorer 
> is only used with two-clause disjunctions, so it's sort of a silly 
> optimization if it's not supporting a greater number of clauses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10653) Should BlockMaxMaxscoreScorer rebuild its heap in bulk?

2022-07-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568729#comment-17568729
 ] 

ASF subversion and git services commented on LUCENE-10653:
--

Commit 3d7d85f245381f84c46c766119695a8645cde2b8 in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=3d7d85f2453 ]

LUCENE-10653: Heapify in BMMScorer (#1022)



> Should BlockMaxMaxscoreScorer rebuild its heap in bulk?
> ---
>
> Key: LUCENE-10653
> URL: https://issues.apache.org/jira/browse/LUCENE-10653
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> BMMScorer has to frequently rebuild its heap, and does do by clearing and 
> then iteratively calling {{{}add{}}}. It would be more efficient to heapify 
> in bulk. This is more academic than anything right now though since BMMScorer 
> is only used with two-clause disjunctions, so it's sort of a silly 
> optimization if it's not supporting a greater number of clauses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10657) CopyBytes now saves one memory copy on ByteBuffersDataOutput

2022-07-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568399#comment-17568399
 ] 

ASF subversion and git services commented on LUCENE-10657:
--

Commit 7328ad2dafc2e78bf2950cb4bdd2c8785f31f7b9 in lucene's branch 
refs/heads/branch_9x from luyuncheng
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7328ad2dafc ]

LUCENE-10657: CopyBytes now saves one memory copy on ByteBuffersDataOutput 
(#1034)

Abstract method copyBytes need to copy from input to a buffer and then write 
into ByteBuffersDataOutput, i think there is unnecessary, we can override it, 
copy directly from input into output


> CopyBytes now saves one memory copy on ByteBuffersDataOutput
> 
>
> Key: LUCENE-10657
> URL: https://issues.apache.org/jira/browse/LUCENE-10657
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Reporter: LuYunCheng
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is derived from 
> [LUCENE-10627|[https://github.com/apache/lucene/pull/987]]
> Code: [https://github.com/apache/lucene/pull/1034] 
> The abstract method `copyBytes` in DataOutput have to copy from input to a 
> copyBuffer and then write into ByteBuffersDataOutput.blocks, i think there is 
> unnecessary, we can override it, copy directly from input into output.
> with override this method,
>  # Reduce memory copy in `Lucene90CompressingStoredFieldsWriter#copyOneDoc` 
> -> `bufferdDocs.copyBytes(DataInput input)`
>  # Reduce memory copy in `Lucene90CompoundFormat.writeCompoundFile` -> 
> `data.copyBytes` when input is `BufferedChecksumIndexinput` and output is 
> `ByteBuffersDataOutput`
>  # Reduce memory `IndexWriter#copySegmentAsIs` ->CopyFrom -> copyBytes
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10657) CopyBytes now saves one memory copy on ByteBuffersDataOutput

2022-07-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568394#comment-17568394
 ] 

ASF subversion and git services commented on LUCENE-10657:
--

Commit 11e7fe66182690ce518c85c50ffa4094366f3299 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=11e7fe66182 ]

LUCENE-10657: Move CHANGES entry to 9.3.


> CopyBytes now saves one memory copy on ByteBuffersDataOutput
> 
>
> Key: LUCENE-10657
> URL: https://issues.apache.org/jira/browse/LUCENE-10657
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Reporter: LuYunCheng
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is derived from 
> [LUCENE-10627|[https://github.com/apache/lucene/pull/987]]
> Code: [https://github.com/apache/lucene/pull/1034] 
> The abstract method `copyBytes` in DataOutput have to copy from input to a 
> copyBuffer and then write into ByteBuffersDataOutput.blocks, i think there is 
> unnecessary, we can override it, copy directly from input into output.
> with override this method,
>  # Reduce memory copy in `Lucene90CompressingStoredFieldsWriter#copyOneDoc` 
> -> `bufferdDocs.copyBytes(DataInput input)`
>  # Reduce memory copy in `Lucene90CompoundFormat.writeCompoundFile` -> 
> `data.copyBytes` when input is `BufferedChecksumIndexinput` and output is 
> `ByteBuffersDataOutput`
>  # Reduce memory `IndexWriter#copySegmentAsIs` ->CopyFrom -> copyBytes
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10657) CopyBytes now saves one memory copy on ByteBuffersDataOutput

2022-07-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568393#comment-17568393
 ] 

ASF subversion and git services commented on LUCENE-10657:
--

Commit e5bf76b84304b0a85951e43eaf887bd46c82fad4 in lucene's branch 
refs/heads/main from luyuncheng
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e5bf76b8430 ]

LUCENE-10657: CopyBytes now saves one memory copy on ByteBuffersDataOutput 
(#1034)

Abstract method copyBytes need to copy from input to a buffer and then write 
into ByteBuffersDataOutput, i think there is unnecessary, we can override it, 
copy directly from input into output

> CopyBytes now saves one memory copy on ByteBuffersDataOutput
> 
>
> Key: LUCENE-10657
> URL: https://issues.apache.org/jira/browse/LUCENE-10657
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Reporter: LuYunCheng
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is derived from 
> [LUCENE-10627|[https://github.com/apache/lucene/pull/987]]
> Code: [https://github.com/apache/lucene/pull/1034] 
> The abstract method `copyBytes` in DataOutput have to copy from input to a 
> copyBuffer and then write into ByteBuffersDataOutput.blocks, i think there is 
> unnecessary, we can override it, copy directly from input into output.
> with override this method,
>  # Reduce memory copy in `Lucene90CompressingStoredFieldsWriter#copyOneDoc` 
> -> `bufferdDocs.copyBytes(DataInput input)`
>  # Reduce memory copy in `Lucene90CompoundFormat.writeCompoundFile` -> 
> `data.copyBytes` when input is `BufferedChecksumIndexinput` and output is 
> `ByteBuffersDataOutput`
>  # Reduce memory `IndexWriter#copySegmentAsIs` ->CopyFrom -> copyBytes
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-07-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568321#comment-17568321
 ] 

ASF subversion and git services commented on LUCENE-10557:
--

Commit 781edf442b200145e4fc3fc59de554b7a0c0b57b in lucene's branch 
refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=781edf442b2 ]

LUCENE-10557: Refine issue label texts (#1036)



> Migrate to GitHub issue from Jira
> -
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, 
> image-2022-06-29-13-36-57-365.png, screenshot-1.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * (/) Get a consensus about the migration among committers
>  * (/) Choose issues that should be moved to GitHub - We'll migrate all 
> issues towards an atomic switch to GitHub if no major technical obstacles 
> show up.
>  ** Discussion thread 
> [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
>  ** -Conclusion for now: We don't migrate any issues. Only new issues should 
> be opened on GitHub.-
>  ** Write a prototype migration script - the decision could be made on that. 
> Things to consider:
>  *** version numbers - labels or milestones?
>  *** add a comment/ prepend a link to the source Jira issue on github side,
>  *** add a comment/ prepend a link on the jira side to the new issue on 
> github side (for people who access jira from blogs, mailing list archives and 
> other sources that will have stale links),
>  *** convert cross-issue automatic links in comments/ descriptions (as 
> suggested by Robert),
>  *** strategy to deal with sub-issues (hierarchies),
>  *** maybe prefix (or postfix) the issue title on github side with the 
> original LUCENE-XYZ key so that it is easier to search for a particular issue 
> there?
>  *** how to deal with user IDs (author, reporter, commenters)? Do they have 
> to be github users? Will information about people not registered on github be 
> lost?
>  *** create an extra mapping file of old-issue-new-issue URLs for any 
> potential future uses.
>  *** what to do with issue numbers in git/svn commits? These could be 
> rewritten but it'd change the entire git history tree - I don't think this is 
> practical, while doable.
> * Prepare a complete migration tool
> ** See https://github.com/apache/lucene-jira-archive/issues/5 
> * Build the convention for issue label/milestone management
>  ** See [https://github.com/apache/lucene-jira-archive/issues/6]
>  ** Do some experiments on a sandbox repository 
> [https://github.com/mocobeta/sandbox-lucene-10557]
>  ** Make documentation for metadata (label/milestone) management 
>  * (/) Enable Github issue on the lucene's repository
>  ** Raise an issue on INFRA
>  ** (Create an issue-only private repository for sensitive issues if it's 
> needed and allowed)
>  ** Set a mail hook to 
> [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to 
> the general mail group name)
>  * Set a schedule for migration
>  ** See [https://github.com/apache/lucene-jira-archive/issues/7]
>  ** Give some time to committers to play around with issues/labels/milestones 
> before the actual migration
>  ** Make an announcement on the mail lists
>  ** Show some text messages when opening a new Jira issue (in issue template?)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10649) Failure in TestDemoParallelLeafReader.testRandomMultipleSchemaGensSameField

2022-07-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567972#comment-17567972
 ] 

ASF subversion and git services commented on LUCENE-10649:
--

Commit 41f7618535448e14365f26aebdf6db443a2d0cea in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=41f76185354 ]

LUCENE-10649: Fix failures in TestDemoParallelLeafReader (#1025)



> Failure in TestDemoParallelLeafReader.testRandomMultipleSchemaGensSameField
> ---
>
> Key: LUCENE-10649
> URL: https://issues.apache.org/jira/browse/LUCENE-10649
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Vigya Sharma
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Failing Build Link: 
> [https://jenkins.thetaphi.de/job/Lucene-main-Linux/35617/testReport/junit/org.apache.lucene.index/TestDemoParallelLeafReader/testRandomMultipleSchemaGensSameField/]
> Repro:
> {code:java}
> gradlew test --tests 
> TestDemoParallelLeafReader.testRandomMultipleSchemaGensSameField 
> -Dtests.seed=A7496D7D3957981A -Dtests.multiplier=3 -Dtests.locale=sr-Latn-BA 
> -Dtests.timezone=Etc/GMT-7 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 
> {code}
> Error:
> {code:java}
> java.lang.AssertionError: expected:<103> but was:<2147483647>
>     at 
> __randomizedtesting.SeedInfo.seed([A7496D7D3957981A:F71866BCCEA1C903]:0)
>     at org.junit.Assert.fail(Assert.java:89)
>     at org.junit.Assert.failNotEquals(Assert.java:835)
>     at org.junit.Assert.assertEquals(Assert.java:647)
>     at org.junit.Assert.assertEquals(Assert.java:633)
>     at 
> org.apache.lucene.index.TestDemoParallelLeafReader.testRandomMultipleSchemaGensSameField(TestDemoParallelLeafReader.java:1347)
>     at 
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10649) Failure in TestDemoParallelLeafReader.testRandomMultipleSchemaGensSameField

2022-07-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567968#comment-17567968
 ] 

ASF subversion and git services commented on LUCENE-10649:
--

Commit 30a7c52e6c7ed4ffd67a64a13c3f3b25b34853d5 in lucene's branch 
refs/heads/main from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=30a7c52e6c7 ]

LUCENE-10649: Fix failures in TestDemoParallelLeafReader (#1025)



> Failure in TestDemoParallelLeafReader.testRandomMultipleSchemaGensSameField
> ---
>
> Key: LUCENE-10649
> URL: https://issues.apache.org/jira/browse/LUCENE-10649
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Vigya Sharma
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Failing Build Link: 
> [https://jenkins.thetaphi.de/job/Lucene-main-Linux/35617/testReport/junit/org.apache.lucene.index/TestDemoParallelLeafReader/testRandomMultipleSchemaGensSameField/]
> Repro:
> {code:java}
> gradlew test --tests 
> TestDemoParallelLeafReader.testRandomMultipleSchemaGensSameField 
> -Dtests.seed=A7496D7D3957981A -Dtests.multiplier=3 -Dtests.locale=sr-Latn-BA 
> -Dtests.timezone=Etc/GMT-7 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 
> {code}
> Error:
> {code:java}
> java.lang.AssertionError: expected:<103> but was:<2147483647>
>     at 
> __randomizedtesting.SeedInfo.seed([A7496D7D3957981A:F71866BCCEA1C903]:0)
>     at org.junit.Assert.fail(Assert.java:89)
>     at org.junit.Assert.failNotEquals(Assert.java:835)
>     at org.junit.Assert.assertEquals(Assert.java:647)
>     at org.junit.Assert.assertEquals(Assert.java:633)
>     at 
> org.apache.lucene.index.TestDemoParallelLeafReader.testRandomMultipleSchemaGensSameField(TestDemoParallelLeafReader.java:1347)
>     at 
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-07-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567825#comment-17567825
 ] 

ASF subversion and git services commented on LUCENE-10557:
--

Commit 8938e6a3fab56e1037feb86db398d71e269dcf34 in lucene's branch 
refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8938e6a3fab ]

LUCENE-10557: Add GitHub issue templates (#1024)



> Migrate to GitHub issue from Jira
> -
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: Screen Shot 2022-06-29 at 11.02.35 AM.png, 
> image-2022-06-29-13-36-57-365.png, screenshot-1.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * (/) Get a consensus about the migration among committers
>  * (/) Choose issues that should be moved to GitHub - We'll migrate all 
> issues towards an atomic switch to GitHub if no major technical obstacles 
> show up.
>  ** Discussion thread 
> [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
>  ** -Conclusion for now: We don't migrate any issues. Only new issues should 
> be opened on GitHub.-
>  ** Write a prototype migration script - the decision could be made on that. 
> Things to consider:
>  *** version numbers - labels or milestones?
>  *** add a comment/ prepend a link to the source Jira issue on github side,
>  *** add a comment/ prepend a link on the jira side to the new issue on 
> github side (for people who access jira from blogs, mailing list archives and 
> other sources that will have stale links),
>  *** convert cross-issue automatic links in comments/ descriptions (as 
> suggested by Robert),
>  *** strategy to deal with sub-issues (hierarchies),
>  *** maybe prefix (or postfix) the issue title on github side with the 
> original LUCENE-XYZ key so that it is easier to search for a particular issue 
> there?
>  *** how to deal with user IDs (author, reporter, commenters)? Do they have 
> to be github users? Will information about people not registered on github be 
> lost?
>  *** create an extra mapping file of old-issue-new-issue URLs for any 
> potential future uses.
>  *** what to do with issue numbers in git/svn commits? These could be 
> rewritten but it'd change the entire git history tree - I don't think this is 
> practical, while doable.
> * Prepare a complete migration tool
> ** See https://github.com/apache/lucene-jira-archive/issues/5 
> * Build the convention for issue label/milestone management
>  ** See [https://github.com/apache/lucene-jira-archive/issues/6]
>  ** Do some experiments on a sandbox repository 
> [https://github.com/mocobeta/sandbox-lucene-10557]
>  ** Make documentation for metadata (label/milestone) management 
>  * (/) Enable Github issue on the lucene's repository
>  ** Raise an issue on INFRA
>  ** (Create an issue-only private repository for sensitive issues if it's 
> needed and allowed)
>  ** Set a mail hook to 
> [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to 
> the general mail group name)
>  * Set a schedule for migration
>  ** See [https://github.com/apache/lucene-jira-archive/issues/7]
>  ** Give some time to committers to play around with issues/labels/milestones 
> before the actual migration
>  ** Make an announcement on the mail lists
>  ** Show some text messages when opening a new Jira issue (in issue template?)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher

2022-07-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567328#comment-17567328
 ] 

ASF subversion and git services commented on LUCENE-10151:
--

Commit aa082b46f669f71cd0deb2e409c62be863f17091 in lucene's branch 
refs/heads/branch_9x from Deepika0510
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=aa082b46f66 ]

LUCENE-10151: Adding Timeout Support to IndexSearcher  (#927)

Authored-by: Deepika Sharma 

> Add timeout support to IndexSearcher
> 
>
> Key: LUCENE-10151
> URL: https://issues.apache.org/jira/browse/LUCENE-10151
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> I'd like to explore adding optional "timeout" capabilities to 
> {{IndexSearcher}}. This would enable users to (optionally) specify a maximum 
> time budget for search execution. If the search "times out", partial results 
> would be available.
> This idea originated on the dev list (thanks [~jpountz] for the suggestion). 
> Thread for reference: 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E]
>  
> A couple things to watch out for with this change:
>  # We want to make sure it's robust to a two-phase query evaluation scenario 
> where the "approximate" step matches a large number of candidates but the 
> "confirmation" step matches very few (or none). This is a particularly tricky 
> case.
>  # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is 
> {{GREATER_THAN_OR_EQUAL_TO}} if the query times out
>  # We want to make sure it plays nice with the {{LRUCache}} since it iterates 
> the query to pre-populate a {{BitSet}} when caching. That step shouldn't be 
> allowed to overrun the timeout. The proper way to handle this probably needs 
> some thought.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher

2022-07-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567329#comment-17567329
 ] 

ASF subversion and git services commented on LUCENE-10151:
--

Commit 5cd6eda8caba5a93eeaf60215885ec3171707449 in lucene's branch 
refs/heads/branch_9x from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5cd6eda8cab ]

CHANGES entry for LUCENE-10151


> Add timeout support to IndexSearcher
> 
>
> Key: LUCENE-10151
> URL: https://issues.apache.org/jira/browse/LUCENE-10151
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> I'd like to explore adding optional "timeout" capabilities to 
> {{IndexSearcher}}. This would enable users to (optionally) specify a maximum 
> time budget for search execution. If the search "times out", partial results 
> would be available.
> This idea originated on the dev list (thanks [~jpountz] for the suggestion). 
> Thread for reference: 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E]
>  
> A couple things to watch out for with this change:
>  # We want to make sure it's robust to a two-phase query evaluation scenario 
> where the "approximate" step matches a large number of candidates but the 
> "confirmation" step matches very few (or none). This is a particularly tricky 
> case.
>  # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is 
> {{GREATER_THAN_OR_EQUAL_TO}} if the query times out
>  # We want to make sure it plays nice with the {{LRUCache}} since it iterates 
> the query to pre-populate a {{BitSet}} when caching. That step shouldn't be 
> allowed to overrun the timeout. The proper way to handle this probably needs 
> some thought.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566614#comment-17566614
 ] 

ASF subversion and git services commented on LUCENE-10603:
--

Commit 9b185b99c429290c80bac5be0bcc2398f58b58db in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9b185b99c42 ]

LUCENE-10603: Remove SSDV#NO_MORE_ORDS definition (#1021)



> Improve iteration of ords for SortedSetDocValues
> 
>
> Key: LUCENE-10603
> URL: https://issues.apache.org/jira/browse/LUCENE-10603
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Assignee: Lu Xugang
>Priority: Trivial
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> After SortedSetDocValues#docValueCount added since Lucene 9.2, should we 
> refactor the implementation of ords iterations using docValueCount instead of 
> NO_MORE_ORDS?
> Similar how SortedNumericDocValues did
> From 
> {code:java}
> for (long ord = values.nextOrd();ord != SortedSetDocValues.NO_MORE_ORDS; ord 
> = values.nextOrd()) {
> }{code}
> to
> {code:java}
> for (int i = 0; i < values.docValueCount(); i++) {
>   long ord = values.nextOrd();
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10648) Fix TestAssertingPointsFormat.testWithExceptions failure

2022-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566444#comment-17566444
 ] 

ASF subversion and git services commented on LUCENE-10648:
--

Commit ca7917472b4d7518b71bbf74498a3c6fac259e11 in lucene's branch 
refs/heads/main from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ca7917472b4 ]

LUCENE-10648: Fix failures in TestAssertingPointsFormat.testWithExceptions 
(#1012)

* Fix failures in TestAssertingPointsFormat.testWithExceptions

* remove redundant finally block

* tidy

* remove TODO as it is done now

> Fix TestAssertingPointsFormat.testWithExceptions failure
> 
>
> Key: LUCENE-10648
> URL: https://issues.apache.org/jira/browse/LUCENE-10648
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Vigya Sharma
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We are seeing build failures due to 
> TestAssertingPointsFormat.testWithExceptions. I am able to repro this on my 
> box with the random seed. Tracking the issue here.
> Sample Failing Build: 
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-main/6057/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10523) facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter

2022-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566443#comment-17566443
 ] 

ASF subversion and git services commented on LUCENE-10523:
--

Commit f014c97aa26cb269e63a82c538918a2fa37bb4a0 in lucene's branch 
refs/heads/branch_9x from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f014c97aa26 ]

LUCENE-10523: factor out UnifiedHighlighter.newFieldHighlighter() method (#821)

(cherry picked from commit 56462b5f9628ba1d465fa005e5106c55494a2011)


> facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter
> ---
>
> Key: LUCENE-10523
> URL: https://issues.apache.org/jira/browse/LUCENE-10523
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If the {{UnifiedHighlighter}} had a protected {{newFieldHighlighter}} method 
> then less {{getFieldHighlighter}} code would need to be duplicated if one 
> wanted to use a custom {{FieldHighlighter}}.
> Proposed change: https://github.com/apache/lucene/pull/821
> A possible usage scenario:
>  * e.g. via Solr's {{HTMLStripFieldUpdateProcessorFactory}} any HTML markup 
> could be stripped at document ingestion time but this may not suit all use 
> cases
>  * e.g. via Solr's {{hl.encoder=html}} parameter any HTML markup could be 
> escaped at document search time when returning highlighting snippets but this 
> may not suit all use cases
>  * extension illustration: https://github.com/apache/solr/pull/811
>  ** i.e. at document search time remove any HTML markup prior to highlight 
> snippet extraction



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10523) facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter

2022-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566441#comment-17566441
 ] 

ASF subversion and git services commented on LUCENE-10523:
--

Commit 56462b5f9628ba1d465fa005e5106c55494a2011 in lucene's branch 
refs/heads/main from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=56462b5f962 ]

LUCENE-10523: factor out UnifiedHighlighter.newFieldHighlighter() method (#821)



> facilitate UnifiedHighlighter extension w.r.t. FieldHighlighter
> ---
>
> Key: LUCENE-10523
> URL: https://issues.apache.org/jira/browse/LUCENE-10523
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If the {{UnifiedHighlighter}} had a protected {{newFieldHighlighter}} method 
> then less {{getFieldHighlighter}} code would need to be duplicated if one 
> wanted to use a custom {{FieldHighlighter}}.
> Proposed change: https://github.com/apache/lucene/pull/821
> A possible usage scenario:
>  * e.g. via Solr's {{HTMLStripFieldUpdateProcessorFactory}} any HTML markup 
> could be stripped at document ingestion time but this may not suit all use 
> cases
>  * e.g. via Solr's {{hl.encoder=html}} parameter any HTML markup could be 
> escaped at document search time when returning highlighting snippets but this 
> may not suit all use cases
>  * extension illustration: https://github.com/apache/solr/pull/811
>  ** i.e. at document search time remove any HTML markup prior to highlight 
> snippet extraction



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10619) Optimize the writeBytes in TermsHashPerField

2022-07-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565873#comment-17565873
 ] 

ASF subversion and git services commented on LUCENE-10619:
--

Commit 9f9786122b487f992119f45c5d8a51a8d9d4a6f8 in lucene's branch 
refs/heads/branch_9x from tang donghai
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9f9786122b4 ]

LUCENE-10619: Optimize the writeBytes in TermsHashPerField (#966)



> Optimize the writeBytes in TermsHashPerField
> 
>
> Key: LUCENE-10619
> URL: https://issues.apache.org/jira/browse/LUCENE-10619
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.2
>Reporter: tang donghai
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Because we don't know the length of slice, writeBytes will always write byte 
> one after another instead of writing a block of bytes.
> May be we could return both offset and length in ByteBlockPool#allocSlice?
> 1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits.
> 2. slice size is at most 200, so it could fit in 8 bits.
> So we could put them together into an int  offset | length
> There are only two places where this function is used,the cost of change it 
> is relatively small.
> When allocSlice could return the offset and length of new Slice, we could 
> change writeBytes like below
> {code:java}
> // write block of bytes each time
> while(remaining > 0 ) {
>int offsetAndLength = allocSlice(bytes, offset);
>length = min(remaining, (offsetAndLength & 0xff) - 1);
>offset = offsetAndLength >> 8;
>System.arraycopy(src, srcPos, bytePool.buffer, offset, length);
>remaining -= length;
>offset+= (length + 1);
> }
> {code}
> If it could work, I'd like to raise a pr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10619) Optimize the writeBytes in TermsHashPerField

2022-07-12 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565872#comment-17565872
 ] 

ASF subversion and git services commented on LUCENE-10619:
--

Commit d7c2def019b8c1318d3c37a7065569e8d1a1af1f in lucene's branch 
refs/heads/main from tang donghai
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d7c2def019b ]

LUCENE-10619: Optimize the writeBytes in TermsHashPerField (#966)



> Optimize the writeBytes in TermsHashPerField
> 
>
> Key: LUCENE-10619
> URL: https://issues.apache.org/jira/browse/LUCENE-10619
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.2
>Reporter: tang donghai
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Because we don't know the length of slice, writeBytes will always write byte 
> one after another instead of writing a block of bytes.
> May be we could return both offset and length in ByteBlockPool#allocSlice?
> 1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits.
> 2. slice size is at most 200, so it could fit in 8 bits.
> So we could put them together into an int  offset | length
> There are only two places where this function is used,the cost of change it 
> is relatively small.
> When allocSlice could return the offset and length of new Slice, we could 
> change writeBytes like below
> {code:java}
> // write block of bytes each time
> while(remaining > 0 ) {
>int offsetAndLength = allocSlice(bytes, offset);
>length = min(remaining, (offsetAndLength & 0xff) - 1);
>offset = offsetAndLength >> 8;
>System.arraycopy(src, srcPos, bytePool.buffer, offset, length);
>remaining -= length;
>offset+= (length + 1);
> }
> {code}
> If it could work, I'd like to raise a pr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565081#comment-17565081
 ] 

ASF subversion and git services commented on LUCENE-10614:
--

Commit d6dbe4374a5229b827613b85066f3a4da91d5f27 in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d6dbe4374a5 ]

Move LUCENE-10614 CHANGES entry to 10.0 and add MIGRATE entry


> Properly support getTopChildren in RangeFacetCounts
> ---
>
> Key: LUCENE-10614
> URL: https://issues.apache.org/jira/browse/LUCENE-10614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 10.0 (main)
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing 
> {{getTopChildren}}. Instead of returning "top" ranges, it returns all 
> user-provided ranges in the order the user specified them when instantiating. 
> This is probably more useful functionality, but it would be nice to support 
> {{getTopChildren}} as well.
> LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that 
> lands, we can replace the current implementation of {{getTopChildren}} with 
> an actual "top children" implementation and direct users to 
> {{getAllChildren}} if they want to maintain the current behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10614) Properly support getTopChildren in RangeFacetCounts

2022-07-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565075#comment-17565075
 ] 

ASF subversion and git services commented on LUCENE-10614:
--

Commit 5ef7e5025def61cf20442806486c8f6102ebcdc4 in lucene's branch 
refs/heads/main from Yuting Gan
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5ef7e5025de ]

LUCENE-10614: Properly support getTopChildren in RangeFacetCounts (#974)



> Properly support getTopChildren in RangeFacetCounts
> ---
>
> Key: LUCENE-10614
> URL: https://issues.apache.org/jira/browse/LUCENE-10614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 10.0 (main)
>Reporter: Greg Miller
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> As mentioned in LUCENE-10538, {{RangeFacetCounts}} is not implementing 
> {{getTopChildren}}. Instead of returning "top" ranges, it returns all 
> user-provided ranges in the order the user specified them when instantiating. 
> This is probably more useful functionality, but it would be nice to support 
> {{getTopChildren}} as well.
> LUCENE-10550 is introducing the concept of {{getAllChildren}}, so once that 
> lands, we can replace the current implementation of {{getTopChildren}} with 
> an actual "top children" implementation and direct users to 
> {{getAllChildren}} if they want to maintain the current behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10647) Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler

2022-07-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564900#comment-17564900
 ] 

ASF subversion and git services commented on LUCENE-10647:
--

Commit 190cfbc65c66be807d6c61291500a6fdcf9a975e in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=190cfbc65c6 ]

LUCENE-10647: Fix TestMergeSchedulerExternal failures (#1011)

Ensure mergeScheduler.sync() gets called before we rollback the writer.

> Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler
> --
>
> Key: LUCENE-10647
> URL: https://issues.apache.org/jira/browse/LUCENE-10647
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Vigya Sharma
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recent builds are intermittently failing on 
> TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler. Example:
> https://jenkins.thetaphi.de/job/Lucene-main-Linux/35576/testReport/junit/org.apache.lucene/TestMergeSchedulerExternal/testSubclassConcurrentMergeScheduler/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10647) Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler

2022-07-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564896#comment-17564896
 ] 

ASF subversion and git services commented on LUCENE-10647:
--

Commit 128869d63aef6a448af991fa2768113a560a8dbc in lucene's branch 
refs/heads/main from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=128869d63ae ]

LUCENE-10647: Fix TestMergeSchedulerExternal failures (#1011)

Ensure mergeScheduler.sync() gets called before we rollback the writer.

> Failure in TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler
> --
>
> Key: LUCENE-10647
> URL: https://issues.apache.org/jira/browse/LUCENE-10647
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Vigya Sharma
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recent builds are intermittently failing on 
> TestMergeSchedulerExternal.testSubclassConcurrentMergeScheduler. Example:
> https://jenkins.thetaphi.de/job/Lucene-main-Linux/35576/testReport/junit/org.apache.lucene/TestMergeSchedulerExternal/testSubclassConcurrentMergeScheduler/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564326#comment-17564326
 ] 

ASF subversion and git services commented on LUCENE-10480:
--

Commit 090cbc50dd7e5659494149f470378ab7f6a90cf1 in lucene's branch 
refs/heads/branch_9x from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=090cbc50dd7 ]

LUCENE-10480: Move scoring from advance to TwoPhaseIterator#matches to improve 
disjunction within conjunction (#1006) (#1008)

(cherry picked from commit da8143bfa38cd5fadae4b4712b9e639e79016021)

> Specialize 2-clauses disjunctions
> -
>
> Key: LUCENE-10480
> URL: https://issues.apache.org/jira/browse/LUCENE-10480
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its 
> invariants: one linked list for the current candidates, one priority queue of 
> scorers that are behind, another one for scorers that are ahead. All this 
> could be simplified in the 2-clauses case, which feels worth specializing for 
> as it's very common that end users enter queries that only have two terms?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10563) Unable to Tessellate polygon

2022-07-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563996#comment-17563996
 ] 

ASF subversion and git services commented on LUCENE-10563:
--

Commit 8926732a32823be168267fe2ed39eb804d1030f1 in lucene's branch 
refs/heads/branch_9x from Nhat Nguyen
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8926732a328 ]

LUCENE-10563: Fix CHANGES list (#1009)

The CHANGES of 10.0 were accidentally merged into 9x CHANGES in b7231bb.

> Unable to Tessellate polygon
> 
>
> Key: LUCENE-10563
> URL: https://issues.apache.org/jira/browse/LUCENE-10563
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 9.1
>Reporter: Yixun Xu
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 9.3
>
> Attachments: polygon-1.json, polygon-2.json, polygon-3.json
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Following up to LUCENE-10470, I found some more polygons that cause 
> {{Tessellator.tessellate}} to throw "Unable to Tessellate shape", which are 
> not covered by the fix to LUCENE-10470. I attached the geojson of 3 failing 
> shapes that I got, and this is the 
> [branch|https://github.com/apache/lucene/compare/main...yixunx:yx/reproduce-tessellator-error?expand=1#diff-5e8e8052af8b8618e7e4325b7d69def4d562a356acbfea3e983198327c7c8d18R17-R19]
>  I am testing on that demonstrates the tessellation failures. 
>  
> [^polygon-1.json]
> [^polygon-2.json]
> [^polygon-3.json]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-07-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563901#comment-17563901
 ] 

ASF subversion and git services commented on LUCENE-10603:
--

Commit c46e1f03901ebaac9e010862acbb0cf460d807ef in lucene's branch 
refs/heads/branch_9x from Stefan Vodita
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c46e1f03901 ]

LUCENE-10603: Stop using SortedSetDocValues.NO_MORE_ORDS in tests (#1004)



> Improve iteration of ords for SortedSetDocValues
> 
>
> Key: LUCENE-10603
> URL: https://issues.apache.org/jira/browse/LUCENE-10603
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Assignee: Lu Xugang
>Priority: Trivial
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> After SortedSetDocValues#docValueCount added since Lucene 9.2, should we 
> refactor the implementation of ords iterations using docValueCount instead of 
> NO_MORE_ORDS?
> Similar how SortedNumericDocValues did
> From 
> {code:java}
> for (long ord = values.nextOrd();ord != SortedSetDocValues.NO_MORE_ORDS; ord 
> = values.nextOrd()) {
> }{code}
> to
> {code:java}
> for (int i = 0; i < values.docValueCount(); i++) {
>   long ord = values.nextOrd();
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-07-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563876#comment-17563876
 ] 

ASF subversion and git services commented on LUCENE-10603:
--

Commit dd4e8b82d711b8f665e91f0d74f159ef1e63939f in lucene's branch 
refs/heads/main from Stefan Vodita
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=dd4e8b82d71 ]

LUCENE-10603: Stop using SortedSetDocValues.NO_MORE_ORDS in tests (#1004)



> Improve iteration of ords for SortedSetDocValues
> 
>
> Key: LUCENE-10603
> URL: https://issues.apache.org/jira/browse/LUCENE-10603
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Assignee: Lu Xugang
>Priority: Trivial
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> After SortedSetDocValues#docValueCount added since Lucene 9.2, should we 
> refactor the implementation of ords iterations using docValueCount instead of 
> NO_MORE_ORDS?
> Similar how SortedNumericDocValues did
> From 
> {code:java}
> for (long ord = values.nextOrd();ord != SortedSetDocValues.NO_MORE_ORDS; ord 
> = values.nextOrd()) {
> }{code}
> to
> {code:java}
> for (int i = 0; i < values.docValueCount(); i++) {
>   long ord = values.nextOrd();
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563627#comment-17563627
 ] 

ASF subversion and git services commented on LUCENE-10480:
--

Commit da8143bfa38cd5fadae4b4712b9e639e79016021 in lucene's branch 
refs/heads/main from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=da8143bfa38 ]

LUCENE-10480: Move scoring from advance to TwoPhaseIterator#matches to improve 
disjunction within conjunction (#1006)



> Specialize 2-clauses disjunctions
> -
>
> Key: LUCENE-10480
> URL: https://issues.apache.org/jira/browse/LUCENE-10480
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its 
> invariants: one linked list for the current candidates, one priority queue of 
> scorers that are behind, another one for scorers that are ahead. All this 
> could be simplified in the 2-clauses case, which feels worth specializing for 
> as it's very common that end users enter queries that only have two terms?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10216) Add concurrency to addIndexes(CodecReader…) API

2022-07-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563440#comment-17563440
 ] 

ASF subversion and git services commented on LUCENE-10216:
--

Commit 698f40ad51af0c42b0a4a8321ab89968e8d0860b in lucene's branch 
refs/heads/main from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=698f40ad51a ]

LUCENE-10216: Use MergeScheduler and MergePolicy to run 
addIndexes(CodecReader[]) merges. (#633)

* Use merge policy and merge scheduler to run addIndexes merges

* wrapped reader does not see deletes - debug

* Partially fixed tests in TestAddIndexes

* Use writer object to invoke addIndexes merge

* Use merge object info

* Add javadocs for new methods

* TestAddIndexes passing

* verify field info schemas upfront from incoming readers

* rename flag to track pooled readers

* Keep addIndexes API transactional

* Maintain transactionality - register segments with iw after all merges 
complete

* fix checkstyle

* PR comments

* Fix pendingDocs - numDocs mismatch bug

* Tests with 1-1 merges and partial merge failures

* variable renaming and better comments

* add test for partial merge failures. change tests to use 1-1 findmerges

* abort pending merges gracefully

* test null and empty merge specs

* test interim files are deleted

* test with empty readers

* test cascading merges triggered

* remove nocommits

* gradle check errors

* remove unused line

* remove printf

* spotless apply

* update TestIndexWriterOnDiskFull to accept mergeException from failing 
addIndexes calls

* return singleton reader mergespec in NoMergePolicy

* rethrow exceptions seen in merge threads on failure

* spotless apply

* update test to new exception type thrown

* spotlessApply

* test for maxDoc limit in IndexWriter

* spotlessApply

* Use DocValuesIterator instead of DocValuesFieldExistsQuery for counting soft 
deletes

* spotless apply

* change exception message for closed IW

* remove non-essential comments

* update api doc string

* doc string update

* spotless

* Changes file entry

* simplify findMerges API, add 1-1 merges to MockRandomMergePolicy

* update merge policies to new api

* remove unused imports

* spotless apply

* move changes entry to end of list

* fix testAddIndicesWithSoftDeletes

* test with 1-1 merge policy always enabled

* please spotcheck

* tidy

* test - never use 1-1 merge policy

* use 1-1 merge policy randomly

* Remove concurrent addIndexes findMerges from MockRandomMergePolicy

* Bug Fix: RuntimeException in addIndexes

Aborted pending merges were slipping through the merge exception check in
API, and getting caught later in the RuntimeException check.

* tidy

* Rebase on main. Move changes to 10.0

* Synchronize IW.AddIndexesMergeSource on outer class IW object

* tidy

> Add concurrency to addIndexes(CodecReader…) API
> ---
>
> Key: LUCENE-10216
> URL: https://issues.apache.org/jira/browse/LUCENE-10216
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Vigya Sharma
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> I work at Amazon Product Search, and we use Lucene to power search for the 
> e-commerce platform. I’m working on a project that involves applying 
> metadata+ETL transforms and indexing documents on n different _indexing_ 
> boxes, combining them into a single index on a separate _reducer_ box, and 
> making it available for queries on m different _search_ boxes (replicas). 
> Segments are asynchronously copied from indexers to reducers to searchers as 
> they become available for the next layer to consume.
> I am using the addIndexes API to combine multiple indexes into one on the 
> reducer boxes. Since we also have taxonomy data, we need to remap facet field 
> ordinals, which means I need to use the {{addIndexes(CodecReader…)}} version 
> of this API. The API leverages {{SegmentMerger.merge()}} to create segments 
> with new ordinal values while also merging all provided segments in the 
> process.
> _This is however a blocking call that runs in a single thread._ Until we have 
> written segments with new ordinal values, we cannot copy them to searcher 
> boxes, which increases the time to make documents available for search.
> I was playing around with the API by creating multiple concurrent merges, 
> each with only a single reader, creating a concurrently running 1:1 
> conversion from old segments to new ones (with new ordinal values). We follow 
> this up with non-blocking background merges. This lets us copy the segments 
> to searchers and replicas as soon as they are available, and later replace 
> them with merged segments as background jobs complete. On the Amazon dataset 
> I profiled, this gave us around 2.5 to 3x 

[jira] [Commented] (LUCENE-10626) Hunspell: add tools to aid dictionary editing: analysis introspection, stem expansion and stem/flag suggestion

2022-07-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562821#comment-17562821
 ] 

ASF subversion and git services commented on LUCENE-10626:
--

Commit d537013e70872015364c745e5f320727efc034b7 in lucene's branch 
refs/heads/main from Peter Gromov
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d537013e708 ]

LUCENE-10626: Hunspell: add tools to aid dictionary editing: analysis 
introspection, stem expansion and stem/flag suggestion (#975)



> Hunspell: add tools to aid dictionary editing: analysis introspection, stem 
> expansion and stem/flag suggestion
> --
>
> Key: LUCENE-10626
> URL: https://issues.apache.org/jira/browse/LUCENE-10626
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Peter Gromov
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The following tools would be nice to have when editing and appending an 
> existing dictionary:
> 1. See how Hunspell analyzes a given word, with all the involved affix flags: 
> `Hunspell.analyzeSimpleWord`
> 2. See all forms that the given stem can produce with the given flags: 
> `Hunspell.expandRoot`, `WordFormGenerator.expandRoot`
> 3. Given a number of word forms, suggest a stem and a set of flags that 
> produce these word forms: `Hunspell.compress`, `WordFormGenerator.compress`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10636) Could the partial score sum from essential list scores be cached?

2022-07-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562464#comment-17562464
 ] 

ASF subversion and git services commented on LUCENE-10636:
--

Commit 2d05f5c623e06b8bafa1f7b1d6be813c14550690 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2d05f5c623e ]

LUCENE-10636: Avoid computing the same scores multiple times. (#1005)

`BlockMaxMaxscoreScorer` would previously compute the score twice for essential
scorers.

Co-authored-by: zacharymorn 

> Could the partial score sum from essential list scores be cached?
> -
>
> Key: LUCENE-10636
> URL: https://issues.apache.org/jira/browse/LUCENE-10636
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Zach Chen
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a follow-up issue from discussion 
> [https://github.com/apache/lucene/pull/972#discussion_r909300200] . Currently 
> in the implementation of BlockMaxMaxscoreScorer, there's duplicated 
> computation of summing up scores from essential list scorers. We would like 
> to see if this duplicated computation can be cached without introducing much 
> overhead or data structure that might out-weight the benefit of caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10636) Could the partial score sum from essential list scores be cached?

2022-07-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562453#comment-17562453
 ] 

ASF subversion and git services commented on LUCENE-10636:
--

Commit 3dd9a5487c2c3994abdaf5ab0553a3d78ebe50ab in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=3dd9a5487c2 ]

LUCENE-10636: Avoid computing the same scores multiple times. (#1005)

`BlockMaxMaxscoreScorer` would previously compute the score twice for essential
scorers.

Co-authored-by: zacharymorn 

> Could the partial score sum from essential list scores be cached?
> -
>
> Key: LUCENE-10636
> URL: https://issues.apache.org/jira/browse/LUCENE-10636
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Zach Chen
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is a follow-up issue from discussion 
> [https://github.com/apache/lucene/pull/972#discussion_r909300200] . Currently 
> in the implementation of BlockMaxMaxscoreScorer, there's duplicated 
> computation of summing up scores from essential list scorers. We would like 
> to see if this duplicated computation can be cached without introducing much 
> overhead or data structure that might out-weight the benefit of caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562266#comment-17562266
 ] 

ASF subversion and git services commented on LUCENE-10480:
--

Commit a5c99aca1abc9b73a0c68d4f23533311382b718c in lucene's branch 
refs/heads/branch_9x from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a5c99aca1ab ]

LUCENE-10480: Use BMM scorer for 2 clauses disjunction (#972) (#1002)

(cherry picked from commit 503ec5597331454bf8b6af79b9701cfdccf5)

> Specialize 2-clauses disjunctions
> -
>
> Key: LUCENE-10480
> URL: https://issues.apache.org/jira/browse/LUCENE-10480
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its 
> invariants: one linked list for the current candidates, one priority queue of 
> scorers that are behind, another one for scorers that are ahead. All this 
> could be simplified in the 2-clauses case, which feels worth specializing for 
> as it's very common that end users enter queries that only have two terms?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher

2022-07-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562216#comment-17562216
 ] 

ASF subversion and git services commented on LUCENE-10151:
--

Commit 81d4a7a69f1c9085e40df412be87de22d0aa8cd6 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=81d4a7a69f1 ]

LUCENE-10151: Some fixes to query timeouts. (#996)

I noticed some minor bugs in the original PR #927 that this PR should fix:
 - When a timeout is set, we would no longer catch
   `CollectionTerminatedException`.
 - I added randomization to `LuceneTestCase` to randomly set a timeout, it
   would have caught the above bug.
 - Fixed visibility of `TimeLimitingBulkScorer`.

> Add timeout support to IndexSearcher
> 
>
> Key: LUCENE-10151
> URL: https://issues.apache.org/jira/browse/LUCENE-10151
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> I'd like to explore adding optional "timeout" capabilities to 
> {{IndexSearcher}}. This would enable users to (optionally) specify a maximum 
> time budget for search execution. If the search "times out", partial results 
> would be available.
> This idea originated on the dev list (thanks [~jpountz] for the suggestion). 
> Thread for reference: 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E]
>  
> A couple things to watch out for with this change:
>  # We want to make sure it's robust to a two-phase query evaluation scenario 
> where the "approximate" step matches a large number of candidates but the 
> "confirmation" step matches very few (or none). This is a particularly tricky 
> case.
>  # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is 
> {{GREATER_THAN_OR_EQUAL_TO}} if the query times out
>  # We want to make sure it plays nice with the {{LRUCache}} since it iterates 
> the query to pre-populate a {{BitSet}} when caching. That step shouldn't be 
> allowed to overrun the timeout. The proper way to handle this probably needs 
> some thought.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10577) Quantize vector values

2022-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561790#comment-17561790
 ] 

ASF subversion and git services commented on LUCENE-10577:
--

Commit 359b495129c68403e7aa36b0a1455e75a3a033e1 in lucene's branch 
refs/heads/branch_9x from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=359b495129c ]

LUCENE-10577: Add vectors format unit test and fix toString (#998)

We forgot to add this unit test when introducing the new 9.3 vectors format.
This commit adds the test and fixes issues it uncovered in toString.

> Quantize vector values
> --
>
> Key: LUCENE-10577
> URL: https://issues.apache.org/jira/browse/LUCENE-10577
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The {{KnnVectorField}} api handles vectors with 4-byte floating point values. 
> These fields can be used (via {{KnnVectorsReader}}) in two main ways:
> 1. The {{VectorValues}} iterator enables retrieving values
> 2. Approximate nearest -neighbor search
> The main point of this addition was to provide the search capability, and to 
> support that it is not really necessary to store vectors in full precision. 
> Perhaps users may also be willing to retrieve values in lower precision for 
> whatever purpose those serve, if they are able to store more samples. We know 
> that 8 bits is enough to provide a very near approximation to the same 
> recall/performance tradeoff that is achieved with the full-precision vectors. 
> I'd like to explore how we could enable 4:1 compression of these fields by 
> reducing their precision.
> A few ways I can imagine this would be done:
> 1. Provide a parallel byte-oriented API. This would allow users to provide 
> their data in reduced-precision format and give control over the quantization 
> to them. It would have a major impact on the Lucene API surface though, 
> essentially requiring us to duplicate all of the vector APIs.
> 2. Automatically quantize the stored vector data when we can. This would 
> require no or perhaps very limited change to the existing API to enable the 
> feature.
> I've been exploring (2), and what I find is that we can achieve very good 
> recall results using dot-product similarity scoring by simple linear scaling 
> + quantization of the vector values, so long as  we choose the scale that 
> minimizes the quantization error. Dot-product is amenable to this treatment 
> since vectors are required to be unit-length when used with that similarity 
> function. 
>  Even still there is variability in the ideal scale over different data sets. 
> A good choice seems to be max(abs(min-value), abs(max-value)), but of course 
> this assumes that the data set doesn't have a few outlier data points. A 
> theoretical range can be obtained by 1/sqrt(dimension), but this is only 
> useful when the samples are normally distributed. We could in theory 
> determine the ideal scale when flushing a segment and manage this 
> quantization per-segment, but then numerical error could creep in when 
> merging.
> I'll post a patch/PR with an experimental setup I've been using for 
> evaluation purposes. It is pretty self-contained and simple, but has some 
> drawbacks that need to be addressed:
> 1. No automated mechanism for determining quantization scale (it's a constant 
> that I have been playing with)
> 2. Converts from byte/float when computing dot-product instead of directly 
> computing on byte values
> I'd like to get people's feedback on the approach and whether in general we 
> should think about doing this compression under the hood, or expose a 
> byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty 
> compelling and we should pursue something.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561787#comment-17561787
 ] 

ASF subversion and git services commented on LUCENE-10480:
--

Commit 503ec5597331454bf8b6af79b9701cfdccf5 in lucene's branch 
refs/heads/main from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=503ec559733 ]

LUCENE-10480: Use BMM scorer for 2 clauses disjunction (#972)



> Specialize 2-clauses disjunctions
> -
>
> Key: LUCENE-10480
> URL: https://issues.apache.org/jira/browse/LUCENE-10480
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its 
> invariants: one linked list for the current candidates, one priority queue of 
> scorers that are behind, another one for scorers that are ahead. All this 
> could be simplified in the 2-clauses case, which feels worth specializing for 
> as it's very common that end users enter queries that only have two terms?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10577) Quantize vector values

2022-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561773#comment-17561773
 ] 

ASF subversion and git services commented on LUCENE-10577:
--

Commit 187f843e2a49f37f5fa1d50107f32be895146e21 in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=187f843e2a4 ]

LUCENE-10577: Add vectors format unit test and fix toString (#998)

We forgot to add this unit test when introducing the new 9.3 vectors format.
This commit adds the test and fixes issues it uncovered in toString.

> Quantize vector values
> --
>
> Key: LUCENE-10577
> URL: https://issues.apache.org/jira/browse/LUCENE-10577
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The {{KnnVectorField}} api handles vectors with 4-byte floating point values. 
> These fields can be used (via {{KnnVectorsReader}}) in two main ways:
> 1. The {{VectorValues}} iterator enables retrieving values
> 2. Approximate nearest -neighbor search
> The main point of this addition was to provide the search capability, and to 
> support that it is not really necessary to store vectors in full precision. 
> Perhaps users may also be willing to retrieve values in lower precision for 
> whatever purpose those serve, if they are able to store more samples. We know 
> that 8 bits is enough to provide a very near approximation to the same 
> recall/performance tradeoff that is achieved with the full-precision vectors. 
> I'd like to explore how we could enable 4:1 compression of these fields by 
> reducing their precision.
> A few ways I can imagine this would be done:
> 1. Provide a parallel byte-oriented API. This would allow users to provide 
> their data in reduced-precision format and give control over the quantization 
> to them. It would have a major impact on the Lucene API surface though, 
> essentially requiring us to duplicate all of the vector APIs.
> 2. Automatically quantize the stored vector data when we can. This would 
> require no or perhaps very limited change to the existing API to enable the 
> feature.
> I've been exploring (2), and what I find is that we can achieve very good 
> recall results using dot-product similarity scoring by simple linear scaling 
> + quantization of the vector values, so long as  we choose the scale that 
> minimizes the quantization error. Dot-product is amenable to this treatment 
> since vectors are required to be unit-length when used with that similarity 
> function. 
>  Even still there is variability in the ideal scale over different data sets. 
> A good choice seems to be max(abs(min-value), abs(max-value)), but of course 
> this assumes that the data set doesn't have a few outlier data points. A 
> theoretical range can be obtained by 1/sqrt(dimension), but this is only 
> useful when the samples are normally distributed. We could in theory 
> determine the ideal scale when flushing a segment and manage this 
> quantization per-segment, but then numerical error could creep in when 
> merging.
> I'll post a patch/PR with an experimental setup I've been using for 
> evaluation purposes. It is pretty self-contained and simple, but has some 
> drawbacks that need to be addressed:
> 1. No automated mechanism for determining quantization scale (it's a constant 
> that I have been playing with)
> 2. Converts from byte/float when computing dot-product instead of directly 
> computing on byte values
> I'd like to get people's feedback on the approach and whether in general we 
> should think about doing this compression under the hood, or expose a 
> byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty 
> compelling and we should pursue something.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10563) Unable to Tessellate polygon

2022-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561376#comment-17561376
 ] 

ASF subversion and git services commented on LUCENE-10563:
--

Commit 1fd67682f20c41c67dc2d0854d71d4c0c4bddc31 in lucene-solr's branch 
refs/heads/branch_8_11 from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1fd67682f20 ]

LUCENE-10563: Fix failure to tessellate complex polygon (#933) (#2666)



> Unable to Tessellate polygon
> 
>
> Key: LUCENE-10563
> URL: https://issues.apache.org/jira/browse/LUCENE-10563
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 9.1
>Reporter: Yixun Xu
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 9.3
>
> Attachments: polygon-1.json, polygon-2.json, polygon-3.json
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Following up to LUCENE-10470, I found some more polygons that cause 
> {{Tessellator.tessellate}} to throw "Unable to Tessellate shape", which are 
> not covered by the fix to LUCENE-10470. I attached the geojson of 3 failing 
> shapes that I got, and this is the 
> [branch|https://github.com/apache/lucene/compare/main...yixunx:yx/reproduce-tessellator-error?expand=1#diff-5e8e8052af8b8618e7e4325b7d69def4d562a356acbfea3e983198327c7c8d18R17-R19]
>  I am testing on that demonstrates the tessellation failures. 
>  
> [^polygon-1.json]
> [^polygon-2.json]
> [^polygon-3.json]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10563) Unable to Tessellate polygon

2022-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561371#comment-17561371
 ] 

ASF subversion and git services commented on LUCENE-10563:
--

Commit 41203f412762e858f62223524ae9b4af6cfe32f8 in lucene-solr's branch 
refs/heads/LUCENE-10563 from Craig Taverner
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=41203f41276 ]

LUCENE-10563: Fix failure to tessellate complex polygon (#933)

# Conflicts:
#   lucene/CHANGES.txt
#   lucene/core/src/java/org/apache/lucene/geo/Tessellator.java
#   lucene/core/src/test/org/apache/lucene/geo/TestTessellator.java
#   
lucene/test-framework/src/resources/org/apache/lucene/geo/lucene-10563-1.geojson.gz
#   
lucene/test-framework/src/resources/org/apache/lucene/geo/lucene-10563-2.geojson.gz
#   
lucene/test-framework/src/resources/org/apache/lucene/geo/lucene-10563-3.geojson.gz


> Unable to Tessellate polygon
> 
>
> Key: LUCENE-10563
> URL: https://issues.apache.org/jira/browse/LUCENE-10563
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 9.1
>Reporter: Yixun Xu
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 9.3
>
> Attachments: polygon-1.json, polygon-2.json, polygon-3.json
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Following up to LUCENE-10470, I found some more polygons that cause 
> {{Tessellator.tessellate}} to throw "Unable to Tessellate shape", which are 
> not covered by the fix to LUCENE-10470. I attached the geojson of 3 failing 
> shapes that I got, and this is the 
> [branch|https://github.com/apache/lucene/compare/main...yixunx:yx/reproduce-tessellator-error?expand=1#diff-5e8e8052af8b8618e7e4325b7d69def4d562a356acbfea3e983198327c7c8d18R17-R19]
>  I am testing on that demonstrates the tessellation failures. 
>  
> [^polygon-1.json]
> [^polygon-2.json]
> [^polygon-3.json]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10470) Unable to Tessellate polygon

2022-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561368#comment-17561368
 ] 

ASF subversion and git services commented on LUCENE-10470:
--

Commit 41ffac45b49dfc9e4ed6979c95ca9a8c14617a54 in lucene-solr's branch 
refs/heads/branch_8_11 from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=41ffac45b49 ]

LUCENE-10470: [Tessellator] Fix some failing polygons due to collinear edges 
(#756) (#2665)

Check if polygon has been successfully tessellated before we fail (we are 
failing some valid
tessellations) and allow filtering edges that fold on top of the previous one

> Unable to Tessellate polygon
> 
>
> Key: LUCENE-10470
> URL: https://issues.apache.org/jira/browse/LUCENE-10470
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 9.0
>Reporter: Yixun Xu
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 9.2
>
> Attachments: image-2022-03-16-18-12-43-411.png, 
> image-2022-03-31-16-06-33-051.png, image-2022-04-04-17-33-52-454.png, 
> image-2022-04-04-17-34-41-971.png, polygon2.geojson, polygon3.geojson, 
> vertices-latest-lucene.txt, vertices-lucene-820.txt
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I have a polygon that causes {{Tessellator.tessellate}} to throw an "Unable 
> to Tessellate shape" error. I tried several versions of Lucene, and the issue 
> does not happen with Lucene 8.2.0, but seems to happen with all Lucene 
> versions >=8.3.0, including the latest main branch.
> I created a branch that reproduces the issue: 
> [https://github.com/apache/lucene/compare/main...yixunx:yx/reproduce-tessellator-error?expand=1]
> This is the polygon rendered on geojson.io:
> !image-2022-03-16-18-12-43-411.png|width=379,height=234!
> Is this a bug in the Tesselator logic, or is there anything wrong with this 
> polygon that maybe wasn't caught by Lucene 8.2.0?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10470) Unable to Tessellate polygon

2022-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561362#comment-17561362
 ] 

ASF subversion and git services commented on LUCENE-10470:
--

Commit 4f67eabb3c08501d4691beee731f8f7dba1262f0 in lucene-solr's branch 
refs/heads/LUCENE-10470 from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4f67eabb3c0 ]

LUCENE-10470: [Tessellator] Fix some failing polygons due to collinear edges 
(#756)

 Check if polygon has been successfully tessellated before we fail (we are 
failing some valid
  tessellations) and allow filtering edges that fold on top of the previous one
# Conflicts:
#   lucene/CHANGES.txt
#   lucene/core/src/java/org/apache/lucene/geo/Tessellator.java
#   lucene/core/src/test/org/apache/lucene/geo/TestTessellator.java
#   
lucene/test-framework/src/resources/org/apache/lucene/geo/lucene-10470-2.geojson.gz
#   
lucene/test-framework/src/resources/org/apache/lucene/geo/lucene-10470-3.geojson.gz
#   
lucene/test-framework/src/resources/org/apache/lucene/geo/lucene-10470.geojson.gz
#   
lucene/test-framework/src/resources/org/apache/lucene/geo/lucene-10470.wkt.gz


> Unable to Tessellate polygon
> 
>
> Key: LUCENE-10470
> URL: https://issues.apache.org/jira/browse/LUCENE-10470
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 9.0
>Reporter: Yixun Xu
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 9.2
>
> Attachments: image-2022-03-16-18-12-43-411.png, 
> image-2022-03-31-16-06-33-051.png, image-2022-04-04-17-33-52-454.png, 
> image-2022-04-04-17-34-41-971.png, polygon2.geojson, polygon3.geojson, 
> vertices-latest-lucene.txt, vertices-lucene-820.txt
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> I have a polygon that causes {{Tessellator.tessellate}} to throw an "Unable 
> to Tessellate shape" error. I tried several versions of Lucene, and the issue 
> does not happen with Lucene 8.2.0, but seems to happen with all Lucene 
> versions >=8.3.0, including the latest main branch.
> I created a branch that reproduces the issue: 
> [https://github.com/apache/lucene/compare/main...yixunx:yx/reproduce-tessellator-error?expand=1]
> This is the polygon rendered on geojson.io:
> !image-2022-03-16-18-12-43-411.png|width=379,height=234!
> Is this a bug in the Tesselator logic, or is there anything wrong with this 
> polygon that maybe wasn't caught by Lucene 8.2.0?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-06-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561189#comment-17561189
 ] 

ASF subversion and git services commented on LUCENE-10603:
--

Commit 3e268805024cf98abb11f6de45b32403b088eb5b in lucene's branch 
refs/heads/branch_9x from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=3e268805024 ]

LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in 
production code (#1000)



> Improve iteration of ords for SortedSetDocValues
> 
>
> Key: LUCENE-10603
> URL: https://issues.apache.org/jira/browse/LUCENE-10603
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Assignee: Lu Xugang
>Priority: Trivial
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> After SortedSetDocValues#docValueCount added since Lucene 9.2, should we 
> refactor the implementation of ords iterations using docValueCount instead of 
> NO_MORE_ORDS?
> Similar how SortedNumericDocValues did
> From 
> {code:java}
> for (long ord = values.nextOrd();ord != SortedSetDocValues.NO_MORE_ORDS; ord 
> = values.nextOrd()) {
> }{code}
> to
> {code:java}
> for (int i = 0; i < values.docValueCount(); i++) {
>   long ord = values.nextOrd();
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues

2022-06-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561187#comment-17561187
 ] 

ASF subversion and git services commented on LUCENE-10603:
--

Commit 5f2a4998a079278ada89ce7bfa3992673a91c5c9 in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5f2a4998a07 ]

LUCENE-10603: Migrate remaining SSDV iteration to use docValueCount in 
production code (#995)



> Improve iteration of ords for SortedSetDocValues
> 
>
> Key: LUCENE-10603
> URL: https://issues.apache.org/jira/browse/LUCENE-10603
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Assignee: Lu Xugang
>Priority: Trivial
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> After SortedSetDocValues#docValueCount added since Lucene 9.2, should we 
> refactor the implementation of ords iterations using docValueCount instead of 
> NO_MORE_ORDS?
> Similar how SortedNumericDocValues did
> From 
> {code:java}
> for (long ord = values.nextOrd();ord != SortedSetDocValues.NO_MORE_ORDS; ord 
> = values.nextOrd()) {
> }{code}
> to
> {code:java}
> for (int i = 0; i < values.docValueCount(); i++) {
>   long ord = values.nextOrd();
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher

2022-06-29 Thread ASF subversion and git services (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 ASF subversion and git services commented on  LUCENE-10151  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Add timeout support to IndexSearcher   
 

  
 
 
 
 

 
 Commit 95de554b65bece9697396eeb4a5e78a8352f58d0 in lucene's branch refs/heads/main from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=95de554b65b ] CHANGES entry for LUCENE-10151  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher

2022-06-29 Thread ASF subversion and git services (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 ASF subversion and git services commented on  LUCENE-10151  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Add timeout support to IndexSearcher   
 

  
 
 
 
 

 
 Commit af05550ebfe3dc1bc40aeb2318c132a9b12e37a2 in lucene's branch refs/heads/main from Deepika0510 [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=af05550ebfe ] LUCENE-10151: Adding Timeout Support to IndexSearcher (#927) Authored-by: Deepika Sharma   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Commented] (LUCENE-10593) VectorSimilarityFunction reverse removal

2022-06-29 Thread ASF subversion and git services (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 ASF subversion and git services commented on  LUCENE-10593  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: VectorSimilarityFunction reverse removal   
 

  
 
 
 
 

 
 Commit b3b7098cd9636c5ad2516055f768dd29b795a05d in lucene's branch refs/heads/branch_9x from Alessandro Benedetti [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b3b7098cd96 ] LUCENE-10593: VectorSimilarityFunction reverse removal (#926) 
 
Vector Similarity Function reverse property removed 
 
 
NeighborQueue tie-breaking fixed (node id + node score encoding) 
 
 
NeighborQueue readability refactor 
 
 
BoundChecker removal (now it's only in backward-codecs) 
  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-28 Thread ASF subversion and git services (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 ASF subversion and git services commented on  LUCENE-10557  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Migrate to GitHub issue from Jira   
 

  
 
 
 
 

 
 Commit 64321114e1d8579e52376a97f5eb3e4cd13338e8 in lucene's branch refs/heads/main from Tomoko Uchida [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=64321114e1d ] LUCENE-10557: temprarily enable github issue (#988)  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Commented] (LUCENE-10593) VectorSimilarityFunction reverse removal

2022-06-28 Thread ASF subversion and git services (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 ASF subversion and git services commented on  LUCENE-10593  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: VectorSimilarityFunction reverse removal   
 

  
 
 
 
 

 
 Commit 8cf694fed2131c71679c24277fbb76e0d981d564 in lucene's branch refs/heads/main from Alessandro Benedetti [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8cf694fed21 ] LUCENE-10593: VectorSimilarityFunction reverse removal (#926) 
 
Vector Similarity Function reverse property removed 
 
 
NeighborQueue tie-breaking fixed (node id + node score encoding) 
 
 
NeighborQueue readability refactor 
 
 
BoundChecker removal (now it's only in backward-codecs) 
  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Commented] (LUCENE-10623) Error implementation of docValueCount for SortingSortedSetDocValues

2022-06-27 Thread ASF subversion and git services (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 ASF subversion and git services commented on  LUCENE-10623  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Error implementation of docValueCount for SortingSortedSetDocValues   
 

  
 
 
 
 

 
 Commit fb261e6ff48e5a57d9dff7fd960e21ec2634294d in lucene's branch refs/heads/branch_9x from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb261e6ff48 ] LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues (#967)  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Commented] (LUCENE-10623) Error implementation of docValueCount for SortingSortedSetDocValues

2022-06-27 Thread ASF subversion and git services (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 ASF subversion and git services commented on  LUCENE-10623  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Error implementation of docValueCount for SortingSortedSetDocValues   
 

  
 
 
 
 

 
 Commit d8fb47b67480afe5fffca68f1565774ef6874d60 in lucene's branch refs/heads/main from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d8fb47b6748 ] LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues (#967)  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Commented] (LUCENE-9580) Tessellator failure for a certain polygon

2022-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17559043#comment-17559043
 ] 

ASF subversion and git services commented on LUCENE-9580:
-

Commit 6a3f50539587cdabe5efe199bc06f6375f1d092a in lucene-solr's branch 
refs/heads/branch_8_11 from Hugo Mercier
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6a3f5053958 ]

LUCENE-9580: Fix bug in the polygon tessellator when introducing collinear 
edges during polygon splitting (#2452) (#2664)

Co-authored-by: Ignacio Vera 

> Tessellator failure for a certain polygon
> -
>
> Key: LUCENE-9580
> URL: https://issues.apache.org/jira/browse/LUCENE-9580
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.5, 8.6
>Reporter: Iurii Vyshnevskyi
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 9.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This bug was discovered while using ElasticSearch (checked with versions 
> 7.6.2 and 7.9.2).
> But I've created an isolated test case just for Lucene: 
> [https://github.com/apache/lucene-solr/pull/2006/files]
>  
> The unit test fails with "java.lang.IllegalArgumentException: Unable to 
> Tessellate shape".
>  
> The polygon contains two holes that share the same vertex and one more 
> standalone hole.
> Removing any of them makes the unit test pass. 
>  
> Changing the least significant digit in any coordinate of the "common vertex" 
> in any of two first holes, so that these vertices become different in each 
> hole - also makes unit test pass.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   9   10   >