[GitHub] [lucene-jira-archive] mocobeta closed issue #13: Test issue

2022-07-02 Thread GitBox


mocobeta closed issue #13: Test issue
URL: https://github.com/apache/lucene-jira-archive/issues/13


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #14: Investigate import failure of LUCENE-1498

2022-07-02 Thread GitBox


mocobeta opened a new issue, #14:
URL: https://github.com/apache/lucene-jira-archive/issues/14

   https://issues.apache.org/jira/browse/LUCENE-1498 won't be imported.
   
   ```
   [2022-06-26 18:38:25,394] ERROR:import_github_issues: Import GitHub issue 
/mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data/GH-LUCENE-1498.json
 was failed. status=failed, errors=[{'location': '/issue', 'resource': 'Issue', 
'field': None, 'value': None, 'code': 'error'}]
   ```
   
   Maybe the body contains character sequences that are not acceptable to 
GitHub?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #7: Make a detailed migration plan

2022-07-02 Thread GitBox


mocobeta commented on issue #7:
URL: 
https://github.com/apache/lucene-jira-archive/issues/7#issuecomment-1173010491

   #13 confirmed that my account was able to update issues in an ASF repo 
without modifying the author. We can do step 7 (second pass - the most 
time-consuming part) ourselves.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #13: Test issue

2022-07-02 Thread GitBox


mocobeta commented on issue #13:
URL: 
https://github.com/apache/lucene-jira-archive/issues/13#issuecomment-1173010229

   Ok, we can update issues ourselves by scripts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #13: Test issue

2022-07-02 Thread GitBox


mocobeta commented on issue #13:
URL: 
https://github.com/apache/lucene-jira-archive/issues/13#issuecomment-1173007101

   Thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] ikawaha commented on issue #13: Test issue

2022-07-02 Thread GitBox


ikawaha commented on issue #13:
URL: 
https://github.com/apache/lucene-jira-archive/issues/13#issuecomment-1173006810

   ねこですねこはいますよろしくおねがいします


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #13: Test issue

2022-07-02 Thread GitBox


mocobeta commented on issue #13:
URL: 
https://github.com/apache/lucene-jira-archive/issues/13#issuecomment-1173006761

   Thank you @ikawaha.
   Can you please also add a fake comment please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] ikawaha opened a new issue, #13: Test issue

2022-07-02 Thread GitBox


ikawaha opened a new issue, #13:
URL: https://github.com/apache/lucene-jira-archive/issues/13

   This is a test to check the operation of the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10577) Quantize vector values

2022-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561790#comment-17561790
 ] 

ASF subversion and git services commented on LUCENE-10577:
--

Commit 359b495129c68403e7aa36b0a1455e75a3a033e1 in lucene's branch 
refs/heads/branch_9x from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=359b495129c ]

LUCENE-10577: Add vectors format unit test and fix toString (#998)

We forgot to add this unit test when introducing the new 9.3 vectors format.
This commit adds the test and fixes issues it uncovered in toString.

> Quantize vector values
> --
>
> Key: LUCENE-10577
> URL: https://issues.apache.org/jira/browse/LUCENE-10577
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The {{KnnVectorField}} api handles vectors with 4-byte floating point values. 
> These fields can be used (via {{KnnVectorsReader}}) in two main ways:
> 1. The {{VectorValues}} iterator enables retrieving values
> 2. Approximate nearest -neighbor search
> The main point of this addition was to provide the search capability, and to 
> support that it is not really necessary to store vectors in full precision. 
> Perhaps users may also be willing to retrieve values in lower precision for 
> whatever purpose those serve, if they are able to store more samples. We know 
> that 8 bits is enough to provide a very near approximation to the same 
> recall/performance tradeoff that is achieved with the full-precision vectors. 
> I'd like to explore how we could enable 4:1 compression of these fields by 
> reducing their precision.
> A few ways I can imagine this would be done:
> 1. Provide a parallel byte-oriented API. This would allow users to provide 
> their data in reduced-precision format and give control over the quantization 
> to them. It would have a major impact on the Lucene API surface though, 
> essentially requiring us to duplicate all of the vector APIs.
> 2. Automatically quantize the stored vector data when we can. This would 
> require no or perhaps very limited change to the existing API to enable the 
> feature.
> I've been exploring (2), and what I find is that we can achieve very good 
> recall results using dot-product similarity scoring by simple linear scaling 
> + quantization of the vector values, so long as  we choose the scale that 
> minimizes the quantization error. Dot-product is amenable to this treatment 
> since vectors are required to be unit-length when used with that similarity 
> function. 
>  Even still there is variability in the ideal scale over different data sets. 
> A good choice seems to be max(abs(min-value), abs(max-value)), but of course 
> this assumes that the data set doesn't have a few outlier data points. A 
> theoretical range can be obtained by 1/sqrt(dimension), but this is only 
> useful when the samples are normally distributed. We could in theory 
> determine the ideal scale when flushing a segment and manage this 
> quantization per-segment, but then numerical error could creep in when 
> merging.
> I'll post a patch/PR with an experimental setup I've been using for 
> evaluation purposes. It is pretty self-contained and simple, but has some 
> drawbacks that need to be addressed:
> 1. No automated mechanism for determining quantization scale (it's a constant 
> that I have been playing with)
> 2. Converts from byte/float when computing dot-product instead of directly 
> computing on byte values
> I'd like to get people's feedback on the approach and whether in general we 
> should think about doing this compression under the hood, or expose a 
> byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty 
> compelling and we should pursue something.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10635) Ensure test coverage for WANDScorer after additional scorers get added

2022-07-02 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561789#comment-17561789
 ] 

Zach Chen commented on LUCENE-10635:


I like this idea! This approach should also be able to preserve most of the 
assertions in the test utilities. I can give it a try and see how things might 
look.

> Ensure test coverage for WANDScorer after additional scorers get added
> --
>
> Key: LUCENE-10635
> URL: https://issues.apache.org/jira/browse/LUCENE-10635
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Zach Chen
>Priority: Major
>
> This is a follow-up issue from discussions 
> [https://github.com/apache/lucene/pull/972#issuecomment-1170684358] & 
> [https://github.com/apache/lucene/pull/972#pullrequestreview-1024377641] .
>  
> As additional scorers such as BlockMaxMaxscoreScorer get added, some tests in 
> TestWANDScorer that used to test WANDScorer now test BlockMaxMaxscoreScorer 
> instead, reducing test coverage for WANDScorer. We would like to see how we 
> can ensure TestWANDScorer reliably tests WANDScorer, perhaps by initiating 
> the scorer directly inside the tests?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-07-02 Thread GitBox


zacharymorn commented on PR #972:
URL: https://github.com/apache/lucene/pull/972#issuecomment-1172956347

   Thanks again @jpountz ! I've created the above PR to backport these changes 
to `branch_9x`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn opened a new pull request, #1002: LUCENE-10480: (Backporting) Use BMM scorer for 2 clauses disjunction

2022-07-02 Thread GitBox


zacharymorn opened a new pull request, #1002:
URL: https://github.com/apache/lucene/pull/1002

   This PR backports PR https://github.com/apache/lucene/pull/972 to `branch_9x`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10480) Specialize 2-clauses disjunctions

2022-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561787#comment-17561787
 ] 

ASF subversion and git services commented on LUCENE-10480:
--

Commit 503ec5597331454bf8b6af79b9701cfdccf5 in lucene's branch 
refs/heads/main from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=503ec559733 ]

LUCENE-10480: Use BMM scorer for 2 clauses disjunction (#972)



> Specialize 2-clauses disjunctions
> -
>
> Key: LUCENE-10480
> URL: https://issues.apache.org/jira/browse/LUCENE-10480
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its 
> invariants: one linked list for the current candidates, one priority queue of 
> scorers that are behind, another one for scorers that are ahead. All this 
> could be simplified in the 2-clauses case, which feels worth specializing for 
> as it's very common that end users enter queries that only have two terms?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn merged pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction

2022-07-02 Thread GitBox


zacharymorn merged PR #972:
URL: https://github.com/apache/lucene/pull/972


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10577) Quantize vector values

2022-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561773#comment-17561773
 ] 

ASF subversion and git services commented on LUCENE-10577:
--

Commit 187f843e2a49f37f5fa1d50107f32be895146e21 in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=187f843e2a4 ]

LUCENE-10577: Add vectors format unit test and fix toString (#998)

We forgot to add this unit test when introducing the new 9.3 vectors format.
This commit adds the test and fixes issues it uncovered in toString.

> Quantize vector values
> --
>
> Key: LUCENE-10577
> URL: https://issues.apache.org/jira/browse/LUCENE-10577
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The {{KnnVectorField}} api handles vectors with 4-byte floating point values. 
> These fields can be used (via {{KnnVectorsReader}}) in two main ways:
> 1. The {{VectorValues}} iterator enables retrieving values
> 2. Approximate nearest -neighbor search
> The main point of this addition was to provide the search capability, and to 
> support that it is not really necessary to store vectors in full precision. 
> Perhaps users may also be willing to retrieve values in lower precision for 
> whatever purpose those serve, if they are able to store more samples. We know 
> that 8 bits is enough to provide a very near approximation to the same 
> recall/performance tradeoff that is achieved with the full-precision vectors. 
> I'd like to explore how we could enable 4:1 compression of these fields by 
> reducing their precision.
> A few ways I can imagine this would be done:
> 1. Provide a parallel byte-oriented API. This would allow users to provide 
> their data in reduced-precision format and give control over the quantization 
> to them. It would have a major impact on the Lucene API surface though, 
> essentially requiring us to duplicate all of the vector APIs.
> 2. Automatically quantize the stored vector data when we can. This would 
> require no or perhaps very limited change to the existing API to enable the 
> feature.
> I've been exploring (2), and what I find is that we can achieve very good 
> recall results using dot-product similarity scoring by simple linear scaling 
> + quantization of the vector values, so long as  we choose the scale that 
> minimizes the quantization error. Dot-product is amenable to this treatment 
> since vectors are required to be unit-length when used with that similarity 
> function. 
>  Even still there is variability in the ideal scale over different data sets. 
> A good choice seems to be max(abs(min-value), abs(max-value)), but of course 
> this assumes that the data set doesn't have a few outlier data points. A 
> theoretical range can be obtained by 1/sqrt(dimension), but this is only 
> useful when the samples are normally distributed. We could in theory 
> determine the ideal scale when flushing a segment and manage this 
> quantization per-segment, but then numerical error could creep in when 
> merging.
> I'll post a patch/PR with an experimental setup I've been using for 
> evaluation purposes. It is pretty self-contained and simple, but has some 
> drawbacks that need to be addressed:
> 1. No automated mechanism for determining quantization scale (it's a constant 
> that I have been playing with)
> 2. Converts from byte/float when computing dot-product instead of directly 
> computing on byte values
> I'd like to get people's feedback on the approach and whether in general we 
> should think about doing this compression under the hood, or expose a 
> byte-oriented API. Whatever we do I think a 4:1 compression ratio is pretty 
> compelling and we should pursue something.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani merged pull request #998: LUCENE-10577: Add vectors format unit test and fix toString

2022-07-02 Thread GitBox


jtibshirani merged PR #998:
URL: https://github.com/apache/lucene/pull/998


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on pull request #998: LUCENE-10577: Add vectors format unit test and fix toString

2022-07-02 Thread GitBox


jtibshirani commented on PR #998:
URL: https://github.com/apache/lucene/pull/998#issuecomment-1172928401

   No problem! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on a diff in pull request #932: LUCENE-10559: Add Prefilter Option to KnnGraphTester

2022-07-02 Thread GitBox


jtibshirani commented on code in PR #932:
URL: https://github.com/apache/lucene/pull/932#discussion_r912381221


##
lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java:
##
@@ -480,11 +537,15 @@ private int[][] getNN(Path docPath, Path queryPath) 
throws IOException {
 String hash = Integer.toString(Objects.hash(docPath, queryPath, numDocs, 
numIters, topK), 36);
 String nnFileName = "nn-" + hash + ".bin";
 Path nnPath = Paths.get(nnFileName);
-if (Files.exists(nnPath) && isNewer(nnPath, docPath, queryPath)) {
+if (Files.exists(nnPath)

Review Comment:
   Oops I read the logic incorrectly, thanks for clarifying!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10639) WANDScorer performs better without two-phase

2022-07-02 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561772#comment-17561772
 ] 

Greg Miller commented on LUCENE-10639:
--

{quote}I suspected there was some overhead to two-phase iteration but not as 
much as this.
{quote}
Right. Yeah, I guess I was so surprised by the performance shift that I assumed 
there must be an interesting second-phase happening. But from what you're 
saying, it sounds like these {{OrHighLow/Med/High}} tasks aren't doing that. 
And that the performance change is purely some side-effect of running the two 
phases instead of doing all the checks in the first phase. I should have dug 
into what these tasks are doing.
{quote}Hotspot was not always able to optimize "if (liveDocs == null)" checks
{quote}
Interesting. Seems worth a shot.

 

Thanks for the quick thoughts!

> WANDScorer performs better without two-phase
> 
>
> Key: LUCENE-10639
> URL: https://issues.apache.org/jira/browse/LUCENE-10639
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Major
>
> After looking at the recent improvement [~jpountz] made to WAND scoring in 
> LUCENE-10634, which does additional work during match confirmation to not 
> confirm a match who's score wouldn't be competitive, I wanted to see how 
> performance would shift if we squashed the two-phase iteration completely and 
> only returned true matches (that were also known to be competitive by score) 
> in the "approximation" phase. I was a bit surprised to find that luceneutil 
> benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some 
> disjunction tasks and doesn't show significant regressions anywhere else.
> Note that I used LUCENE-10634 as a baseline, and built my candidate change on 
> top of that. The diff can be seen here: 
> [DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231]
> A simple conclusion here might be that we shouldn't do two-phase iteration in 
> WANDScorer, but I'm pretty sure that's not right. I wonder if what's really 
> going on is that we're under-estimating the cost of confirming a match? Right 
> now we just return the tail size as the cost. While the cost of confirming a 
> match is proportional to the tail size, the actual work involved can be quite 
> significant (having to advance tail iterators to new blocks and decompress 
> them). I wonder if the WAND second phase is being run too early on 
> approximate candidates, and if less-expensive, (and even possibly more 
> restrictive?), second phases could/should be running first?
> I'm raising this here as more of a curiosity to see if it sparks ideas on how 
> to move forward. Again, I'm not proposing we do away with two-phase 
> iteration, but it seems we might be able to improve things. Maybe I'll 
> explore changing the cost heuristic next. Also, maybe there's some different 
> benchmarking that would be useful here that I may not be familiar with?
> Benchmark results on wikimediumall:
> {code:java}
> TaskQPS baseline  StdDevQPS candidate  
> StdDevPct diff p-value
> HighTermTitleBDVSort   22.52 (18.9%)   21.66 
> (15.6%)   -3.8% ( -32% -   37%) 0.485
>  Prefix39.38  (9.2%)9.09 
> (10.6%)   -3.1% ( -20% -   18%) 0.326
>HighTermMonthSort   25.37 (16.0%)   24.87 
> (17.1%)   -2.0% ( -30% -   37%) 0.710
> MedTermDayTaxoFacets9.62  (4.2%)9.51  
> (4.1%)   -1.2% (  -9% -7%) 0.368
>   TermDTSort   74.69 (18.0%)   74.13 
> (18.2%)   -0.7% ( -31% -   43%) 0.897
>HighTermDayOfYearSort   52.64 (16.1%)   52.32 
> (15.4%)   -0.6% ( -27% -   36%) 0.903
>BrowseMonthTaxoFacets8.64 (19.1%)8.59 
> (19.8%)   -0.6% ( -33% -   47%) 0.926
> BrowseDateSSDVFacets0.86  (9.5%)0.86 
> (13.1%)   -0.4% ( -20% -   24%) 0.914
> PKLookup  147.18  (3.9%)  146.66  
> (3.3%)   -0.3% (  -7% -7%) 0.759
>BrowseDayOfYearSSDVFacets3.47  (4.5%)3.45  
> (4.8%)   -0.3% (  -9% -9%) 0.822
> Wildcard   36.36  (4.4%)   36.26  
> (5.2%)   -0.3% (  -9% -9%) 0.866
>BrowseMonthSSDVFacets4.15 (12.7%)4.13 
> (12.8%)   -0.3% ( -22% -   28%) 0.950
>  AndHighMedDayTaxoFacets   15.21  (2.7%)   15.18  
> (2.9%)   -0.2% (  -5% -5%) 0.819
>   Fuzzy1   68.33  (1.8%)   68.22 

[jira] [Commented] (LUCENE-10635) Ensure test coverage for WANDScorer after additional scorers get added

2022-07-02 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561754#comment-17561754
 ] 

Adrien Grand commented on LUCENE-10635:
---

Thinking out loud, maybe one way to do this would be to have a specialized 
WANDQuery in the test folder that is guaranteed to produce a WANDScorer?

> Ensure test coverage for WANDScorer after additional scorers get added
> --
>
> Key: LUCENE-10635
> URL: https://issues.apache.org/jira/browse/LUCENE-10635
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Zach Chen
>Priority: Major
>
> This is a follow-up issue from discussions 
> [https://github.com/apache/lucene/pull/972#issuecomment-1170684358] & 
> [https://github.com/apache/lucene/pull/972#pullrequestreview-1024377641] .
>  
> As additional scorers such as BlockMaxMaxscoreScorer get added, some tests in 
> TestWANDScorer that used to test WANDScorer now test BlockMaxMaxscoreScorer 
> instead, reducing test coverage for WANDScorer. We would like to see how we 
> can ensure TestWANDScorer reliably tests WANDScorer, perhaps by initiating 
> the scorer directly inside the tests?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10639) WANDScorer performs better without two-phase

2022-07-02 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561753#comment-17561753
 ] 

Adrien Grand commented on LUCENE-10639:
---

On a recent PR [~ChrisHegarty] found out that Hotspot was not always able to 
optimize "if (liveDocs == null)" checks within for loops 
(https://github.com/apache/lucene/pull/812#discussion_r851301618). Since then 
I've been wondering if DefaultBulkScorer is affected by this. If it is, we 
could look into the performance benefit of moving the {{if (liveDocs == null)}} 
check out of the for loop here: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/Weight.java#L311-L317.
 This might also help the compiler figure out that the approximation and 
TwoPhaseIterator's matches run in sequence?

> WANDScorer performs better without two-phase
> 
>
> Key: LUCENE-10639
> URL: https://issues.apache.org/jira/browse/LUCENE-10639
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Major
>
> After looking at the recent improvement [~jpountz] made to WAND scoring in 
> LUCENE-10634, which does additional work during match confirmation to not 
> confirm a match who's score wouldn't be competitive, I wanted to see how 
> performance would shift if we squashed the two-phase iteration completely and 
> only returned true matches (that were also known to be competitive by score) 
> in the "approximation" phase. I was a bit surprised to find that luceneutil 
> benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some 
> disjunction tasks and doesn't show significant regressions anywhere else.
> Note that I used LUCENE-10634 as a baseline, and built my candidate change on 
> top of that. The diff can be seen here: 
> [DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231]
> A simple conclusion here might be that we shouldn't do two-phase iteration in 
> WANDScorer, but I'm pretty sure that's not right. I wonder if what's really 
> going on is that we're under-estimating the cost of confirming a match? Right 
> now we just return the tail size as the cost. While the cost of confirming a 
> match is proportional to the tail size, the actual work involved can be quite 
> significant (having to advance tail iterators to new blocks and decompress 
> them). I wonder if the WAND second phase is being run too early on 
> approximate candidates, and if less-expensive, (and even possibly more 
> restrictive?), second phases could/should be running first?
> I'm raising this here as more of a curiosity to see if it sparks ideas on how 
> to move forward. Again, I'm not proposing we do away with two-phase 
> iteration, but it seems we might be able to improve things. Maybe I'll 
> explore changing the cost heuristic next. Also, maybe there's some different 
> benchmarking that would be useful here that I may not be familiar with?
> Benchmark results on wikimediumall:
> {code:java}
> TaskQPS baseline  StdDevQPS candidate  
> StdDevPct diff p-value
> HighTermTitleBDVSort   22.52 (18.9%)   21.66 
> (15.6%)   -3.8% ( -32% -   37%) 0.485
>  Prefix39.38  (9.2%)9.09 
> (10.6%)   -3.1% ( -20% -   18%) 0.326
>HighTermMonthSort   25.37 (16.0%)   24.87 
> (17.1%)   -2.0% ( -30% -   37%) 0.710
> MedTermDayTaxoFacets9.62  (4.2%)9.51  
> (4.1%)   -1.2% (  -9% -7%) 0.368
>   TermDTSort   74.69 (18.0%)   74.13 
> (18.2%)   -0.7% ( -31% -   43%) 0.897
>HighTermDayOfYearSort   52.64 (16.1%)   52.32 
> (15.4%)   -0.6% ( -27% -   36%) 0.903
>BrowseMonthTaxoFacets8.64 (19.1%)8.59 
> (19.8%)   -0.6% ( -33% -   47%) 0.926
> BrowseDateSSDVFacets0.86  (9.5%)0.86 
> (13.1%)   -0.4% ( -20% -   24%) 0.914
> PKLookup  147.18  (3.9%)  146.66  
> (3.3%)   -0.3% (  -7% -7%) 0.759
>BrowseDayOfYearSSDVFacets3.47  (4.5%)3.45  
> (4.8%)   -0.3% (  -9% -9%) 0.822
> Wildcard   36.36  (4.4%)   36.26  
> (5.2%)   -0.3% (  -9% -9%) 0.866
>BrowseMonthSSDVFacets4.15 (12.7%)4.13 
> (12.8%)   -0.3% ( -22% -   28%) 0.950
>  AndHighMedDayTaxoFacets   15.21  (2.7%)   15.18  
> (2.9%)   -0.2% (  -5% -5%) 0.819
>   Fuzzy1   68.33  (1.8%)   68.22  
> (2.0%)   -0.2% (  -3% -3%) 

[jira] [Commented] (LUCENE-10639) WANDScorer performs better without two-phase

2022-07-02 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561751#comment-17561751
 ] 

Adrien Grand commented on LUCENE-10639:
---

I suspected there was some overhead to two-phase iteration but not as much as 
this. Two-phase iteration doesn't aim at improving the performance of queries 
on their own, but when combined with other queries through conjunctions: 
conjunctions make sure to reach agreement across approximations before they 
proceed with the match phase. This is the feature that makes Lucene perform 
better than other search libraries on the query `+"the who" +uk` at 
https://tantivy-search.github.io/bench/, because Lucene makes sure that 
documents contain all of "the", "who" and "uk" before it starts checking 
positions. I would also expect two-phase iteration to help on [AndMedOrHighHigh 
on nightly 
benchmarks|https://home.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html]
 since WANDScorer will do less work to return the next candidate on or beyond 
the lead doc ID produced by the "Med" term.

> WANDScorer performs better without two-phase
> 
>
> Key: LUCENE-10639
> URL: https://issues.apache.org/jira/browse/LUCENE-10639
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Major
>
> After looking at the recent improvement [~jpountz] made to WAND scoring in 
> LUCENE-10634, which does additional work during match confirmation to not 
> confirm a match who's score wouldn't be competitive, I wanted to see how 
> performance would shift if we squashed the two-phase iteration completely and 
> only returned true matches (that were also known to be competitive by score) 
> in the "approximation" phase. I was a bit surprised to find that luceneutil 
> benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some 
> disjunction tasks and doesn't show significant regressions anywhere else.
> Note that I used LUCENE-10634 as a baseline, and built my candidate change on 
> top of that. The diff can be seen here: 
> [DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231]
> A simple conclusion here might be that we shouldn't do two-phase iteration in 
> WANDScorer, but I'm pretty sure that's not right. I wonder if what's really 
> going on is that we're under-estimating the cost of confirming a match? Right 
> now we just return the tail size as the cost. While the cost of confirming a 
> match is proportional to the tail size, the actual work involved can be quite 
> significant (having to advance tail iterators to new blocks and decompress 
> them). I wonder if the WAND second phase is being run too early on 
> approximate candidates, and if less-expensive, (and even possibly more 
> restrictive?), second phases could/should be running first?
> I'm raising this here as more of a curiosity to see if it sparks ideas on how 
> to move forward. Again, I'm not proposing we do away with two-phase 
> iteration, but it seems we might be able to improve things. Maybe I'll 
> explore changing the cost heuristic next. Also, maybe there's some different 
> benchmarking that would be useful here that I may not be familiar with?
> Benchmark results on wikimediumall:
> {code:java}
> TaskQPS baseline  StdDevQPS candidate  
> StdDevPct diff p-value
> HighTermTitleBDVSort   22.52 (18.9%)   21.66 
> (15.6%)   -3.8% ( -32% -   37%) 0.485
>  Prefix39.38  (9.2%)9.09 
> (10.6%)   -3.1% ( -20% -   18%) 0.326
>HighTermMonthSort   25.37 (16.0%)   24.87 
> (17.1%)   -2.0% ( -30% -   37%) 0.710
> MedTermDayTaxoFacets9.62  (4.2%)9.51  
> (4.1%)   -1.2% (  -9% -7%) 0.368
>   TermDTSort   74.69 (18.0%)   74.13 
> (18.2%)   -0.7% ( -31% -   43%) 0.897
>HighTermDayOfYearSort   52.64 (16.1%)   52.32 
> (15.4%)   -0.6% ( -27% -   36%) 0.903
>BrowseMonthTaxoFacets8.64 (19.1%)8.59 
> (19.8%)   -0.6% ( -33% -   47%) 0.926
> BrowseDateSSDVFacets0.86  (9.5%)0.86 
> (13.1%)   -0.4% ( -20% -   24%) 0.914
> PKLookup  147.18  (3.9%)  146.66  
> (3.3%)   -0.3% (  -7% -7%) 0.759
>BrowseDayOfYearSSDVFacets3.47  (4.5%)3.45  
> (4.8%)   -0.3% (  -9% -9%) 0.822
> Wildcard   36.36  (4.4%)   36.26  
> (5.2%)   -0.3% (  -9% -9%) 0.866
>BrowseMonthSSDVFacets4.15 (12.7%)4.13 
> 

[jira] [Created] (LUCENE-10639) WANDScorer performs better without two-phase

2022-07-02 Thread Greg Miller (Jira)
Greg Miller created LUCENE-10639:


 Summary: WANDScorer performs better without two-phase
 Key: LUCENE-10639
 URL: https://issues.apache.org/jira/browse/LUCENE-10639
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Greg Miller


After looking at the recent improvement [~jpountz] made to WAND scoring in 
LUCENE-10634, which does additional work during match confirmation to not 
confirm a match who's score wouldn't be competitive, I wanted to see how 
performance would shift if we squashed the two-phase iteration completely and 
only returned true matches (that were also known to be competitive by score) in 
the "approximation" phase. I was a bit surprised to find that luceneutil 
benchmarks (run with {{{}wikimediumall{}}}), improves significantly on some 
disjunction tasks and doesn't show significant regressions anywhere else.

Note that I used LUCENE-10634 as a baseline, and built my candidate change on 
top of that. The diff can be seen here: 
[DIFF|https://github.com/gsmiller/lucene/compare/b2d46440998fe4a972e8cc8c948580111359ed0f..c5bab794c92dbc66e70f9389948c1bdfe9b45231]

A simple conclusion here might be that we shouldn't do two-phase iteration in 
WANDScorer, but I'm pretty sure that's not right. I wonder if what's really 
going on is that we're under-estimating the cost of confirming a match? Right 
now we just return the tail size as the cost. While the cost of confirming a 
match is proportional to the tail size, the actual work involved can be quite 
significant (having to advance tail iterators to new blocks and decompress 
them). I wonder if the WAND second phase is being run too early on approximate 
candidates, and if less-expensive, (and even possibly more restrictive?), 
second phases could/should be running first?

I'm raising this here as more of a curiosity to see if it sparks ideas on how 
to move forward. Again, I'm not proposing we do away with two-phase iteration, 
but it seems we might be able to improve things. Maybe I'll explore changing 
the cost heuristic next. Also, maybe there's some different benchmarking that 
would be useful here that I may not be familiar with?

Benchmark results on wikimediumall:
{code:java}
TaskQPS baseline  StdDevQPS candidate  
StdDevPct diff p-value
HighTermTitleBDVSort   22.52 (18.9%)   21.66 
(15.6%)   -3.8% ( -32% -   37%) 0.485
 Prefix39.38  (9.2%)9.09 
(10.6%)   -3.1% ( -20% -   18%) 0.326
   HighTermMonthSort   25.37 (16.0%)   24.87 
(17.1%)   -2.0% ( -30% -   37%) 0.710
MedTermDayTaxoFacets9.62  (4.2%)9.51  
(4.1%)   -1.2% (  -9% -7%) 0.368
  TermDTSort   74.69 (18.0%)   74.13 
(18.2%)   -0.7% ( -31% -   43%) 0.897
   HighTermDayOfYearSort   52.64 (16.1%)   52.32 
(15.4%)   -0.6% ( -27% -   36%) 0.903
   BrowseMonthTaxoFacets8.64 (19.1%)8.59 
(19.8%)   -0.6% ( -33% -   47%) 0.926
BrowseDateSSDVFacets0.86  (9.5%)0.86 
(13.1%)   -0.4% ( -20% -   24%) 0.914
PKLookup  147.18  (3.9%)  146.66  
(3.3%)   -0.3% (  -7% -7%) 0.759
   BrowseDayOfYearSSDVFacets3.47  (4.5%)3.45  
(4.8%)   -0.3% (  -9% -9%) 0.822
Wildcard   36.36  (4.4%)   36.26  
(5.2%)   -0.3% (  -9% -9%) 0.866
   BrowseMonthSSDVFacets4.15 (12.7%)4.13 
(12.8%)   -0.3% ( -22% -   28%) 0.950
 AndHighMedDayTaxoFacets   15.21  (2.7%)   15.18  
(2.9%)   -0.2% (  -5% -5%) 0.819
  Fuzzy1   68.33  (1.8%)   68.22  
(2.0%)   -0.2% (  -3% -3%) 0.783
  OrHighMedDayTaxoFacets2.90  (4.1%)2.89  
(4.0%)   -0.1% (  -7% -8%) 0.930
   MedPhrase   52.81  (2.3%)   52.76  
(1.8%)   -0.1% (  -4% -4%) 0.878
 Respell   36.80  (1.9%)   36.78  
(1.9%)   -0.1% (  -3% -3%) 0.933
  Fuzzy2   63.06  (1.9%)   63.05  
(2.1%)   -0.0% (  -3% -4%) 0.971
   LowPhrase   74.60  (1.9%)   74.61  
(1.8%)0.0% (  -3% -3%) 0.987
AndHighHighDayTaxoFacets4.54  (2.3%)4.55  
(2.0%)0.0% (  -4% -4%) 0.960
  HighPhrase  353.13  (2.6%)  353.28  
(2.5%)0.0% (  -4% -5%) 0.958
   OrNotHighHigh  761.72  (4.0%)  762.48  
(3.6%)0.1% (  -7% -8%) 0.935
OrHighNotLow 1129.94  (4.1%) 1131.56  

[GitHub] [lucene-jira-archive] mocobeta commented on issue #12: Make a test set for improving markup conversion quality

2022-07-02 Thread GitBox


mocobeta commented on issue #12:
URL: 
https://github.com/apache/lucene-jira-archive/issues/12#issuecomment-1172897263

   Supported block elements in [Jira Text 
Formatting](https://jira.atlassian.com/secure/WikiRendererHelpAction.jspa?section=all)
   
   - `-`: not included
   - `o`: correctly rendered
   - `x`: broken
   
   |Issue|Quote|Bullet list|Numbered list|Noformat/Code|Table|
   |-|-|-|-|-|-|
   
|[LUCENE-10560](https://github.com/mocobeta/sandbox-lucene-10557/issues/10793)|-|-|-|o|-|
   
[LUCENE-10559](https://github.com/mocobeta/sandbox-lucene-10557/issues/10792)|-|-|-|-|x|
   
|[LUCENE-10557](https://github.com/mocobeta/migration-test-2/issues/155)|x|x|x|o|x|
   
|[LUCENE-10544](https://github.com/mocobeta/sandbox-lucene-10557/issues/10777)|x|x|-|o|-|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts

2022-07-02 Thread GitBox


gsmiller commented on code in PR #974:
URL: https://github.com/apache/lucene/pull/974#discussion_r912362078


##
lucene/demo/src/java/org/apache/lucene/demo/facet/DistanceFacetsExample.java:
##
@@ -212,7 +212,26 @@ public static Query getBoundingBoxQuery(
   }
 
   /** User runs a query and counts facets. */
-  public FacetResult search() throws IOException {
+  public FacetResult searchAllChildren() throws IOException {
+
+FacetsCollector fc = searcher.search(new MatchAllDocsQuery(), new 
FacetsCollectorManager());
+
+Facets facets =
+new DoubleRangeFacetCounts(
+"field",
+getDistanceValueSource(),
+fc,
+getBoundingBoxQuery(ORIGIN_LATITUDE, ORIGIN_LONGITUDE, 10.0),
+ONE_KM,
+TWO_KM,
+FIVE_KM,
+TEN_KM);
+
+return facets.getAllChildren("field");
+  }
+
+  /** User runs a query and counts facets. */
+  public FacetResult searchTopChildren() throws IOException {

Review Comment:
   Thanks! I don't think this is quite what I was suggesting, but maybe it's 
OK? Or maybe I haven't had enough coffee yet and am misreading your change? But 
the idea I was proposing would be to faceting on one-hour buckets, not a 
trailing window as you have. So each range would only be one hour "wide", but 
we'd cover an entire week's worth of them. This type of log analysis is 
something you might want to do in the real-world, where you want to understand 
how many errors occurred in each hour over the last week. So you're essentially 
creating a histogram of error counts over a week-long time period (in one-hour 
granularity). And then you're answering the question, "what are the top-5 hours 
that contained the most errors?" Does this make sense?
   
   I think what you have here is a "trailing window" sort of thing that grows 
but the endpoint is always at "now"? I'm not sure this is all that interesting 
of an example since the larger ranges will _always_ contain more errors right? 
Sorry if I'm mis-reading your new change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #2: Archive all Jira attachments

2022-07-02 Thread GitBox


mocobeta commented on issue #2:
URL: 
https://github.com/apache/lucene-jira-archive/issues/2#issuecomment-1172890744

   > This is first time I learned of this cool `dust` command! What a great 
name too.
   
   Yes, it's one of my favorite CLI tools (a bit slower than `du` but fine for 
my usage).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mikemccand commented on issue #2: Archive all Jira attachments

2022-07-02 Thread GitBox


mikemccand commented on issue #2:
URL: 
https://github.com/apache/lucene-jira-archive/issues/2#issuecomment-1172888349

   Thanks @mocobeta!  This is first time I learned of this cool `dust` command! 
 What a great name too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #2: Archive all Jira attachments

2022-07-02 Thread GitBox


mocobeta commented on issue #2:
URL: 
https://github.com/apache/lucene-jira-archive/issues/2#issuecomment-1172884299

   Links that point to the files in the `attachments` branch are also fine.
   
   ![Screenshot from 2022-07-02 
20-39-13](https://user-images.githubusercontent.com/1825333/176999240-b77f9e0f-ebeb-43b2-85d2-93ec2333e5e7.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #2: Archive all Jira attachments

2022-07-02 Thread GitBox


mocobeta commented on issue #2:
URL: 
https://github.com/apache/lucene-jira-archive/issues/2#issuecomment-1172880262

   Instead of `main`, I committed all attachments (the latest snapshot) to 
`attachments` branch.
   The download script looks to work fine. I'm closing this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta closed issue #2: Archive all Jira attachments

2022-07-02 Thread GitBox


mocobeta closed issue #2: Archive all Jira attachments
URL: https://github.com/apache/lucene-jira-archive/issues/2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #12: Make a test set for #1

2022-07-02 Thread GitBox


mocobeta opened a new issue, #12:
URL: https://github.com/apache/lucene-jira-archive/issues/12

   There are too many patterns where markup can be broken. Can't make sure that 
an ad-hoc fix for one problem does not introduce another bug.
   Instead of arbitrary choosing examples, it'd be good to have a carefully 
selected test set that includes various markup combinations. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta commented on issue #11: Selectively create github cross-issue link

2022-07-02 Thread GitBox


mocobeta commented on issue #11:
URL: 
https://github.com/apache/lucene-jira-archive/issues/11#issuecomment-1172875774

   It's awkward but may not be a blocker. At least we need to check if this 
does not break hyperlinks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-jira-archive] mocobeta opened a new issue, #11: Selectively create github cross-issue link

2022-07-02 Thread GitBox


mocobeta opened a new issue, #11:
URL: https://github.com/apache/lucene-jira-archive/issues/11

   Currently, we blindly create cross-issue links - for example, 
"LUCENE-.patch" is incorrectly replaced with "LUCENE- (#YYY).patch". 
   Ideally, only Jira issue keys should be replaced. As an ad-hoc solution, 
maybe check if there are spaces before/after issue keys... (this can make false 
negatives - if an issue key is in a table cell, there may not be spaces but `|` 
before/after the issue key.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org