[jira] [Commented] (LUCENE-8978) "Max Bottom" Based Early Termination For Concurrent Search
[ https://issues.apache.org/jira/browse/LUCENE-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929280#comment-16929280 ] Atri Sharma commented on LUCENE-8978: - Both the runs are for wikimedium2m with concurrent searches enabled > "Max Bottom" Based Early Termination For Concurrent Search > -- > > Key: LUCENE-8978 > URL: https://issues.apache.org/jira/browse/LUCENE-8978 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > When running a search concurrently, collectors which have collected the > number of hits requested locally i.e. their local priority queue is full can > then globally publish their bottom hit's score, and other collectors can then > use that score as the filter. If multiple collectors have full priority > queues, the maximum of all bottom scores will be considered as the global > bottom score. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8978) "Max Bottom" Based Early Termination For Concurrent Search
[ https://issues.apache.org/jira/browse/LUCENE-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929277#comment-16929277 ] Atri Sharma commented on LUCENE-8978: - Run with propagating global minimum scores ||Task ('HighSpanNear', None)||P50 Base 63.640386||P50 Cmp 65.506369||Pct Diff 2.93207366781||P90 Base 68.082931||P90 Cmp 68.719427||Pct Diff 0.9348833704||P99 Base 98.544661||P99 Cmp 90.023821||Pct Diff -8.64667848418||P999 Base 98.544661||P999 Cmp 90.023821||Pct Diff -8.64667848418||P100 Base 120.85372||P100 Cmp 115.88214||Pct Diff -4.1137169795 ||Task ('BrowseDayOfYearSSDVFacets', None)||P50 Base 25.833619||P50 Cmp 25.713409||Pct Diff -0.465323886677||P90 Base 28.549801||P90 Cmp 35.187339||Pct Diff 23.2489816654||P99 Base 34.097888||P99 Cmp 61.883127||Pct Diff 81.4866862135||P999 Base 34.097888||P999 Cmp 61.883127||Pct Diff 81.4866862135||P100 Base 214.305793||P100 Cmp 275.876451||Pct Diff 28.7302816868 ||Task ('HighTermDayOfYearSort', 'DayOfYear')||P50 Base 4.600415||P50 Cmp 5.241538||Pct Diff 13.9361992342||P90 Base 54.632331||P90 Cmp 41.589045||Pct Diff -23.8746649855||P99 Base 140.777103||P99 Cmp 113.980705||Pct Diff -19.0346280957||P999 Base 140.777103||P999 Cmp 113.980705||Pct Diff -19.0346280957||P100 Base 212.259622||P100 Cmp 232.746881||Pct Diff 9.65198128922 ||Task ('HighTerm', None)||P50 Base 0.707935||P50 Cmp 0.767744||Pct Diff 8.44837449766||P90 Base 2.481444||P90 Cmp 2.45366||Pct Diff -1.11967064338||P99 Base 2.819463||P99 Cmp 3.250364||Pct Diff 15.283087595||P999 Base 2.819463||P999 Cmp 3.250364||Pct Diff 15.283087595||P100 Base 5.743958||P100 Cmp 67.726682||Pct Diff 1079.09431093 ||Task ('LowTerm', None)||P50 Base 0.662316||P50 Cmp 0.730491||Pct Diff 10.2934248908||P90 Base 1.215188||P90 Cmp 3.100794||Pct Diff 155.169899637||P99 Base 10.361147||P99 Cmp 8.509808||Pct Diff -17.8680893148||P999 Base 10.361147||P999 Cmp 8.509808||Pct Diff -17.8680893148||P100 Base 40.860202||P100 Cmp 43.746191||Pct Diff 7.06308059857 ||Task ('AndHighLow', None)||P50 Base 1.000578||P50 Cmp 1.001309||Pct Diff 0.0730577726074||P90 Base 1.841719||P90 Cmp 1.74311||Pct Diff -5.35418269562||P99 Base 2.803872||P99 Cmp 7.829637||Pct Diff 179.243738659||P999 Base 2.803872||P999 Cmp 7.829637||Pct Diff 179.243738659||P100 Base 8.888941||P100 Cmp 26.286796||Pct Diff 195.724721314 ||Task ('MedTerm', None)||P50 Base 0.702324||P50 Cmp 0.760572||Pct Diff 8.29360807832||P90 Base 1.789433||P90 Cmp 5.539351||Pct Diff 209.559005562||P99 Base 4.193817||P99 Cmp 14.309771||Pct Diff 241.211144883||P999 Base 4.193817||P999 Cmp 14.309771||Pct Diff 241.211144883||P100 Base 12.924386||P100 Cmp 69.040778||Pct Diff 434.190003301 ||Task ('AndHighHigh', None)||P50 Base 8.716311||P50 Cmp 8.766923||Pct Diff 0.580658491878||P90 Base 22.896812||P90 Cmp 14.794421||Pct Diff -35.3865463891||P99 Base 76.380162||P99 Cmp 27.420985||Pct Diff -64.0993364219||P999 Base 76.380162||P999 Cmp 27.420985||Pct Diff -64.0993364219||P100 Base 192.565741||P100 Cmp 209.282678||Pct Diff 8.68115839982 ||Task ('LowSloppyPhrase', None)||P50 Base 2.504543||P50 Cmp 2.496497||Pct Diff -0.321256213209||P90 Base 5.864326||P90 Cmp 17.025432||Pct Diff 190.322059176||P99 Base 17.061955||P99 Cmp 26.972014||Pct Diff 58.0827871132||P999 Base 17.061955||P999 Cmp 26.972014||Pct Diff 58.0827871132||P100 Base 28.311233||P100 Cmp 38.382978||Pct Diff 35.5750842784 ||Task ('Wildcard', None)||P50 Base 4.622608||P50 Cmp 4.604615||Pct Diff -0.389239148117||P90 Base 13.902747||P90 Cmp 9.311908||Pct Diff -33.0210928819||P99 Base 212.077852||P99 Cmp 217.640103||Pct Diff 2.62274016242||P999 Base 212.077852||P999 Cmp 217.640103||Pct Diff 2.62274016242||P100 Base 256.120499||P100 Cmp 348.976972||Pct Diff 36.254994568 ||Task ('HighSloppyPhrase', None)||P50 Base 40.021589||P50 Cmp 40.71495||Pct Diff 1.73246744401||P90 Base 41.349646||P90 Cmp 42.092274||Pct Diff 1.7959718446||P99 Base 43.137416||P99 Cmp 63.876883||Pct Diff 48.0776757699||P999 Base 43.137416||P999 Cmp 63.876883||Pct Diff 48.0776757699||P100 Base 889.481117||P100 Cmp 748.568262||Pct Diff -15.8421412559 ||Task ('HighIntervalsOrdered', None)||P50 Base 17.065112||P50 Cmp 17.259941||Pct Diff 1.1416801718||P90 Base 18.188702||P90 Cmp 18.965857||Pct Diff 4.27273479988||P99 Base 18.315874||P99 Cmp 50.189647||Pct Diff 174.022670171||P999 Base 18.315874||P999 Cmp 50.189647||Pct Diff 174.022670171||P100 Base 302.418464||P100 Cmp 329.078973||Pct Diff 8.81576761133 ||Task ('IntNRQ', None)||P50 Base 4.603492||P50 Cmp 5.553211||Pct Diff 20.6304040498||P90 Base 61.351885||P90 Cmp 61.48353||Pct Diff 0.214573684248||P99 Base 164.30294||P99 Cmp 163.250118||Pct Diff -0.640780986634||P999 Base 164.30294||P999 Cmp 163.250118||Pct Diff -0.640780986634||P100 Base 224.633428||P100 Cmp 224.348545||Pct Diff -0.126821285032 ||Task ('BrowseDayOfYearTaxoFacets', None)||P50 Base 0.121258||P50 Cmp 0.121229||Pct Dif
[jira] [Commented] (LUCENE-8978) "Max Bottom" Based Early Termination For Concurrent Search
[ https://issues.apache.org/jira/browse/LUCENE-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929030#comment-16929030 ] Atri Sharma commented on LUCENE-8978: - ||Task ('HighSpanNear', None)||P50 Base 11.060489||P50 Cmp 11.859525||Pct Diff 7.22423755405||P90 Base 15.826127||P90 Cmp 15.409751||Pct Diff -2.63094059589||P99 Base 17.0499||P99 Cmp 15.787728||Pct Diff -7.4028117467||P999 Base 17.0499||P999 Cmp 15.787728||Pct Diff -7.4028117467||P100 Base 369.613225||P100 Cmp 411.489965||Pct Diff 11.3298813916 ||Task ('BrowseDayOfYearSSDVFacets', None)||P50 Base 26.011344||P50 Cmp 25.870156||Pct Diff -0.542793944058||P90 Base 27.199846||P90 Cmp 26.948776||Pct Diff -0.923056696718||P99 Base 50.355332||P99 Cmp 62.389047||Pct Diff 23.8975983715||P999 Base 50.355332||P999 Cmp 62.389047||Pct Diff 23.8975983715||P100 Base 265.301527||P100 Cmp 242.147844||Pct Diff -8.72730860686 ||Task ('HighTermDayOfYearSort', 'DayOfYear')||P50 Base 4.855392||P50 Cmp 5.073211||Pct Diff 4.48612593999||P90 Base 91.615585||P90 Cmp 90.944365||Pct Diff -0.73264827158||P99 Base 139.177491||P99 Cmp 134.249562||Pct Diff -3.54075142797||P999 Base 139.177491||P999 Cmp 134.249562||Pct Diff -3.54075142797||P100 Base 413.078905||P100 Cmp 399.62664||Pct Diff -3.25658484061 ||Task ('IntNRQ', None)||P50 Base 4.003539||P50 Cmp 4.117275||Pct Diff 2.84088652565||P90 Base 68.282386||P90 Cmp 67.613176||Pct Diff -0.980062413168||P99 Base 168.038952||P99 Cmp 162.14838||Pct Diff -3.50548008655||P999 Base 168.038952||P999 Cmp 162.14838||Pct Diff -3.50548008655||P100 Base 183.270534||P100 Cmp 180.209181||Pct Diff -1.67040109132 ||Task ('LowTerm', None)||P50 Base 0.736588||P50 Cmp 0.802246||Pct Diff 8.91380255991||P90 Base 1.433158||P90 Cmp 9.655967||Pct Diff 573.754533694||P99 Base 9.67953||P99 Cmp 41.953847||Pct Diff 333.428554899||P999 Base 9.67953||P999 Cmp 41.953847||Pct Diff 333.428554899||P100 Base 57.585597||P100 Cmp 212.693297||Pct Diff 269.351553306 ||Task ('AndHighLow', None)||P50 Base 1.54347||P50 Cmp 1.634274||Pct Diff 5.88310754339||P90 Base 2.434604||P90 Cmp 3.283687||Pct Diff 34.8756101608||P99 Base 3.374315||P99 Cmp 10.557446||Pct Diff 212.8767172||P999 Base 3.374315||P999 Cmp 10.557446||Pct Diff 212.8767172||P100 Base 41.81324||P100 Cmp 50.963314||Pct Diff 21.8831977622 ||Task ('MedTerm', None)||P50 Base 0.89585||P50 Cmp 0.944529||Pct Diff 5.43383378914||P90 Base 1.404803||P90 Cmp 1.912129||Pct Diff 36.1136757254||P99 Base 1.721718||P99 Cmp 2.879041||Pct Diff 67.2190800119||P999 Base 1.721718||P999 Cmp 2.879041||Pct Diff 67.2190800119||P100 Base 57.913331||P100 Cmp 6.156178||Pct Diff -89.3700156878 ||Task ('AndHighHigh', None)||P50 Base 9.298414||P50 Cmp 9.193083||Pct Diff -1.13278458025||P90 Base 17.43996||P90 Cmp 28.767063||Pct Diff 64.9491340576||P99 Base 29.387967||P99 Cmp 36.807631||Pct Diff 25.2472857343||P999 Base 29.387967||P999 Cmp 36.807631||Pct Diff 25.2472857343||P100 Base 109.854089||P100 Cmp 107.673127||Pct Diff -1.98532619027 ||Task ('LowSloppyPhrase', None)||P50 Base 5.680762||P50 Cmp 5.562709||Pct Diff -2.0781190974||P90 Base 10.573096||P90 Cmp 8.783411||Pct Diff -16.9267828458||P99 Base 11.119536||P99 Cmp 10.675304||Pct Diff -3.99505878663||P999 Base 11.119536||P999 Cmp 10.675304||Pct Diff -3.99505878663||P100 Base 279.186923||P100 Cmp 253.176147||Pct Diff -9.3166168818 ||Task ('Wildcard', None)||P50 Base 5.493537||P50 Cmp 5.347662||Pct Diff -2.65539305551||P90 Base 251.824224||P90 Cmp 242.036414||Pct Diff -3.88676269682||P99 Base 410.472925||P99 Cmp 411.681977||Pct Diff 0.294550974343||P999 Base 410.472925||P999 Cmp 411.681977||Pct Diff 0.294550974343||P100 Base 473.53058||P100 Cmp 467.82275||Pct Diff -1.20537727468 ||Task ('HighSloppyPhrase', None)||P50 Base 11.728682||P50 Cmp 11.905609||Pct Diff 1.50849856787||P90 Base 78.56345||P90 Cmp 23.156508||Pct Diff -70.5250876839||P99 Base 165.526231||P99 Cmp 24.095868||Pct Diff -85.4428703811||P999 Base 165.526231||P999 Cmp 24.095868||Pct Diff -85.4428703811||P100 Base 239.459867||P100 Cmp 154.765063||Pct Diff -35.369101746 ||Task ('HighIntervalsOrdered', None)||P50 Base 18.723819||P50 Cmp 19.239293||Pct Diff 2.75303878979||P90 Base 20.32576||P90 Cmp 20.59||Pct Diff 2.22377416638||P99 Base 21.323183||P99 Cmp 21.997505||Pct Diff 3.16238902982||P999 Base 21.323183||P999 Cmp 21.997505||Pct Diff 3.16238902982||P100 Base 365.748746||P100 Cmp 306.958046||Pct Diff -16.0740674146 ||Task ('HighTerm', None)||P50 Base 0.982074||P50 Cmp 1.08638||Pct Diff 10.6209919008||P90 Base 1.859062||P90 Cmp 4.64411||Pct Diff 149.809312438||P99 Base 2.090176||P99 Cmp 25.399617||Pct Diff 1115.19034761||P999 Base 2.090176||P999 Cmp 25.399617||Pct Diff 1115.19034761||P100 Base 4.26937||P100 Cmp 54.324505||Pct Diff 1172.4243858 ||Task ('BrowseDayOfYearTaxoFacets', None)||P50 Base 0.111432||P50 Cmp 0.116611||Pct Diff 4.64767750736||P90 Base 0.177541||P90 Cm
[jira] [Created] (LUCENE-8978) "Max Bottom" Based Early Termination For Concurrent Search
Atri Sharma created LUCENE-8978: --- Summary: "Max Bottom" Based Early Termination For Concurrent Search Key: LUCENE-8978 URL: https://issues.apache.org/jira/browse/LUCENE-8978 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma When running a search concurrently, collectors which have collected the number of hits requested locally i.e. their local priority queue is full can then globally publish their bottom hit's score, and other collectors can then use that score as the filter. If multiple collectors have full priority queues, the maximum of all bottom scores will be considered as the global bottom score. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7282) search APIs should take advantage of index sort by default
[ https://issues.apache.org/jira/browse/LUCENE-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926845#comment-16926845 ] Atri Sharma commented on LUCENE-7282: - I think LUCENE-7714 does a similar thing for range queries. However, I don’t think we do this optimisation for exact queries yet (I might be mistaken though, [~jtibshirani] any thoughts here? > search APIs should take advantage of index sort by default > -- > > Key: LUCENE-7282 > URL: https://issues.apache.org/jira/browse/LUCENE-7282 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > > Spinoff from LUCENE-6766, where we made it very easy to have Lucene sort > documents in the index (at merge time). > An index-time sort is powerful because if you then search that index by the > same sort (or by a "prefix" of it), you can early-terminate per segment once > you've collected enough hits. But doing this by default would mean accepting > an approximate hit count, and could not be used in cases that need to see > every hit, e.g. if you are also faceting. > Separately, `TermQuery` on the leading sort field can be very fast since we > can advance to the first docID, and only match to the last docID for the > requested value. This would not be approximate, and should be lower risk / > easier. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8974) Shared Bottom Score Based Early Termination For Concurrent Search
Atri Sharma created LUCENE-8974: --- Summary: Shared Bottom Score Based Early Termination For Concurrent Search Key: LUCENE-8974 URL: https://issues.apache.org/jira/browse/LUCENE-8974 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma Following up to LUCENE-8939, post collection of numHits, we should share a bottom score which can be used to globally filter hits and choose competitive hits -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8970) TopFieldCollector(s) Should Prepopulate Sentinel Objects
[ https://issues.apache.org/jira/browse/LUCENE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926480#comment-16926480 ] Atri Sharma commented on LUCENE-8970: - I did a prototype of this –- it is a bit hairy since, unlike TopDocsCollector, TopFieldComparator does not directly perform comparisons against the bottom but instead uses FieldComparator to do the job. The problem is that FieldComparatorcould maintain its internal queue, which needs to be accordingly set with sentinel values if the queue is prepopulated. This works well with straight implementations, but for comparators like RelevanceComparator, which do not use the passed in slot but instead depend on the presence of the scorer instance to generate the doc to be placed, this can be an issue. I wonder if it is worth exposing a prePopulate API in FieldComparator which does what it advertises – allows prepopulating the internal structure used for maintaining docID mappings. > TopFieldCollector(s) Should Prepopulate Sentinel Objects > > > Key: LUCENE-8970 > URL: https://issues.apache.org/jira/browse/LUCENE-8970 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > We do not repopulate the hit queue with sentinel values today, thus leading > to extra checks and extra code. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8970) TopFieldCollector(s) Should Prepopulate Sentinel Objects
Atri Sharma created LUCENE-8970: --- Summary: TopFieldCollector(s) Should Prepopulate Sentinel Objects Key: LUCENE-8970 URL: https://issues.apache.org/jira/browse/LUCENE-8970 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma We do not repopulate the hit queue with sentinel values today, thus leading to extra checks and extra code. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8963) Allow Collectors To "Publish" If They Can Be Used In Concurrent Search
[ https://issues.apache.org/jira/browse/LUCENE-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922322#comment-16922322 ] Atri Sharma commented on LUCENE-8963: - Yeah, I agree. My only gripe is that in case a collector is not really reducible or has some semantic constraints against concurrency, we do not provide any defense against getting into an unknown state. Maybe it is not an engine problem but more of a user issue – but I wanted to raise this point and see if we have any thoughts about this. > Allow Collectors To "Publish" If They Can Be Used In Concurrent Search > -- > > Key: LUCENE-8963 > URL: https://issues.apache.org/jira/browse/LUCENE-8963 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > There is an implied assumption today that all we need to run a query > concurrently is a CollectorManager implementation. While that is true, there > might be some corner cases where a Collector's semantics do not allow it to > be concurrently executed (think of ES's aggregates). If a user manages to > write a CollectorManager with a Collector that is not really concurrent > friendly, we could end up in an undefined state. > > This Jira is more of a rhetorical discussion, and to explore if we should > allow Collectors to implement an API which simply returns a boolean > signifying if a Collector is parallel ready or not. The default would be > true, until a Collector explicitly overrides it? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8963) Allow Collectors To "Publish" If They Can Be Used In Concurrent Search
Atri Sharma created LUCENE-8963: --- Summary: Allow Collectors To "Publish" If They Can Be Used In Concurrent Search Key: LUCENE-8963 URL: https://issues.apache.org/jira/browse/LUCENE-8963 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma There is an implied assumption today that all we need to run a query concurrently is a CollectorManager implementation. While that is true, there might be some corner cases where a Collector's semantics do not allow it to be concurrently executed (think of ES's aggregates). If a user manages to write a CollectorManager with a Collector that is not really concurrent friendly, we could end up in an undefined state. This Jira is more of a rhetorical discussion, and to explore if we should allow Collectors to implement an API which simply returns a boolean signifying if a Collector is parallel ready or not. The default would be true, until a Collector explicitly overrides it? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8403) Support 'filtered' term vectors - don't require all terms to be present
[ https://issues.apache.org/jira/browse/LUCENE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918358#comment-16918358 ] Atri Sharma commented on LUCENE-8403: - David, sorry for the delay in response – this somehow was misplaced by my inbox. I get a NullPointerException when CheckIndex tries to validate term vectors. I understand the approaches – your approach seems to be a longer term solution (I am not sure of the complexity implications though). How do you suggest we approach this? > Support 'filtered' term vectors - don't require all terms to be present > --- > > Key: LUCENE-8403 > URL: https://issues.apache.org/jira/browse/LUCENE-8403 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Braun >Priority: Minor > Attachments: LUCENE-8403.patch > > > The genesis of this was a conversation and idea from [~dsmiley] several years > ago. > In order to optimize term vector storage, we may not actually need all tokens > to be present in the term vectors - and if so, ideally our codec could just > opt not to store them. > I attempted to fork the standard codec and override the TermVectorsFormat and > TermVectorsWriter to ignore storing certain Terms within a field. This > worked, however, CheckIndex checks that the terms present in the standard > postings are also present in the TVs, if TVs enabled. So this then doesn't > work as 'valid' according to CheckIndex. > Can the TermVectorsFormat be made in such a way to support configuration of > tokens that should not be stored (benefits: less storage, more optimal > retrieval per doc)? Is this valuable to the wider community? Is there a way > we can design this to not break CheckIndex's contract while at the same time > lessening storage for unneeded tokens? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8958) Add Shared Count Based Concurrent Early Termination For TopScoreDocCollector
Atri Sharma created LUCENE-8958: --- Summary: Add Shared Count Based Concurrent Early Termination For TopScoreDocCollector Key: LUCENE-8958 URL: https://issues.apache.org/jira/browse/LUCENE-8958 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma LUCENE-8939 implements a shared count early termination collector manager for indices sorted by non relevance fields. This Jira tracks efforts for implementing the same for TopScoreDocCollector when the index is sorted by relevance -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8403) Support 'filtered' term vectors - don't require all terms to be present
[ https://issues.apache.org/jira/browse/LUCENE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916404#comment-16916404 ] Atri Sharma commented on LUCENE-8403: - Thanks for reviewing, David. I did notice a CheckHits breakage on this patch – I was hoping to get some early feedback on the patch and then seek advice to solve the open problems. Does it make sense for me to adapt the patch to support pattern based filtering? RE: CheckHits fix, how about Hoss's idea to allow the TermVector codec to publish which terms are available? > Support 'filtered' term vectors - don't require all terms to be present > --- > > Key: LUCENE-8403 > URL: https://issues.apache.org/jira/browse/LUCENE-8403 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Braun >Priority: Minor > Attachments: LUCENE-8403.patch > > > The genesis of this was a conversation and idea from [~dsmiley] several years > ago. > In order to optimize term vector storage, we may not actually need all tokens > to be present in the term vectors - and if so, ideally our codec could just > opt not to store them. > I attempted to fork the standard codec and override the TermVectorsFormat and > TermVectorsWriter to ignore storing certain Terms within a field. This > worked, however, CheckIndex checks that the terms present in the standard > postings are also present in the TVs, if TVs enabled. So this then doesn't > work as 'valid' according to CheckIndex. > Can the TermVectorsFormat be made in such a way to support configuration of > tokens that should not be stored (benefits: less storage, more optimal > retrieval per doc)? Is this valuable to the wider community? Is there a way > we can design this to not break CheckIndex's contract while at the same time > lessening storage for unneeded tokens? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8403) Support 'filtered' term vectors - don't require all terms to be present
[ https://issues.apache.org/jira/browse/LUCENE-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915472#comment-16915472 ] Atri Sharma commented on LUCENE-8403: - Any thoughts on this one? > Support 'filtered' term vectors - don't require all terms to be present > --- > > Key: LUCENE-8403 > URL: https://issues.apache.org/jira/browse/LUCENE-8403 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Braun >Priority: Minor > Attachments: LUCENE-8403.patch > > > The genesis of this was a conversation and idea from [~dsmiley] several years > ago. > In order to optimize term vector storage, we may not actually need all tokens > to be present in the term vectors - and if so, ideally our codec could just > opt not to store them. > I attempted to fork the standard codec and override the TermVectorsFormat and > TermVectorsWriter to ignore storing certain Terms within a field. This > worked, however, CheckIndex checks that the terms present in the standard > postings are also present in the TVs, if TVs enabled. So this then doesn't > work as 'valid' according to CheckIndex. > Can the TermVectorsFormat be made in such a way to support configuration of > tokens that should not be stored (benefits: less storage, more optimal > retrieval per doc)? Is this valuable to the wider community? Is there a way > we can design this to not break CheckIndex's contract while at the same time > lessening storage for unneeded tokens? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8950) FieldComparators Should Not Maintain Implicit PQs
[ https://issues.apache.org/jira/browse/LUCENE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907093#comment-16907093 ] Atri Sharma commented on LUCENE-8950: - {quote}you would like to introduce a sub class of FieldComparator that hides the fact that it maintains an implicit PQ, and make simple comparators extend this sub class instead of FieldComparator directly? {quote} Yes, exactly. Thanks for validating – I will work on a PR now. > FieldComparators Should Not Maintain Implicit PQs > - > > Key: LUCENE-8950 > URL: https://issues.apache.org/jira/browse/LUCENE-8950 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > While doing some perf tests, I realised that FieldComparators inherently > maintain implicit priority queues for maintaining the sorted order of > documents for the given sort order. This is wasteful especially in the case > of a multi feature sort order and a large number of hits requested. > > We should change this to have FieldComparators maintain only the top and > bottom values, and use them as barriers to compare -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8950) FieldComparators Should Not Maintain Implicit PQs
[ https://issues.apache.org/jira/browse/LUCENE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907075#comment-16907075 ] Atri Sharma commented on LUCENE-8950: - {quote}This looks like a duplicate of LUCENE-8878? {quote} Not necessarily – 8878 targets refactoring the API to be simpler, whereas this Jira only targets removing the necessary condition that FieldComparators maintain their own priority queues. I believe this Jira compliments 8878. {quote}I think all of us agree on the fact that it would be nice to have a simpler FieldComparator API. The challenge is that we don't want to trade too much efficiency. For instance the API you are proposing wouldn't work well with geo-distance sorting since it would require computing the actual distance for every new document, while the current implementation tries to be smart to first check a bounding box, and then compute a sort key that compares like the actual distance but is much cheaper to compute {quote} Agreed, that is precisely why I suggested deprecating compare (slot, slot) instead of removing it completely. The idea is that comparators that require access to an internal PQ for whatever reasons are free to do so, but it should not be mandatory, and future comparators should not take on this dependency without understanding the tradeoffs > FieldComparators Should Not Maintain Implicit PQs > - > > Key: LUCENE-8950 > URL: https://issues.apache.org/jira/browse/LUCENE-8950 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > While doing some perf tests, I realised that FieldComparators inherently > maintain implicit priority queues for maintaining the sorted order of > documents for the given sort order. This is wasteful especially in the case > of a multi feature sort order and a large number of hits requested. > > We should change this to have FieldComparators maintain only the top and > bottom values, and use them as barriers to compare -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8950) FieldComparators Should Not Maintain Implicit PQs
[ https://issues.apache.org/jira/browse/LUCENE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906978#comment-16906978 ] Atri Sharma commented on LUCENE-8950: - I confess I do not have a very clean idea as to how this can be implemented: the typical usages of FieldComparator mandate that the user maintain a list of slots into the FieldComparator, which can implicitly be as bad in terms of size as the queue itself. FieldComparator provides a convenient API to allow comparisons between two values of the type maintained in the queue, which can form the basis of this observation. Here is the first cut of proposal that I have in mind: 1) Deprecate compare(slot, slot) so that new implementations do not depend on this method, but rather use compare(T val, T val). 2) Start with some comparators (Numeric comparators?), get rid of the implicit priority queue and make the user maintain those values. 3) Make Numeric comparators track only the top and bottom values, as needed. Note that I am treating NumericComparators as the starting point/example, but the approach should extend for other comparators as well. With [https://github.com/apache/lucene-solr/pull/831,] getting values out of leaf comparators should be easy, so the logical step after this PR is to depend on compare (val, val) more than we rely on compare (slot, slot). Happy to receive feedback and alternate proposals > FieldComparators Should Not Maintain Implicit PQs > - > > Key: LUCENE-8950 > URL: https://issues.apache.org/jira/browse/LUCENE-8950 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > While doing some perf tests, I realised that FieldComparators inherently > maintain implicit priority queues for maintaining the sorted order of > documents for the given sort order. This is wasteful especially in the case > of a multi feature sort order and a large number of hits requested. > > We should change this to have FieldComparators maintain only the top and > bottom values, and use them as barriers to compare -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8950) FieldComparators Should Not Maintain Implicit PQs
Atri Sharma created LUCENE-8950: --- Summary: FieldComparators Should Not Maintain Implicit PQs Key: LUCENE-8950 URL: https://issues.apache.org/jira/browse/LUCENE-8950 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma While doing some perf tests, I realised that FieldComparators inherently maintain implicit priority queues for maintaining the sorted order of documents for the given sort order. This is wasteful especially in the case of a multi feature sort order and a large number of hits requested. We should change this to have FieldComparators maintain only the top and bottom values, and use them as barriers to compare -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8949) Allow LeafFieldComparators to publish feature values
Atri Sharma created LUCENE-8949: --- Summary: Allow LeafFieldComparators to publish feature values Key: LUCENE-8949 URL: https://issues.apache.org/jira/browse/LUCENE-8949 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma We allow LeafFieldComparators to only accept a docID, get the equivalent feature value(s) and compare against the bottom/top of the values set for the comparator. This mandates that the values being compared against the bottom/top should originate from the same comparator. This does not allow use cases such as cross comparator value comparisons i.e. if a user wanted to compute the "global" minimum across multiple comparators. FieldComparators expose an API to get the feature value corresponding to a docID. We should let LeafFieldComparators do the same. A new comparison method is not required since the parent FieldComparator's compare method can be used once the values are retrieved. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8213) Cache costly subqueries asynchronously
[ https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900160#comment-16900160 ] Atri Sharma commented on LUCENE-8213: - I raised a PR for the same. The performance number from newly enhanced luceneutil for wikimedium10m are: Latencies: |Task ('Wildcard', None)||P50 Base 2.045201||P50 Cmp 2.089539||Pct Diff 2.16790427933||P90 Base 18.845334||P90 Cmp 35.346911||Pct Diff 87.5631973411||P99 Base 83.02344||P99 Cmp 48.300884||Pct Diff -41.8225937157||P999 Base 83.02344||P999 Cmp 48.300884||Pct Diff -41.8225937157||P100 Base 249.902876||P100 Cmp 87.512667||Pct Diff -64.9813285862|| ||Task ('HighTermDayOfYearSort', 'DayOfYear')||P50 Base 4.295828||P50 Cmp 4.727759||Pct Diff 10.0546623375||P90 Base 9.037488||P90 Cmp 55.639159||Pct Diff 515.648496573||P99 Base 82.149576||P99 Cmp 81.261365||Pct Diff -1.08121191033||P999 Base 82.149576||P999 Cmp 81.261365||Pct Diff -1.08121191033||P100 Base 86.642014||P100 Cmp 168.84768||Pct Diff 94.8796804285|| ||Task ('MedSloppyPhrase', None)||P50 Base 9.18549||P50 Cmp 8.683321||Pct Diff -5.46698107559||P90 Base 29.233836||P90 Cmp 30.984274||Pct Diff 5.98771232075||P99 Base 34.303039||P99 Cmp 35.978633||Pct Diff 4.88468091705||P999 Base 34.303039||P999 Cmp 35.978633||Pct Diff 4.88468091705||P100 Base 181.426025||P100 Cmp 261.742214||Pct Diff 44.2693869306|| ||Task ('OrHighHigh', None)||P50 Base 20.997779||P50 Cmp 16.938239||Pct Diff -19.3331875719||P90 Base 26.989668||P90 Cmp 29.711731||Pct Diff 10.0855742279||P99 Base 71.1345||P99 Cmp 72.914457||Pct Diff 2.50224152837||P999 Base 71.1345||P999 Cmp 72.914457||Pct Diff 2.50224152837||P100 Base 288.85441||P100 Cmp 203.02949||Pct Diff -29.7121723016|| ||Task ('MedPhrase', None)||P50 Base 6.935508||P50 Cmp 6.676061||Pct Diff -3.74085070625||P90 Base 8.834132||P90 Cmp 7.366097||Pct Diff -16.6177616545||P99 Base 61.645788||P99 Cmp 59.423887||Pct Diff -3.60430302229||P999 Base 61.645788||P999 Cmp 59.423887||Pct Diff -3.60430302229||P100 Base 65.592528||P100 Cmp 63.493249||Pct Diff -3.20048496987|| ||Task ('LowSpanNear', None)||P50 Base 23.256239||P50 Cmp 23.17936||Pct Diff -0.330573658105||P90 Base 33.890598||P90 Cmp 34.205568||Pct Diff 0.929372801271||P99 Base 34.958863||P99 Cmp 34.857876||Pct Diff -0.288873811485||P999 Base 34.958863||P999 Cmp 34.857876||Pct Diff -0.288873811485||P100 Base 96.937787||P100 Cmp 121.889403||Pct Diff 25.7398242442|| ||Task ('Fuzzy2', None)||P50 Base 25.45292||P50 Cmp 25.25128||Pct Diff -0.792207730979||P90 Base 79.376572||P90 Cmp 106.649481||Pct Diff 34.3588899254||P99 Base 108.933154||P99 Cmp 122.051216||Pct Diff 12.0423044026||P999 Base 108.933154||P999 Cmp 122.051216||Pct Diff 12.0423044026||P100 Base 212.373308||P100 Cmp 209.138442||Pct Diff -1.52319800942|| ||Task ('OrNotHighHigh', None)||P50 Base 1.903331||P50 Cmp 2.16024||Pct Diff 13.4978624317||P90 Base 4.890325||P90 Cmp 4.723459||Pct Diff -3.4121658581||P99 Base 102.556452||P99 Cmp 102.641448||Pct Diff 0.0828772820651||P999 Base 102.556452||P999 Cmp 102.641448||Pct Diff 0.0828772820651||P100 Base 226.783706||P100 Cmp 308.709148||Pct Diff 36.1249242483|| ||Task ('OrHighNotLow', None)||P50 Base 1.434646||P50 Cmp 1.52378||Pct Diff 6.21296124619||P90 Base 3.905319||P90 Cmp 4.569729||Pct Diff 17.0129507986||P99 Base 6.321682||P99 Cmp 7.281513||Pct Diff 15.1831585328||P999 Base 6.321682||P999 Cmp 7.281513||Pct Diff 15.1831585328||P100 Base 7.720665||P100 Cmp 15.035781||Pct Diff 94.7472270847|| ||Task ('BrowseMonthSSDVFacets', None)||P50 Base 93.940495||P50 Cmp 93.939183||Pct Diff -0.00139662879145||P90 Base 102.50354||P90 Cmp 98.604983||Pct Diff -3.80333888956||P99 Base 103.572854||P99 Cmp 106.785928||Pct Diff 3.10223564951||P999 Base 103.572854||P999 Cmp 106.785928||Pct Diff 3.10223564951||P100 Base 283.457123||P100 Cmp 244.054099||Pct Diff -13.9008762888|| ||Task ('Fuzzy1', None)||P50 Base 26.559456||P50 Cmp 29.050383||Pct Diff 9.37868230434||P90 Base 159.424881||P90 Cmp 171.063113||Pct Diff 7.30013529068||P99 Base 339.7673||P99 Cmp 179.733118||Pct Diff -47.1011136151||P999 Base 339.7673||P999 Cmp 179.733118||Pct Diff -47.1011136151||P100 Base 417.349072||P100 Cmp 395.168736||Pct Diff -5.31457657105|| ||Task ('HighSloppyPhrase', None)||P50 Base 9.489382||P50 Cmp 9.980939||Pct Diff 5.18007389733||P90 Base 14.424659||P90 Cmp 15.315198||Pct Diff 6.17372653315||P99 Base 37.046395||P99 Cmp 31.348423||Pct Diff -15.380638251||P999 Base 37.046395||P999 Cmp 31.348423||Pct Diff -15.380638251||P100 Base 51.797966||P100 Cmp 33.660774||Pct Diff -35.0152590934|| ||Task ('OrNotHighMed', None)||P50 Base 1.605631||P50 Cmp 1.549948||Pct Diff -3.46798236955||P90 Base 16.030506||P90 Cmp 11.175798||Pct Diff -30.2841844169||P99 Base 63.933462||P99 Cmp 63.33348||Pct Diff -0.938447537848||P999 Base 63.933462||P999 Cmp 63.33348||Pct Diff -0.938447537848||P100 Base 176.946354||P100
[jira] [Created] (LUCENE-8946) LRUQueryCache#doCache Should Be More Verbose
Atri Sharma created LUCENE-8946: --- Summary: LRUQueryCache#doCache Should Be More Verbose Key: LUCENE-8946 URL: https://issues.apache.org/jira/browse/LUCENE-8946 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma doCache does not really cache the query on its invocation. The actual caching (or checks) will happen only during scoring. doCache is basically creating the caching weight wrapper around the original weight of the query. We should 1) rename the method or/and 2) update the documentation around the method explicitly calling out this facet. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8942) Tighten Up LRUQueryCache's Methods
Atri Sharma created LUCENE-8942: --- Summary: Tighten Up LRUQueryCache's Methods Key: LUCENE-8942 URL: https://issues.apache.org/jira/browse/LUCENE-8942 Project: Lucene - Core Issue Type: Improvement Environment: LRUQueryCache has less strict visibility of methods than it can, and has some redundant parameters. Reporter: Atri Sharma -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8929) Early Terminating CollectorManager
[ https://issues.apache.org/jira/browse/LUCENE-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atri Sharma updated LUCENE-8929: Issue Type: Sub-task (was: Improvement) Parent: LUCENE-8940 > Early Terminating CollectorManager > -- > > Key: LUCENE-8929 > URL: https://issues.apache.org/jira/browse/LUCENE-8929 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Atri Sharma >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > We should have an early terminating collector manager which accurately tracks > hits across all of its collectors and determines when there are enough hits, > allowing all the collectors to abort. > The options for the same are: > 1) Shared total count : Global "scoreboard" where all collectors update their > current hit count. At the end of each document's collection, collector checks > if N > threshold, and aborts if true > 2) State Reporting Collectors: Collectors report their total number of counts > collected periodically using a callback mechanism, and get a proceed or abort > decision. > 1) has the overhead of synchronization in the hot path, 2) can collect > unnecessary hits before aborting. > I am planning to work on 2), unless objections -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8939) Shared Hit Count Early Termination
[ https://issues.apache.org/jira/browse/LUCENE-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atri Sharma updated LUCENE-8939: Summary: Shared Hit Count Early Termination (was: Global Early Termination For Sorted Collections) > Shared Hit Count Early Termination > -- > > Key: LUCENE-8939 > URL: https://issues.apache.org/jira/browse/LUCENE-8939 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Atri Sharma >Priority: Major > > When collecting hits across sorted segments, it should be possible to > terminate early across all slices when enough hits have been collected > globally i.e. hit count > numHits AND hit count < totalHitsThreshold -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8940) Early Termination Across Slices
Atri Sharma created LUCENE-8940: --- Summary: Early Termination Across Slices Key: LUCENE-8940 URL: https://issues.apache.org/jira/browse/LUCENE-8940 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma This JIRA tracks efforts for global early termination when segments are sorted. The cases being chased are: 1) Sorted segments -- hit count > numHits but less than threshold 2) Sorted segments and sort key is non score -- use shared PQ 3) Sorted segments and sort key is score -- propagate global minimum score -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8939) Global Early Termination For Sorted Collections
[ https://issues.apache.org/jira/browse/LUCENE-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atri Sharma updated LUCENE-8939: Issue Type: Sub-task (was: Improvement) Parent: LUCENE-8940 > Global Early Termination For Sorted Collections > --- > > Key: LUCENE-8939 > URL: https://issues.apache.org/jira/browse/LUCENE-8939 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Atri Sharma >Priority: Major > > When collecting hits across sorted segments, it should be possible to > terminate early across all slices when enough hits have been collected > globally i.e. hit count > numHits AND hit count < totalHitsThreshold -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8939) Global Early Termination For Sorted Collections
Atri Sharma created LUCENE-8939: --- Summary: Global Early Termination For Sorted Collections Key: LUCENE-8939 URL: https://issues.apache.org/jira/browse/LUCENE-8939 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma When collecting hits across sorted segments, it should be possible to terminate early across all slices when enough hits have been collected globally i.e. hit count > numHits AND hit count < totalHitsThreshold -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8936) Add SpanishMinimalStemFilter
[ https://issues.apache.org/jira/browse/LUCENE-8936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894336#comment-16894336 ] Atri Sharma commented on LUCENE-8936: - Hello Vinod! Welcome to the community. Thank you for your contribution. I would suggest following either of two approaches : 1) Attach a patch to this JIRA or 2) Open a pull request on the Lucene-Solr Github repository. Somebody will review your contribution soon and provide feedback. > Add SpanishMinimalStemFilter > > > Key: LUCENE-8936 > URL: https://issues.apache.org/jira/browse/LUCENE-8936 > Project: Lucene - Core > Issue Type: Improvement >Reporter: vinod kumar >Priority: Major > Attachments: LUCENE-8936.patch > > > SpanishMinimalStemmerFilter is less aggressive stemmer than > SpanishLightStemmerFilter > Ex: > input tokens -> output tokens > 1. camiseta niños -> *camiseta* and *nino* > 2. camisas -> camisa > *camisetas* and *camisas* are t-shirts and shirts respectively. > Stemming both of the tokens to *camis* will match both tokens and returns > both t-shirts and shirts for query camisas(shirts). > SpanishMinimalStemmerFilter will help handling these cases. > And importantly It will preserve gender context with tokens. > Ex: *niños* ,*niñas* *chicos* and *chicas* are stemmed to *nino*, *nina*, > *chico* and *chica* -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13655) Cut Over Collections.unmodifiedSet usages to Set.*
Atri Sharma created SOLR-13655: -- Summary: Cut Over Collections.unmodifiedSet usages to Set.* Key: SOLR-13655 URL: https://issues.apache.org/jira/browse/SOLR-13655 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Atri Sharma -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8929) Early Terminating CollectorManager
[ https://issues.apache.org/jira/browse/LUCENE-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892643#comment-16892643 ] Atri Sharma commented on LUCENE-8929: - Ok, so I have been working on this and am wondering what the definition (parameter) of a globally competitive hit be. Should it be the largest of the worst accepted hit across all collectors, and all collectors use that as the minimum threshold when filtering further hits? > Early Terminating CollectorManager > -- > > Key: LUCENE-8929 > URL: https://issues.apache.org/jira/browse/LUCENE-8929 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > We should have an early terminating collector manager which accurately tracks > hits across all of its collectors and determines when there are enough hits, > allowing all the collectors to abort. > The options for the same are: > 1) Shared total count : Global "scoreboard" where all collectors update their > current hit count. At the end of each document's collection, collector checks > if N > threshold, and aborts if true > 2) State Reporting Collectors: Collectors report their total number of counts > collected periodically using a callback mechanism, and get a proceed or abort > decision. > 1) has the overhead of synchronization in the hot path, 2) can collect > unnecessary hits before aborting. > I am planning to work on 2), unless objections -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8931) TestTopFieldCollectorEarlyTermination Should Use CheckHits
Atri Sharma created LUCENE-8931: --- Summary: TestTopFieldCollectorEarlyTermination Should Use CheckHits Key: LUCENE-8931 URL: https://issues.apache.org/jira/browse/LUCENE-8931 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma TestTopFieldCollectorEarlyTermination invents a new way of checking equality of hits. That is redundant since CheckHits provides the same functionality and is the de facto standard now. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8929) Early Terminating CollectorManager
[ https://issues.apache.org/jira/browse/LUCENE-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890760#comment-16890760 ] Atri Sharma commented on LUCENE-8929: - bq. So you need to collect each segment at least until {numHits} hits have been collected, or until the last collected hit was not competitive globally (whichever comes first) Yeah, sorry I was not clear. Per collector, we will collect until numHits hits are collected. I have opened a PR implementing the same: https://github.com/apache/lucene-solr/pull/803 Hoping the code gives more clarity > Early Terminating CollectorManager > -- > > Key: LUCENE-8929 > URL: https://issues.apache.org/jira/browse/LUCENE-8929 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We should have an early terminating collector manager which accurately tracks > hits across all of its collectors and determines when there are enough hits, > allowing all the collectors to abort. > The options for the same are: > 1) Shared total count : Global "scoreboard" where all collectors update their > current hit count. At the end of each document's collection, collector checks > if N > threshold, and aborts if true > 2) State Reporting Collectors: Collectors report their total number of counts > collected periodically using a callback mechanism, and get a proceed or abort > decision. > 1) has the overhead of synchronization in the hot path, 2) can collect > unnecessary hits before aborting. > I am planning to work on 2), unless objections -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8929) Early Terminating CollectorManager
[ https://issues.apache.org/jira/browse/LUCENE-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890749#comment-16890749 ] Atri Sharma commented on LUCENE-8929: - bq. OK, so if I understand correctly you are still collecting the first numHits hits as today, but you are trying to avoid collecting ${totalHitsThreshold-numHits} additional hits on every slice with this global counter? Yeah, exactly. The first numHits hits can be spread across all the involved collectors, but with the global counter, all collectors will abort once they realize that numHits number of hits have been collected globally, even if the total hit count per collector is, obviously, < numHits. > Early Terminating CollectorManager > -- > > Key: LUCENE-8929 > URL: https://issues.apache.org/jira/browse/LUCENE-8929 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > We should have an early terminating collector manager which accurately tracks > hits across all of its collectors and determines when there are enough hits, > allowing all the collectors to abort. > The options for the same are: > 1) Shared total count : Global "scoreboard" where all collectors update their > current hit count. At the end of each document's collection, collector checks > if N > threshold, and aborts if true > 2) State Reporting Collectors: Collectors report their total number of counts > collected periodically using a callback mechanism, and get a proceed or abort > decision. > 1) has the overhead of synchronization in the hot path, 2) can collect > unnecessary hits before aborting. > I am planning to work on 2), unless objections -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8929) Early Terminating CollectorManager
[ https://issues.apache.org/jira/browse/LUCENE-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890734#comment-16890734 ] Atri Sharma commented on LUCENE-8929: - {quote}What collector do you have in mind? Is it TopFieldCollector? {quote} Yes, that is the one. I did some tests, and am now inclined to go with 1), since that is a less invasive change and allows accurate termination with minimal overhead (< 3% degradation). This is due to the fact that AtomicInteger is mostly not implemented with a synchronization lock on modern hardwares. > Early Terminating CollectorManager > -- > > Key: LUCENE-8929 > URL: https://issues.apache.org/jira/browse/LUCENE-8929 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > We should have an early terminating collector manager which accurately tracks > hits across all of its collectors and determines when there are enough hits, > allowing all the collectors to abort. > The options for the same are: > 1) Shared total count : Global "scoreboard" where all collectors update their > current hit count. At the end of each document's collection, collector checks > if N > threshold, and aborts if true > 2) State Reporting Collectors: Collectors report their total number of counts > collected periodically using a callback mechanism, and get a proceed or abort > decision. > 1) has the overhead of synchronization in the hot path, 2) can collect > unnecessary hits before aborting. > I am planning to work on 2), unless objections -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8929) Early Terminating CollectorManager
Atri Sharma created LUCENE-8929: --- Summary: Early Terminating CollectorManager Key: LUCENE-8929 URL: https://issues.apache.org/jira/browse/LUCENE-8929 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma We should have an early terminating collector manager which accurately tracks hits across all of its collectors and determines when there are enough hits, allowing all the collectors to abort. The options for the same are: 1) Shared total count : Global "scoreboard" where all collectors update their current hit count. At the end of each document's collection, collector checks if N > threshold, and aborts if true 2) State Reporting Collectors: Collectors report their total number of counts collected periodically using a callback mechanism, and get a proceed or abort decision. 1) has the overhead of synchronization in the hot path, 2) can collect unnecessary hits before aborting. I am planning to work on 2), unless objections -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor
[ https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889970#comment-16889970 ] Atri Sharma commented on LUCENE-8727: - bq. we will have to skip all these docs with smaller doc Ids even if they have the same scores as docs with higher doc Ids and should be selected instead. That should be avoidable, since we will need a custom PQ implementation anyways if we decided to share the queue, so the PQ can tie break the other way round on doc IDs. One advantage of sharing PQ is that we can skip the merge process during reduce call of the CollectorManager. I am hesitant to introduce a synchronized block to the collector level collection mechanism -- it has a potential of blowing up in our face and becoming a performance bottleneck. I am curious about if we should simply have both versions -- sharing the PQ/min score and the CollectorManager which allows callbacks which are invoked at regular intervals by the dependent Collectors. The former can work well with lesser number of slices, while the latter can work well with a large number of slices. > IndexSearcher#search(Query,int) should operate on a shared priority queue > when configured with an executor > -- > > Key: LUCENE-8727 > URL: https://issues.apache.org/jira/browse/LUCENE-8727 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > If IndexSearcher is configured with an executor, then the top docs for each > slice are computed separately before being merged once the top docs for all > slices are computed. With block-max WAND this is a bit of a waste of > resources: it would be better if an increase of the min competitive score > could help skip non-competitive hits on every slice and not just the current > one. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor
[ https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888766#comment-16888766 ] Atri Sharma commented on LUCENE-8727: - [~jpountz] Here are two thoughts for the implementation of same: 1) Shared Priority Queue: A shared priority queue which is held in parent CollectorManager is used by all Collectors. This flows down naturally since post collection of top N hits globally, the minimum competitive score can be increased without Collectors getting involved and further hits will be ranked accordingly. However, the downside is that the priority queue implementation will have to be synchronized, so there can be performance hit as the critical path of segment collection will be affected. 2) Alternate way can be that for N hits, each slice gets an equal number of prorated hits to start with (M collectors, so N/M hits). Each Collector gets a callback supplier which the Collector will call with the number of hits collected till the point and the score of the highest scoring local hit. The callback will return the minimum competitive hit globally seen till now, and the Collector will use that score to filter out remaining hits. The point in time when a Collector calls the callback mechanism can be relative, simplest being after each N/M hits. The callback will be provided by the CollectorManager. The downside of this approach is that there is communication involved between Collectors and CollectorManager, and some redundant hits can be collected due to the periodic callback invocation. In contrast, the shared priority queue mechanism allows for accurate filtering. WDYT? > IndexSearcher#search(Query,int) should operate on a shared priority queue > when configured with an executor > -- > > Key: LUCENE-8727 > URL: https://issues.apache.org/jira/browse/LUCENE-8727 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > If IndexSearcher is configured with an executor, then the top docs for each > slice are computed separately before being merged once the top docs for all > slices are computed. With block-max WAND this is a bit of a waste of > resources: it would be better if an increase of the min competitive score > could help skip non-competitive hits on every slice and not just the current > one. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8927) Cut Over To Set.copyOf and Set.Of From Collections.unmodifiableSet
Atri Sharma created LUCENE-8927: --- Summary: Cut Over To Set.copyOf and Set.Of From Collections.unmodifiableSet Key: LUCENE-8927 URL: https://issues.apache.org/jira/browse/LUCENE-8927 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8924) Remove Fields Order Checks from CheckIndex?
[ https://issues.apache.org/jira/browse/LUCENE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887340#comment-16887340 ] Atri Sharma commented on LUCENE-8924: - I see. Should we make this more explicit and robust then? For E.g., since we do not explicitly maintain a sort order but rely on the key set to do the right thing, a change from Collections.unModifiableSet to Set.copyOf breaks this assertion in checkIndex (since Ser.copyOf explicitly calls out that there is no guarantee in the order of traversal) > Remove Fields Order Checks from CheckIndex? > --- > > Key: LUCENE-8924 > URL: https://issues.apache.org/jira/browse/LUCENE-8924 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > CheckIndex checks the order of fields read from the FieldsEnum for the > posting reader. Since we do not explicitly sort or use a sorted data > structure to represent keys (atleast explicitly), and no FieldsEnum depends > on the order apart from MultiFieldsEnum, which no longer exists. > > Should we remove the check? -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8924) Remove Fields Order Checks from CheckIndex?
Atri Sharma created LUCENE-8924: --- Summary: Remove Fields Order Checks from CheckIndex? Key: LUCENE-8924 URL: https://issues.apache.org/jira/browse/LUCENE-8924 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma CheckIndex checks the order of fields read from the FieldsEnum for the posting reader. Since we do not explicitly sort or use a sorted data structure to represent keys (atleast explicitly), and no FieldsEnum depends on the order apart from MultiFieldsEnum, which no longer exists. Should we remove the check? -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8915) Allow RateLimiter To Have Dynamic Limits
[ https://issues.apache.org/jira/browse/LUCENE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885941#comment-16885941 ] Atri Sharma commented on LUCENE-8915: - [~ab] Thanks, raised a PR doing the same. [https://github.com/apache/lucene-solr/pull/789] > Allow RateLimiter To Have Dynamic Limits > > > Key: LUCENE-8915 > URL: https://issues.apache.org/jira/browse/LUCENE-8915 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > RateLimiter does not allow dynamic configuration of the rate limit today. > This limits the kind of applications that the functionality can be applied > to. This Jira tracks 1) allowing the rate limiter to change limits > dynamically. 2) Add a RateLimiter subclass which exposes the same. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8915) Allow RateLimiter To Have Dynamic Limits
[ https://issues.apache.org/jira/browse/LUCENE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885887#comment-16885887 ] Atri Sharma commented on LUCENE-8915: - Hmm, I do not see a reason why SimpleRateLimiter cannot dynamically set values today (the setter is public). Should we make the rate limit value as protected, or update the javadocs/comments to reflect that dynamic updatability is available? > Allow RateLimiter To Have Dynamic Limits > > > Key: LUCENE-8915 > URL: https://issues.apache.org/jira/browse/LUCENE-8915 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > RateLimiter does not allow dynamic configuration of the rate limit today. > This limits the kind of applications that the functionality can be applied > to. This Jira tracks 1) allowing the rate limiter to change limits > dynamically. 2) Add a RateLimiter subclass which exposes the same. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8919) Query Metadata Aggregator
Atri Sharma created LUCENE-8919: --- Summary: Query Metadata Aggregator Key: LUCENE-8919 URL: https://issues.apache.org/jira/browse/LUCENE-8919 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma It would be good if there was a mechanism to allow aggregation of metadata for queries (eg, number of clauses, types of clauses, terms involved etc). This is particularly useful for complex queries with multiple levels of nesting and a high degree of branching. This should help debug query performance issues and draw patterns in case a query is misbehaving. With the QueryVisitor being present, this should be doable. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884939#comment-16884939 ] Atri Sharma commented on LUCENE-8811: - [~jpountz] Yeah, that is what I was thinking of, but I see your view point. I will raise a PR shortly > Add maximum clause count check to IndexSearcher rather than BooleanQuery > > > Key: LUCENE-8811 > URL: https://issues.apache.org/jira/browse/LUCENE-8811 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Alan Woodward >Priority: Minor > Fix For: 8.2 > > Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, > LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch > > > Currently we only check whether boolean queries have too many clauses. > However there are other ways that queries may have too many clauses, for > instance if you have boolean queries that have themselves inner boolean > queries. > Could we use the new Query visitor API to move this check from BooleanQuery > to IndexSearcher in order to make this check more consistent across queries? > See for instance LUCENE-8810 where a rewrite rule caused the maximum clause > count to be hit even though the total number of leaf queries remained the > same. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884930#comment-16884930 ] Atri Sharma commented on LUCENE-8811: - [~jpountz] I had originally raised a patch which implemented your suggested approach, should we commit that for 8.2, and let all other branches have the actual change introduced by this JIRA? > Add maximum clause count check to IndexSearcher rather than BooleanQuery > > > Key: LUCENE-8811 > URL: https://issues.apache.org/jira/browse/LUCENE-8811 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Alan Woodward >Priority: Minor > Fix For: 8.2 > > Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, > LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch > > > Currently we only check whether boolean queries have too many clauses. > However there are other ways that queries may have too many clauses, for > instance if you have boolean queries that have themselves inner boolean > queries. > Could we use the new Query visitor API to move this check from BooleanQuery > to IndexSearcher in order to make this check more consistent across queries? > See for instance LUCENE-8810 where a rewrite rule caused the maximum clause > count to be hit even though the total number of leaf queries remained the > same. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8915) Allow RateLimiter To Have Dynamic Limits
[ https://issues.apache.org/jira/browse/LUCENE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atri Sharma updated LUCENE-8915: Description: RateLimiter does not allow dynamic configuration of the rate limit today. This limits the kind of applications that the functionality can be applied to. This Jira tracks 1) allowing the rate limiter to change limits dynamically. 2) Add a RateLimiter subclass which exposes the same. (was: While working on multi range queries, I realised that it would be good to specialize for cases where all clauses in a query are ORed together. MultiTermQuery springs to mind, when all terms are basically disjuncted.) > Allow RateLimiter To Have Dynamic Limits > > > Key: LUCENE-8915 > URL: https://issues.apache.org/jira/browse/LUCENE-8915 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > RateLimiter does not allow dynamic configuration of the rate limit today. > This limits the kind of applications that the functionality can be applied > to. This Jira tracks 1) allowing the rate limiter to change limits > dynamically. 2) Add a RateLimiter subclass which exposes the same. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8915) Allow RateLimiter To Have Dynamic Limits
Atri Sharma created LUCENE-8915: --- Summary: Allow RateLimiter To Have Dynamic Limits Key: LUCENE-8915 URL: https://issues.apache.org/jira/browse/LUCENE-8915 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma While working on multi range queries, I realised that it would be good to specialize for cases where all clauses in a query are ORed together. MultiTermQuery springs to mind, when all terms are basically disjuncted. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8905) TopDocsCollector Should Have Better Error Handling For Illegal Arguments
Atri Sharma created LUCENE-8905: --- Summary: TopDocsCollector Should Have Better Error Handling For Illegal Arguments Key: LUCENE-8905 URL: https://issues.apache.org/jira/browse/LUCENE-8905 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma While writing some tests, I realised that TopDocsCollector does not behave well when illegal arguments are passed in (for eg, requesting more hits than the number of hits collected). Instead, we return a TopDocs instance with 0 hits. This can be problematic when queries are being formed by applications. This can hide bugs where malformed queries return no hits and that is surfaced upstream to client applications. I found a TODO at the relevant code space, so I believe it is time to fix the problem and throw an IllegalArgumentsException. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8829) TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
[ https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atri Sharma resolved LUCENE-8829. - Resolution: Fixed Merged to master > TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved > - > > Key: LUCENE-8829 > URL: https://issues.apache.org/jira/browse/LUCENE-8829 > Project: Lucene - Core > Issue Type: Bug >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, > LUCENE-8829.patch > > > While investigating LUCENE-8819, I understood that TopDocs#merge's order of > results are indirectly dependent on the number of collectors involved in the > merge. This is troubling because 1) The number of collectors involved in a > merge are cost based and directly dependent on the number of slices created > for the parallel searcher case. 2) TopN hits code path will invoke merge with > a single Collector, so essentially, doing the same TopN query with single > threaded and parallel threaded searcher will invoke different order of > results, which is a bad invariant that breaks. > > The reason why this happens is because of the subtle way TopDocs#merge sets > shardIndex in the ScoreDoc population during populating the priority queue > used for merging. ShardIndex is essentially set to the ordinal of the > collector which generates the hit. This means that the shardIndex is > dependent on the number of collectors, even for the same set of hits. > > In case of no sort order specified, shardIndex is used for tie breaking when > scores are equal. This translates to different orders for same hits with > different shardIndices. > > I propose that we remove shardIndex from the default tie breaking mechanism > and replace it with docID. DocID order is the de facto that is expected > during collection, so it might make sense to use the same factor during tie > breaking when scores are the same. > > CC: [~ivera] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8794) Cost Based Slice Allocation Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atri Sharma resolved LUCENE-8794. - Resolution: Fixed Merged to master > Cost Based Slice Allocation Algorithm > - > > Key: LUCENE-8794 > URL: https://issues.apache.org/jira/browse/LUCENE-8794 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > In https://issues.apache.org/jira/browse/LUCENE-8757, the idea of a cost > based and dynamically adjusting slice allocation algorithm was conceived. We > should ideally have a hard cap on the number of threads that can be consumed > by a single query, and have static cost factors associated with segments and > assign them to threads in a fair manner. We will also need to ensure that we > end up not assigning individual threads to small segments, or making more > thread s that needed (thread context switching could outweight benefits). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8762) Lucene50PostingsReader should specialize reading docs+freqs with impacts
[ https://issues.apache.org/jira/browse/LUCENE-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877864#comment-16877864 ] Atri Sharma commented on LUCENE-8762: - I will take a crack at this and post a patch soon. > Lucene50PostingsReader should specialize reading docs+freqs with impacts > > > Key: LUCENE-8762 > URL: https://issues.apache.org/jira/browse/LUCENE-8762 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently if you ask for impacts, we only have one implementation that is > able to expose everything: docs, freqs, positions and offsets. In contrast, > if you don't need impacts, we have specialization for docs+freqs, > docs+freqs+positions and docs+freqs+positions+offsets. > Maybe we should add specialization for the docs+freqs case with impacts, > which should be the most common case, and remove specialization for > docs+freqs+positions when impacts are not requested? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877232#comment-16877232 ] Atri Sharma commented on LUCENE-8857: - [~jpountz] Yes, I ran the Solr suite twice. The first time, failures with tracer not able to close were seen. The second time, the entire suite came in clean. I also ran ant precommit – came in clean. > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857-compile-fix.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8899) Implementation of MultiTermQuery for ORed Queries
[ https://issues.apache.org/jira/browse/LUCENE-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atri Sharma resolved LUCENE-8899. - Resolution: Not A Problem > Implementation of MultiTermQuery for ORed Queries > - > > Key: LUCENE-8899 > URL: https://issues.apache.org/jira/browse/LUCENE-8899 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > While working on multi range queries, I realised that it would be good to > specialize for cases where all clauses in a query are ORed together. > MultiTermQuery springs to mind, when all terms are basically disjuncted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8899) Implementation of MultiTermQuery for ORed Queries
[ https://issues.apache.org/jira/browse/LUCENE-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877219#comment-16877219 ] Atri Sharma commented on LUCENE-8899: - Hmm, true. I was thinking of a query type just for the disjunctives, but looks like TermInSetQuery already covers it. Thanks for pointing it out! > Implementation of MultiTermQuery for ORed Queries > - > > Key: LUCENE-8899 > URL: https://issues.apache.org/jira/browse/LUCENE-8899 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > While working on multi range queries, I realised that it would be good to > specialize for cases where all clauses in a query are ORed together. > MultiTermQuery springs to mind, when all terms are basically disjuncted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877156#comment-16877156 ] Atri Sharma commented on LUCENE-8857: - [~jpountz] Thanks for confirming. I wanted to ensure that no unsuspecting user gets bitten :) > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857-compile-fix.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8899) Implementation of MultiTermQuery for ORed Queries
[ https://issues.apache.org/jira/browse/LUCENE-8899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877154#comment-16877154 ] Atri Sharma commented on LUCENE-8899: - The way I am thinking of this is by using the fact that MultiTermQueryConstantScoreWrapper will always convert to a BooleanQuery with each clause as SHOULD. So it should be a simple matter to use that logic. The main change will be introduction of a new TermsEnum implementation which can filter the input terms based on a filter built from the terms list given in the query. Does this seem like a reasonable approach? > Implementation of MultiTermQuery for ORed Queries > - > > Key: LUCENE-8899 > URL: https://issues.apache.org/jira/browse/LUCENE-8899 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > While working on multi range queries, I realised that it would be good to > specialize for cases where all clauses in a query are ORed together. > MultiTermQuery springs to mind, when all terms are basically disjuncted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13597) TopGroups Should Respect the API in Lucene's TopDocs.merge
[ https://issues.apache.org/jira/browse/SOLR-13597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atri Sharma resolved SOLR-13597. Resolution: Not A Problem This can be done at Lucene level itself, given the usage pattern of Solr for TopDocs.merge > TopGroups Should Respect the API in Lucene's TopDocs.merge > -- > > Key: SOLR-13597 > URL: https://issues.apache.org/jira/browse/SOLR-13597 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Atri Sharma >Priority: Major > > In LUCENE-8857, TopDocs.merge loses the ability to set shard indices, so > callers have to set shard indices themselves before calling merge, or use > docID based tie breaker. > > TopGroups uses this non existent capability of Lucene, hence the > corresponding tests break. This Jira tracks the efforts to fix TopGroups to > respect the new API, and should be merged post merge of LUCENE-8857 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876965#comment-16876965 ] Atri Sharma commented on LUCENE-8857: - Since this is a breaking API change, is there a way we can highlight this to existing users in a "louder" manner, or is MIGRATE.txt entry enough? > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857-compile-fix.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8899) Implementation of MultiTermQuery for ORed Queries
Atri Sharma created LUCENE-8899: --- Summary: Implementation of MultiTermQuery for ORed Queries Key: LUCENE-8899 URL: https://issues.apache.org/jira/browse/LUCENE-8899 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma While working on multi range queries, I realised that it would be good to specialize for cases where all clauses in a query are ORed together. MultiTermQuery springs to mind, when all terms are basically disjuncted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876773#comment-16876773 ] Atri Sharma commented on LUCENE-8857: - JFYI The latest iteration on PR also fixes the compilation failure in Solr, introduced in SOLR-13404 > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857-compile-fix.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876759#comment-16876759 ] Atri Sharma commented on LUCENE-8857: - [~jpountz] I have pushed the latest iteration to the new PR. It passes ant test: {code:java} [junit4:tophints] 59.58s | org.apache.lucene.search.suggest.document.TestSuggestField [junit4:tophints] 17.10s | org.apache.lucene.search.suggest.DocumentDictionaryTest [junit4:tophints] 14.56s | org.apache.lucene.search.suggest.fst.FSTCompletionTest [junit4:tophints] 14.21s | org.apache.lucene.search.suggest.analyzing.FuzzySuggesterTest -check-totals: common.test: -check-totals: test: BUILD SUCCESSFUL Total time: 74 minutes 29 seconds f01898a404cf:lucene atris$ {code} It also passes the offending Solr test: ant test -Dtestcase=TestDistributedGrouping -Dtests.method=test -Dtests.seed=B5D95BEAE23E9468 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=nl-AW -Dtests.timezone=Asia/Jayapura -Dtests.asserts=true -Dtests.file.encoding=UTF-8 {code:java} 27429 INFO (closeThreadPool-74-thread-4) [ ] o.e.j.s.AbstractConnector Stopped ServerConnector@3e6caf50{HTTP/1.1,[http/1.1, h2c]}{127.0.0.1:0} 27430 INFO (closeThreadPool-74-thread-4) [ ] o.e.j.s.h.ContextHandler Stopped o.e.j.s.ServletContextHandler@169e0265{/,null,UNAVAILABLE} 27430 INFO (closeThreadPool-74-thread-4) [ ] o.e.j.s.session node0 Stopped scavenging 27431 INFO (closeThreadPool-74-thread-1) [ ] o.e.j.s.AbstractConnector Stopped ServerConnector@1be02e89{HTTP/1.1,[http/1.1, h2c]}{127.0.0.1:0} 27431 INFO (closeThreadPool-74-thread-1) [ ] o.e.j.s.h.ContextHandler Stopped o.e.j.s.ServletContextHandler@6b6f3dda{/,null,UNAVAILABLE} 27432 INFO (closeThreadPool-74-thread-1) [ ] o.e.j.s.session node0 Stopped scavenging 27432 INFO (closeThreadPool-74-thread-5) [ ] o.e.j.s.AbstractConnector Stopped ServerConnector@4052b482{HTTP/1.1,[http/1.1, h2c]}{127.0.0.1:0} 27432 INFO (closeThreadPool-74-thread-5) [ ] o.e.j.s.h.ContextHandler Stopped o.e.j.s.ServletContextHandler@7063254f{/,null,UNAVAILABLE} 27432 INFO (closeThreadPool-74-thread-5) [ ] o.e.j.s.session node0 Stopped scavenging 27436 INFO (SUITE-TestDistributedGrouping-seed#[C817F4DEFFC8F2A7]-worker) [ ] o.a.s.SolrTestCaseJ4 --- Done waiting for tracked resources to be released{code} > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857-compile-fix.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8882) Add State To QueryVisitor
[ https://issues.apache.org/jira/browse/LUCENE-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876754#comment-16876754 ] Atri Sharma commented on LUCENE-8882: - My idea was not to replace IndexOrDocValues, but to allow it to be more generally applicable. For eg, taking the specific example of the optimized query which is applicable for limited cases in which the index is sorted, we would ideally be better off if we used that query over point values (even though that query is a docvalues based implementation). However, the query is too specialized for IndexOrDocValues to factor in. What I was envisioning was a state where, at the start of the query, IndexSearcher creates a QueryVisitor, sees that the index is sorted by key X, and populates a property in the QueryVisitor's metadata (INDEX_SORTED_KEY=X). IndexOrDocValuesQuery, then, instead of making an immediate decision as to whether to use Points or DocValues, passes on the visitor to both of the branches. Further down the line, the sorted index query type will see the metadata in the visitor and volunteer itself (by adding another property in the metadata of the visitor (SORTED_PLAN_AVAILABLE=true or something). In the end, IndexOrDocValues will perform an evaluation, which includes the costing which it does today + the metadata state gathered from both the branches, and then choose the branch to execute. This will allow new query types for specific use cases to be added easily (just add a new property type and a listener query for it), and let the engine take better decisions as to when to execute what queries, which can potentially lead to better query performance. Thoughts? > Add State To QueryVisitor > - > > Key: LUCENE-8882 > URL: https://issues.apache.org/jira/browse/LUCENE-8882 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > QueryVisitor has no state passed in either up or down recursion. This limits > the width of decisions that can be taken by visitation of QueryVisitor. For > eg, for LUCENE-8881, we need a way to specify is the visitor is a rewriter > visitor. > > This Jira proposes adding a property bag model to QueryVisitor, which can > then be referred to by the Query instance being visited by QueryVisitor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876711#comment-16876711 ] Atri Sharma commented on LUCENE-8857: - [~munendrasn] Thanks for the compilation fix. Yes, the test will fail. I fixed that test failure – will update the PR once my local test suite run completes > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857-compile-fix.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876708#comment-16876708 ] Atri Sharma commented on LUCENE-8857: - Ok, updating the PR now. > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857-compile-fix.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876699#comment-16876699 ] Atri Sharma commented on LUCENE-8857: - [~jpountz] Yes, we will. I did not want to add the fix for Solr in this PR since that kind of muddles up (going across two modules). I can raise a separate PR just for the Solr fixes, though, if that works. > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876692#comment-16876692 ] Atri Sharma commented on LUCENE-8857: - I have opened https://issues.apache.org/jira/browse/SOLR-13597 to track fixes to Solr to use the new API (that is what is causing the Solr test to fail). I will raise a PR for that Jira post the merging of this PR. > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-13597) TopGroups Should Respect the API in Lucene's TopDocs.merge
Atri Sharma created SOLR-13597: -- Summary: TopGroups Should Respect the API in Lucene's TopDocs.merge Key: SOLR-13597 URL: https://issues.apache.org/jira/browse/SOLR-13597 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Atri Sharma In LUCENE-8857, TopDocs.merge loses the ability to set shard indices, so callers have to set shard indices themselves before calling merge, or use docID based tie breaker. TopGroups uses this non existent capability of Lucene, hence the corresponding tests break. This Jira tracks the efforts to fix TopGroups to respect the new API, and should be merged post merge of LUCENE-8857 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876378#comment-16876378 ] Atri Sharma commented on LUCENE-8857: - [~jpountz] Ran ant test 5 times again: all came in clean: I have raised a new PR with testGrouping fixes: [https://github.com/apache/lucene-solr/pull/757] Can we merge it, if it looks fine? {code:java} junit4:tophints] 54.39s | org.apache.lucene.search.suggest.document.TestSuggestField [junit4:tophints] 16.93s | org.apache.lucene.search.suggest.DocumentDictionaryTest [junit4:tophints] 16.63s | org.apache.lucene.search.suggest.analyzing.FuzzySuggesterTest [junit4:tophints] 16.42s | org.apache.lucene.search.suggest.fst.FSTCompletionTest -check-totals: common.test: -check-totals: test: BUILD SUCCESSFUL Total time: 45 minutes 8 seconds f01898a404cf:lucene atris$ {code} > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876335#comment-16876335 ] Atri Sharma commented on LUCENE-8857: - [~jpountz] I investigated this and it turned out to be a test limitation (testGrouping assumed that TopDocs.merge was setting the shard indices). It took a while to reproduce since it was the random test which was failing (thanks for providing the seed!) I have fixed the test and ran ant test a couple of times – it came in clean: Can we push this in now? {code:java} [junit4:tophints] 49.54s | org.apache.lucene.search.suggest.document.TestSuggestField [junit4:tophints] 21.55s | org.apache.lucene.search.suggest.analyzing.FuzzySuggesterTest [junit4:tophints] 21.51s | org.apache.lucene.search.suggest.DocumentDictionaryTest [junit4:tophints] 15.45s | org.apache.lucene.search.spell.TestSpellChecker -check-totals: common.test: -check-totals: test: BUILD SUCCESSFUL Total time: 49 minutes 49 seconds f01898a404cf:lucene atris$ {code} [~munendrasn] I am not too aware of Solr's internals, but looking at the error you pointed to, looks like that the test is not setting shard indices or hit indices. This points to an assumption in the test – that TopDocs.merge is setting the shard indices. Can you check {code:java} search/grouping/distributed/responseprocessor/TopGroupsShardResponseProcessor.java{code} where the TopDocs.merge call is done? We can set shard indices for all TopHits based on the QueryCommandResult they come from. > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4h 20m > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876253#comment-16876253 ] Atri Sharma commented on LUCENE-8857: - I did – I was not able to see any failures (probably due to seeds?). I will try with the seed in your command now. > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4h 20m > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876218#comment-16876218 ] Atri Sharma commented on LUCENE-8857: - [~jpountz] Thanks for committing and reviewing, [~simonw] Thanks for your constructive inputs! > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 4h 10m > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8862) Collector Level Dynamic Memory Accounting
[ https://issues.apache.org/jira/browse/LUCENE-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876130#comment-16876130 ] Atri Sharma commented on LUCENE-8862: - [~jpountz] Thanks for pushing and reviewing! > Collector Level Dynamic Memory Accounting > - > > Key: LUCENE-8862 > URL: https://issues.apache.org/jira/browse/LUCENE-8862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Inspired from LUCENE-8855, I am thinking of adding a new interface which > tracks dynamic memory used by Collectors. This shall allow users to get an > accountability as to the memory usage of their Collectors and better plan > their resource capacity. This shall also allow us to add Collector level > limits for memory usage, thus allowing users a finer control over their > resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8897) Allow Callbacks For Events In Collectors/ CollectorManagers
Atri Sharma created LUCENE-8897: --- Summary: Allow Callbacks For Events In Collectors/ CollectorManagers Key: LUCENE-8897 URL: https://issues.apache.org/jira/browse/LUCENE-8897 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma It would be good to allow and Collectors and CollectorManagers to allow callbacks to happen for specific incidents (such as collection of N doc IDs across all Collectors of a CollectorManager). This will allow things like more accurate early termination to happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8896) Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries
[ https://issues.apache.org/jira/browse/LUCENE-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876037#comment-16876037 ] Atri Sharma commented on LUCENE-8896: - Does PointRangeQuery not already have its custom intersects implementation? > Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, > byte[]) for several queries > -- > > Key: LUCENE-8896 > URL: https://issues.apache.org/jira/browse/LUCENE-8896 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > > In LUCENE-8885, it was introduced a new method on the {{IntersectsVisitor}} > interface. It contains a default implementation but queries can override it > and therefore benefit when there are several documents on a leaf associated > to the same point. > In this issue the following queries are proposed to override the default > implementation > * LatLonShapeQuery > * RangeFieldQuery > * LatLonPointInPolygonQuery > * LatLonPointDistanceQuery > * PointRangeQuery > * PointInSetQuery -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876026#comment-16876026 ] Atri Sharma commented on LUCENE-8857: - Should we push the latest iteration on the PR, if it looks fine? > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874125#comment-16874125 ] Atri Sharma commented on LUCENE-8857: - Updated the PR with latest comments, removing merge functionality as well. Happy to iterate further > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8862) Collector Level Dynamic Memory Accounting
[ https://issues.apache.org/jira/browse/LUCENE-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874013#comment-16874013 ] Atri Sharma commented on LUCENE-8862: - Updated the PR with latest comments and moved to misc module. Happy to iterate further. > Collector Level Dynamic Memory Accounting > - > > Key: LUCENE-8862 > URL: https://issues.apache.org/jira/browse/LUCENE-8862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Inspired from LUCENE-8855, I am thinking of adding a new interface which > tracks dynamic memory used by Collectors. This shall allow users to get an > accountability as to the memory usage of their Collectors and better plan > their resource capacity. This shall also allow us to add Collector level > limits for memory usage, thus allowing users a finer control over their > resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8889) Remove Dead Code From PointRangeQuery
[ https://issues.apache.org/jira/browse/LUCENE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874002#comment-16874002 ] Atri Sharma commented on LUCENE-8889: - [~jim.ferenczi] Call me old school, but I believe that APIs should have atleast one user within library code base (for purely external facing APIs, tests are the way as you suggested). I have raised a PR to beef up equality tests using the said API, let me know if it looks fine > Remove Dead Code From PointRangeQuery > - > > Key: LUCENE-8889 > URL: https://issues.apache.org/jira/browse/LUCENE-8889 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > PointRangeQuery has accessors for the underlying points in the query but > those are never accessed. We should remove them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8889) Remove Dead Code From PointRangeQuery
Atri Sharma created LUCENE-8889: --- Summary: Remove Dead Code From PointRangeQuery Key: LUCENE-8889 URL: https://issues.apache.org/jira/browse/LUCENE-8889 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma PointRangeQuery has accessors for the underlying points in the query but those are never accessed. We should remove them -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8881) Query.rewrite Should Move To QueryVisitor
[ https://issues.apache.org/jira/browse/LUCENE-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873984#comment-16873984 ] Atri Sharma commented on LUCENE-8881: - [~romseygeek] Agreed, however, we could use QueryVisitor's recursion mechanism to get query specific rewrites done (please see my PR to add metadata state to QueryVisitor). We could add a boolean property saying DO_REWRITE=true and fire a visitor, and each query checks for that property. My main point is that it seems incorrect for two query tree traversal mechanisms to exist independently. This Jira is primarily opened to trade thoughts on that front, and maybe see if we can draw a common baseline between the two existing mechanisms. WDYT? > Query.rewrite Should Move To QueryVisitor > - > > Key: LUCENE-8881 > URL: https://issues.apache.org/jira/browse/LUCENE-8881 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > Now that we have QueryVisitor, the rewrite functionality should belong there, > since rewrite is essentially a recursive visitation of underlying queries, > which sounds exactly as what QueryVisitor is designed to be. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873207#comment-16873207 ] Atri Sharma commented on LUCENE-8811: - Thanks [~romseygeek] for pushing! A small nit: I think git somehow botched up the patch during commit ? (I see your name as both author and committer). > Add maximum clause count check to IndexSearcher rather than BooleanQuery > > > Key: LUCENE-8811 > URL: https://issues.apache.org/jira/browse/LUCENE-8811 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Alan Woodward >Priority: Minor > Fix For: 8.2 > > Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, > LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch > > > Currently we only check whether boolean queries have too many clauses. > However there are other ways that queries may have too many clauses, for > instance if you have boolean queries that have themselves inner boolean > queries. > Could we use the new Query visitor API to move this check from BooleanQuery > to IndexSearcher in order to make this check more consistent across queries? > See for instance LUCENE-8810 where a rewrite rule caused the maximum clause > count to be hit even though the total number of leaf queries remained the > same. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8882) Add State To QueryVisitor
[ https://issues.apache.org/jira/browse/LUCENE-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873002#comment-16873002 ] Atri Sharma commented on LUCENE-8882: - I think this is useful even outside LUCENE-8881 – This allows upper queries to collect metadata about the lower leaf level queries and make decisions (motivated by the excellent work done recently to use the property of a sorted index to perform binary searches on docIDs). So we could use a property such as INDEX_SORTED, which is populated at some query and visible to the entire query tree, and then a query looks at the property and decides to use a specific type of query. This can even be ingested in the cost of the query, but in a localised form so that not all heuristics are crammed in one specialized query (IndexOrDocValues?) Objections/Thoughts/Comments? > Add State To QueryVisitor > - > > Key: LUCENE-8882 > URL: https://issues.apache.org/jira/browse/LUCENE-8882 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > QueryVisitor has no state passed in either up or down recursion. This limits > the width of decisions that can be taken by visitation of QueryVisitor. For > eg, for LUCENE-8881, we need a way to specify is the visitor is a rewriter > visitor. > > This Jira proposes adding a property bag model to QueryVisitor, which can > then be referred to by the Query instance being visited by QueryVisitor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8882) Add State To QueryVisitor
Atri Sharma created LUCENE-8882: --- Summary: Add State To QueryVisitor Key: LUCENE-8882 URL: https://issues.apache.org/jira/browse/LUCENE-8882 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma QueryVisitor has no state passed in either up or down recursion. This limits the width of decisions that can be taken by visitation of QueryVisitor. For eg, for LUCENE-8881, we need a way to specify is the visitor is a rewriter visitor. This Jira proposes adding a property bag model to QueryVisitor, which can then be referred to by the Query instance being visited by QueryVisitor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8881) Query.rewrite Should Move To QueryVisitor
Atri Sharma created LUCENE-8881: --- Summary: Query.rewrite Should Move To QueryVisitor Key: LUCENE-8881 URL: https://issues.apache.org/jira/browse/LUCENE-8881 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma Now that we have QueryVisitor, the rewrite functionality should belong there, since rewrite is essentially a recursive visitation of underlying queries, which sounds exactly as what QueryVisitor is designed to be. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872550#comment-16872550 ] Atri Sharma commented on LUCENE-8811: - Any chance we could push this one? Happy to make any changes > Add maximum clause count check to IndexSearcher rather than BooleanQuery > > > Key: LUCENE-8811 > URL: https://issues.apache.org/jira/browse/LUCENE-8811 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, > LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch > > > Currently we only check whether boolean queries have too many clauses. > However there are other ways that queries may have too many clauses, for > instance if you have boolean queries that have themselves inner boolean > queries. > Could we use the new Query visitor API to move this check from BooleanQuery > to IndexSearcher in order to make this check more consistent across queries? > See for instance LUCENE-8810 where a rewrite rule caused the maximum clause > count to be hit even though the total number of leaf queries remained the > same. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8880) Add a TopDocsCollector which does not sort by score
Atri Sharma created LUCENE-8880: --- Summary: Add a TopDocsCollector which does not sort by score Key: LUCENE-8880 URL: https://issues.apache.org/jira/browse/LUCENE-8880 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma We assume that the user cares about the underlying hits being ordered by score. This Jira explores adding a collector which does not make this guarantee, thus not using priority queue as the collection data structure. This should help with large hits case, where the heap’s rebalancing can become a bottleneck -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8877) TopDocsCollector Should Not Depend on Priority Queue
[ https://issues.apache.org/jira/browse/LUCENE-8877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872544#comment-16872544 ] Atri Sharma commented on LUCENE-8877: - Any thoughts on this? I am envisioning eventually getting to a state where the underlying data structure used is opaque to IndexSearcher API. This should allow an abstraction with high degree of flexibility > TopDocsCollector Should Not Depend on Priority Queue > > > Key: LUCENE-8877 > URL: https://issues.apache.org/jira/browse/LUCENE-8877 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > TopDocsCollector is tightly coupled to the notion of priority queue, which is > not necessarily a good abstraction to have since the collector really just > needs an interface to iterate on and hold docID and score, with possibly > shard indexes. > > We should rewrite this to a more simplistic interface with priority queue > being the default implementation -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8875) Should TopScoreDocCollector Always Populate Sentinel Values?
[ https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871620#comment-16871620 ] Atri Sharma commented on LUCENE-8875: - I meant Elasticsearch aggregates (although I am not sure if this new proposed collector has a direct improvement in that front, on second thought). The meat of the point here is that I believe it is the path of minimal invasion if we introduced a new collector which clearly calls out that it is meant for cases when N is very large (>10k?), and lists out the benefits and trade offs clearly. Are there any catches that are applicable here? > Should TopScoreDocCollector Always Populate Sentinel Values? > > > Key: LUCENE-8875 > URL: https://issues.apache.org/jira/browse/LUCENE-8875 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > TopScoreDocCollector always initializes HitQueue as the PQ implementation, > and instruct HitQueue to populate with sentinels. While this is a great > safety mechanism, for very large datasets where the query's selectivity is > high, the sentinel population can be redundant and can become a large enough > bottleneck in itself. Does it make sense to introduce a new parameter in > TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and > does not populate sentinels? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8875) Should TopScoreDocCollector Always Populate Sentinel Values?
[ https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871543#comment-16871543 ] Atri Sharma commented on LUCENE-8875: - While I do agree that too many hits are not what top N hits are intended for, but some increasing popular use cases are inclined in that direction (bucket aggregates?) I think it would be fair to allow such users to use a different Collector which optimises their case while not muddling with the commonly used code path. WDYT? > Should TopScoreDocCollector Always Populate Sentinel Values? > > > Key: LUCENE-8875 > URL: https://issues.apache.org/jira/browse/LUCENE-8875 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > TopScoreDocCollector always initializes HitQueue as the PQ implementation, > and instruct HitQueue to populate with sentinels. While this is a great > safety mechanism, for very large datasets where the query's selectivity is > high, the sentinel population can be redundant and can become a large enough > bottleneck in itself. Does it make sense to introduce a new parameter in > TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and > does not populate sentinels? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8877) TopDocsCollector Should Not Depend on Priority Queue
Atri Sharma created LUCENE-8877: --- Summary: TopDocsCollector Should Not Depend on Priority Queue Key: LUCENE-8877 URL: https://issues.apache.org/jira/browse/LUCENE-8877 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma TopDocsCollector is tightly coupled to the notion of priority queue, which is not necessarily a good abstraction to have since the collector really just needs an interface to iterate on and hold docID and score, with possibly shard indexes. We should rewrite this to a more simplistic interface with priority queue being the default implementation -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8875) Should TopScoreDocCollector Always Populate Sentinel Values?
[ https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16870679#comment-16870679 ] Atri Sharma commented on LUCENE-8875: - Another thing to explore is to have a sleek set of arrays instead of ScoreDocs: [https://sbdevel.wordpress.com/2015/10/05/speeding-up-core-search/] Maybe have a new implementation of a PQ using this idea, and a new Collector which uses the threshold sentinel filling + the new PQ? Only used for very large N? > Should TopScoreDocCollector Always Populate Sentinel Values? > > > Key: LUCENE-8875 > URL: https://issues.apache.org/jira/browse/LUCENE-8875 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > TopScoreDocCollector always initializes HitQueue as the PQ implementation, > and instruct HitQueue to populate with sentinels. While this is a great > safety mechanism, for very large datasets where the query's selectivity is > high, the sentinel population can be redundant and can become a large enough > bottleneck in itself. Does it make sense to introduce a new parameter in > TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and > does not populate sentinels? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8875) Should TopScoreDocCollector Always Populate Sentinel Values?
Atri Sharma created LUCENE-8875: --- Summary: Should TopScoreDocCollector Always Populate Sentinel Values? Key: LUCENE-8875 URL: https://issues.apache.org/jira/browse/LUCENE-8875 Project: Lucene - Core Issue Type: Improvement Reporter: Atri Sharma TopScoreDocCollector always initializes HitQueue as the PQ implementation, and instruct HitQueue to populate with sentinels. While this is a great safety mechanism, for very large datasets where the query's selectivity is high, the sentinel population can be redundant and can become a large enough bottleneck in itself. Does it make sense to introduce a new parameter in TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and does not populate sentinels? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868909#comment-16868909 ] Atri Sharma commented on LUCENE-8857: - [~simonw] I have added the default tie breaker which tie breaks by shard indices first and then docIDs, as suggested. The new PR has the latest iteration, please let me know if it seems fine. > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868717#comment-16868717 ] Atri Sharma commented on LUCENE-8857: - {quote}Any chance we can select the tie-breaker based on if one of the TopDocs has a shardIndex != -1 and assert that all of them have it or not? Another option would be to have only one comparator and first tie-break on shardIndex and then on doc since we don't set the shard index it should be fine since they are all -1? {quote} Would that not defeat the purpose of passing in the custom tie breaker? I thought the reason we added passing in the Comparator was to allow users to specify custom tie breaking algorithms, and define a custom one. Am I missing something? > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868479#comment-16868479 ] Atri Sharma commented on LUCENE-8857: - Does this iteration look fine? Happy to iterate further if needed. > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867411#comment-16867411 ] Atri Sharma commented on LUCENE-8857: - Updated patch with improved javadocs and removal of now redundant methods [^LUCENE-8857.patch] > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8857) Refactor TopDocs#Merge To Take In Custom Tie Breakers
[ https://issues.apache.org/jira/browse/LUCENE-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Atri Sharma updated LUCENE-8857: Attachment: LUCENE-8857.patch > Refactor TopDocs#Merge To Take In Custom Tie Breakers > - > > Key: LUCENE-8857 > URL: https://issues.apache.org/jira/browse/LUCENE-8857 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8857.patch, LUCENE-8857.patch, LUCENE-8857.patch, > LUCENE-8857.patch, LUCENE-8857.patch > > > In LUCENE-8829, the idea of having lambdas passed in to the API to allow > finer control over the process was discussed. > This JIRA tracks adding a parameter to the API which allows passing in > lambdas to define custom tie breakers, thus allowing users to do custom > algorithms when required. > CC: [~jpountz] [~simonw] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8862) Collector Level Dynamic Memory Accounting
[ https://issues.apache.org/jira/browse/LUCENE-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867325#comment-16867325 ] Atri Sharma commented on LUCENE-8862: - I have opened a PR for the same. Please let me know if this looks fine. Once we merge this, I am planning to open a Jira to enable Solr's facet collector to account for memory. For default cases, the limit can be long.MAX_VALUE. Thoughts? > Collector Level Dynamic Memory Accounting > - > > Key: LUCENE-8862 > URL: https://issues.apache.org/jira/browse/LUCENE-8862 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Inspired from LUCENE-8855, I am thinking of adding a new interface which > tracks dynamic memory used by Collectors. This shall allow users to get an > accountability as to the memory usage of their Collectors and better plan > their resource capacity. This shall also allow us to add Collector level > limits for memory usage, thus allowing users a finer control over their > resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8864) Add Query Memory Estimation Ability in QueryVisitor
[ https://issues.apache.org/jira/browse/LUCENE-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866933#comment-16866933 ] Atri Sharma commented on LUCENE-8864: - Right, the purpose of this Jira was twofold: 1) To throw out thoughts about making memory accounting a first class citizen within QueryVisitor. I think it would be good if we added a method which returned the overall size of the underlying query. This fits in nicely with QueryVisitor's model since queries can be nested, so it is good to get the "deep" memory usage of the parent query. As you said, the new method could return the Accountable's estimate or shallow size if Accountable is not supported. 2) Borrow ideas from QueryVisitor design to see if we can improve Accountable itself. While this is orthogonal and I have not really thought through every corner case, my instinct says that there might be opportunities to improve Accountable's APIs to be more recursive in nature. For eg, there are a ton of instanceof checks present today, for each Query type. Should we think about delegating some of that calculation to a visitor type model which localizes the per query calculation to the query's scope? > Add Query Memory Estimation Ability in QueryVisitor > --- > > Key: LUCENE-8864 > URL: https://issues.apache.org/jira/browse/LUCENE-8864 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > In LUCENE-8855, there is a discussion around adding memory accounting > capabilities to QueryVisitor to allow estimation of memory consumption by > queries.' > This Jira tracks the effort -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8769) Range Query Type With Logically Connected Ranges
[ https://issues.apache.org/jira/browse/LUCENE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866884#comment-16866884 ] Atri Sharma commented on LUCENE-8769: - Thinking more about this, I think what can be done is: 1) Introduce NOT semantics by translating NOT (a, b) to (-infinity, a) AND (b, infinity) 2) Introduce a RangeClause which contains a bunch of ranges and associated AND and NOT clauses (not OR). Each RangeClause will be independently executed, and then the final result then ANDed or ORed. For eg: (a AND B) OR (c NOT d) converts to two RangeClauses: \{a, b, AND}, \{c, d, NOT}, where the RangeClauses are connected by OR, so the independent results of both clauses are then ORed to give final result. Does this seem useful and a doable approach? > Range Query Type With Logically Connected Ranges > > > Key: LUCENE-8769 > URL: https://issues.apache.org/jira/browse/LUCENE-8769 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8769.patch, LUCENE-8769.patch, LUCENE-8769.patch > > > Today, we visit BKD tree for each range specified for PointRangeQuery. It > would be good to have a range query type which can take multiple ranges > logically ANDed or ORed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org