[jira] [Commented] (LUCENE-10134) TestSortedSetDocValuesFacets fails with Bits shared between threads
[ https://issues.apache.org/jira/browse/LUCENE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423047#comment-17423047 ] Ankur commented on LUCENE-10134: [~dweiss] Thanks for cleaning up the resource leaks on error. Here is a PR with the fix. [https://github.com/apache/lucene/pull/345/files] The issue did not surface in my local testing using `./gradlew precommit` and `./gradlew test`. I wonder if I am missing a test step that could have helped catch this error during development. Was this error produced by randomized tests running in Lucene nightly benchmarks ? > TestSortedSetDocValuesFacets fails with Bits shared between threads > --- > > Key: LUCENE-10134 > URL: https://issues.apache.org/jira/browse/LUCENE-10134 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Reporter: Dawid Weiss >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Repro: > {code} > gradlew -p lucene\facet test -Ptests.seed=5E8A2F2BBCCBDF1B > {code} > {code} > org.apache.lucene.facet.sortedset.TestSortedSetDocValuesFacets > testCountAll > FAILED > java.lang.AssertionError: Bits are only supposed to be consumed in the > thread in which they have been acquired. But > was acquired in > Thread[TEST-TestSortedSetDocValuesFacets.testCountAll-seed#[5E8A2F2BBCCBDF1B],5,TGRP-TestSortedSetDocValuesFacets] > and consumed in > Thread[TestIndexSearcher-2-thread-1,5,TGRP-TestSortedSetDocValuesFacets]. > at > __randomizedtesting.SeedInfo.seed([5E8A2F2BBCCBDF1B:FFB0A39BEF474EA3]:0) > at > org.apache.lucene.index.AssertingLeafReader.assertThread(AssertingLeafReader.java:43) > at > org.apache.lucene.index.AssertingLeafReader$AssertingBits.get(AssertingLeafReader.java:1374) > at org.apache.lucene.facet.FacetUtils$1.doNext(FacetUtils.java:62) > at org.apache.lucene.facet.FacetUtils$1.nextDoc(FacetUtils.java:70) > at > org.apache.lucene.facet.sortedset.ConcurrentSortedSetDocValuesFacetCounts$CountOneSegment.call(ConcurrentSortedSetDocValuesFacetCounts.java:260) > at > org.apache.lucene.facet.sortedset.ConcurrentSortedSetDocValuesFacetCounts$CountOneSegment.call(ConcurrentSortedSetDocValuesFacetCounts.java:159) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) > at java.base/java.lang.Thread.run(Thread.java:832) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10070) "count all" faceting functionality counts deleted docs for multiple implementations
[ https://issues.apache.org/jira/browse/LUCENE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413450#comment-17413450 ] Ankur edited comment on LUCENE-10070 at 9/11/21, 1:12 AM: -- [~gsmiller] Thanks for taking a look at the above PR. I incorporated the feedback in a different PR due to GIT related issues at my end. Request folks to continue the conversation there. [https://github.com/apache/lucene/pull/293/files] was (Author: goankur): [~gsmiller] Thanks for taking a look at the above PR. I incorporated your feedback into the changes capture in a different PR due to GIT related issues at my end. Request you to continue the conversation there. [https://github.com/apache/lucene/pull/293/files] > "count all" faceting functionality counts deleted docs for multiple > implementations > --- > > Key: LUCENE-10070 > URL: https://issues.apache.org/jira/browse/LUCENE-10070 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Reporter: Greg Miller >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > A few different {{Facets}} implementations supporting a "count all" style > constructor that allows the user to not pass in a {{FacetsCollector}} > instance. It advertises that it's equivalent to using a {{FacetsCollector}} > populated with a {{MatchAllDocsQuery}}, but more efficient. It looks like, > with the exception of {{FastTaxonomyFacetCounts}}, none of the > implementations correctly account for deleted documents (have a look at > {{FastTaxonomyFacetCounts}} for a correct example that consults "live docs." > From what I can tell, the affected implementations are: > * SortedSetDocValueFacetCounts > * ConcurrentSortedSetDocValueFacetCounts > * LongValueFacetCounts > * StringValueFacetCounts > I'll attach a PR shortly illustrating unit tests I wrote that confirm the bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10070) "count all" faceting functionality counts deleted docs for multiple implementations
[ https://issues.apache.org/jira/browse/LUCENE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413450#comment-17413450 ] Ankur edited comment on LUCENE-10070 at 9/11/21, 1:11 AM: -- [~gsmiller] Thanks for taking a look at the above PR. I incorporated your feedback into the changes capture in a different PR due to GIT related issues at my end. Request you to continue the conversation there. [https://github.com/apache/lucene/pull/293/files] was (Author: goankur): [~gsmiller] Thanks for taking a look the the above PR. I incorporated your feedback into the changes capture in a different PR due to GIT related issues at my end. Request you to continue the conversation there. https://github.com/apache/lucene/pull/293/files > "count all" faceting functionality counts deleted docs for multiple > implementations > --- > > Key: LUCENE-10070 > URL: https://issues.apache.org/jira/browse/LUCENE-10070 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Reporter: Greg Miller >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > A few different {{Facets}} implementations supporting a "count all" style > constructor that allows the user to not pass in a {{FacetsCollector}} > instance. It advertises that it's equivalent to using a {{FacetsCollector}} > populated with a {{MatchAllDocsQuery}}, but more efficient. It looks like, > with the exception of {{FastTaxonomyFacetCounts}}, none of the > implementations correctly account for deleted documents (have a look at > {{FastTaxonomyFacetCounts}} for a correct example that consults "live docs." > From what I can tell, the affected implementations are: > * SortedSetDocValueFacetCounts > * ConcurrentSortedSetDocValueFacetCounts > * LongValueFacetCounts > * StringValueFacetCounts > I'll attach a PR shortly illustrating unit tests I wrote that confirm the bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10070) "count all" faceting functionality counts deleted docs for multiple implementations
[ https://issues.apache.org/jira/browse/LUCENE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413450#comment-17413450 ] Ankur commented on LUCENE-10070: [~gsmiller] Thanks for taking a look the the above PR. I incorporated your feedback into the changes capture in a different PR due to GIT related issues at my end. Request you to continue the conversation there. https://github.com/apache/lucene/pull/293/files > "count all" faceting functionality counts deleted docs for multiple > implementations > --- > > Key: LUCENE-10070 > URL: https://issues.apache.org/jira/browse/LUCENE-10070 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Reporter: Greg Miller >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > A few different {{Facets}} implementations supporting a "count all" style > constructor that allows the user to not pass in a {{FacetsCollector}} > instance. It advertises that it's equivalent to using a {{FacetsCollector}} > populated with a {{MatchAllDocsQuery}}, but more efficient. It looks like, > with the exception of {{FastTaxonomyFacetCounts}}, none of the > implementations correctly account for deleted documents (have a look at > {{FastTaxonomyFacetCounts}} for a correct example that consults "live docs." > From what I can tell, the affected implementations are: > * SortedSetDocValueFacetCounts > * ConcurrentSortedSetDocValueFacetCounts > * LongValueFacetCounts > * StringValueFacetCounts > I'll attach a PR shortly illustrating unit tests I wrote that confirm the bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10070) "count all" faceting functionality counts deleted docs for multiple implementations
[ https://issues.apache.org/jira/browse/LUCENE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409785#comment-17409785 ] Ankur commented on LUCENE-10070: New PR available - [https://github.com/apache/lucene/pull/282/files] > "count all" faceting functionality counts deleted docs for multiple > implementations > --- > > Key: LUCENE-10070 > URL: https://issues.apache.org/jira/browse/LUCENE-10070 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Reporter: Greg Miller >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > A few different {{Facets}} implementations supporting a "count all" style > constructor that allows the user to not pass in a {{FacetsCollector}} > instance. It advertises that it's equivalent to using a {{FacetsCollector}} > populated with a {{MatchAllDocsQuery}}, but more efficient. It looks like, > with the exception of {{FastTaxonomyFacetCounts}}, none of the > implementations correctly account for deleted documents (have a look at > {{FastTaxonomyFacetCounts}} for a correct example that consults "live docs." > From what I can tell, the affected implementations are: > * SortedSetDocValueFacetCounts > * ConcurrentSortedSetDocValueFacetCounts > * LongValueFacetCounts > * StringValueFacetCounts > I'll attach a PR shortly illustrating unit tests I wrote that confirm the bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10070) "count all" faceting functionality counts deleted docs for multiple implementations
[ https://issues.apache.org/jira/browse/LUCENE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409708#comment-17409708 ] Ankur commented on LUCENE-10070: Let me pick it up > "count all" faceting functionality counts deleted docs for multiple > implementations > --- > > Key: LUCENE-10070 > URL: https://issues.apache.org/jira/browse/LUCENE-10070 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Reporter: Greg Miller >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > A few different {{Facets}} implementations supporting a "count all" style > constructor that allows the user to not pass in a {{FacetsCollector}} > instance. It advertises that it's equivalent to using a {{FacetsCollector}} > populated with a {{MatchAllDocsQuery}}, but more efficient. It looks like, > with the exception of {{FastTaxonomyFacetCounts}}, none of the > implementations correctly account for deleted documents (have a look at > {{FastTaxonomyFacetCounts}} for a correct example that consults "live docs." > From what I can tell, the affected implementations are: > * SortedSetDocValueFacetCounts > * ConcurrentSortedSetDocValueFacetCounts > * LongValueFacetCounts > * StringValueFacetCounts > I'll attach a PR shortly illustrating unit tests I wrote that confirm the bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10048) Bypass total frequency check if field uses custom term frequency
[ https://issues.apache.org/jira/browse/LUCENE-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398355#comment-17398355 ] Ankur commented on LUCENE-10048: Thanks for your response [~rcmuir]. Let me try to explain the use case at a high level # An offline (map-reduce style) batch process consumes a set of indexable documents. # The process also consumes terms and metadata information from external data sources. # For each indexable document, the batch-process computes a set of term-doc scores and add this set to a document field (to be indexed later). # A document will only have a small number of such terms in a field, *less than 10K*. # There could be *many such fields* in a single document populated by different offline processes, all of which scale these values arbitrarily (due to historical reasons) but still make sure a single value fits in 4-bytes. # The document also has usual textual fields (title, description etc) for which Lucene computes term/field statistics and produces BM25 scores. # All of these scores are used by a ranking method. You are referring to [Payloads|https://cwiki.apache.org/confluence/display/LUCENE/Payloads] right? It is a viable option but less space efficient (no delta compression) compared to storing these values directly as term-frequencies. So only for fields that are populated by an external process, I am hoping we can come up with a mechanism to ignore the overflow checks on term/field statistics. > Bypass total frequency check if field uses custom term frequency > > > Key: LUCENE-10048 > URL: https://issues.apache.org/jira/browse/LUCENE-10048 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tony Xu >Priority: Minor > > For all fields whose index option is not *IndexOptions.NONE*. There is a > check on per field total token count (i.e. field-length) to ensure we don't > index too many tokens. This is done by accumulating the token's > *TermFrequencyAttribute.* > > Given that currently Lucene allows custom term frequency attached to each > token and the usage of the frequency can be pretty wild. It is possible to > have the following case where the check fails with only a few tokens that > have large frequencies. Currently Lucene will skip indexing the whole > document. > *"foo| bar|"* > > What should be way to inform the indexing chain not to check the field length? > A related observation, when custom term frequency is in use, user is not > likely to use the similarity for this field. Maybe we can offer a way to > specify that, too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10048) Bypass total frequency check if field uses custom term frequency
[ https://issues.apache.org/jira/browse/LUCENE-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397771#comment-17397771 ] Ankur commented on LUCENE-10048: @[~rcmuir] Consider the case where these term-document level scoring factors are computed in an offline process, indexed in Lucene and accessed at query time by a ranking function that does not rely on Lucene's [Scorer|https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/Scorer.html] and [Similarity|https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/similarities/Similarity.html] abstractions. What is considered reasonable is up to the offline process that serves the needs of the ranking function and is outside our control. A single term-document scoring factor can still be less than {{Integer.MAX_VALUE}} but the sum of all such factors for a document could easily exceed the {{Integer.MAX_VALUE}} range. Without this our only option (I think) is to use {{BinaryDocValues}} and implement mechanisms to serialize/deserialize term-document level scoring factors at indexing and searching time ourselves. With this we don't get the space efficiencies that come with the use of highly optimized terms dictionary and the integer compression techniques used to encode postings data (at least not without significant work). Maybe we can keep the restriction on the custom term frequency to be less than {{Integer.MAX_VALUE}} but relax the check on per field total token count for the expert use case ? > Bypass total frequency check if field uses custom term frequency > > > Key: LUCENE-10048 > URL: https://issues.apache.org/jira/browse/LUCENE-10048 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tony Xu >Priority: Minor > > For all fields whose index option is not *IndexOptions.NONE*. There is a > check on per field total token count (i.e. field-length) to ensure we don't > index too many tokens. This is done by accumulating the token's > *TermFrequencyAttribute.* > > Given that currently Lucene allows custom term frequency attached to each > token and the usage of the frequency can be pretty wild. It is possible to > have the following case where the check fails with only a few tokens that > have large frequencies. Currently Lucene will skip indexing the whole > document. > *"foo| bar|"* > > What should be way to inform the indexing chain not to check the field length? > A related observation, when custom term frequency is in use, user is not > likely to use the similarity for this field. Maybe we can offer a way to > specify that, too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9385) Skip indexing facet drill down terms
[ https://issues.apache.org/jira/browse/LUCENE-9385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304324#comment-17304324 ] Ankur commented on LUCENE-9385: --- Hi Zach, I am not currently working on it. Feel free to grab it :) > Skip indexing facet drill down terms > > > Key: LUCENE-9385 > URL: https://issues.apache.org/jira/browse/LUCENE-9385 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet >Affects Versions: 8.5.2 >Reporter: Ankur >Priority: Minor > Labels: easyfix > > FacetsConfig creates index terms from the Facet dimension and path > automatically for the purpose of supporting drill-down queries. > An application that does not need drill-down ends up paying the index cost of > the extra terms. > Ideally an option to skip indexing these drill down terms should be exposed > to the application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9838) simd version of VectorUtil.dotProduct
[ https://issues.apache.org/jira/browse/LUCENE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302157#comment-17302157 ] Ankur edited comment on LUCENE-9838 at 3/16/21, 2:23 AM: - This is cool - [~rcmuir]. I played with this a little on my MacBook Pro (2019, *Memory*: 32 GB 2667 MHZ DDR4; *Processor*: 2.6 GHz 6-Core Intel Core i7) after downloading *OpenJDK build 16+36-2231* and setting up a standalone [JMH benchmark|https://github.com/openjdk/jmh] project. I copied over the old dotProduct implementation and the new one from your patch to _MyBenchmark.java_ in the JMH project space. Here are the results I got {code:java} Benchmark (size) Mode CntScore Error Units MyBenchmark.dotProductOld 16 thrpt5 90.896 ± 5.302 ops/us MyBenchmark.dotProductNew 16 thrpt5 100.901 ± 5.105 ops/us MyBenchmark.dotProductOld 32 thrpt5 53.563 ± 2.378 ops/us MyBenchmark.dotProductNew 32 thrpt5 97.610 ± 5.393 ops/us MyBenchmark.dotProductOld 64 thrpt5 29.792 ± 1.246 ops/us MyBenchmark.dotProductNew 64 thrpt5 73.499 ± 3.640 ops/us MyBenchmark.dotProductOld 128 thrpt5 16.906 ± 0.751 ops/us MyBenchmark.dotProductNew 128 thrpt5 65.068 ± 3.986 ops/us MyBenchmark.dotProductOld 256 thrpt58.360 ± 0.125 ops/us MyBenchmark.dotProductNew 256 thrpt5 42.595 ± 2.958 ops/us MyBenchmark.dotProductOld 512 thrpt54.231 ± 0.158 ops/us MyBenchmark.dotProductNew 512 thrpt5 26.283 ± 0.640 ops/us MyBenchmark.dotProductOld1024 thrpt52.104 ± 0.093 ops/us MyBenchmark.dotProductNew1024 thrpt5 14.389 ± 0.720 ops/us {code} These benchmarks were run after adding annotations to disable TieredCompilation and vector bounds check. Looks like for small vector size (*16 elements*) we see *10%* improvement but for large vectors (*128 or more* elements) the improvement is *_4X or higher._* was (Author: goankur): This is cool - [~rcmuir]. I played with this a little on my MacBook Pro (2019, *Memory*: 32 GB 2667 MHZ DDR4; *Processor*: 2.6 GHz 6-Core Intel Core i7) after downloading [OpenJDK build 16+36-2231|https://download.java.net/java/GA/jdk16/7863447f0ab643c585b9bdebf67c69db/36/GPL/openjdk-16_osx-x64_bin.tar.gz] and setting up a standalone [JMH benchmark|https://github.com/openjdk/jmh] project. I copied over the old dotProduct implementation and the new one from your patch to _MyBenchmark.java_ in the JMH project space. Here are the results I got {code:java} Benchmark (size) Mode CntScore Error Units MyBenchmark.dotProductOld 16 thrpt5 90.896 ± 5.302 ops/us MyBenchmark.dotProductNew 16 thrpt5 100.901 ± 5.105 ops/us MyBenchmark.dotProductOld 32 thrpt5 53.563 ± 2.378 ops/us MyBenchmark.dotProductNew 32 thrpt5 97.610 ± 5.393 ops/us MyBenchmark.dotProductOld 64 thrpt5 29.792 ± 1.246 ops/us MyBenchmark.dotProductNew 64 thrpt5 73.499 ± 3.640 ops/us MyBenchmark.dotProductOld 128 thrpt5 16.906 ± 0.751 ops/us MyBenchmark.dotProductNew 128 thrpt5 65.068 ± 3.986 ops/us MyBenchmark.dotProductOld 256 thrpt58.360 ± 0.125 ops/us MyBenchmark.dotProductNew 256 thrpt5 42.595 ± 2.958 ops/us MyBenchmark.dotProductOld 512 thrpt54.231 ± 0.158 ops/us MyBenchmark.dotProductNew 512 thrpt5 26.283 ± 0.640 ops/us MyBenchmark.dotProductOld1024 thrpt52.104 ± 0.093 ops/us MyBenchmark.dotProductNew1024 thrpt5 14.389 ± 0.720 ops/us {code} These benchmarks were run after adding annotations to disable TieredCompilation and vector bounds check. Looks like for small vector size (*16 elements*) we see *10%* improvement but for large vectors (*128 or more* elements) the improvement is *_4X or higher._* > simd version of VectorUtil.dotProduct > - > > Key: LUCENE-9838 > URL: https://issues.apache.org/jira/browse/LUCENE-9838 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9838.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > Followup to LUCENE-9837 > Let's explore using JDK 16 vector API to speed this up more. It might be a > hassle to try to MR-JAR/package up for users (adding commandline flags and > stuff), but it gives good performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9838) simd version of VectorUtil.dotProduct
[ https://issues.apache.org/jira/browse/LUCENE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302157#comment-17302157 ] Ankur commented on LUCENE-9838: --- This is cool - [~rcmuir]. I played with this a little on my MacBook Pro (2019, *Memory*: 32 GB 2667 MHZ DDR4; *Processor*: 2.6 GHz 6-Core Intel Core i7) after downloading [OpenJDK build 16+36-2231|https://download.java.net/java/GA/jdk16/7863447f0ab643c585b9bdebf67c69db/36/GPL/openjdk-16_osx-x64_bin.tar.gz] and setting up a standalone [JMH benchmark|https://github.com/openjdk/jmh] project. I copied over the old dotProduct implementation and the new one from your patch to _MyBenchmark.java_ in the JMH project space. Here are the results I got {code:java} Benchmark (size) Mode CntScore Error Units MyBenchmark.dotProductOld 16 thrpt5 90.896 ± 5.302 ops/us MyBenchmark.dotProductNew 16 thrpt5 100.901 ± 5.105 ops/us MyBenchmark.dotProductOld 32 thrpt5 53.563 ± 2.378 ops/us MyBenchmark.dotProductNew 32 thrpt5 97.610 ± 5.393 ops/us MyBenchmark.dotProductOld 64 thrpt5 29.792 ± 1.246 ops/us MyBenchmark.dotProductNew 64 thrpt5 73.499 ± 3.640 ops/us MyBenchmark.dotProductOld 128 thrpt5 16.906 ± 0.751 ops/us MyBenchmark.dotProductNew 128 thrpt5 65.068 ± 3.986 ops/us MyBenchmark.dotProductOld 256 thrpt58.360 ± 0.125 ops/us MyBenchmark.dotProductNew 256 thrpt5 42.595 ± 2.958 ops/us MyBenchmark.dotProductOld 512 thrpt54.231 ± 0.158 ops/us MyBenchmark.dotProductNew 512 thrpt5 26.283 ± 0.640 ops/us MyBenchmark.dotProductOld1024 thrpt52.104 ± 0.093 ops/us MyBenchmark.dotProductNew1024 thrpt5 14.389 ± 0.720 ops/us {code} These benchmarks were run after adding annotations to disable TieredCompilation and vector bounds check. Looks like for small vector size (*16 elements*) we see *10%* improvement but for large vectors (*128 or more* elements) the improvement is *_4X or higher._* > simd version of VectorUtil.dotProduct > - > > Key: LUCENE-9838 > URL: https://issues.apache.org/jira/browse/LUCENE-9838 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9838.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > Followup to LUCENE-9837 > Let's explore using JDK 16 vector API to speed this up more. It might be a > hassle to try to MR-JAR/package up for users (adding commandline flags and > stuff), but it gives good performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9838) simd version of VectorUtil.dotProduct
[ https://issues.apache.org/jira/browse/LUCENE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302157#comment-17302157 ] Ankur edited comment on LUCENE-9838 at 3/16/21, 2:10 AM: - This is cool - [~rcmuir]. I played with this a little on my MacBook Pro (2019, *Memory*: 32 GB 2667 MHZ DDR4; *Processor*: 2.6 GHz 6-Core Intel Core i7) after downloading [OpenJDK build 16+36-2231|https://download.java.net/java/GA/jdk16/7863447f0ab643c585b9bdebf67c69db/36/GPL/openjdk-16_osx-x64_bin.tar.gz] and setting up a standalone [JMH benchmark|https://github.com/openjdk/jmh] project. I copied over the old dotProduct implementation and the new one from your patch to _MyBenchmark.java_ in the JMH project space. Here are the results I got {code:java} Benchmark (size) Mode CntScore Error Units MyBenchmark.dotProductOld 16 thrpt5 90.896 ± 5.302 ops/us MyBenchmark.dotProductNew 16 thrpt5 100.901 ± 5.105 ops/us MyBenchmark.dotProductOld 32 thrpt5 53.563 ± 2.378 ops/us MyBenchmark.dotProductNew 32 thrpt5 97.610 ± 5.393 ops/us MyBenchmark.dotProductOld 64 thrpt5 29.792 ± 1.246 ops/us MyBenchmark.dotProductNew 64 thrpt5 73.499 ± 3.640 ops/us MyBenchmark.dotProductOld 128 thrpt5 16.906 ± 0.751 ops/us MyBenchmark.dotProductNew 128 thrpt5 65.068 ± 3.986 ops/us MyBenchmark.dotProductOld 256 thrpt58.360 ± 0.125 ops/us MyBenchmark.dotProductNew 256 thrpt5 42.595 ± 2.958 ops/us MyBenchmark.dotProductOld 512 thrpt54.231 ± 0.158 ops/us MyBenchmark.dotProductNew 512 thrpt5 26.283 ± 0.640 ops/us MyBenchmark.dotProductOld1024 thrpt52.104 ± 0.093 ops/us MyBenchmark.dotProductNew1024 thrpt5 14.389 ± 0.720 ops/us {code} These benchmarks were run after adding annotations to disable TieredCompilation and vector bounds check. Looks like for small vector size (*16 elements*) we see *10%* improvement but for large vectors (*128 or more* elements) the improvement is *_4X or higher._* was (Author: goankur): This is cool - [~rcmuir]. I played with this a little on my MacBook Pro (2019, *Memory*: 32 GB 2667 MHZ DDR4; *Processor*: 2.6 GHz 6-Core Intel Core i7) after downloading [OpenJDK build 16+36-2231|https://download.java.net/java/GA/jdk16/7863447f0ab643c585b9bdebf67c69db/36/GPL/openjdk-16_osx-x64_bin.tar.gz] and setting up a standalone [JMH benchmark|https://github.com/openjdk/jmh] project. I copied over the old dotProduct implementation and the new one from your patch to _MyBenchmark.java_ in the JMH project space. Here are the results I got {code:java} Benchmark (size) Mode CntScore Error Units MyBenchmark.dotProductOld 16 thrpt5 90.896 ± 5.302 ops/us MyBenchmark.dotProductNew 16 thrpt5 100.901 ± 5.105 ops/us MyBenchmark.dotProductOld 32 thrpt5 53.563 ± 2.378 ops/us MyBenchmark.dotProductNew 32 thrpt5 97.610 ± 5.393 ops/us MyBenchmark.dotProductOld 64 thrpt5 29.792 ± 1.246 ops/us MyBenchmark.dotProductNew 64 thrpt5 73.499 ± 3.640 ops/us MyBenchmark.dotProductOld 128 thrpt5 16.906 ± 0.751 ops/us MyBenchmark.dotProductNew 128 thrpt5 65.068 ± 3.986 ops/us MyBenchmark.dotProductOld 256 thrpt58.360 ± 0.125 ops/us MyBenchmark.dotProductNew 256 thrpt5 42.595 ± 2.958 ops/us MyBenchmark.dotProductOld 512 thrpt54.231 ± 0.158 ops/us MyBenchmark.dotProductNew 512 thrpt5 26.283 ± 0.640 ops/us MyBenchmark.dotProductOld1024 thrpt52.104 ± 0.093 ops/us MyBenchmark.dotProductNew1024 thrpt5 14.389 ± 0.720 ops/us {code} These benchmarks were run after adding annotations to disable TieredCompilation and vector bounds check. Looks like for small vector size (*16 elements*) we see *10%* improvement but for large vectors (*128 or more* elements) the improvement is *_4X or higher._* > simd version of VectorUtil.dotProduct > - > > Key: LUCENE-9838 > URL: https://issues.apache.org/jira/browse/LUCENE-9838 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9838.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > Followup to LUCENE-9837 > Let's explore using JDK 16 vector API to speed this up more. It might be a > hassle to try to MR-JAR/package up for users (adding commandline flags and > stuff), but it gives good performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249980#comment-17249980 ] Ankur commented on LUCENE-9444: --- [~mikemccand], Sorry for the late response. Yes we can resolve this one now. > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Fix For: master (9.0) > > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur resolved LUCENE-9444. --- Resolution: Fixed > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Fix For: master (9.0) > > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203559#comment-17203559 ] Ankur edited comment on LUCENE-9444 at 9/29/20, 12:06 AM: -- Thanks [~mikemccand] for merging the [PR-1893.|https://github.com/apache/lucene-solr/pull/1893/files] I just realized that the changes in {{TestTaxonomyFacetCounts.testRandom()}} did not exercise the API to get facet labels for specific dimension - {{TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)}} so I added the required changes in [PR-1928.|https://github.com/apache/lucene-solr/pull/1928/files] Re-opening the issue so that you can take a look. was (Author: goankur): Thanks [~mikemccand] for merging the [PR-1893.|https://github.com/apache/lucene-solr/pull/1893/files] I just realized that the changes in {{TestTaxonomyFacetCounts.testRandom()}} did not exercise the API to get facet labels for specific dimension - {{TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)}} so I added the required changes in [PR-1928.|https://github.com/apache/lucene-solr/pull/1928/files] Re-opening the issue so that you can take a look. > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Fix For: master (9.0), 8.7 > > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203559#comment-17203559 ] Ankur edited comment on LUCENE-9444 at 9/29/20, 12:05 AM: -- Thanks [~mikemccand] for merging the [PR-1893.|https://github.com/apache/lucene-solr/pull/1893/files] I just realized that the changes in {{TestTaxonomyFacetCounts.testRandom()}} did not exercise the API to get facet labels for specific dimension - {{TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)}} so I added the required changes in [PR-1928.|https://github.com/apache/lucene-solr/pull/1928/files] Re-opening the issue so that you can take a look. was (Author: goankur): Thanks [~mikemccand] for merging the [PR-1893.|https://github.com/apache/lucene-solr/pull/1893/files] I just realized that the changes in {{TestTaxonomyFacetCounts.testRandom()}} did not exercise the API to get facet labels for specific dimension - {{TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)}} so I made changes in [PR-1928.|https://github.com/apache/lucene-solr/pull/1928/files] Can you please take a look ? > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Fix For: master (9.0), 8.7 > > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur reopened LUCENE-9444: --- Thanks [~mikemccand] for merging the [PR-1893.|https://github.com/apache/lucene-solr/pull/1893/files] I just realized that the changes in {{TestTaxonomyFacetCounts.testRandom()}} did not exercise the API to get facet labels for specific dimension - {{TaxonomyFacetLabels.nextFacetLabel(docId, facetDimension)}} so I made changes in [PR-1928.|https://github.com/apache/lucene-solr/pull/1928/files] Can you please take a look ? > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Fix For: master (9.0), 8.7 > > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200529#comment-17200529 ] Ankur edited comment on LUCENE-9444 at 9/25/20, 5:50 PM: - Thanks [~mikemccand], I incorporated the code review feedback and * Replaced {{assert}} with {{IllegalArgumentException}} for invalid inputs * Added javadoc notes explaining that returned _FacetLabels_ may not be in the same order in which they were indexed. * Enhanced {{TestTaxonomyFacetCount.testRandom()}} method to exercise the API to get facet labels for each matching document. * Here is the updated PR - https://github.com/apache/lucene-solr/pull/1893/commits/75ff251ebac9034c93edbb43dcf5d8dd0f1058ae was (Author: goankur): Thanks [~mikemccand], I incorporated the code review feedback and * Replaced {{assert} with {{IllegalArgumentException}} for invalid inputs * Added javadoc notes explaining that returned _FacetLabels_ may not be in the same order in which they were indexed. * Enhanced {{TestTaxonomyFacetCount.testRandom()}} method to exercise the API to get facet labels for each matching document. * Here is the updated PR - https://github.com/apache/lucene-solr/pull/1893/commits/75ff251ebac9034c93edbb43dcf5d8dd0f1058ae > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200529#comment-17200529 ] Ankur edited comment on LUCENE-9444 at 9/23/20, 4:34 AM: - Thanks [~mikemccand], I incorporated the code review feedback and * Replaced {{assert} with {{IllegalArgumentException}} for invalid inputs * Added javadoc notes explaining that returned _FacetLabels_ may not be in the same order in which they were indexed. * Enhanced {{TestTaxonomyFacetCount.testRandom()}} method to exercise the API to get facet labels for each matching document. * Here is the updated PR - https://github.com/apache/lucene-solr/pull/1893/commits/75ff251ebac9034c93edbb43dcf5d8dd0f1058ae was (Author: goankur): [~mikemccand], I incorporated the code review feedback and * Replaced {{assert} with {{IllegalArgumentException}} for invalid inputs * Added javadoc notes explaining that returned _FacetLabels_ may not be in the same order in which they were indexed. * Enhanced {{TestTaxonomyFacetCount.testRandom()}} method to exercise the API to get facet labels for each matching document. * Here is the updated PR - https://github.com/apache/lucene-solr/pull/1893/commits/75ff251ebac9034c93edbb43dcf5d8dd0f1058ae > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200529#comment-17200529 ] Ankur commented on LUCENE-9444: --- [~mikemccand], I incorporated the code review feedback and * Replaced {{assert} with {{IllegalArgumentException}} for invalid inputs * Added javadoc notes explaining that returned _FacetLabels_ may not be in the same order in which they were indexed. * Enhanced {{TestTaxonomyFacetCount.testRandom()}} method to exercise the API to get facet labels for each matching document. * Here is the updated PR - https://github.com/apache/lucene-solr/pull/1893/commits/75ff251ebac9034c93edbb43dcf5d8dd0f1058ae > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198780#comment-17198780 ] Ankur edited comment on LUCENE-9444 at 9/19/20, 6:30 PM: - Thanks [~mikemccand] for making those changes. I made a couple of minor edits to _TaxonomyFacetLabels.java_ * Removed the reference to \{@link java.util.Iterator} as it is no longer used. * Fixed typo in javadoc. * Replaced {code:java} if (parentOrd == INVALID_ORDINAL) { throw new AssertionError("Root ordinal not found for facet dimension: " + facetDimension); }{code} with single line {code:java} assert parentOrd != INVALID_ORDINAL : "Category ordinal not found for facet dimension: " + facetDimension; {code} in method {code:java} public FacetLabel nextFacetLabel(int docId, String facetDimension) throws IOException{code} * Created a pull request as you suggested in one of your earlier comments :) ** [https://github.com/apache/lucene-solr/pull/1893/commits/bf8eaf98901cbe83f23067bea90dfb2f3102603a] Can you take a look and see if it's ready to be committed ? was (Author: goankur): Thanks [~mikemccand] for making those changes. I made a couple of minor edits * Fixed typo in javadoc * Replaced {code:java} if (parentOrd == INVALID_ORDINAL) { throw new AssertionError("Root ordinal not found for facet dimension: " + facetDimension); }{code} with single line {code:java} assert parentOrd != INVALID_ORDINAL : "Category ordinal not found for facet dimension: " + facetDimension; {code} in method {code:java} public FacetLabel nextFacetLabel(int docId, String facetDimension) throws IOException{code} * Created a pull request as you suggested in one of your earlier comments :) ** [https://github.com/apache/lucene-solr/pull/1893/commits/bf8eaf98901cbe83f23067bea90dfb2f3102603a] Can you take a look and see if it's ready to be committed ? > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198780#comment-17198780 ] Ankur commented on LUCENE-9444: --- Thanks [~mikemccand] for making those changes. I made a couple of minor edits * Fixed typo in javadoc * Replaced {code:java} if (parentOrd == INVALID_ORDINAL) { throw new AssertionError("Root ordinal not found for facet dimension: " + facetDimension); }{code} with single line {code:java} assert parentOrd != INVALID_ORDINAL : "Category ordinal not found for facet dimension: " + facetDimension; {code} in method {code:java} public FacetLabel nextFacetLabel(int docId, String facetDimension) throws IOException{code} * Created a pull request as you suggested in one of your earlier comments :) ** [https://github.com/apache/lucene-solr/pull/1893/commits/bf8eaf98901cbe83f23067bea90dfb2f3102603a] Can you take a look and see if it's ready to be committed ? > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch, LUCENE-9444.patch, > LUCENE-9444.v2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193350#comment-17193350 ] Ankur edited comment on LUCENE-9444 at 9/10/20, 5:06 PM: - Thanks [~mikemccand]. I uploaded a new patch that incorporates the code review feed. The patch * Makes {{FacetLabelReader}} public as suggested. * Adds javadoc explaining why {{FacetLabelReader}} is not thread-safe. * Eliminates {{Iterator}} and replaces {{lookupLabels()}} methods with {{nextFacetLabel()}} methods that just return {{null}} if no more FacetLabels exist for input docId. * Adds {{@lucene.experimental}} to class level javadocs. * Enhances {{TestTaxonomyLabels.testBasic()}} method to check that fetching FacetLabels in decreasing docId order throws {{AssertionError.}} was (Author: goankur): Thanks [~mikemccand]. I uploaded a new patch that incorporates the code review feed. The patch * Makes {{FacetLabelReader}} public as suggested. * Adds javadoc to explaining that {{FacetLabelReader}} is not thread-safe. * Eliminates {{Iterator}} and replaces {{lookupLabels()}} methods with {{nextFacetLabel()}} methods that just return {{null}} if no more FacetLabels exist for input docId. * Adds {{@lucene.experimental}} to class level javadocs. * Enhances {{TestTaxonomyLabels.testBasic()}} method to check that fetching FacetLabels in decreasing docId order throws {{AssertionError.}} > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch, LUCENE-9444.v2.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193350#comment-17193350 ] Ankur commented on LUCENE-9444: --- Thanks [~mikemccand]. I uploaded a new patch that incorporates the code review feed. The patch * Makes {{FacetLabelReader}} public as suggested. * Adds javadoc to explaining that {{FacetLabelReader}} is not thread-safe. * Eliminates {{Iterator}} and replaces {{lookupLabels()}} methods with {{nextFacetLabel()}} methods that just return {{null}} if no more FacetLabels exist for input docId. * Adds {{@lucene.experimental}} to class level javadocs. * Enhances {{TestTaxonomyLabels.testBasic()}} method to check that fetching FacetLabels in decreasing docId order throws {{AssertionError.}} > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch, LUCENE-9444.v2.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9444: -- Attachment: LUCENE-9444.v2.patch > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch, LUCENE-9444.v2.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190342#comment-17190342 ] Ankur edited comment on LUCENE-9444 at 9/3/20, 6:09 PM: Patch has been available for 1+ day, not sure why automated patch testing has not picked it up yet. was (Author: goankur): Patch has been available for 1+ day, not sure why automated patch testing has picked it up yet. > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190342#comment-17190342 ] Ankur commented on LUCENE-9444: --- Patch has been available for 1+ day, not sure why automated patch testing has picked it up yet. > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189618#comment-17189618 ] Ankur edited comment on LUCENE-9444 at 9/2/20, 6:35 PM: Here is a patch that adds a new utility class {{TaxonomyFacetLabels}} with a single method {{getFacetLabelReader(LeafReaderContext)}} that returns an instance of nested class {{FacetLabelReader}}. It uses an instance of {{OrdinalsSegmentReader}} to fetch and decode ordinals for input docid into a reusable buffer and returns an {{Iterator}} that uses {{TaxonomyReader}} to lookup and return {{FacetLabels}} for each ordinal. The patch also adds a new test case {{TestTaxonomyLabels}} demonstrating the usage. was (Author: goankur): Here is a patch that adds a new utility class {{TaxonomyFacetLabels}} with a single method {{getFacetLabelReader(LeafReaderContext)}} that returns an instance of nested class {{FacetLabelReader}}. It uses an instance of {{OrdinalsSegmentReader}} to fetch and decode ordinals for input docid into a reusable buffer and returns an {{Iterator}} that uses {{TaxonomyReader}} to lookup and return {{FacetLabels}} for each ordinal. The patch also adds a new test case {{TestTaxonomyLabels}} demonstrating the usage. > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189618#comment-17189618 ] Ankur edited comment on LUCENE-9444 at 9/2/20, 6:33 PM: Here is a patch that adds a new utility class {{TaxonomyFacetLabels}} with a single method {{getFacetLabelReader(LeafReaderContext)}} that returns an instance of nested class {{FacetLabelReader}}. It uses an instance of {{OrdinalsSegmentReader}} to fetch and decode ordinals for input docid into a reusable buffer and returns an {{Iterator}} that uses {{TaxonomyReader}} to lookup and return {{FacetLabels}} for each ordinal. The patch also adds a new test case {{TestTaxonomyLabels}} demonstrating the usage. was (Author: goankur): Here is a patch that adds a new utility class __TaxonomyFacetLabels__ with a single method {code:java} getFacetLabelReader(LeafReaderContext){code} that returns an instance of nested class {code:java} FacetLabelReader{code} It uses an instance of {code:java} OrdinalsSegmentReader{code} to fetch and decode ordinals for the input docid into a reusable buffer and returns an {code:java} Iterator{code} that uses {code:java} TaxonomyReader{code} to lookup {code:java} FacetLabels{code} for each ordinal. The patch also adds a new test case {code:java} TestTaxonomyLabels{code} demonstrating the usage. > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189618#comment-17189618 ] Ankur edited comment on LUCENE-9444 at 9/2/20, 6:28 PM: Here is a patch that adds a new utility class __TaxonomyFacetLabels__ with a single method {code:java} getFacetLabelReader(LeafReaderContext){code} that returns an instance of nested class {code:java} FacetLabelReader{code} It uses an instance of {code:java} OrdinalsSegmentReader{code} to fetch and decode ordinals for the input docid into a reusable buffer and returns an {code:java} Iterator{code} that uses {code:java} TaxonomyReader{code} to lookup {code:java} FacetLabels{code} for each ordinal. The patch also adds a new test case {code:java} TestTaxonomyLabels{code} demonstrating the usage. was (Author: goankur): Here is a patch that adds a new utility class {code:java} {code} _TaxonomyFacetLabels_with a single method {code:java} getFacetLabelReader(LeafReaderContext){code} that returns an instance of nested class {code:java} FacetLabelReader{code} It uses an instance of {code:java} OrdinalsSegmentReader{code} to fetch and decode ordinals for the input docid into a reusable buffer and returns an {code:java} Iterator{code} that uses {code:java} TaxonomyReader{code} to lookup {code:java} FacetLabels{code} for each ordinal. The patch also adds a new test case {code:java} TestTaxonomyLabels{code} demonstrating the usage. > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189618#comment-17189618 ] Ankur edited comment on LUCENE-9444 at 9/2/20, 6:27 PM: Here is a patch that adds a new utility class {code:java} {code} _TaxonomyFacetLabels_with a single method {code:java} getFacetLabelReader(LeafReaderContext){code} that returns an instance of nested class {code:java} FacetLabelReader{code} It uses an instance of {code:java} OrdinalsSegmentReader{code} to fetch and decode ordinals for the input docid into a reusable buffer and returns an {code:java} Iterator{code} that uses {code:java} TaxonomyReader{code} to lookup {code:java} FacetLabels{code} for each ordinal. The patch also adds a new test case {code:java} TestTaxonomyLabels{code} demonstrating the usage. was (Author: goankur): Here is a patch that adds a new utility class - `TaxonomyFacetLabels` with the method - `getFacetLabelReader(LeafReaderContext)` that returns an instance of nested class - `FacetLabelReader`. `FacetLabelReader` uses an instance of `OrdinalsSegmentReader` to fetch and decode ordinals for the input docid into a reusable buffer and returns an `Iterator` that uses `TaxonomyReader` instance to lookup `FacetLabels`. The patch also adds a new test case `TestTaxonomyLabels` demonstrating the usage. > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189618#comment-17189618 ] Ankur commented on LUCENE-9444: --- Here is a patch that adds a new utility class - `TaxonomyFacetLabels` with the method - `getFacetLabelReader(LeafReaderContext)` that returns an instance of nested class - `FacetLabelReader`. `FacetLabelReader` uses an instance of `OrdinalsSegmentReader` to fetch and decode ordinals for the input docid into a reusable buffer and returns an `Iterator` that uses `TaxonomyReader` instance to lookup `FacetLabels`. The patch also adds a new test case `TestTaxonomyLabels` demonstrating the usage. > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9444: -- Attachment: LUCENE-9444.patch Lucene Fields: New,Patch Available (was: New) Labels: facet (was: ) Status: Open (was: Open) > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9444: -- Status: Patch Available (was: Open) > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > Labels: facet > Attachments: LUCENE-9444.patch > > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9489) Compilation failure due to broken link reference in javadoc
[ https://issues.apache.org/jira/browse/LUCENE-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187105#comment-17187105 ] Ankur commented on LUCENE-9489: --- Thanks Tomoko Uchida for taking a look. It looks like my local git repo was in a weird state where it won't get the latest updates despite multiple 'git pull --rebase origin master' attempts. Doing a fresh git clone solved the problem for me. Sorry for the false alarm. > Compilation failure due to broken link reference in javadoc > --- > > Key: LUCENE-9489 > URL: https://issues.apache.org/jira/browse/LUCENE-9489 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Affects Versions: master (9.0) >Reporter: Ankur >Priority: Trivial > Fix For: master (9.0) > > Attachments: LUCENE-9489.patch > > > Javadoc for method > {code:java} > org.apache.lucene.facet.taxonomy.DocValuesOrdinalsReader.decode(BytesRef buf, > IntsRef ordinals){code} > has a broken link reference causing compilation failure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9489) Compilation failure due to broken link reference in javadoc
[ https://issues.apache.org/jira/browse/LUCENE-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9489: -- Description: Javadoc for method {code:java} org.apache.lucene.facet.taxonomy.DocValuesOrdinalsReader.decode(BytesRef buf, IntsRef ordinals){code} has a broken link reference causing compilation error. was: Javadoc for method {code:java} org.apache.lucene.facet.taxonomy.DocValuesOrdinalsReader.decode(BytesRef buf, IntsRef ordinals){code} has a broken link reference in javadoc causing compilation error. > Compilation failure due to broken link reference in javadoc > --- > > Key: LUCENE-9489 > URL: https://issues.apache.org/jira/browse/LUCENE-9489 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Affects Versions: master (9.0) >Reporter: Ankur >Priority: Trivial > Fix For: master (9.0) > > Attachments: LUCENE-9489.patch > > > Javadoc for method > {code:java} > org.apache.lucene.facet.taxonomy.DocValuesOrdinalsReader.decode(BytesRef buf, > IntsRef ordinals){code} > has a broken link reference causing compilation error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9489) Compilation failure due to broken link reference in javadoc
[ https://issues.apache.org/jira/browse/LUCENE-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9489: -- Status: Patch Available (was: Open) > Compilation failure due to broken link reference in javadoc > --- > > Key: LUCENE-9489 > URL: https://issues.apache.org/jira/browse/LUCENE-9489 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Affects Versions: master (9.0) >Reporter: Ankur >Priority: Trivial > Fix For: master (9.0) > > Attachments: LUCENE-9489.patch > > > Javadoc for method > {code:java} > org.apache.lucene.facet.taxonomy.DocValuesOrdinalsReader.decode(BytesRef buf, > IntsRef ordinals){code} > has a broken link reference in javadoc causing compilation error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9489) Compilation failure due to broken link reference in javadoc
[ https://issues.apache.org/jira/browse/LUCENE-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9489: -- Description: Javadoc for method {code:java} org.apache.lucene.facet.taxonomy.DocValuesOrdinalsReader.decode(BytesRef buf, IntsRef ordinals){code} has a broken link reference causing compilation failure. was: Javadoc for method {code:java} org.apache.lucene.facet.taxonomy.DocValuesOrdinalsReader.decode(BytesRef buf, IntsRef ordinals){code} has a broken link reference causing compilation error. > Compilation failure due to broken link reference in javadoc > --- > > Key: LUCENE-9489 > URL: https://issues.apache.org/jira/browse/LUCENE-9489 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Affects Versions: master (9.0) >Reporter: Ankur >Priority: Trivial > Fix For: master (9.0) > > Attachments: LUCENE-9489.patch > > > Javadoc for method > {code:java} > org.apache.lucene.facet.taxonomy.DocValuesOrdinalsReader.decode(BytesRef buf, > IntsRef ordinals){code} > has a broken link reference causing compilation failure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9489) Compilation failure due to broken link reference in javadoc
[ https://issues.apache.org/jira/browse/LUCENE-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9489: -- Attachment: LUCENE-9489.patch Fix Version/s: master (9.0) Lucene Fields: New,Patch Available (was: New) Status: Open (was: Open) > Compilation failure due to broken link reference in javadoc > --- > > Key: LUCENE-9489 > URL: https://issues.apache.org/jira/browse/LUCENE-9489 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet >Affects Versions: master (9.0) >Reporter: Ankur >Priority: Trivial > Fix For: master (9.0) > > Attachments: LUCENE-9489.patch > > > Javadoc for method > {code:java} > org.apache.lucene.facet.taxonomy.DocValuesOrdinalsReader.decode(BytesRef buf, > IntsRef ordinals){code} > has a broken link reference in javadoc causing compilation error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9489) Compilation failure due to broken link reference in javadoc
Ankur created LUCENE-9489: - Summary: Compilation failure due to broken link reference in javadoc Key: LUCENE-9489 URL: https://issues.apache.org/jira/browse/LUCENE-9489 Project: Lucene - Core Issue Type: Bug Components: modules/facet Affects Versions: master (9.0) Reporter: Ankur Javadoc for method {code:java} org.apache.lucene.facet.taxonomy.DocValuesOrdinalsReader.decode(BytesRef buf, IntsRef ordinals){code} has a broken link reference in javadoc causing compilation error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171759#comment-17171759 ] Ankur commented on LUCENE-9444: --- > ...and it returns another class for actually iterating over the >{{FacetLabel}} for each document in that segment? Should this class extends {{DocIdSetIterator}} to allow intersection with another {{DocIdSetIterator}} created from {{FacetsCollector.MatchingDoc.bits}} ? Making {{dim}} part of ctor feels a bit restrictive, how about providing 2 separate APIs, one that accepts dimension and another that does not ? > Instead of FacetLabel[] maybe ... how about returning a {{java.util.Iterator}} instead of {{FacetLabel[]}} ? > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171215#comment-17171215 ] Ankur edited comment on LUCENE-9444 at 8/5/20, 1:41 AM: Thanks for your response [~mikemccand] Yes, having _dim_ as additional parameter makes sense. Regarding _*BinaryDocValues*_ iterator, initially I was thinking of providing a concrete implementation - *_TaxonomyFacetsLabels_* of abstract class *_TaxonomyFacets_* and add a constructor that accepts a *_LeafReaderContext_* which will then be used to instantiate and reuse the BinaryDocValues iterator between multiple calls to *getLabels(docId, dim).* That way a caller does not need to know if a _*BinaryDocValues*_ field existed at all. The downside is that caller will need to create a new instance of *_TaxonomyFacetsLabels_* for each different *_LeafReaderContext._* But thinking more, I feel its simpler to pass BinaryDocValues iterator as a 3rd argument to *getLabels().* In order to take care of hierarchical fields, I think it makes sense to return FacetLabel[] instead of String[]. The proposed API signature would look like this {{public FacetLabel[] getLabels(int docId, String dim, BinaryDocValues dv)}} was (Author: goankur): Thanks for your response [~mikemccand] Yes, having _dim_ as additional parameter makes sense. Regarding _*BinaryDocValues*_ iterator, initially I was thinking of providing a concrete implementation - *_TaxonomyFacetsLabels_* of abstract class *_TaxonomyFacets_* and add a constructor that accepts a *_LeafReaderContext_* which will then be used to instantiate and reuse the BinaryDocValues iterator between multiple calls to *getLabels(docId, dim).* That way a caller does not need to know if a _*BinaryDocValues*_ field existed at all. The downside is that caller will need to create a new instance of *_TaxonomyFacetsLabels_* for each different *_LeafReaderContext._* But thinking more, I feel its simpler to pass BinaryDocValues iterator as a 3rd argument to *getLabels().* In order to take care of hierarchical fields, I think it makes sense to return FacetLabel[] instead of String[]. The proposed API signature would look like this {{public FacetLabel[] getLabels(int docId, String dim, BinaryDocValues)}} > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171215#comment-17171215 ] Ankur edited comment on LUCENE-9444 at 8/5/20, 1:40 AM: Thanks for your response [~mikemccand] Yes, having _dim_ as additional parameter makes sense. Regarding _*BinaryDocValues*_ iterator, initially I was thinking of providing a concrete implementation - *_TaxonomyFacetsLabels_* of abstract class *_TaxonomyFacets_* and add a constructor that accepts a *_LeafReaderContext_* which will then be used to instantiate and reuse the BinaryDocValues iterator between multiple calls to *getLabels(docId, dim).* That way a caller does not need to know if a _*BinaryDocValues*_ field existed at all. The downside is that caller will need to create a new instance of *_TaxonomyFacetsLabels_* for each different *_LeafReaderContext._* But thinking more, I feel its simpler to pass BinaryDocValues iterator as a 3rd argument to *getLabels().* In order to take care of hierarchical fields, I think it makes sense to return FacetLabel[] instead of String[]. The proposed API signature would look like this {{public FacetLabel[] getLabels(int docId, String dim, BinaryDocValues)}} was (Author: goankur): Thanks for your response [~mikemccand] Yes, having _dim_ as additional parameter makes sense. Regarding _*BinaryDocValues*_ iterator, initially I was thinking of providing a concrete implementation - *_TaxonomyFacetsLabels_* of abstract class *_TaxonomyFacets_* and add a constructor that accepts a *_LeafReaderContext_* which will then be used to instantiate and reuse the BinaryDocValues iterator between multiple calls to *getLabels(docId, dim).* That way a caller does not need to know if a _*BinaryDocValues*_ field existed at all. The downside is that caller will need to create a new instance of *_TaxonomyFacetsLabels_* for each different *_LeafReaderContext._* But thinking more, I feel its simpler to pass BinaryDocValues iterator as a 3rd argument to *getLabels().* In order to take care of hierarchical fields, I think it makes sense to return FacetLabel[] instead of String[]. The proposed API signature would look like this {{public static FacetLabel[] getLabels(int docId, String dim, BinaryDocValues)}} > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171215#comment-17171215 ] Ankur edited comment on LUCENE-9444 at 8/5/20, 1:38 AM: Thanks for your response [~mikemccand] Yes, having _dim_ as additional parameter makes sense. Regarding _*BinaryDocValues*_ iterator, initially I was thinking of providing a concrete implementation - *_TaxonomyFacetsLabels_* of abstract class *_TaxonomyFacets_* and add a constructor that accepts a *_LeafReaderContext_* which will then be used to instantiate and reuse the BinaryDocValues iterator between multiple calls to *getLabels(docId, dim).* That way a caller does not need to know if a _*BinaryDocValues*_ field existed at all. The downside is that caller will need to create a new instance of *_TaxonomyFacetsLabels_* for each different *_LeafReaderContext._* But thinking more, I feel its simpler to pass BinaryDocValues iterator as a 3rd argument to *getLabels().* In order to take care of hierarchical fields, I think it makes sense to return FacetLabel[] instead of String[]. The proposed API signature would look like this {{public static FacetLabel[] getLabels(int docId, String dim, BinaryDocValues)}} was (Author: goankur): Thanks for your response [~mikemccand] Yes, having _dim_ as additional parameter makes sense. Regarding _*BinaryDocValues*_ iterator, initially I was thinking of providing a concrete implementation - *_TaxonomyFacetsLabels_* of abstract class *_TaxonomyFacets_* and add a constructor that accepts a *_LeafReaderContext_* which will then be used to instantiate and reuse the BinaryDocValues iterator between multiple calls to *getLabels(docId, dim).* That way a caller does not need to know if a _*BinaryDocValues*_ field existed at all. The downside is that caller will need to create a new instance of *_TaxonomyFacetsLabels_* for each different *_LeafReaderContext._* But thinking more, I feel its simpler to pass BinaryDocValues iterator as a 3rd argument to *getLabels().* In order to take care of hierarchical fields, I think it makes sense to return FacetLabel[] instead of String[]. One last thing, should we make the API _*static*_ ? The proposed API signature would look like this {{public static FacetLabel[] getLabels(int docId, String dim, BinaryDocValues)}} > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171215#comment-17171215 ] Ankur commented on LUCENE-9444: --- Thanks for your response [~mikemccand] Yes, having _dim_ as additional parameter makes sense. Regarding _*BinaryDocValues*_ iterator, initially I was thinking of providing a concrete implementation - *_TaxonomyFacetsLabels_* of abstract class *_TaxonomyFacets_* and add a constructor that accepts a *_LeafReaderContext_* which will then be used to instantiate and reuse the BinaryDocValues iterator between multiple calls to *getLabels(docId, dim).* That way a caller does not need to know if a _*BinaryDocValues*_ field existed at all. The downside is that caller will need to create a new instance of *_TaxonomyFacetsLabels_* for each different *_LeafReaderContext._* But thinking more, I feel its simpler to pass BinaryDocValues iterator as a 3rd argument to *getLabels().* In order to take care of hierarchical fields, I think it makes sense to return FacetLabel[] instead of String[]. One last thing, should we make the API _*static*_ ? The proposed API signature would look like this {{public static FacetLabel[] getLabels(int docId, String dim, BinaryDocValues)}} > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
[ https://issues.apache.org/jira/browse/LUCENE-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169319#comment-17169319 ] Ankur commented on LUCENE-9444: --- Any thoughts folks ? > Need an API to easily fetch facet labels for a field in a document > -- > > Key: LUCENE-9444 > URL: https://issues.apache.org/jira/browse/LUCENE-9444 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 8.6 >Reporter: Ankur >Priority: Major > > A facet field may be included in the list of fields whose values are to be > returned for each hit. > In order to get the facet labels for each hit we need to > # Create an instance of _DocValuesOrdinalsReader_ and invoke > _getReader(LeafReaderContext context)_ method to obtain an instance of > _OrdinalsSegmentReader()_ > # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then > used to fetch and decode the binary payload in the document's BinaryDocValues > field. This provides the ordinals that refer to facet labels in the > taxonomy.** > # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be > returned. > > Ideally there should be a simple API - *String[] getLabels(docId)* that hides > all the above details and gives us the string labels. This can be part of > *TaxonomyFacets* but that's just one idea. > I am opening this issue to get community feedback and suggestions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9444) Need an API to easily fetch facet labels for a field in a document
Ankur created LUCENE-9444: - Summary: Need an API to easily fetch facet labels for a field in a document Key: LUCENE-9444 URL: https://issues.apache.org/jira/browse/LUCENE-9444 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 8.6 Reporter: Ankur A facet field may be included in the list of fields whose values are to be returned for each hit. In order to get the facet labels for each hit we need to # Create an instance of _DocValuesOrdinalsReader_ and invoke _getReader(LeafReaderContext context)_ method to obtain an instance of _OrdinalsSegmentReader()_ # _OrdinalsSegmentReader.get(int docID, IntsRef ordinals)_ method is then used to fetch and decode the binary payload in the document's BinaryDocValues field. This provides the ordinals that refer to facet labels in the taxonomy.** # Lastly TaxonomyReader.getPath(ord) is used to fetch the labels to be returned. Ideally there should be a simple API - *String[] getLabels(docId)* that hides all the above details and gives us the string labels. This can be part of *TaxonomyFacets* but that's just one idea. I am opening this issue to get community feedback and suggestions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9437) Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible
[ https://issues.apache.org/jira/browse/LUCENE-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162422#comment-17162422 ] Ankur commented on LUCENE-9437: --- Thanks [~mikemccand], I updated the patch fixing javadoc comments as suggested. > Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly > accessible > - > > Key: LUCENE-9437 > URL: https://issues.apache.org/jira/browse/LUCENE-9437 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.6 >Reporter: Ankur >Priority: Trivial > Attachments: LUCENE-9437.patch > > > Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is > set to 'protected'. This prevents the method from being used outside this > class in a setting where BinaryDocValues reader is instantiated outside the > class and binary payload containing ordinals still needs to be decoded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9437) Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible
[ https://issues.apache.org/jira/browse/LUCENE-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9437: -- Status: Patch Available (was: Open) > Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly > accessible > - > > Key: LUCENE-9437 > URL: https://issues.apache.org/jira/browse/LUCENE-9437 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.6 >Reporter: Ankur >Priority: Trivial > Attachments: LUCENE-9437.patch > > > Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is > set to 'protected'. This prevents the method from being used outside this > class in a setting where BinaryDocValues reader is instantiated outside the > class and binary payload containing ordinals still needs to be decoded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9437) Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible
[ https://issues.apache.org/jira/browse/LUCENE-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9437: -- Status: Open (was: Patch Available) > Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly > accessible > - > > Key: LUCENE-9437 > URL: https://issues.apache.org/jira/browse/LUCENE-9437 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.6 >Reporter: Ankur >Priority: Trivial > Attachments: LUCENE-9437.patch > > > Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is > set to 'protected'. This prevents the method from being used outside this > class in a setting where BinaryDocValues reader is instantiated outside the > class and binary payload containing ordinals still needs to be decoded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9437) Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible
[ https://issues.apache.org/jira/browse/LUCENE-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9437: -- Attachment: LUCENE-9437.patch > Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly > accessible > - > > Key: LUCENE-9437 > URL: https://issues.apache.org/jira/browse/LUCENE-9437 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.6 >Reporter: Ankur >Priority: Trivial > Attachments: LUCENE-9437.patch > > > Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is > set to 'protected'. This prevents the method from being used outside this > class in a setting where BinaryDocValues reader is instantiated outside the > class and binary payload containing ordinals still needs to be decoded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9437) Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible
[ https://issues.apache.org/jira/browse/LUCENE-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9437: -- Attachment: (was: LUCENE-9437.patch) > Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly > accessible > - > > Key: LUCENE-9437 > URL: https://issues.apache.org/jira/browse/LUCENE-9437 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.6 >Reporter: Ankur >Priority: Trivial > Attachments: LUCENE-9437.patch > > > Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is > set to 'protected'. This prevents the method from being used outside this > class in a setting where BinaryDocValues reader is instantiated outside the > class and binary payload containing ordinals still needs to be decoded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9437) Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible
[ https://issues.apache.org/jira/browse/LUCENE-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9437: -- Description: Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is set to 'protected'. This prevents the method from being used outside this class in a setting where BinaryDocValues reader is instantiated outside the class and binary payload containing ordinals still needs to be decoded. (was: Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is set to 'protected'. This prevents the method from being used outside this class in a setting where BinaryDocValues reader is instantiated outside the class and binary payload containing ordinals still need to be decoded.) > Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly > accessible > - > > Key: LUCENE-9437 > URL: https://issues.apache.org/jira/browse/LUCENE-9437 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.6 >Reporter: Ankur >Priority: Trivial > Attachments: LUCENE-9437.patch > > > Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is > set to 'protected'. This prevents the method from being used outside this > class in a setting where BinaryDocValues reader is instantiated outside the > class and binary payload containing ordinals still needs to be decoded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9437) Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible
[ https://issues.apache.org/jira/browse/LUCENE-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9437: -- Status: Patch Available (was: Open) > Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly > accessible > - > > Key: LUCENE-9437 > URL: https://issues.apache.org/jira/browse/LUCENE-9437 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.6 >Reporter: Ankur >Priority: Trivial > Attachments: LUCENE-9437.patch > > > Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is > set to 'protected'. This prevents the method from being used outside this > class in a setting where BinaryDocValues reader is instantiated outside the > class and binary payload containing ordinals still need to be decoded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9437) Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible
[ https://issues.apache.org/jira/browse/LUCENE-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161587#comment-17161587 ] Ankur commented on LUCENE-9437: --- Simple code change to raise the visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method from 'protected' to 'public'. Also added javadoc > Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly > accessible > - > > Key: LUCENE-9437 > URL: https://issues.apache.org/jira/browse/LUCENE-9437 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.6 >Reporter: Ankur >Priority: Trivial > Attachments: LUCENE-9437.patch > > > Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is > set to 'protected'. This prevents the method from being used outside this > class in a setting where BinaryDocValues reader is instantiated outside the > class and binary payload containing ordinals still need to be decoded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9437) Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible
[ https://issues.apache.org/jira/browse/LUCENE-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9437: -- Attachment: LUCENE-9437.patch Lucene Fields: New,Patch Available (was: New) Affects Version/s: 8.6 Status: Open (was: Open) > Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly > accessible > - > > Key: LUCENE-9437 > URL: https://issues.apache.org/jira/browse/LUCENE-9437 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.6 >Reporter: Ankur >Priority: Trivial > Attachments: LUCENE-9437.patch > > > Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is > set to 'protected'. This prevents the method from being used outside this > class in a setting where BinaryDocValues reader is instantiated outside the > class and binary payload containing ordinals still need to be decoded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9437) Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible
Ankur created LUCENE-9437: - Summary: Make DocValuesOrdinalsReader.decode(BytesRef, IntsRef) method publicly accessible Key: LUCENE-9437 URL: https://issues.apache.org/jira/browse/LUCENE-9437 Project: Lucene - Core Issue Type: Improvement Reporter: Ankur Visibility of _DocValuesOrdinalsReader.decode(BytesRef, IntsRef)_ method is set to 'protected'. This prevents the method from being used outside this class in a setting where BinaryDocValues reader is instantiated outside the class and binary payload containing ordinals still need to be decoded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9392) Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public
[ https://issues.apache.org/jira/browse/LUCENE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9392: -- Attachment: (was: LUCENE-9392.patch) > Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public > --- > > Key: LUCENE-9392 > URL: https://issues.apache.org/jira/browse/LUCENE-9392 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.5.2 >Reporter: Ankur >Priority: Minor > Attachments: LUCENE-9392.patch > > > FacetsConfig.DELIM_CHAR is marked as private. An application that wants to > use this delimiter (in a unit test for example) is forced to re-declare it in > the application code. This can break the application if tetshe value of > DELIM_CHAR is changed in FacetsConfig -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9392) Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public
[ https://issues.apache.org/jira/browse/LUCENE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9392: -- Attachment: LUCENE-9392.patch Lucene Fields: New,Patch Available (was: New) Status: Patch Available (was: Patch Available) Incorporate code review feedback from Mike McCandless > Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public > --- > > Key: LUCENE-9392 > URL: https://issues.apache.org/jira/browse/LUCENE-9392 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.5.2 >Reporter: Ankur >Priority: Minor > Attachments: LUCENE-9392.patch, LUCENE-9392.patch > > > FacetsConfig.DELIM_CHAR is marked as private. An application that wants to > use this delimiter (in a unit test for example) is forced to re-declare it in > the application code. This can break the application if tetshe value of > DELIM_CHAR is changed in FacetsConfig -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9392) Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public
[ https://issues.apache.org/jira/browse/LUCENE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128678#comment-17128678 ] Ankur commented on LUCENE-9392: --- Attached a patch with a fix that raises the visibility of FacetsConfig.DELIM_CHAR to public > Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public > --- > > Key: LUCENE-9392 > URL: https://issues.apache.org/jira/browse/LUCENE-9392 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.5.2 >Reporter: Ankur >Priority: Minor > Attachments: LUCENE-9392.patch > > > FacetsConfig.DELIM_CHAR is marked as private. An application that wants to > use this delimiter (in a unit test for example) is forced to re-declare it in > the application code. This can break the application if tetshe value of > DELIM_CHAR is changed in FacetsConfig -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9392) Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public
[ https://issues.apache.org/jira/browse/LUCENE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9392: -- Status: Patch Available (was: Open) > Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public > --- > > Key: LUCENE-9392 > URL: https://issues.apache.org/jira/browse/LUCENE-9392 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.5.2 >Reporter: Ankur >Priority: Minor > Attachments: LUCENE-9392.patch > > > FacetsConfig.DELIM_CHAR is marked as private. An application that wants to > use this delimiter (in a unit test for example) is forced to re-declare it in > the application code. This can break the application if tetshe value of > DELIM_CHAR is changed in FacetsConfig -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9392) Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public
[ https://issues.apache.org/jira/browse/LUCENE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9392: -- Attachment: LUCENE-9392.patch > Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public > --- > > Key: LUCENE-9392 > URL: https://issues.apache.org/jira/browse/LUCENE-9392 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.5.2 >Reporter: Ankur >Priority: Minor > Attachments: LUCENE-9392.patch > > > FacetsConfig.DELIM_CHAR is marked as private. An application that wants to > use this delimiter (in a unit test for example) is forced to re-declare it in > the application code. This can break the application if tetshe value of > DELIM_CHAR is changed in FacetsConfig -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9392) Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public
Ankur created LUCENE-9392: - Summary: Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public Key: LUCENE-9392 URL: https://issues.apache.org/jira/browse/LUCENE-9392 Project: Lucene - Core Issue Type: Improvement Affects Versions: 8.5.2 Reporter: Ankur FacetsConfig.DELIM_CHAR is marked as private. An application that wants to use this delimiter (in a unit test for example) is forced to re-declare it in the application code. This can break the application if DELIM_CHAR is changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9392) Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public
[ https://issues.apache.org/jira/browse/LUCENE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated LUCENE-9392: -- Description: FacetsConfig.DELIM_CHAR is marked as private. An application that wants to use this delimiter (in a unit test for example) is forced to re-declare it in the application code. This can break the application if tetshe value of DELIM_CHAR is changed in FacetsConfig (was: FacetsConfig.DELIM_CHAR is marked as private. An application that wants to use this delimiter (in a unit test for example) is forced to re-declare it in the application code. This can break the application if DELIM_CHAR is changed.) > Change the visibility of o.a.l.f.FacetsConfig.DELIM_CHAR to public > --- > > Key: LUCENE-9392 > URL: https://issues.apache.org/jira/browse/LUCENE-9392 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 8.5.2 >Reporter: Ankur >Priority: Minor > > FacetsConfig.DELIM_CHAR is marked as private. An application that wants to > use this delimiter (in a unit test for example) is forced to re-declare it in > the application code. This can break the application if tetshe value of > DELIM_CHAR is changed in FacetsConfig -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9385) Skip indexing facet drill down terms
Ankur created LUCENE-9385: - Summary: Skip indexing facet drill down terms Key: LUCENE-9385 URL: https://issues.apache.org/jira/browse/LUCENE-9385 Project: Lucene - Core Issue Type: New Feature Components: modules/facet Affects Versions: 8.5.2 Reporter: Ankur FacetsConfig creates index terms from the Facet dimension and path automatically for the purpose of supporting drill-down queries. An application that does not need drill-down ends up paying the index cost of the extra terms. Ideally an option to skip indexing these drill down terms should be exposed to the application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org