[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879908#comment-16879908 ] Haoyu Zhai commented on LUCENE-8878: [~rcmuir] If I understand LatLonPointDistanceComparator correctly, `copy` method is not optimized, so once we make use of this comparator's inner storage (`values` field), we'll always need to incur the full cost (as we'll always want to `copy` first to store the values)? And the actual optimization is happened before we call `copy` operation, we could make a call to `compareBottom` to filter out bad points in a lower cost. So I guess it is not necessary to have `values` field to keep the optimization, as `compareBottom` is not using `values` anyway?I guess to keep the optimization for LatLonPointDistanceComparator, we need to have a `compareBottom` and `setBottom` and also related fields, but need not to keep storage of whole sort values in the comparator? Also [~hypothesisx86], I think rather than having comparison logic in `SortField`, we could have a comparator class and bind this class with `ValueAccessor` to enable easier customization? > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879274#comment-16879274 ] Michael McCandless commented on LUCENE-8878: {quote}I believe you are talking about Scorer#setMinCompetitiveScore, ie. changing the FieldComparator API to only track the bottom bucket as opposed to every bucket? If this is the case I agree that it sounds like a good idea to explore. {quote} Ahh, yes, that ;) +1 > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874810#comment-16874810 ] Adrien Grand commented on LUCENE-8878: -- [~mikemccand] ImpactsEnum as mostly about exposing maximum scores per block. I believe you are talking about Scorer#setMinCompetitiveScore, ie. changing the FieldComparator API to only track the bottom bucket as opposed to every bucket? If this is the case I agree that it sounds like a good idea to explore. > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874573#comment-16874573 ] Michael McCandless commented on LUCENE-8878: The recently added impacts have a similar use case, where we need to express to the {{ImpactsEnum}} what the "bottom" of our PQ is, I think? Maybe we could take inspiration from that to simplify the comparator APIs or make them similar to how {{ImpactsEnum}} does it? > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874425#comment-16874425 ] Robert Muir commented on LUCENE-8878: - Yes, please don't let me discourage you from attempting to simplify the API. I just wanted to point out that for a search engine, there are totally valid use-cases for the sort comparator to exploit the priority queue to go faster. I think the distance one is "reasonable" in that sense. The comparison-by-ordinal stuff we do for strings is more extreme, it is kind of a separate issue from that? Its related, There might be other ways to do it and still have good performance. I know there was a lot of investigation and benchmarking in past JIRA issues on that. > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874413#comment-16874413 ] Tony Xu commented on LUCENE-8878: - [~rcmuir] Thank you Robert for bring the implementation detail about LatLongPointDistanceComparator, I didn't know about that! Took a look at it I found – * compare(int slot, int slot) method still compare the distance * the setBottom(int slot) method set's the bottom distance (double) and computes the bounding box in a sampling fashion * The optimization lies in compareBottom(int doc) method. It grabs the lat/long out of document and tries to reject the doc if the lat/long is out of bounding box. I also noted there are compareTop/setTopValue methods used for paging. With all that, I will need to rethink and propose a different API > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873284#comment-16873284 ] Robert Muir commented on LUCENE-8878: - Please don't forget about the distance sort comparator, it really needs hooks into its priority queue too: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/document/LatLonPointDistanceComparator.java In this case, it is not a small performance difference if we were to simplify the API, it would become much slower. I'm afraid {{compare(Object,Object)}} isn't going to cut it. > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873003#comment-16873003 ] Adrien Grand commented on LUCENE-8878: -- bq. Is it the case today? I wonder whether the ordinals are comparable across segments (likely not...); Indeed ordinals are not comparable across segments. Have a look at TermOrdValComparator#setBottom, it looks up the bottom term in the terms dictionary of the current segment to get an ordinal that may be used for comparison. I'm afraid the API would need to be a bit more complex than what you are proposing, but hopefully not as complicated as the current API. > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872734#comment-16872734 ] Tony Xu commented on LUCENE-8878: - > As long as we can keep comparing strings using their ordinals instead of >their actual values, it should be good. Is it the case today? I wonder whether the ordinals are comparable across segments (likely not...); To support this I think the the {{ValueAccessor}} for {{SortField.Type.String}} needs to return a 3-tuple (segmentId, ord, byteRef) so the compare logic has enough context to compare ord if possible. > I was hoping we could soon replace FunctionValues with the new >oal.search.LongValues/DoubleValues. +1. I'm still exploring the whole code base but I'm already overwhelmed by the number of classes for valueSource and values representations which are descendants of org.apache.lucene.queries.function.ValueSource... Any suggestion on which class/interface to extend/implement for non-numeric {{ValueAccessor}}? > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872534#comment-16872534 ] Adrien Grand commented on LUCENE-8878: -- +1 to simplify, even at the cost of some performance. As long as we can keep comparing strings using their ordinals instead of their actual values, it should be good. bq. To access the values can we somehow use the existing FunctionValues classes? I was hoping we could soon replace FunctionValues with the new oal.search.LongValues/DoubleValues. :) > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
[ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872270#comment-16872270 ] Michael McCandless commented on LUCENE-8878: +1 to simplify Lucene's comparator APIs – they are crazy complicated because they are "hiding" a priority queue underneath. They look nothing like you'd expect a comparator to look like! They were designed this way to sometimes enable int ordinal comparisons when sorting by string fields ({{DocValuesType.SORTED}}) but I'm not sure all that API complexity is really worth the performance. To access the values can we somehow use the existing {{FunctionValues}} classes? > Provide alternative sorting utility from SortField other than FieldComparator > - > > Key: LUCENE-8878 > URL: https://issues.apache.org/jira/browse/LUCENE-8878 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: 8.1.1 >Reporter: Tony Xu >Priority: Major > > The `FieldComparator` has many responsibilities and users get all of them at > once. At high level the main functionalities of `FieldComparator` are > * Provide LeafFieldComparator > * Allocate storage for requested number of hits > * Read the values from DocValues/Custom source etc. > * Compare two values > There are two major areas for improvement > # The logic of reading values and storing them are coupled. > # User need to specify the size in order to create a `FieldComparator` but > sometimes the size is unknown upfront. > # From `FieldComparator`'s API, one can't reason about thread-safety so it > is not suitable for concurrent search. > E.g. Can two concurrent thread use the same `FieldComparator` to call > `getLeafComparator` for two different segments they are working on? In fact, > almost all existing implementations of `FieldComparator` are not thread-safe. > The proposal is to enhance `SortField` with two APIs > # {color:#14892c}int compare(Object v1, Object v2){color} – this is to > compare two values from different docs for this field > # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext > leaf){color} – This encapsulate the logic for obtaining the right > implementation in order to read the field values. > `ValueAccessor` should be accessed in a similar way as `DocValues` to > provide the sort value for a document in an advance & read fashion. > With this API, hopefully we can reduce the memory usage when using > `FieldComparator` because the users either store the sort values or at least > the slot number besides the storage allocated by `FieldComparator` itself. > Ideally, only once copy of the values should be stored. > The proposed API is also more friendly to concurrent search since it provides > the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared > if there are more than one thread working on the same leaf, at least they can > initialize their own `ValueAccessor`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org