[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-07-07 Thread Haoyu Zhai (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879908#comment-16879908
 ] 

Haoyu Zhai commented on LUCENE-8878:


[~rcmuir] If I understand LatLonPointDistanceComparator correctly, `copy` 
method is not optimized, so once we make use of this comparator's inner storage 
(`values` field), we'll always need to incur the full cost (as we'll always 
want to `copy` first to store the values)? And the actual optimization is 
happened before we call `copy` operation, we could make a call to 
`compareBottom` to filter out bad points in a lower cost. So I guess it is not 
necessary to have `values` field to keep the optimization, as `compareBottom` 
is not using `values` anyway?I guess to keep the optimization for 
LatLonPointDistanceComparator, we need to have a `compareBottom` and 
`setBottom` and also related fields, but need not to keep storage of whole sort 
values in the comparator?

Also [~hypothesisx86], I think rather than having comparison logic in 
`SortField`, we could have a comparator class and bind this class with 
`ValueAccessor` to enable easier customization?

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-07-05 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879274#comment-16879274
 ] 

Michael McCandless commented on LUCENE-8878:


{quote}I believe you are talking about Scorer#setMinCompetitiveScore, ie. 
changing the FieldComparator API to only track the bottom bucket as opposed to 
every bucket? If this is the case I agree that it sounds like a good idea to 
explore.
{quote}
Ahh, yes, that ;)  +1

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-28 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874810#comment-16874810
 ] 

Adrien Grand commented on LUCENE-8878:
--

[~mikemccand] ImpactsEnum as mostly about exposing maximum scores per block. I 
believe you are talking about Scorer#setMinCompetitiveScore, ie. changing the 
FieldComparator API to only track the bottom bucket as opposed to every bucket? 
If this is the case I agree that it sounds like a good idea to explore.

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-27 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874573#comment-16874573
 ] 

Michael McCandless commented on LUCENE-8878:


The recently added impacts have a similar use case, where we need to express to 
the {{ImpactsEnum}} what the "bottom" of our PQ is, I think?  Maybe we could 
take inspiration from that to simplify the comparator APIs or make them similar 
to how {{ImpactsEnum}} does it?

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-27 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874425#comment-16874425
 ] 

Robert Muir commented on LUCENE-8878:
-

Yes, please don't let me discourage you from attempting to simplify the API.

I just wanted to point out that for a search engine, there are totally valid 
use-cases for the sort comparator to exploit the priority queue to go faster. I 
think the distance one is "reasonable" in that sense.

The comparison-by-ordinal stuff we do for strings is more extreme, it is kind 
of a separate issue from that? Its related, There might be other ways to do it 
and still have good performance. I know there was a lot of investigation and 
benchmarking in past JIRA issues on that.

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-27 Thread Tony Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874413#comment-16874413
 ] 

Tony Xu commented on LUCENE-8878:
-

[~rcmuir] Thank you Robert for bring the implementation detail about 
LatLongPointDistanceComparator, I didn't know about that! Took a look at it I 
found –
* compare(int slot, int slot) method still compare the distance
* the setBottom(int slot) method set's the bottom distance (double) and 
computes the bounding box in a sampling fashion
* The optimization lies in compareBottom(int doc) method. It grabs the lat/long 
out of document and tries to reject the doc if the lat/long is out of bounding 
box.


I also noted there are compareTop/setTopValue methods used for paging. With all 
that, I will need to rethink and propose a different API


> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-26 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873284#comment-16873284
 ] 

Robert Muir commented on LUCENE-8878:
-

Please don't forget about the distance sort comparator, it really needs hooks 
into its priority queue too: 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/document/LatLonPointDistanceComparator.java

In this case, it is not a small performance difference if we were to simplify 
the API, it would become much slower. I'm afraid {{compare(Object,Object)}} 
isn't going to cut it.

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873003#comment-16873003
 ] 

Adrien Grand commented on LUCENE-8878:
--

bq. Is it the case today? I wonder whether the ordinals are comparable across 
segments (likely not...);

Indeed ordinals are not comparable across segments. Have a look at 
TermOrdValComparator#setBottom, it looks up the bottom term in the terms 
dictionary of the current segment to get an ordinal that may be used for 
comparison. I'm afraid the API would need to be a bit more complex than what 
you are proposing, but hopefully not as complicated as the current API.

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-25 Thread Tony Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872734#comment-16872734
 ] 

Tony Xu commented on LUCENE-8878:
-

>  As long as we can keep comparing strings using their ordinals instead of 
>their actual values, it should be good.
Is it the case today? I wonder whether the ordinals are comparable across 
segments (likely not...); To support this I think the the {{ValueAccessor}} for 
{{SortField.Type.String}} needs to return a 3-tuple (segmentId, ord, byteRef) 
so the compare logic has enough context to compare ord if possible.
 

> I was hoping we could soon replace FunctionValues with the new 
>oal.search.LongValues/DoubleValues.
+1. I'm still exploring the whole code base but I'm already overwhelmed by the 
number of classes for valueSource and values representations which are 
descendants of org.apache.lucene.queries.function.ValueSource... Any suggestion 
on which class/interface to extend/implement for non-numeric {{ValueAccessor}}?



> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872534#comment-16872534
 ] 

Adrien Grand commented on LUCENE-8878:
--

+1 to simplify, even at the cost of some performance. As long as we can keep 
comparing strings using their ordinals instead of their actual values, it 
should be good.

bq. To access the values can we somehow use the existing FunctionValues classes?

I was hoping we could soon replace FunctionValues with the new 
oal.search.LongValues/DoubleValues. :)

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-25 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872270#comment-16872270
 ] 

Michael McCandless commented on LUCENE-8878:


+1 to simplify Lucene's comparator APIs – they are crazy complicated because 
they are "hiding" a priority queue underneath.  They look nothing like you'd 
expect a comparator to look like!  They were designed this way to sometimes 
enable int ordinal comparisons when sorting by string fields 
({{DocValuesType.SORTED}}) but I'm not sure all that API complexity is really 
worth the performance.

To access the values can we somehow use the existing {{FunctionValues}} classes?

> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org