[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-27 Thread Tony Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874413#comment-16874413
 ] 

Tony Xu commented on LUCENE-8878:
-

[~rcmuir] Thank you Robert for bring the implementation detail about 
LatLongPointDistanceComparator, I didn't know about that! Took a look at it I 
found –
* compare(int slot, int slot) method still compare the distance
* the setBottom(int slot) method set's the bottom distance (double) and 
computes the bounding box in a sampling fashion
* The optimization lies in compareBottom(int doc) method. It grabs the lat/long 
out of document and tries to reject the doc if the lat/long is out of bounding 
box.


I also noted there are compareTop/setTopValue methods used for paging. With all 
that, I will need to rethink and propose a different API


> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-25 Thread Tony Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872734#comment-16872734
 ] 

Tony Xu commented on LUCENE-8878:
-

>  As long as we can keep comparing strings using their ordinals instead of 
>their actual values, it should be good.
Is it the case today? I wonder whether the ordinals are comparable across 
segments (likely not...); To support this I think the the {{ValueAccessor}} for 
{{SortField.Type.String}} needs to return a 3-tuple (segmentId, ord, byteRef) 
so the compare logic has enough context to compare ord if possible.
 

> I was hoping we could soon replace FunctionValues with the new 
>oal.search.LongValues/DoubleValues.
+1. I'm still exploring the whole code base but I'm already overwhelmed by the 
number of classes for valueSource and values representations which are 
descendants of org.apache.lucene.queries.function.ValueSource... Any suggestion 
on which class/interface to extend/implement for non-numeric {{ValueAccessor}}?



> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it 
> is not suitable for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call 
> `getLeafComparator` for two different segments they are working on? In fact, 
> almost all existing implementations of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to 
> compare two values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext 
> leaf){color} – This encapsulate the logic for obtaining the right 
> implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to 
> provide the sort value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using 
> `FieldComparator` because the users either store the sort values or at least 
> the slot number besides the storage allocated by `FieldComparator` itself. 
> Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides 
> the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared 
> if there are more than one thread working on the same leaf, at least they can 
> initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-24 Thread Tony Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Xu updated LUCENE-8878:

Description: 
The `FieldComparator` has many responsibilities and users get all of them at 
once. At high level the main functionalities of `FieldComparator` are
 * Provide LeafFieldComparator
 * Allocate storage for requested number of hits
 * Read the values from DocValues/Custom source etc.
 * Compare two values

There are two major areas for improvement
 # The logic of reading values and storing them are coupled.
 # User need to specify the size in order to create a `FieldComparator` but 
sometimes the size is unknown upfront.
 # From `FieldComparator`'s API, one can't reason about thread-safety so it is 
not suitable for concurrent search.
 E.g. Can two concurrent thread use the same `FieldComparator` to call 
`getLeafComparator` for two different segments they are working on? In fact, 
almost all existing implementations of `FieldComparator` are not thread-safe.

The proposal is to enhance `SortField` with two APIs
 # {color:#14892c}int compare(Object v1, Object v2){color} – this is to compare 
two values from different docs for this field
 # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext leaf){color} 
– This encapsulate the logic for obtaining the right implementation in order to 
read the field values.
 `ValueAccessor` should be accessed in a similar way as `DocValues` to provide 
the sort value for a document in an advance & read fashion.

With this API, hopefully we can reduce the memory usage when using 
`FieldComparator` because the users either store the sort values or at least 
the slot number besides the storage allocated by `FieldComparator` itself. 
Ideally, only once copy of the values should be stored.

The proposed API is also more friendly to concurrent search since it provides 
the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared if 
there are more than one thread working on the same leaf, at least they can 
initialize their own `ValueAccessor`.

  was:
The `FieldComparator` has many responsibilities and users get all of them at 
once. At high level the main functionalities of `FieldComparator` are
* Manage LeafFieldComparator
* Allocate storage for requested number of hits
* Read the values from DocValues/Custom source etc.
* Compare two values 

There are two major areas for improvement
# 1. The logic of reading values and storing them are coupled.
# 2. From `FieldComparator`'s API, one can't reason about thread-safety so it 
is not suitable for concurrent search. 
E.g. Can two concurrent thread use the same `FieldComparator` to call 
`getLeafComparator` for two different segments they are working on? In fact, 
almost all existing implementations of `FieldComparator` are not thread-safe.


The proposal is to enhance `SortField` with two APIs
#1. int compare(Object v1, Object v2) -- this is to compare two values from 
different docs for this field
#2. ValueAccessor newValueAccessor(LeafReaderContext leaf) -- This encapsulate 
the logic for obtaining the right implementation in order to read the field 
values.
`ValueAccessor` should be accessed in a similar way as `DocValues` to provide 
the sort value for a document in an advance & read fashion.


With this API, hopefully we can reduce the memory usage when using 
`FieldComparator` because the users either store the sort values or at least 
the slot number besides the storage allocated by `FieldComparator` itself. 
Ideally, only once copy of the values should be stored.

The proposed API is also more friendly to concurrent search since it provides 
the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared if 
there are more than one thread working on the same leaf, at least they can 
initialize their own `ValueAccessor`.


> Provide alternative sorting utility from SortField other than FieldComparator
> -
>
> Key: LUCENE-8878
> URL: https://issues.apache.org/jira/browse/LUCENE-8878
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 8.1.1
>Reporter: Tony Xu
>Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at 
> once. At high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but 
> sometimes the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so 

[jira] [Created] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator

2019-06-24 Thread Tony Xu (JIRA)
Tony Xu created LUCENE-8878:
---

 Summary: Provide alternative sorting utility from SortField other 
than FieldComparator
 Key: LUCENE-8878
 URL: https://issues.apache.org/jira/browse/LUCENE-8878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: 8.1.1
Reporter: Tony Xu


The `FieldComparator` has many responsibilities and users get all of them at 
once. At high level the main functionalities of `FieldComparator` are
* Manage LeafFieldComparator
* Allocate storage for requested number of hits
* Read the values from DocValues/Custom source etc.
* Compare two values 

There are two major areas for improvement
# 1. The logic of reading values and storing them are coupled.
# 2. From `FieldComparator`'s API, one can't reason about thread-safety so it 
is not suitable for concurrent search. 
E.g. Can two concurrent thread use the same `FieldComparator` to call 
`getLeafComparator` for two different segments they are working on? In fact, 
almost all existing implementations of `FieldComparator` are not thread-safe.


The proposal is to enhance `SortField` with two APIs
#1. int compare(Object v1, Object v2) -- this is to compare two values from 
different docs for this field
#2. ValueAccessor newValueAccessor(LeafReaderContext leaf) -- This encapsulate 
the logic for obtaining the right implementation in order to read the field 
values.
`ValueAccessor` should be accessed in a similar way as `DocValues` to provide 
the sort value for a document in an advance & read fashion.


With this API, hopefully we can reduce the memory usage when using 
`FieldComparator` because the users either store the sort values or at least 
the slot number besides the storage allocated by `FieldComparator` itself. 
Ideally, only once copy of the values should be stored.

The proposed API is also more friendly to concurrent search since it provides 
the `ValueAccessor` per leaf. Although same `ValueAccessor` can't be shared if 
there are more than one thread working on the same leaf, at least they can 
initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8865) Use incoming thread for execution if IndexSearcher has an executor

2019-06-24 Thread Tony Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871853#comment-16871853
 ] 

Tony Xu commented on LUCENE-8865:
-

I randomly came across the issue, this is a nice change! Were you able to 
measure the improvement?

>  Use incoming thread for execution if IndexSearcher has an executor
> ---
>
> Key: LUCENE-8865
> URL: https://issues.apache.org/jira/browse/LUCENE-8865
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Today we don't utilize the incoming thread for a search when IndexSearcher
> has an executor. This thread is only idleing but can be used to execute a 
> search
> once all other collectors are dispatched.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7882) Maybe expression compiler should cache recently compiled expressions?

2018-08-31 Thread Tony Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599076#comment-16599076
 ] 

Tony Xu commented on LUCENE-7882:
-

Oh I forgot to paste the test output.

 

2
Loading class  , using 
Loaderorg.apache.lucene.expressions.js.JavascriptCompiler$Loader@3973f90b
Loading class  , using 
Loaderorg.apache.lucene.expressions.js.JavascriptCompiler$Loader@3973f90b
Loading class  , using 
Loaderorg.apache.lucene.expressions.js.JavascriptCompiler$Loader@3973f90b
3.0
Loading class  , using 
Loaderorg.apache.lucene.expressions.js.JavascriptCompiler$Loader@2a44f939
Loading class  , using 
Loaderorg.apache.lucene.expressions.js.JavascriptCompiler$Loader@2a44f939
Loading class  , using 
Loaderorg.apache.lucene.expressions.js.JavascriptCompiler$Loader@2a44f939
3.0

> Maybe expression compiler should cache recently compiled expressions?
> -
>
> Key: LUCENE-7882
> URL: https://issues.apache.org/jira/browse/LUCENE-7882
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/expressions
>Reporter: Michael McCandless
>Priority: Major
> Attachments: demo.patch
>
>
> I've been running search performance tests using a simple expression 
> ({{_score + ln(1000+unit_sales)}}) for sorting and hit this odd bottleneck:
> {noformat}
> "pool-1-thread-30" #70 prio=5 os_prio=0 tid=0x7eea7000a000 nid=0x1ea8a 
> waiting for monitor entry [0x7eea867dd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.lucene.expressions.js.JavascriptCompiler$CompiledExpression.evaluate(_score
>  + ln(1000+unit_sales))
>   at 
> org.apache.lucene.expressions.ExpressionFunctionValues.doubleValue(ExpressionFunctionValues.java:49)
>   at 
> com.amazon.lucene.OrderedVELeafCollector.collectInternal(OrderedVELeafCollector.java:123)
>   at 
> com.amazon.lucene.OrderedVELeafCollector.collect(OrderedVELeafCollector.java:108)
>   at 
> org.apache.lucene.search.MultiCollectorManager$Collectors$LeafCollectors.collect(MultiCollectorManager.java:102)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:241)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:184)
>   at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
>   at org.apache.lucene.search.IndexSearcher$5.call(IndexSearcher.java:600)
>   at org.apache.lucene.search.IndexSearcher$5.call(IndexSearcher.java:597)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I couldn't see any {{synchronized}} in the sources here, so I'm not sure 
> which object monitor it's blocked on.
> I was accidentally compiling a new expression for every query, and that 
> bottleneck would cause overall QPS to slow down drastically (~4X slower after 
> ~1 hour of redline tests), as if the JVM is getting slower and slower to 
> evaluate each expression the more expressions I had compiled.
> I tested JDK 9-ea and it also kept slowing down over time as the performance 
> test ran.
> Maybe we should put a small cache in front of the expressions compiler to 
> make it less trappy?  Or maybe we can get to the root cause of why the JVM 
> slows down more and more, the more expressions you compile?
> I won't have time to work on this in the near future so if anyone else feels 
> the itch, please scratch it!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7882) Maybe expression compiler should cache recently compiled expressions?

2018-08-31 Thread Tony Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599071#comment-16599071
 ] 

Tony Xu commented on LUCENE-7882:
-

To explain why Mike saw a many threads getting blocked on a monitor. I wrote an 
unit test to demonstrate the issue. 

Basically each compiled expression is defined in a new ClassLoader and when the 
expression's evaluation needs to invoke an external method (in this case *ln* 
that maps to *java.lang.Math#log*), the new classLoader needs to load that 
class. The classloading work is delegated to the parent classloader which uses 
itself as a monitor for synchronization. 

 

The problem was many compiled expression's defining classloaders share the same 
parent and were trying to load a class via the parent in the evaluate method.  
This led to more contention on the parent classloader's monitor. 

 

 

 [^demo.patch]

> Maybe expression compiler should cache recently compiled expressions?
> -
>
> Key: LUCENE-7882
> URL: https://issues.apache.org/jira/browse/LUCENE-7882
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/expressions
>Reporter: Michael McCandless
>Priority: Major
> Attachments: demo.patch
>
>
> I've been running search performance tests using a simple expression 
> ({{_score + ln(1000+unit_sales)}}) for sorting and hit this odd bottleneck:
> {noformat}
> "pool-1-thread-30" #70 prio=5 os_prio=0 tid=0x7eea7000a000 nid=0x1ea8a 
> waiting for monitor entry [0x7eea867dd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.lucene.expressions.js.JavascriptCompiler$CompiledExpression.evaluate(_score
>  + ln(1000+unit_sales))
>   at 
> org.apache.lucene.expressions.ExpressionFunctionValues.doubleValue(ExpressionFunctionValues.java:49)
>   at 
> com.amazon.lucene.OrderedVELeafCollector.collectInternal(OrderedVELeafCollector.java:123)
>   at 
> com.amazon.lucene.OrderedVELeafCollector.collect(OrderedVELeafCollector.java:108)
>   at 
> org.apache.lucene.search.MultiCollectorManager$Collectors$LeafCollectors.collect(MultiCollectorManager.java:102)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:241)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:184)
>   at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
>   at org.apache.lucene.search.IndexSearcher$5.call(IndexSearcher.java:600)
>   at org.apache.lucene.search.IndexSearcher$5.call(IndexSearcher.java:597)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I couldn't see any {{synchronized}} in the sources here, so I'm not sure 
> which object monitor it's blocked on.
> I was accidentally compiling a new expression for every query, and that 
> bottleneck would cause overall QPS to slow down drastically (~4X slower after 
> ~1 hour of redline tests), as if the JVM is getting slower and slower to 
> evaluate each expression the more expressions I had compiled.
> I tested JDK 9-ea and it also kept slowing down over time as the performance 
> test ran.
> Maybe we should put a small cache in front of the expressions compiler to 
> make it less trappy?  Or maybe we can get to the root cause of why the JVM 
> slows down more and more, the more expressions you compile?
> I won't have time to work on this in the near future so if anyone else feels 
> the itch, please scratch it!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7882) Maybe expression compiler should cache recently compiled expressions?

2018-08-31 Thread Tony Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Xu updated LUCENE-7882:

Attachment: demo.patch

> Maybe expression compiler should cache recently compiled expressions?
> -
>
> Key: LUCENE-7882
> URL: https://issues.apache.org/jira/browse/LUCENE-7882
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/expressions
>Reporter: Michael McCandless
>Priority: Major
> Attachments: demo.patch
>
>
> I've been running search performance tests using a simple expression 
> ({{_score + ln(1000+unit_sales)}}) for sorting and hit this odd bottleneck:
> {noformat}
> "pool-1-thread-30" #70 prio=5 os_prio=0 tid=0x7eea7000a000 nid=0x1ea8a 
> waiting for monitor entry [0x7eea867dd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.lucene.expressions.js.JavascriptCompiler$CompiledExpression.evaluate(_score
>  + ln(1000+unit_sales))
>   at 
> org.apache.lucene.expressions.ExpressionFunctionValues.doubleValue(ExpressionFunctionValues.java:49)
>   at 
> com.amazon.lucene.OrderedVELeafCollector.collectInternal(OrderedVELeafCollector.java:123)
>   at 
> com.amazon.lucene.OrderedVELeafCollector.collect(OrderedVELeafCollector.java:108)
>   at 
> org.apache.lucene.search.MultiCollectorManager$Collectors$LeafCollectors.collect(MultiCollectorManager.java:102)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:241)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:184)
>   at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
>   at org.apache.lucene.search.IndexSearcher$5.call(IndexSearcher.java:600)
>   at org.apache.lucene.search.IndexSearcher$5.call(IndexSearcher.java:597)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I couldn't see any {{synchronized}} in the sources here, so I'm not sure 
> which object monitor it's blocked on.
> I was accidentally compiling a new expression for every query, and that 
> bottleneck would cause overall QPS to slow down drastically (~4X slower after 
> ~1 hour of redline tests), as if the JVM is getting slower and slower to 
> evaluate each expression the more expressions I had compiled.
> I tested JDK 9-ea and it also kept slowing down over time as the performance 
> test ran.
> Maybe we should put a small cache in front of the expressions compiler to 
> make it less trappy?  Or maybe we can get to the root cause of why the JVM 
> slows down more and more, the more expressions you compile?
> I won't have time to work on this in the near future so if anyone else feels 
> the itch, please scratch it!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7882) Maybe expression compiler should cache recently compiled expressions?

2018-08-30 Thread Tony Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597756#comment-16597756
 ] 

Tony Xu commented on LUCENE-7882:
-

A simple idea here can be caching the compiled Expression instances.

One way of doing it is to have a class that wraps the *JavascriptCompiler* and 
it makes sure one unique expression text only get compiled once and cached. 
Note that errors can be cached, too, because same invalid expression should 
fail at exactly the same point (e.g the ParseException)! Future compilation of 
the same expression should return the cached compilation result.

This new class can be a static so we cache compilation in JVM scope or it can 
be instantiable so the caller decides the scope.

 

> Maybe expression compiler should cache recently compiled expressions?
> -
>
> Key: LUCENE-7882
> URL: https://issues.apache.org/jira/browse/LUCENE-7882
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/expressions
>Reporter: Michael McCandless
>Priority: Major
>
> I've been running search performance tests using a simple expression 
> ({{_score + ln(1000+unit_sales)}}) for sorting and hit this odd bottleneck:
> {noformat}
> "pool-1-thread-30" #70 prio=5 os_prio=0 tid=0x7eea7000a000 nid=0x1ea8a 
> waiting for monitor entry [0x7eea867dd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.lucene.expressions.js.JavascriptCompiler$CompiledExpression.evaluate(_score
>  + ln(1000+unit_sales))
>   at 
> org.apache.lucene.expressions.ExpressionFunctionValues.doubleValue(ExpressionFunctionValues.java:49)
>   at 
> com.amazon.lucene.OrderedVELeafCollector.collectInternal(OrderedVELeafCollector.java:123)
>   at 
> com.amazon.lucene.OrderedVELeafCollector.collect(OrderedVELeafCollector.java:108)
>   at 
> org.apache.lucene.search.MultiCollectorManager$Collectors$LeafCollectors.collect(MultiCollectorManager.java:102)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:241)
>   at 
> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:184)
>   at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
>   at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
>   at org.apache.lucene.search.IndexSearcher$5.call(IndexSearcher.java:600)
>   at org.apache.lucene.search.IndexSearcher$5.call(IndexSearcher.java:597)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I couldn't see any {{synchronized}} in the sources here, so I'm not sure 
> which object monitor it's blocked on.
> I was accidentally compiling a new expression for every query, and that 
> bottleneck would cause overall QPS to slow down drastically (~4X slower after 
> ~1 hour of redline tests), as if the JVM is getting slower and slower to 
> evaluate each expression the more expressions I had compiled.
> I tested JDK 9-ea and it also kept slowing down over time as the performance 
> test ran.
> Maybe we should put a small cache in front of the expressions compiler to 
> make it less trappy?  Or maybe we can get to the root cause of why the JVM 
> slows down more and more, the more expressions you compile?
> I won't have time to work on this in the near future so if anyone else feels 
> the itch, please scratch it!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8319) A Time-limiting collector that works with CollectorManagers

2018-05-16 Thread Tony Xu (JIRA)
Tony Xu created LUCENE-8319:
---

 Summary: A Time-limiting collector that works with 
CollectorManagers
 Key: LUCENE-8319
 URL: https://issues.apache.org/jira/browse/LUCENE-8319
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Tony Xu


Currently Lucene has *TimeLimitingCollector* to support time-bound collection 
and it will throw 
*TimeExceededException* if timeout happens. This only works nicely with the 
single-thread low-level API from the IndexSearcher. The method signature is --

*void search(List leaves, Weight weight, Collector 
collector)*

The intended use is to always enclose the searcher.search(query, collector) 
call with a try ... catch and handle the timeout exception. Unfortunately when 
working with a *CollectorManager* in the multi-thread search context, the 
*TimeExceededException* thrown during collecting one leaf slice will be 
re-thrown by *IndexSearcher* without calling *CollectorManager*'s reduce(), 
even if other slices are successfully collected. The signature 
of the search api with *CollectorManager* is --

* T search(Query query, CollectorManager 
collectorManager)*
 
The good news is that IndexSearcher handles *CollectionTerminatedException* 
gracefully by ignoring it. We can either wrap TimeLimitingCollector and throw  
*CollectionTerminatedException* when timeout happens or simply replace 
*TimeExceededException* with *CollectionTerminatedException*. In either way, we 
also need to maintain a flag that indicates if timeout occurred so that the 
user know it's a partial collection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org