[jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads

2019-04-22 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823262#comment-16823262
 ] 

Atri Sharma commented on LUCENE-8675:
-

Repeating the earlier results in a more human readable form

 
||Task ('Wildcard', None)||P50 Base 9.993697||P50 Cmp 11.906981||Pct 
19.1449070349||P90 Base 14.431318||P90 Cmp 13.953923||Pct -3.3080485095||
||Task ('HighTermDayOfYearSort', 'DayOfYear')||P50 Base 39.556908||P50 Cmp 
44.389095||Pct 12.2157854198||P90 Base 62.421873||P90 Cmp 49.214184||Pct 
-21.1587515165||
||Task ('AndHighHigh', None)||P50 Base 3.814074||P50 Cmp 2.459326||Pct 
-35.5197093711||P90 Base 5.045984||P90 Cmp 7.932029||Pct 57.1948900353||
||Task ('OrHighHigh', None)||P50 Base 9.586193||P50 Cmp 5.846643||Pct 
-39.0097507947||P90 Base 14.978843||P90 Cmp 7.078967||Pct -52.7402283341||
||Task ('MedPhrase', None)||P50 Base 3.210464||P50 Cmp 2.276356||Pct 
-29.0957319565||P90 Base 4.217049||P90 Cmp 3.852337||Pct -8.64851226533||
||Task ('LowSpanNear', None)||P50 Base 11.247447||P50 Cmp 4.986828||Pct 
-55.6625783611||P90 Base 16.095342||P90 Cmp 6.121194||Pct -61.9691585305||
||Task ('Fuzzy2', None)||P50 Base 23.636902||P50 Cmp 20.959304||Pct 
-11.3280412128||P90 Base 112.5086||P90 Cmp 105.188025||Pct -6.50668037821||
||Task ('OrNotHighHigh', None)||P50 Base 4.225917||P50 Cmp 2.62127||Pct 
-37.9715692476||P90 Base 6.11225||P90 Cmp 8.525249||Pct 39.4780809031||
||Task ('OrHighNotLow', None)||P50 Base 4.015982||P50 Cmp 2.250697||Pct 
-43.956496817||P90 Base 10.636566||P90 Cmp 3.134868||Pct -70.5274427856||
||Task ('BrowseMonthSSDVFacets', None)||P50 Base 66.920633||P50 Cmp 
66.986841||Pct 0.0989351072038||P90 Base 67.230757||P90 Cmp 76.011531||Pct 
13.0606502021||
||Task ('Fuzzy1', None)||P50 Base 14.779783||P50 Cmp 12.559705||Pct 
-15.0210459788||P90 Base 46.329521||P90 Cmp 218.272906||Pct 371.131367838||
||Task ('HighSloppyPhrase', None)||P50 Base 21.362967||P50 Cmp 10.563982||Pct 
-50.5500242546||P90 Base 33.009649||P90 Cmp 15.74507||Pct -52.3016133858||
||Task ('OrNotHighMed', None)||P50 Base 2.032775||P50 Cmp 1.584332||Pct 
-22.0606314029||P90 Base 2.529475||P90 Cmp 2.044107||Pct -19.1884877297||
||Task ('LowPhrase', None)||P50 Base 4.937747||P50 Cmp 2.8876||Pct 
-41.5198875115||P90 Base 6.910574||P90 Cmp 5.159077||Pct -25.345173932||
||Task ('AndHighLow', None)||P50 Base 1.097696||P50 Cmp 1.416176||Pct 
29.0134973617||P90 Base 3.426081||P90 Cmp 13.987273||Pct 308.258678064||
||Task ('LowTerm', None)||P50 Base 0.787595||P50 Cmp 1.038949||Pct 
31.9141182968||P90 Base 1.12006||P90 Cmp 39.639455||Pct 3439.04746174||
||Task ('BrowseDayOfYearSSDVFacets', None)||P50 Base 80.006624||P50 Cmp 
80.215023||Pct 0.260477182489||P90 Base 80.610476||P90 Cmp 81.187614||Pct 
0.71595905227||
||Task ('Prefix3', None)||P50 Base 3.347358||P50 Cmp 3.219213||Pct 
-3.82824305019||P90 Base 6.716371||P90 Cmp 5.21174||Pct -22.4024402464||
||Task ('HighTermMonthSort', 'Month')||P50 Base 20.684075||P50 Cmp 
19.601521||Pct -5.23375592092||P90 Base 21.341383||P90 Cmp 20.092673||Pct 
-5.85112033274||
||Task ('HighTerm', None)||P50 Base 2.991271||P50 Cmp 1.891199||Pct 
-36.7760727798||P90 Base 4.058212||P90 Cmp 2.320309||Pct -42.8243522024||
||Task Respell||P50 Base 17.33154||P50 Cmp 17.397468||Pct 0.38039320222||P90 
Base 99.071728||P90 Cmp 66.75552||Pct -32.6190010535||
||Task ('MedTerm', None)||P50 Base 3.011125||P50 Cmp 1.793175||Pct 
-40.4483374154||P90 Base 4.206761||P90 Cmp 2.392798||Pct -43.1201820118||
||Task ('MedSloppyPhrase', None)||P50 Base 5.896878||P50 Cmp 3.304889||Pct 
-43.9552759952||P90 Base 8.044708||P90 Cmp 4.881775||Pct -39.316939782||
||Task ('HighSpanNear', None)||P50 Base 20.981466||P50 Cmp 9.533211||Pct 
-54.5636563241||P90 Base 28.98951||P90 Cmp 11.087743||Pct -61.7525684291||
||Task ('LowSloppyPhrase', None)||P50 Base 12.841091||P50 Cmp 6.075233||Pct 
-52.6891211969||P90 Base 18.539534||P90 Cmp 6.825001||Pct -63.1867715769||
||Task ('OrHighNotHigh', None)||P50 Base 11.822146||P50 Cmp 6.645646||Pct 
-43.786466518||P90 Base 17.02398||P90 Cmp 7.935497||Pct -53.3863585366||
||Task ('OrNotHighLow', None)||P50 Base 0.782455||P50 Cmp 1.06583||Pct 
36.2161402253||P90 Base 1.668578||P90 Cmp 13.200645||Pct 691.131430476||
||Task ('MedSpanNear', None)||P50 Base 3.161032||P50 Cmp 2.154472||Pct 
-31.8427652741||P90 Base 5.386012||P90 Cmp 5.665401||Pct 5.18730741781||
||Task ('BrowseDateTaxoFacets', None)||P50 Base 444.971146||P50 Cmp 
444.674024||Pct -0.066773318376||P90 Base 447.81169||P90 Cmp 445.950713||Pct 
-0.415571330887||
||Task ('HighPhrase', None)||P50 Base 7.464241||P50 Cmp 4.644244||Pct 
-37.7800904338||P90 Base 25.153245||P90 Cmp 7.548758||Pct -69.9889298578||
||Task ('OrHighLow', None)||P50 Base 6.344855||P50 Cmp 3.590218||Pct 
-43.4152868742||P90 Base 8.425453||P90 Cmp 15.578677||Pct 84.9001709463||
||Task ('BrowseDayOfYearTaxoFacets', None)||P50 Base 0.16655||P50 Cmp 
0.184125||Pct 10.55238

[jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads

2019-02-27 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779287#comment-16779287
 ] 

Atri Sharma commented on LUCENE-8675:
-

Here are the results of luceneutil (patched to generate P50 and P90 and to run 
concurrent searching within IndexSearcher. Patch is posted to luceneutil repo).

Adrien has a valid point about costly scorers not benefitting from this 
approach. Specifically, range queries can take a hit since BKD Tree's scorer is 
two phase and is expensive to construct, so doing them per portion of a segment 
would lead to increase in latency, as is evident from the increase in P90 
latency in the above results. I am spending time to evaluate how to tackle this 
problem and will post any thoughts that I see as viable. These benchmarks are 
targeted to measure the changes in the "happy" path i.e. the targeted big index 
sizes and low QPS cases. Luceneutil was configured accordingly (low number of 
search threads, impacts turned off)

In summary, the queries scanning a higher amount of data and having higher read 
latencies tend to have the maximum improvement. Term queries and queries 
involving term queries on higher frequency terms get a reasonable latency 
reduction.

The following are P50 and P90 latencies calculated by Luceneutil. P50 Base is 
the P50 latency of the base, P50 Cmp is the P50 latency of the competitor 
(patched version), and the same for P90.

Note: The QPS jumps are not real. Since Luceneutil was congigured to run a 
single searcher thread, QPS jump is proportional to the latency drop for task.

Luceneutil results:

{{||Task ('Wildcard', None)||P50 Base 9.993697||P50 Cmp 11.906981||Pct 
19.1449070349||P90 Base 14.431318||P90 Cmp 13.953923|| Pct -3.3080485095}}
{{||Task ('HighTermDayOfYearSort', 'DayOfYear')||P50 Base 39.556908||P50 Cmp 
44.389095||Pct 12.2157854198||P90 Base 62.421873||P90 Cmp 49.214184|| Pct 
-21.1587515165}}
{{||Task ('AndHighHigh', None)||P50 Base 3.814074||P50 Cmp 2.459326||Pct 
-35.5197093711||P90 Base 5.045984||P90 Cmp 7.932029|| Pct 57.1948900353}}
{{||Task ('OrHighHigh', None)||P50 Base 9.586193||P50 Cmp 5.846643||Pct 
-39.0097507947||P90 Base 14.978843||P90 Cmp 7.078967|| Pct -52.7402283341}}
{{||Task ('MedPhrase', None)||P50 Base 3.210464||P50 Cmp 2.276356||Pct 
-29.0957319565||P90 Base 4.217049||P90 Cmp 3.852337|| Pct -8.64851226533}}
{{||Task ('LowSpanNear', None)||P50 Base 11.247447||P50 Cmp 4.986828||Pct 
-55.6625783611||P90 Base 16.095342||P90 Cmp 6.121194|| Pct -61.9691585305}}
{{||Task ('Fuzzy2', None)||P50 Base 23.636902||P50 Cmp 20.959304||Pct 
-11.3280412128||P90 Base 112.5086||P90 Cmp 105.188025|| Pct -6.50668037821}}
{{||Task ('OrNotHighHigh', None)||P50 Base 4.225917||P50 Cmp 2.62127||Pct 
-37.9715692476||P90 Base 6.11225||P90 Cmp 8.525249|| Pct 39.4780809031}}
{{||Task ('OrHighNotLow', None)||P50 Base 4.015982||P50 Cmp 2.250697||Pct 
-43.956496817||P90 Base 10.636566||P90 Cmp 3.134868|| Pct -70.5274427856}}
{{||Task ('BrowseMonthSSDVFacets', None)||P50 Base 66.920633||P50 Cmp 
66.986841||Pct 0.0989351072038||P90 Base 67.230757||P90 Cmp 76.011531|| Pct 
13.0606502021}}
{{||Task ('Fuzzy1', None)||P50 Base 14.779783||P50 Cmp 12.559705||Pct 
-15.0210459788||P90 Base 46.329521||P90 Cmp 218.272906|| Pct 371.131367838}}
{{||Task ('HighSloppyPhrase', None)||P50 Base 21.362967||P50 Cmp 10.563982||Pct 
-50.5500242546||P90 Base 33.009649||P90 Cmp 15.74507|| Pct -52.3016133858}}
{{||Task ('OrNotHighMed', None)||P50 Base 2.032775||P50 Cmp 1.584332||Pct 
-22.0606314029||P90 Base 2.529475||P90 Cmp 2.044107|| Pct -19.1884877297}}
{{||Task ('LowPhrase', None)||P50 Base 4.937747||P50 Cmp 2.8876||Pct 
-41.5198875115||P90 Base 6.910574||P90 Cmp 5.159077|| Pct -25.345173932}}
{{||Task ('AndHighLow', None)||P50 Base 1.097696||P50 Cmp 1.416176||Pct 
29.0134973617||P90 Base 3.426081||P90 Cmp 13.987273|| Pct 308.258678064}}
{{||Task ('LowTerm', None)||P50 Base 0.787595||P50 Cmp 1.038949||Pct 
31.9141182968||P90 Base 1.12006||P90 Cmp 39.639455|| Pct 3439.04746174}}
{{||Task ('BrowseDayOfYearSSDVFacets', None)||P50 Base 80.006624||P50 Cmp 
80.215023||Pct 0.260477182489||P90 Base 80.610476||P90 Cmp 81.187614|| Pct 
0.71595905227}}
{{||Task ('Prefix3', None)||P50 Base 3.347358||P50 Cmp 3.219213||Pct 
-3.82824305019||P90 Base 6.716371||P90 Cmp 5.21174|| Pct -22.4024402464}}
{{||Task ('HighTermMonthSort', 'Month')||P50 Base 20.684075||P50 Cmp 
19.601521||Pct -5.23375592092||P90 Base 21.341383||P90 Cmp 20.092673|| Pct 
-5.85112033274}}
{{||Task ('HighTerm', None)||P50 Base 2.991271||P50 Cmp 1.891199||Pct 
-36.7760727798||P90 Base 4.058212||P90 Cmp 2.320309|| Pct -42.8243522024}}
{{||Task Respell||P50 Base 17.33154||P50 Cmp 17.397468||Pct 0.38039320222||P90 
Base 99.071728||P90 Cmp 66.75552|| Pct -32.6190010535}}
{{||Task ('MedTerm', None)||P50 Base 3.011125||P50 Cmp 1.793175||Pct 
-40.4483374154||P90 Base 4.206761||P90 Cmp 2.392798|| Pct -43.120

[jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads

2019-02-03 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759440#comment-16759440
 ] 

Michael McCandless commented on LUCENE-8675:


{quote}If some segments are getting large enough that intra-segment parallelism 
becomes appealing, then maybe an easier and more efficient way to increase 
parallelism is to instead reduce the maximum segment size so that inter-segment 
parallelism has more potential for parallelizing query execution.
{quote}
Yeah that is a good workaround given how Lucene works today.

It's essentially the same as your original suggestion ("make more shards and 
search them concurrently"), just at the segment instead of shard level.

But this still adds some costs -- the per-segment fixed cost for each query. 
That cost should be less than the per shard fixed cost in the sharded case, but 
it's still adding some cost.

If instead Lucene had a way to divide large segments into multiple work units 
(and I agree there are challenges with that! -- not just BKD and multi-term 
queries, but e.g. how would early termination work?) then we could pay that 
per-segment fixed cost once for such segments then let multiple threads share 
the variable cost work of finding and ranking hits.

In our recently launched production index we see sizable jumps in the P99+ 
query latencies when a large segment merges finish and replicate, because we 
are using "thread per segment" concurrency that we are hoping we could improve 
by pushing thread concurrency into individual large segments.

> Divide Segment Search Amongst Multiple Threads
> --
>
> Key: LUCENE-8675
> URL: https://issues.apache.org/jira/browse/LUCENE-8675
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> Segment search is a single threaded operation today, which can be a 
> bottleneck for large analytical queries which index a lot of data and have 
> complex queries which touch multiple segments (imagine a composite query with 
> range query and filters on top). This ticket is for discussing the idea of 
> splitting a single segment into multiple threads based on mutually exclusive 
> document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning 
> all matching documents (collectors not terminating early). The second phase 
> patch will introduce staged execution and will build on top of this patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads

2019-02-01 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758862#comment-16758862
 ] 

Atri Sharma commented on LUCENE-8675:
-

{quote}If some segments are getting large enough that intra-segment parallelism 
becomes appealing, then maybe an easier and more efficient way to increase 
parallelism is to instead reduce the maximum segment size so that inter-segment 
parallelism has more potential for parallelizing query execution.
{quote}
Would that not lead to a much higher number of segments than required? That 
could lead to issues such as a lot of open file handles and too many threads 
required for scanning (although we would assign multiple small segments to a 
single thread).

Thanks for the point about range queries, that is an important thought. I will 
follow up with a separate patch on top of this which will do the first phase of 
BKD iteration and share the generated bitset across N parallel threads, where N 
is equal to the remaining clauses and each thread intersects a clause with the 
bitset.

> Divide Segment Search Amongst Multiple Threads
> --
>
> Key: LUCENE-8675
> URL: https://issues.apache.org/jira/browse/LUCENE-8675
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> Segment search is a single threaded operation today, which can be a 
> bottleneck for large analytical queries which index a lot of data and have 
> complex queries which touch multiple segments (imagine a composite query with 
> range query and filters on top). This ticket is for discussing the idea of 
> splitting a single segment into multiple threads based on mutually exclusive 
> document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning 
> all matching documents (collectors not terminating early). The second phase 
> patch will introduce staged execution and will build on top of this patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads

2019-02-01 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758489#comment-16758489
 ] 

Adrien Grand commented on LUCENE-8675:
--

If some segments are getting large enough that intra-segment parallelism 
becomes appealing, then maybe an easier and more efficient way to increase 
parallelism is to instead reduce the maximum segment size so that inter-segment 
parallelism has more potential for parallelizing query execution.

> Divide Segment Search Amongst Multiple Threads
> --
>
> Key: LUCENE-8675
> URL: https://issues.apache.org/jira/browse/LUCENE-8675
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> Segment search is a single threaded operation today, which can be a 
> bottleneck for large analytical queries which index a lot of data and have 
> complex queries which touch multiple segments (imagine a composite query with 
> range query and filters on top). This ticket is for discussing the idea of 
> splitting a single segment into multiple threads based on mutually exclusive 
> document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning 
> all matching documents (collectors not terminating early). The second phase 
> patch will introduce staged execution and will build on top of this patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads

2019-02-01 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758451#comment-16758451
 ] 

Michael McCandless commented on LUCENE-8675:


I think it'd be interesting to explore intra-segment parallelism, but I agree 
w/ [~jpountz] that there are challenges :)

If you pass an {{ExecutorService}} to {{IndexSearcher}} today you can already 
use multiple threads to answer one query, but the concurrency is tied to your 
segment geometry and annoyingly a supposedly "optimized" index gets no 
concurrency ;)

But if you do have many segments, this can give a nice reduction to query 
latencies when QPS is well below the searcher's red-line capacity (probably at 
the expense of some hopefully small loss of red-line throughput because of the 
added overhead of thread scheduling).  For certain use cases (large index, low 
typical query rate) this is a powerful approach.

It's true that one can also divide an index into more shards and run each shard 
concurrently but then you are also multiplying the fixed query setup cost which 
in some cases can be relatively significant.
{quote}Parallelizing based on ranges of doc IDs is problematic for some 
queries, for instance the cost of evaluating a range query over an entire 
segment or only about a specific range of doc IDs is exactly the same given 
that it uses data-structures that are organized by value rather than by doc ID.
{quote}
Yeah that's a real problem – these queries traverse the BKD tree per-segment 
while creating the scorer, which is/can be the costly part, and then produce a 
bit set which is very fast to iterate over.  This phase is not separately 
visible to the caller, unlike e.g. rewrite that MultiTermQueries use to 
translate into simpler queries, so it'd be tricky to build intra-segment 
concurrency on top ...

> Divide Segment Search Amongst Multiple Threads
> --
>
> Key: LUCENE-8675
> URL: https://issues.apache.org/jira/browse/LUCENE-8675
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> Segment search is a single threaded operation today, which can be a 
> bottleneck for large analytical queries which index a lot of data and have 
> complex queries which touch multiple segments (imagine a composite query with 
> range query and filters on top). This ticket is for discussing the idea of 
> splitting a single segment into multiple threads based on mutually exclusive 
> document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning 
> all matching documents (collectors not terminating early). The second phase 
> patch will introduce staged execution and will build on top of this patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads

2019-01-31 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757614#comment-16757614
 ] 

Atri Sharma commented on LUCENE-8675:
-

Thanks for the comments.

Having a multi shard approach makes sense, but a search is still bottlenecked 
by the largest segment it needs to scan. If there are many segments of that 
type, that might become a problem.

While I agree that range queries might not be directly benefited from parallel 
scans, but other queries (such as TermQueries) might be benefitted from a 
segment parallel scan. In a typical ElasticSearch interactive query, we see 
spikes when a large segment is hit for an interactive use case. Such cases can 
be optimized with parallel scans.

We should have a method of deciding whether a scan should be parallelized or 
not, and then let the execution operator get a set of nodes to execute. That is 
probably outside the scope of this JIRA, but I wanted to open this thread to 
get the conversation going.

> Divide Segment Search Amongst Multiple Threads
> --
>
> Key: LUCENE-8675
> URL: https://issues.apache.org/jira/browse/LUCENE-8675
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> Segment search is a single threaded operation today, which can be a 
> bottleneck for large analytical queries which index a lot of data and have 
> complex queries which touch multiple segments (imagine a composite query with 
> range query and filters on top). This ticket is for discussing the idea of 
> splitting a single segment into multiple threads based on mutually exclusive 
> document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning 
> all matching documents (collectors not terminating early). The second phase 
> patch will introduce staged execution and will build on top of this patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads

2019-01-31 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757568#comment-16757568
 ] 

Adrien Grand commented on LUCENE-8675:
--

The best way to address such issues is on top of Lucene by having multiple 
shards whose results can be merged with TopDocs#merge.

Parallelizing based on ranges of doc IDs is problematic for some queries, for 
instance the cost of evaluating a range query over an entire segment or only 
about a specific range of doc IDs is exactly the same given that it uses 
data-structures that are organized by value rather than by doc ID.

> Divide Segment Search Amongst Multiple Threads
> --
>
> Key: LUCENE-8675
> URL: https://issues.apache.org/jira/browse/LUCENE-8675
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> Segment search is a single threaded operation today, which can be a 
> bottleneck for large analytical queries which index a lot of data and have 
> complex queries which touch multiple segments (imagine a composite query with 
> range query and filters on top). This ticket is for discussing the idea of 
> splitting a single segment into multiple threads based on mutually exclusive 
> document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning 
> all matching documents (collectors not terminating early). The second phase 
> patch will introduce staged execution and will build on top of this patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org