[
https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-8673:
Description:
Following the advise of [~jpountz] in LUCENE-8623I have investigated using
radix selection when merging segments instead of sorting the data at the
beginning. The results are pretty promising when running Lucene geo benchmarks:
||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time:
Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge
Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size:
Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
|points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
|shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
|geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
edited: table formatting to be a jira table
In 2D the index throughput is more or less equal but for higher dimensions the
impact is quite big. In all cases the merging process requires much less disk
space, I am attaching plots showing the different behaviour and I am opening a
pull request.
was:
Following the advise of [~jpountz] in LUCENE-8623I have investigated using
radix selection when merging segments instead of sorting the data at the
beginning. The results are pretty promising when running Lucene geo benchmarks:
{code:java}
||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader
heap (MB)||
||Dev||Base||Diff ||Dev ||Base ||diff
||Dev||Base||Diff||Dev||Base||Diff ||
|points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
|shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
|geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|{code}
In 2D the index throughput is more or less equal but for higher dimensions the
impact is quite big. In all cases the merging process requires much less disk
space, I am attaching plots showing the different behaviour and I am opening a
pull request.
> Use radix partitioning when merging dimensional points
> --
>
> Key: LUCENE-8673
> URL: https://issues.apache.org/jira/browse/LUCENE-8673
> Project: Lucene - Core
> Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LatLonPoint.png, LatLonShape.png
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using
> radix selection when merging segments instead of sorting the data at the
> beginning. The results are pretty promising when running Lucene geo
> benchmarks:
>
> ||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time:
> Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge
> Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size:
> Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
>
> edited: table formatting to be a jira table
>
> In 2D the index throughput is more or less equal but for higher dimensions
> the impact is quite big. In all cases the merging process requires much less
> disk space, I am attaching plots showing the different behaviour and I am
> opening a pull request.
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org