[jira] [Updated] (LUCENE-8673) Use radix partitioning when merging dimensional points

2019-01-31 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-8673:

Description: 
Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
radix selection when merging segments instead of sorting the data at the 
beginning. The results are pretty promising when running Lucene geo benchmarks:

 
||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: 
Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge 
Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size: 
Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
|points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
|shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
|geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
 
edited: table formatting to be a jira table
 

In 2D the index throughput is more or less equal but for higher dimensions the 
impact is quite big. In all cases the merging process requires much less disk 
space, I am attaching plots showing the different behaviour and I am opening a 
pull request.

 

 

 

  was:
Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
radix selection when merging segments instead of sorting the data at the 
beginning. The results are pretty promising when running Lucene geo benchmarks:

 
{code:java}
||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader 
heap (MB)||
          ||Dev||Base||Diff ||Dev  ||Base  ||diff   
||Dev||Base||Diff||Dev||Base||Diff ||
|points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
|shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
|geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|{code}
 

 

In 2D the index throughput is more or less equal but for higher dimensions the 
impact is quite big. In all cases the merging process requires much less disk 
space, I am attaching plots showing the different behaviour and I am opening a 
pull request.

 

 

 


> Use radix partitioning when merging dimensional points
> --
>
> Key: LUCENE-8673
> URL: https://issues.apache.org/jira/browse/LUCENE-8673
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LatLonPoint.png, LatLonShape.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
> radix selection when merging segments instead of sorting the data at the 
> beginning. The results are pretty promising when running Lucene geo 
> benchmarks:
>  
> ||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: 
> Diff||Force merge time (sec): Dev||Force Merge time (sec): Base||Force Merge 
> Time: Diff||Index size (GB): Dev||Index size (GB): Base||Index Size: 
> Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap: Diff
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
>  
> edited: table formatting to be a jira table
>  
> In 2D the index throughput is more or less equal but for higher dimensions 
> the impact is quite big. In all cases the merging process requires much less 
> disk space, I am attaching plots showing the different behaviour and I am 
> opening a pull request.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8673) Use radix partitioning when merging dimensional points

2019-01-31 Thread Ignacio Vera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-8673:
-
Summary: Use radix partitioning when merging dimensional points  (was: Use 
radix sorting when merging dimensional points)

> Use radix partitioning when merging dimensional points
> --
>
> Key: LUCENE-8673
> URL: https://issues.apache.org/jira/browse/LUCENE-8673
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LatLonPoint.png, LatLonShape.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using 
> radix selection when merging segments instead of sorting the data at the 
> beginning. The results are pretty promising when running Lucene geo 
> benchmarks:
>  
> {code:java}
> ||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader 
> heap (MB)||
>           ||Dev||Base||Diff ||Dev  ||Base  ||diff   
> ||Dev||Base||Diff||Dev||Base||Diff ||
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 
> 0%|{code}
>  
>  
> In 2D the index throughput is more or less equal but for higher dimensions 
> the impact is quite big. In all cases the merging process requires much less 
> disk space, I am attaching plots showing the different behaviour and I am 
> opening a pull request.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org