[jira] [Commented] (LUCENE-8623) Decrease I/O pressure when merging high dimensional points

2019-01-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741811#comment-16741811
 ] 

ASF subversion and git services commented on LUCENE-8623:
-

Commit 74ee4ddf4eb7b6c7f60c3e1fb73da0427c0085ac in lucene-solr's branch 
refs/heads/branch_8x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74ee4dd ]

LUCENE-8623: Decrease I/O pressure when merging high dimensional points


> Decrease I/O pressure when merging high dimensional points
> --
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch, 
> LUCENE-8623.patch, LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8623) Decrease I/O pressure when merging high dimensional points

2019-01-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741810#comment-16741810
 ] 

ASF subversion and git services commented on LUCENE-8623:
-

Commit 8762b071bb04b2e391749e6a064966ecfe932862 in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8762b07 ]

LUCENE-8623: Decrease I/O pressure when merging high dimensional points


> Decrease I/O pressure when merging high dimensional points
> --
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch, 
> LUCENE-8623.patch, LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8623) Decrease I/O pressure when merging high dimensional points

2019-01-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741812#comment-16741812
 ] 

ASF subversion and git services commented on LUCENE-8623:
-

Commit 35955b3891ed6621d5faa1c2c20ce0a333bc7b83 in lucene-solr's branch 
refs/heads/branch_7x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=35955b3 ]

LUCENE-8623: Decrease I/O pressure when merging high dimensional points


> Decrease I/O pressure when merging high dimensional points
> --
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch, 
> LUCENE-8623.patch, LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8623) Decrease I/O pressure when merging high dimensional points

2019-01-12 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741286#comment-16741286
 ] 

Adrien Grand commented on LUCENE-8623:
--

+1 patch looks good.

Given how much this helps I'm thinking we should look into handling merging 
more similarly to how we handle flushing by recursively partitioning around the 
median value of each dimension. It should be possible to implement something 
like an offline oal.util.RadixSelector.

> Decrease I/O pressure when merging high dimensional points
> --
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch, 
> LUCENE-8623.patch, LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8623) Decrease I/O pressure when merging high dimensional points

2019-01-08 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737374#comment-16737374
 ] 

Lucene/Solr QA commented on LUCENE-8623:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
25s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-8623 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12954126/LUCENE-8623.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon 
Sep 24 17:14:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / a37e2c6 |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | 1.8.0_191 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/149/testReport/ |
| modules | C: lucene/core U: lucene/core |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/149/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Decrease I/O pressure when merging high dimensional points
> --
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch, 
> LUCENE-8623.patch, LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8623) Decrease I/O pressure when merging high dimensional points

2019-01-07 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735916#comment-16735916
 ] 

Lucene/Solr QA commented on LUCENE-8623:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
39s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m 31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m 31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 29m 
30s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-8623 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12953955/LUCENE-8623.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / e015afa |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | 1.8.0_191 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/148/testReport/ |
| modules | C: lucene/core U: lucene/core |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/148/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Decrease I/O pressure when merging high dimensional points
> --
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch, 
> LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8623) Decrease I/O pressure when merging high dimensional points

2019-01-07 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735581#comment-16735581
 ] 

Adrien Grand commented on LUCENE-8623:
--

Wow this is a significant reduction indeed.

> Decrease I/O pressure when merging high dimensional points
> --
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Geo3D.png, LUCENE-8623.patch, LUCENE-8623.patch, 
> LUCENE-8623.patch, LatLonPoint.png, LatLonShape.png
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8623) Decrease I/O pressure when merging high dimensional points

2018-12-28 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730521#comment-16730521
 ] 

Lucene/Solr QA commented on LUCENE-8623:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 23s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | lucene.search.TestIntRangeFieldQueries |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-8623 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12953244/LUCENE-8623.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon 
Sep 24 17:14:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 345a655 |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | 1.8.0_191 |
| unit | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/145/artifact/out/patch-unit-lucene_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/145/testReport/ |
| modules | C: lucene/core U: lucene/core |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/145/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Decrease I/O pressure when merging high dimensional points
> --
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8623.patch, LUCENE-8623.patch
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8623) Decrease I/O pressure when merging high dimensional points

2018-12-27 Thread Lucene/Solr QA (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729635#comment-16729635
 ] 

Lucene/Solr QA commented on LUCENE-8623:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 29m 
11s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-8623 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12953120/LUCENE-8623.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 106d300 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | 1.8.0_191 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/143/testReport/ |
| modules | C: lucene/core U: lucene/core |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/143/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Decrease I/O pressure when merging high dimensional points
> --
>
> Key: LUCENE-8623
> URL: https://issues.apache.org/jira/browse/LUCENE-8623
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: LUCENE-8623.patch
>
>
> Related with LUCENE-8619, after indexing 60 million shapes(~1.65 billion 
> triangles) using {{LatLonShape}}, the index directory grew to a size of 265 
> GB when performing merging of different segments. After the processes were 
> over the index size was 57 GB.
> As an example imagine we are merging several segments to a new segment of 
> size 10GB (4 dimensions). The BKD tree merging logic will create the 
> following files:
> 1) Level 0: 4 copies of the data, each one sorted by one dimensions : 40GB
> 2) Level 1: 6 copies of half of the data, left and right : 30GB
> 3) Level 2: 6 copies of one quarter of the data, left and right : 15 GB
> 4) Level 3: 6 more copies halving the previous level, left and right : 7.5 GB
> 5) Level 4: 6 more copies halving the previous level, left and right : 3.75 GB
>  
> and so on... So it requires around 100GB to merge that segment. 
> In this issue is proposed to delay the creation of sorted copies to when they 
> are needed. It reduces the total size required to half of what it is needed 
> now. 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org