[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap

2020-02-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031711#comment-17031711
 ] 

ASF subversion and git services commented on LUCENE-9147:
-

Commit 6a380798a27e1ce777843a4322afba463e383acc in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6a38079 ]

LUCENE-9147: Make sure temporary files get deleted on all code paths.


> Move the stored fields index off-heap
> -
>
> Key: LUCENE-9147
> URL: https://issues.apache.org/jira/browse/LUCENE-9147
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now that the terms index is off-heap by default, it's almost embarrassing 
> that many indices spend most of their memory usage on the stored fields index 
> or the term vectors index, which are much less performance-sensitive than the 
> terms index. We should move them off-heap too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap

2020-02-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031712#comment-17031712
 ] 

ASF subversion and git services commented on LUCENE-9147:
-

Commit 85dba7356f32da6d577550a6dd6c5e6244556d87 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=85dba73 ]

LUCENE-9147: Make sure temporary files get deleted on all code paths.


> Move the stored fields index off-heap
> -
>
> Key: LUCENE-9147
> URL: https://issues.apache.org/jira/browse/LUCENE-9147
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now that the terms index is off-heap by default, it's almost embarrassing 
> that many indices spend most of their memory usage on the stored fields index 
> or the term vectors index, which are much less performance-sensitive than the 
> terms index. We should move them off-heap too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap

2020-02-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031411#comment-17031411
 ] 

ASF subversion and git services commented on LUCENE-9147:
-

Commit 3246b2605869549dfbcedef21ea24d7101c20eee in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3246b26 ]

LUCENE-9147: Fix codec excludes.


> Move the stored fields index off-heap
> -
>
> Key: LUCENE-9147
> URL: https://issues.apache.org/jira/browse/LUCENE-9147
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now that the terms index is off-heap by default, it's almost embarrassing 
> that many indices spend most of their memory usage on the stored fields index 
> or the term vectors index, which are much less performance-sensitive than the 
> terms index. We should move them off-heap too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap

2020-02-06 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031412#comment-17031412
 ] 

ASF subversion and git services commented on LUCENE-9147:
-

Commit fdf5ade727ea8a5a6232d421a33b3fa1495d93b3 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fdf5ade ]

LUCENE-9147: Fix codec excludes.


> Move the stored fields index off-heap
> -
>
> Key: LUCENE-9147
> URL: https://issues.apache.org/jira/browse/LUCENE-9147
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now that the terms index is off-heap by default, it's almost embarrassing 
> that many indices spend most of their memory usage on the stored fields index 
> or the term vectors index, which are much less performance-sensitive than the 
> terms index. We should move them off-heap too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap

2020-02-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031333#comment-17031333
 ] 

ASF subversion and git services commented on LUCENE-9147:
-

Commit 1b882246d70e1b67c2c438092ea627f7baff3249 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1b88224 ]

LUCENE-9147: Avoid reusing file names with FileSwitchDirectory or 
NRTCachingDirectory and IOContext randomization.


> Move the stored fields index off-heap
> -
>
> Key: LUCENE-9147
> URL: https://issues.apache.org/jira/browse/LUCENE-9147
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now that the terms index is off-heap by default, it's almost embarrassing 
> that many indices spend most of their memory usage on the stored fields index 
> or the term vectors index, which are much less performance-sensitive than the 
> terms index. We should move them off-heap too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap

2020-02-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030916#comment-17030916
 ] 

ASF subversion and git services commented on LUCENE-9147:
-

Commit 597141df6b6a017fced16ec27b8fd180e9a6fcc2 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=597141d ]

LUCENE-9147: Move the stored fields index off-heap. (#1179)

This replaces the index of stored fields and term vectors with two
`DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number
of values to write up-front, so incoming doc IDs and file pointers are buffered
on disk using temporary files that never get fsynced, but have index headers
and footers to make sure any corruption in these files wouldn't propagate to the
index.

`DirectMonotonicReader` gets a specialized `binarySearch` implementation that
leverages the metadata in order to avoid going to the IndexInput as often as
possible. Actually in the common case, it would only go to a single
sub `DirectReader` which, combined with the size of blocks of 1k values, helps
bound the number of page faults to 2.


> Move the stored fields index off-heap
> -
>
> Key: LUCENE-9147
> URL: https://issues.apache.org/jira/browse/LUCENE-9147
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now that the terms index is off-heap by default, it's almost embarrassing 
> that many indices spend most of their memory usage on the stored fields index 
> or the term vectors index, which are much less performance-sensitive than the 
> terms index. We should move them off-heap too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap

2020-02-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030849#comment-17030849
 ] 

ASF subversion and git services commented on LUCENE-9147:
-

Commit 136dcbdbbced7c2d32b4d244ca99ace2c59baee8 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=136dcbd ]

LUCENE-9147: Move the stored fields index off-heap. (#1179)

This replaces the index of stored fields and term vectors with two
`DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number
of values to write up-front, so incoming doc IDs and file pointers are buffered
on disk using temporary files that never get fsynced, but have index headers
and footers to make sure any corruption in these files wouldn't propagate to the
index.

`DirectMonotonicReader` gets a specialized `binarySearch` implementation that
leverages the metadata in order to avoid going to the IndexInput as often as
possible. Actually in the common case, it would only go to a single
sub `DirectReader` which, combined with the size of blocks of 1k values, helps
bound the number of page faults to 2.


> Move the stored fields index off-heap
> -
>
> Key: LUCENE-9147
> URL: https://issues.apache.org/jira/browse/LUCENE-9147
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now that the terms index is off-heap by default, it's almost embarrassing 
> that many indices spend most of their memory usage on the stored fields index 
> or the term vectors index, which are much less performance-sensitive than the 
> terms index. We should move them off-heap too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap

2020-01-17 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018022#comment-17018022
 ] 

Adrien Grand commented on LUCENE-9147:
--

[~erickerickson] Yeah I have similar motivations, with many users who want to 
open terabytes of indices on rather small nodes. In my case the main heap user 
is usually the terms index of a primary/foreign key, so the ability to load the 
terms index off-heap addresses most of the problem. But since it should be an 
even less contentious move for stored fields and term vectors, I thought we 
should do it! :)

> Move the stored fields index off-heap
> -
>
> Key: LUCENE-9147
> URL: https://issues.apache.org/jira/browse/LUCENE-9147
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now that the terms index is off-heap by default, it's almost embarrassing 
> that many indices spend most of their memory usage on the stored fields index 
> or the term vectors index, which are much less performance-sensitive than the 
> terms index. We should move them off-heap too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap

2020-01-17 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018007#comment-17018007
 ] 

Erick Erickson commented on LUCENE-9147:


If you only knew how much of my time with clients is spent dealing with "how 
much memory should I allocate" ;). So while I don't have an opinion on the 
technical aspects, anything we can do to reduce heap requirements is welcome.

> Move the stored fields index off-heap
> -
>
> Key: LUCENE-9147
> URL: https://issues.apache.org/jira/browse/LUCENE-9147
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now that the terms index is off-heap by default, it's almost embarrassing 
> that many indices spend most of their memory usage on the stored fields index 
> or the term vectors index, which are much less performance-sensitive than the 
> terms index. We should move them off-heap too?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org