[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031711#comment-17031711 ] ASF subversion and git services commented on LUCENE-9147: - Commit 6a380798a27e1ce777843a4322afba463e383acc in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6a38079 ] LUCENE-9147: Make sure temporary files get deleted on all code paths. > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: 8.5 > > Time Spent: 40m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031712#comment-17031712 ] ASF subversion and git services commented on LUCENE-9147: - Commit 85dba7356f32da6d577550a6dd6c5e6244556d87 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=85dba73 ] LUCENE-9147: Make sure temporary files get deleted on all code paths. > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Fix For: 8.5 > > Time Spent: 40m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031411#comment-17031411 ] ASF subversion and git services commented on LUCENE-9147: - Commit 3246b2605869549dfbcedef21ea24d7101c20eee in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3246b26 ] LUCENE-9147: Fix codec excludes. > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031412#comment-17031412 ] ASF subversion and git services commented on LUCENE-9147: - Commit fdf5ade727ea8a5a6232d421a33b3fa1495d93b3 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fdf5ade ] LUCENE-9147: Fix codec excludes. > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031333#comment-17031333 ] ASF subversion and git services commented on LUCENE-9147: - Commit 1b882246d70e1b67c2c438092ea627f7baff3249 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1b88224 ] LUCENE-9147: Avoid reusing file names with FileSwitchDirectory or NRTCachingDirectory and IOContext randomization. > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030916#comment-17030916 ] ASF subversion and git services commented on LUCENE-9147: - Commit 597141df6b6a017fced16ec27b8fd180e9a6fcc2 in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=597141d ] LUCENE-9147: Move the stored fields index off-heap. (#1179) This replaces the index of stored fields and term vectors with two `DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number of values to write up-front, so incoming doc IDs and file pointers are buffered on disk using temporary files that never get fsynced, but have index headers and footers to make sure any corruption in these files wouldn't propagate to the index. `DirectMonotonicReader` gets a specialized `binarySearch` implementation that leverages the metadata in order to avoid going to the IndexInput as often as possible. Actually in the common case, it would only go to a single sub `DirectReader` which, combined with the size of blocks of 1k values, helps bound the number of page faults to 2. > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030849#comment-17030849 ] ASF subversion and git services commented on LUCENE-9147: - Commit 136dcbdbbced7c2d32b4d244ca99ace2c59baee8 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=136dcbd ] LUCENE-9147: Move the stored fields index off-heap. (#1179) This replaces the index of stored fields and term vectors with two `DirectMonotonic` arrays. `DirectMonotonicWriter` requires to know the number of values to write up-front, so incoming doc IDs and file pointers are buffered on disk using temporary files that never get fsynced, but have index headers and footers to make sure any corruption in these files wouldn't propagate to the index. `DirectMonotonicReader` gets a specialized `binarySearch` implementation that leverages the metadata in order to avoid going to the IndexInput as often as possible. Actually in the common case, it would only go to a single sub `DirectReader` which, combined with the size of blocks of 1k values, helps bound the number of page faults to 2. > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018022#comment-17018022 ] Adrien Grand commented on LUCENE-9147: -- [~erickerickson] Yeah I have similar motivations, with many users who want to open terabytes of indices on rather small nodes. In my case the main heap user is usually the terms index of a primary/foreign key, so the ability to load the terms index off-heap addresses most of the problem. But since it should be an even less contentious move for stored fields and term vectors, I thought we should do it! :) > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9147) Move the stored fields index off-heap
[ https://issues.apache.org/jira/browse/LUCENE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018007#comment-17018007 ] Erick Erickson commented on LUCENE-9147: If you only knew how much of my time with clients is spent dealing with "how much memory should I allocate" ;). So while I don't have an opinion on the technical aspects, anything we can do to reduce heap requirements is welcome. > Move the stored fields index off-heap > - > > Key: LUCENE-9147 > URL: https://issues.apache.org/jira/browse/LUCENE-9147 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Now that the terms index is off-heap by default, it's almost embarrassing > that many indices spend most of their memory usage on the stored fields index > or the term vectors index, which are much less performance-sensitive than the > terms index. We should move them off-heap too? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org