[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
[ https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713648#comment-13713648 ] ASF subversion and git services commented on LUCENE-5119: - Commit 1504873 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1504873 ] LUCENE-5119: DiskDV SortedDocValues shouldnt hold doc-to-ord in heap > DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory > -- > > Key: LUCENE-5119 > URL: https://issues.apache.org/jira/browse/LUCENE-5119 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-5119.patch > > > These are accessed sequentially when e.g. faceting, and can be a fairly large > amount of data (based on # of docs and # of unique terms). > I think this was done so that conceptually "random" access to a specific > docid would be faster than eg. stored fields, but I think we should instead > target the DV datastructures towards real use cases > (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
[ https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713641#comment-13713641 ] ASF subversion and git services commented on LUCENE-5119: - Commit 1504868 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1504868 ] LUCENE-5119: DiskDV SortedDocValues shouldnt hold doc-to-ord in heap > DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory > -- > > Key: LUCENE-5119 > URL: https://issues.apache.org/jira/browse/LUCENE-5119 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-5119.patch > > > These are accessed sequentially when e.g. faceting, and can be a fairly large > amount of data (based on # of docs and # of unique terms). > I think this was done so that conceptually "random" access to a specific > docid would be faster than eg. stored fields, but I think we should instead > target the DV datastructures towards real use cases > (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
[ https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713611#comment-13713611 ] Michael McCandless commented on LUCENE-5119: +1 to move ords to disk. > DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory > -- > > Key: LUCENE-5119 > URL: https://issues.apache.org/jira/browse/LUCENE-5119 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-5119.patch > > > These are accessed sequentially when e.g. faceting, and can be a fairly large > amount of data (based on # of docs and # of unique terms). > I think this was done so that conceptually "random" access to a specific > docid would be faster than eg. stored fields, but I think we should instead > target the DV datastructures towards real use cases > (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
[ https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713414#comment-13713414 ] Adrien Grand commented on LUCENE-5119: -- David, I think your use-case would still work pretty well with this change. In particular, if you had enough memory to store your ordinals mapping in memory, this means that the file-system cache will likely be able to cache the whole ordinals mapping as well (you may just need to decrease a little the amount of memory given the the JVM) so random access should remain fast? > DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory > -- > > Key: LUCENE-5119 > URL: https://issues.apache.org/jira/browse/LUCENE-5119 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-5119.patch > > > These are accessed sequentially when e.g. faceting, and can be a fairly large > amount of data (based on # of docs and # of unique terms). > I think this was done so that conceptually "random" access to a specific > docid would be faster than eg. stored fields, but I think we should instead > target the DV datastructures towards real use cases > (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
[ https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713408#comment-13713408 ] Adrien Grand commented on LUCENE-5119: -- +1 I think it makes sense to make DiskDV deserve its name and store everything on disk. > DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory > -- > > Key: LUCENE-5119 > URL: https://issues.apache.org/jira/browse/LUCENE-5119 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-5119.patch > > > These are accessed sequentially when e.g. faceting, and can be a fairly large > amount of data (based on # of docs and # of unique terms). > I think this was done so that conceptually "random" access to a specific > docid would be faster than eg. stored fields, but I think we should instead > target the DV datastructures towards real use cases > (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
[ https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712558#comment-13712558 ] Robert Muir commented on LUCENE-5119: - I dont plan to do this. Thats why we have a codec api... > DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory > -- > > Key: LUCENE-5119 > URL: https://issues.apache.org/jira/browse/LUCENE-5119 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-5119.patch > > > These are accessed sequentially when e.g. faceting, and can be a fairly large > amount of data (based on # of docs and # of unique terms). > I think this was done so that conceptually "random" access to a specific > docid would be faster than eg. stored fields, but I think we should instead > target the DV datastructures towards real use cases > (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
[ https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712551#comment-13712551 ] David Smiley commented on LUCENE-5119: -- Would it be easy to add random access as an option? Looking at your patch, which was pretty simple, it doesn't appear that it'd be hard to support random access should an application which to want this. A realistic example in my mind is a spatial filter in which a potentially large binary geometry representations of a shape is encoded for each document into DiskDV. Some fast leading filters narrow down the applicable documents but some documents shape geometry need to be consulted in the DiskDV afterwards. Does that make sense? > DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory > -- > > Key: LUCENE-5119 > URL: https://issues.apache.org/jira/browse/LUCENE-5119 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-5119.patch > > > These are accessed sequentially when e.g. faceting, and can be a fairly large > amount of data (based on # of docs and # of unique terms). > I think this was done so that conceptually "random" access to a specific > docid would be faster than eg. stored fields, but I think we should instead > target the DV datastructures towards real use cases > (faceting,sorting,grouping,...) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org