[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

2013-07-19 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713408#comment-13713408
 ] 

Adrien Grand commented on LUCENE-5119:
--

+1 I think it makes sense to make DiskDV deserve its name and store everything 
on disk.

 DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
 --

 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch


 These are accessed sequentially when e.g. faceting, and can be a fairly large 
 amount of data (based on # of docs and # of unique terms). 
 I think this was done so that conceptually random access to a specific 
 docid would be faster than eg. stored fields, but I think we should instead 
 target the DV datastructures towards real use cases 
 (faceting,sorting,grouping,...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

2013-07-19 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713414#comment-13713414
 ] 

Adrien Grand commented on LUCENE-5119:
--

David, I think your use-case would still work pretty well with this change. In 
particular, if you had enough memory to store your ordinals mapping in memory, 
this means that the file-system cache will likely be able to cache the whole 
ordinals mapping as well (you may just need to decrease a little the amount of 
memory given the the JVM) so random access should remain fast?

 DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
 --

 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch


 These are accessed sequentially when e.g. faceting, and can be a fairly large 
 amount of data (based on # of docs and # of unique terms). 
 I think this was done so that conceptually random access to a specific 
 docid would be faster than eg. stored fields, but I think we should instead 
 target the DV datastructures towards real use cases 
 (faceting,sorting,grouping,...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

2013-07-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713611#comment-13713611
 ] 

Michael McCandless commented on LUCENE-5119:


+1 to move ords to disk.

 DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
 --

 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch


 These are accessed sequentially when e.g. faceting, and can be a fairly large 
 amount of data (based on # of docs and # of unique terms). 
 I think this was done so that conceptually random access to a specific 
 docid would be faster than eg. stored fields, but I think we should instead 
 target the DV datastructures towards real use cases 
 (faceting,sorting,grouping,...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

2013-07-19 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713641#comment-13713641
 ] 

ASF subversion and git services commented on LUCENE-5119:
-

Commit 1504868 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1504868 ]

LUCENE-5119: DiskDV SortedDocValues shouldnt hold doc-to-ord in heap

 DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
 --

 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch


 These are accessed sequentially when e.g. faceting, and can be a fairly large 
 amount of data (based on # of docs and # of unique terms). 
 I think this was done so that conceptually random access to a specific 
 docid would be faster than eg. stored fields, but I think we should instead 
 target the DV datastructures towards real use cases 
 (faceting,sorting,grouping,...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

2013-07-19 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713648#comment-13713648
 ] 

ASF subversion and git services commented on LUCENE-5119:
-

Commit 1504873 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1504873 ]

LUCENE-5119: DiskDV SortedDocValues shouldnt hold doc-to-ord in heap

 DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
 --

 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch


 These are accessed sequentially when e.g. faceting, and can be a fairly large 
 amount of data (based on # of docs and # of unique terms). 
 I think this was done so that conceptually random access to a specific 
 docid would be faster than eg. stored fields, but I think we should instead 
 target the DV datastructures towards real use cases 
 (faceting,sorting,grouping,...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

2013-07-18 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712551#comment-13712551
 ] 

David Smiley commented on LUCENE-5119:
--

Would it be easy to add random access as an option?  Looking at your patch, 
which was pretty simple, it doesn't appear that it'd be hard to support random 
access should an application which to want this.

A realistic example in my mind is a spatial filter in which a potentially large 
binary geometry representations of a shape is encoded for each document into 
DiskDV.  Some fast leading filters narrow down the applicable documents but 
some documents shape geometry need to be consulted in the DiskDV afterwards.  
Does that make sense?

 DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
 --

 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch


 These are accessed sequentially when e.g. faceting, and can be a fairly large 
 amount of data (based on # of docs and # of unique terms). 
 I think this was done so that conceptually random access to a specific 
 docid would be faster than eg. stored fields, but I think we should instead 
 target the DV datastructures towards real use cases 
 (faceting,sorting,grouping,...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5119) DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory

2013-07-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712558#comment-13712558
 ] 

Robert Muir commented on LUCENE-5119:
-

I dont plan to do this. Thats why we have a codec api...

 DiskDV SortedDocValues shouldnt hold doc-to-ord in heap memory
 --

 Key: LUCENE-5119
 URL: https://issues.apache.org/jira/browse/LUCENE-5119
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5119.patch


 These are accessed sequentially when e.g. faceting, and can be a fairly large 
 amount of data (based on # of docs and # of unique terms). 
 I think this was done so that conceptually random access to a specific 
 docid would be faster than eg. stored fields, but I think we should instead 
 target the DV datastructures towards real use cases 
 (faceting,sorting,grouping,...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org