[jira] [Commented] (LUCENE-5148) SortedSetDocValues caching / state
[ https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747532#comment-13747532 ] Robert Muir commented on LUCENE-5148: - Right: I'm still convinced the trap only impacts committers writing unit tests that compare against slow-wrappers :) The patch seems to have a very large amount of changes for such a small thing... is there some reformatting happening? If we can't implement this without major changes: then I dont think we should do it. SortedSetDocValues caching / state -- Key: LUCENE-5148 URL: https://issues.apache.org/jira/browse/LUCENE-5148 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5148.patch I just spent some time digging into a bug which was due to the fact that SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per thread. So if you try to get two instances from the same field in the same thread, you will actually get the same instance and won't be able to iterate over ords of two documents in parallel. This is not necessarily a bug, this behavior can be documented, but I think it would be nice if the API could prevent from such mistakes by storing the state in a separate object or cloning the SortedSetDocValues object in SegmentCoreReaders.getSortedSetDocValues? What do you think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5148) SortedSetDocValues caching / state
[ https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747538#comment-13747538 ] Robert Muir commented on LUCENE-5148: - and FieldCache should be consistent as well. SortedSetDocValues caching / state -- Key: LUCENE-5148 URL: https://issues.apache.org/jira/browse/LUCENE-5148 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5148.patch I just spent some time digging into a bug which was due to the fact that SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per thread. So if you try to get two instances from the same field in the same thread, you will actually get the same instance and won't be able to iterate over ords of two documents in parallel. This is not necessarily a bug, this behavior can be documented, but I think it would be nice if the API could prevent from such mistakes by storing the state in a separate object or cloning the SortedSetDocValues object in SegmentCoreReaders.getSortedSetDocValues? What do you think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5148) SortedSetDocValues caching / state
[ https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722449#comment-13722449 ] Simon Willnauer commented on LUCENE-5148: - +1 on removing the trap. Yet, it would be nice to make this object entirely stateless if possible. I can think of 2 options: {noformat} public LongsRef getOrds(int docId, LongsRef spare) {noformat} this has the advantage that we can easily reuse a LongsRef on top which is kind of consistent with other API in Lucene or maybe add an OrdsIterator like this {noformat} public OrdsIter getOrds(int docId, OrdsIter spare) // Iterate like this: int ord; while( (ord = ordsIter.nextOrd()) != NO_MORE_ORDS) { ... } {noformat} mainly thinking about consistency regarding other apis here but I don't like the stateful API we have right now. SortedSetDocValues caching / state -- Key: LUCENE-5148 URL: https://issues.apache.org/jira/browse/LUCENE-5148 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor I just spent some time digging into a bug which was due to the fact that SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per thread. So if you try to get two instances from the same field in the same thread, you will actually get the same instance and won't be able to iterate over ords of two documents in parallel. This is not necessarily a bug, this behavior can be documented, but I think it would be nice if the API could prevent from such mistakes by storing the state in a separate object or cloning the SortedSetDocValues object in SegmentCoreReaders.getSortedSetDocValues? What do you think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5148) SortedSetDocValues caching / state
[ https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722475#comment-13722475 ] Robert Muir commented on LUCENE-5148: - these other options have downsides too. LongsRef has all the disadvantages of the *Ref APIs (e.g. reuse bugs), also requires reading all the ordinals into RAM at once. Adding an additional iterator just pushes the problem into a different place to me, and makes the api more complex. The current threadlocal + state is at least simple, consistent with all of the other docvalues, and documented that it works this way. If we want to change the API, then I think we need to consider all of these issues. SortedSetDocValues caching / state -- Key: LUCENE-5148 URL: https://issues.apache.org/jira/browse/LUCENE-5148 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor I just spent some time digging into a bug which was due to the fact that SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per thread. So if you try to get two instances from the same field in the same thread, you will actually get the same instance and won't be able to iterate over ords of two documents in parallel. This is not necessarily a bug, this behavior can be documented, but I think it would be nice if the API could prevent from such mistakes by storing the state in a separate object or cloning the SortedSetDocValues object in SegmentCoreReaders.getSortedSetDocValues? What do you think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5148) SortedSetDocValues caching / state
[ https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722481#comment-13722481 ] Robert Muir commented on LUCENE-5148: - {quote} This is not necessarily a bug, this behavior can be documented, but I think it would be nice if the API could prevent from such mistakes by storing the state in a separate object or cloning the SortedSetDocValues object in SegmentCoreReaders.getSortedSetDocValues? {quote} An auto-clone could also cause traps, e.g. if someone is calling this method multiple times and its refilling buffers and so on. But adding clone to the api (so someone could do this explicitly for these expert cases) might be a good solution too. SortedSetDocValues caching / state -- Key: LUCENE-5148 URL: https://issues.apache.org/jira/browse/LUCENE-5148 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor I just spent some time digging into a bug which was due to the fact that SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per thread. So if you try to get two instances from the same field in the same thread, you will actually get the same instance and won't be able to iterate over ords of two documents in parallel. This is not necessarily a bug, this behavior can be documented, but I think it would be nice if the API could prevent from such mistakes by storing the state in a separate object or cloning the SortedSetDocValues object in SegmentCoreReaders.getSortedSetDocValues? What do you think? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org