[jira] [Commented] (LUCENE-5148) SortedSetDocValues caching / state

2013-08-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747532#comment-13747532
 ] 

Robert Muir commented on LUCENE-5148:
-

Right: I'm still convinced the trap only impacts committers writing unit tests 
that compare against slow-wrappers :)

The patch seems to have a very large amount of changes for such a small 
thing... is there some reformatting happening?

If we can't implement this without major changes: then I dont think we should 
do it.

 SortedSetDocValues caching / state
 --

 Key: LUCENE-5148
 URL: https://issues.apache.org/jira/browse/LUCENE-5148
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5148.patch


 I just spent some time digging into a bug which was due to the fact that 
 SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per 
 thread. So if you try to get two instances from the same field in the same 
 thread, you will actually get the same instance and won't be able to iterate 
 over ords of two documents in parallel.
 This is not necessarily a bug, this behavior can be documented, but I think 
 it would be nice if the API could prevent from such mistakes by storing the 
 state in a separate object or cloning the SortedSetDocValues object in 
 SegmentCoreReaders.getSortedSetDocValues?
 What do you think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5148) SortedSetDocValues caching / state

2013-08-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747538#comment-13747538
 ] 

Robert Muir commented on LUCENE-5148:
-

and FieldCache should be consistent as well.

 SortedSetDocValues caching / state
 --

 Key: LUCENE-5148
 URL: https://issues.apache.org/jira/browse/LUCENE-5148
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5148.patch


 I just spent some time digging into a bug which was due to the fact that 
 SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per 
 thread. So if you try to get two instances from the same field in the same 
 thread, you will actually get the same instance and won't be able to iterate 
 over ords of two documents in parallel.
 This is not necessarily a bug, this behavior can be documented, but I think 
 it would be nice if the API could prevent from such mistakes by storing the 
 state in a separate object or cloning the SortedSetDocValues object in 
 SegmentCoreReaders.getSortedSetDocValues?
 What do you think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5148) SortedSetDocValues caching / state

2013-07-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722449#comment-13722449
 ] 

Simon Willnauer commented on LUCENE-5148:
-

+1 on removing the trap. Yet, it would be nice to make this object entirely 
stateless if possible. I can think of 2 options:

{noformat}

public LongsRef getOrds(int docId, LongsRef spare)

{noformat}

this has the advantage that we can easily reuse a LongsRef on top which is kind 
of consistent with other API in Lucene 

or maybe add an OrdsIterator like this

{noformat}

public OrdsIter getOrds(int docId, OrdsIter spare)

// Iterate like this:
int ord;
while( (ord = ordsIter.nextOrd()) != NO_MORE_ORDS) {
  ...
}
{noformat}

mainly thinking about consistency regarding other apis here but I don't like 
the stateful API we have right now.

 SortedSetDocValues caching / state
 --

 Key: LUCENE-5148
 URL: https://issues.apache.org/jira/browse/LUCENE-5148
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 I just spent some time digging into a bug which was due to the fact that 
 SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per 
 thread. So if you try to get two instances from the same field in the same 
 thread, you will actually get the same instance and won't be able to iterate 
 over ords of two documents in parallel.
 This is not necessarily a bug, this behavior can be documented, but I think 
 it would be nice if the API could prevent from such mistakes by storing the 
 state in a separate object or cloning the SortedSetDocValues object in 
 SegmentCoreReaders.getSortedSetDocValues?
 What do you think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5148) SortedSetDocValues caching / state

2013-07-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722475#comment-13722475
 ] 

Robert Muir commented on LUCENE-5148:
-

these other options have downsides too.

LongsRef has all the disadvantages of the *Ref APIs (e.g. reuse bugs), also 
requires reading all the ordinals into RAM at once.

Adding an additional iterator just pushes the problem into a different place to 
me, and makes the api more complex.

The current threadlocal + state is at least simple, consistent with all of the 
other docvalues, and documented that it works this way.

If we want to change the API, then I think we need to consider all of these 
issues.

 SortedSetDocValues caching / state
 --

 Key: LUCENE-5148
 URL: https://issues.apache.org/jira/browse/LUCENE-5148
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 I just spent some time digging into a bug which was due to the fact that 
 SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per 
 thread. So if you try to get two instances from the same field in the same 
 thread, you will actually get the same instance and won't be able to iterate 
 over ords of two documents in parallel.
 This is not necessarily a bug, this behavior can be documented, but I think 
 it would be nice if the API could prevent from such mistakes by storing the 
 state in a separate object or cloning the SortedSetDocValues object in 
 SegmentCoreReaders.getSortedSetDocValues?
 What do you think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5148) SortedSetDocValues caching / state

2013-07-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722481#comment-13722481
 ] 

Robert Muir commented on LUCENE-5148:
-

{quote}
This is not necessarily a bug, this behavior can be documented, but I think it 
would be nice if the API could prevent from such mistakes by storing the state 
in a separate object or cloning the SortedSetDocValues object in 
SegmentCoreReaders.getSortedSetDocValues?
{quote}

An auto-clone could also cause traps, e.g. if someone is calling this method 
multiple times and its refilling buffers and so on. 

But adding clone to the api (so someone could do this explicitly for these 
expert cases) might be a good solution too.

 SortedSetDocValues caching / state
 --

 Key: LUCENE-5148
 URL: https://issues.apache.org/jira/browse/LUCENE-5148
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 I just spent some time digging into a bug which was due to the fact that 
 SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per 
 thread. So if you try to get two instances from the same field in the same 
 thread, you will actually get the same instance and won't be able to iterate 
 over ords of two documents in parallel.
 This is not necessarily a bug, this behavior can be documented, but I think 
 it would be nice if the API could prevent from such mistakes by storing the 
 state in a separate object or cloning the SortedSetDocValues object in 
 SegmentCoreReaders.getSortedSetDocValues?
 What do you think?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org