[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082898#comment-14082898 ] David Smiley commented on LUCENE-5156: -- I can understand why this change was done -- better to not support it than support something optional that should be implemented fast yet not do it fast. What if it were to be made fast, along with seekCeil() which is also implemented slowly right now too? For example, say the first time either seekCeil is called or an ord method is called, then build up an array of term start positions by ordinal, which otherwise wouldn't be done. Then you could do a binary search for seekCeil and a direct lookup for seekExact. The lazy-created array could also then be shared across repeated invocations to get Terms for the current document. Why bother, you might ask? I'm working on a means of having the Terms from term vectors be directly searched against by the default highlighter instead of re-inverting to MemoryIndex. I'll post a separate issue for that with code, of course, which works but isn't as efficient as it could be thanks to the O(N) of seekCeil on term vectors' Terms. CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5, 5.0 Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083026#comment-14083026 ] Robert Muir commented on LUCENE-5156: - Thats unrelated to term vectors. We shouldnt have such caching in the default codec, it can easily blow up on a large document. CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5, 5.0 Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083029#comment-14083029 ] Robert Muir commented on LUCENE-5156: - Personally i would do such a thing with a FilterTerms + FilterReader. you just check if docid == lastDocID and you have your cache thing. But i dont think it should be in the default codec. I also happen to think term vectors arent a good datastructure for highlighting anyway. CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5, 5.0 Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083065#comment-14083065 ] Robert Muir commented on LUCENE-5156: - I also think its ok if we fix the codec to have a faster seekExact (not by copying stuff into a large array on the first call though, just by fixing datastructure / how it access data). That would solve the actual problem here you have in a clean way. CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5, 5.0 Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083064#comment-14083064 ] David Smiley commented on LUCENE-5156: -- I agree on the caching thing -- that is, what I said in which you ask for Terms for the same document again. Never-mind that part -- as I thought about it I realized I didn't need that after all. bq. But i dont think it should be in the default codec. I also happen to think term vectors arent a good datastructure for highlighting anyway. The default highlighter fully respects the positions and other aspects of the user's query, unlike the other highlighters. Some applications demand that a highlight is accurate to the query, even if the query uses custom span queries that do tricks with payloads, etc. It would be nice if the other highlighters supported accurate highlights for such queries but they don't, so today, this is the applicable one for accurate highlights for complex queries. The default highlighter requires a Terms instance reflecting the current document -- it currently gets it via a re-inverting into a MemoryIndex but it can be hacked to accept a Terms directly from term vectors. So you don't like the idea of enhancing performance of term vector seekCeil in the default codec? Is that a -1 or -0? This change I propose seems harmless -- the code would not create build up the new offset array if consuming code doesn't call seekCeil or the ord methods. CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5, 5.0 Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083072#comment-14083072 ] Robert Muir commented on LUCENE-5156: - Sorry David, its not about being against speeding something up, its about how you propose implementing it. Copying all the data from the entire document into another array on the first read for the doc, that's a really trashy thing to do here. Instead, we should just fix it correctly, so that seekCeil() is not linear time. CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5, 5.0 Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083079#comment-14083079 ] Robert Muir commented on LUCENE-5156: - Also, this should be discussed somewhere else than on an unrelated, closed, year-old issue, like on its own issue. (Sorry, its not really related to seek-by-ord, your problem is a more general one, and it wasnt created by this issue, nor even by compressing term vectors but is older than that... this issue is closed) CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5, 5.0 Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730673#comment-13730673 ] Michael McCandless commented on LUCENE-5156: +1 CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730688#comment-13730688 ] Adrien Grand commented on LUCENE-5156: -- Patch looks good, +1 CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730869#comment-13730869 ] ASF subversion and git services commented on LUCENE-5156: - Commit 1511009 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1511009 ] LUCENE-5156: remove seek-by-ord from CompressingTermVectors CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730880#comment-13730880 ] ASF subversion and git services commented on LUCENE-5156: - Commit 1511014 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1511014 ] LUCENE-5156: remove seek-by-ord from CompressingTermVectors CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5156.patch Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5156) CompressingTermVectors termsEnum should probably not support seek-by-ord
[ https://issues.apache.org/jira/browse/LUCENE-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727443#comment-13727443 ] Adrien Grand commented on LUCENE-5156: -- +1 this is trappy CompressingTermVectors termsEnum should probably not support seek-by-ord Key: LUCENE-5156 URL: https://issues.apache.org/jira/browse/LUCENE-5156 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Just like term vectors before it, it has a O(n) seek-by-term. But this one also advertises a seek-by-ord, only this is also O(n). This could cause e.g. checkindex to be very slow, because if termsenum supports ord it does a bunch of seeking tests. (Another solution would be to leave it, and add a boolean so checkindex never does seeking tests for term vectors, only real fields). However, I think its also kinda a trap, in my opinion if seek-by-ord is supported anywhere, you kinda expect it to be faster than linear time...? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org