[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join
[ https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482924#comment-14482924 ] ASF subversion and git services commented on LUCENE-6352: - Commit 1671777 from [~martijn.v.groningen] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1671777 ] LUCENE-6352: Improved tests for global ordinal join Add global ordinal based query time join - Key: LUCENE-6352 URL: https://issues.apache.org/jira/browse/LUCENE-6352 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch Global ordinal based query time join as an alternative to the current query time join. The implementation is faster for subsequent joins between reopens, but requires an OrdinalMap to be built. This join has certain restrictions and requirements: * A document can only refer to on other document. (but can be referred by one or more documents) * A type field must exist on all documents and each document must be categorized to a type. This is to distingues between the from and to side. * There must be a single sorted doc values field use by both the from and to documents. By encoding join into a single doc values field it is trival to build an ordinals map from it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join
[ https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482915#comment-14482915 ] ASF subversion and git services commented on LUCENE-6352: - Commit 1671774 from [~martijn.v.groningen] in branch 'dev/trunk' [ https://svn.apache.org/r1671774 ] LUCENE-6352: Improved tests for global ordinal join Add global ordinal based query time join - Key: LUCENE-6352 URL: https://issues.apache.org/jira/browse/LUCENE-6352 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch Global ordinal based query time join as an alternative to the current query time join. The implementation is faster for subsequent joins between reopens, but requires an OrdinalMap to be built. This join has certain restrictions and requirements: * A document can only refer to on other document. (but can be referred by one or more documents) * A type field must exist on all documents and each document must be categorized to a type. This is to distingues between the from and to side. * There must be a single sorted doc values field use by both the from and to documents. By encoding join into a single doc values field it is trival to build an ordinals map from it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join
[ https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393544#comment-14393544 ] ASF subversion and git services commented on LUCENE-6352: - Commit 1670990 from [~martijn.v.groningen] in branch 'dev/trunk' [ https://svn.apache.org/r1670990 ] LUCENE-6352: Added a new query time join to the join module that uses global ordinals, which is faster for subsequent joins between reopens. Add global ordinal based query time join - Key: LUCENE-6352 URL: https://issues.apache.org/jira/browse/LUCENE-6352 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch Global ordinal based query time join as an alternative to the current query time join. The implementation is faster for subsequent joins between reopens, but requires an OrdinalMap to be built. This join has certain restrictions and requirements: * A document can only refer to on other document. (but can be referred by one or more documents) * A type field must exist on all documents and each document must be categorized to a type. This is to distingues between the from and to side. * There must be a single sorted doc values field use by both the from and to documents. By encoding join into a single doc values field it is trival to build an ordinals map from it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join
[ https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393569#comment-14393569 ] ASF subversion and git services commented on LUCENE-6352: - Commit 1670991 from [~martijn.v.groningen] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1670991 ] LUCENE-6352: Added a new query time join to the join module that uses global ordinals, which is faster for subsequent joins between reopens. Add global ordinal based query time join - Key: LUCENE-6352 URL: https://issues.apache.org/jira/browse/LUCENE-6352 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch Global ordinal based query time join as an alternative to the current query time join. The implementation is faster for subsequent joins between reopens, but requires an OrdinalMap to be built. This join has certain restrictions and requirements: * A document can only refer to on other document. (but can be referred by one or more documents) * A type field must exist on all documents and each document must be categorized to a type. This is to distingues between the from and to side. * There must be a single sorted doc values field use by both the from and to documents. By encoding join into a single doc values field it is trival to build an ordinals map from it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join
[ https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390204#comment-14390204 ] Adrien Grand commented on LUCENE-6352: -- Or maybe we could just document that this feature expects that the join field stores utf8 string values? Add global ordinal based query time join - Key: LUCENE-6352 URL: https://issues.apache.org/jira/browse/LUCENE-6352 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch Global ordinal based query time join as an alternative to the current query time join. The implementation is faster for subsequent joins between reopens, but requires an OrdinalMap to be built. This join has certain restrictions and requirements: * A document can only refer to on other document. (but can be referred by one or more documents) * A type field must exist on all documents and each document must be categorized to a type. This is to distingues between the from and to side. * There must be a single sorted doc values field use by both the from and to documents. By encoding join into a single doc values field it is trival to build an ordinals map from it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join
[ https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390223#comment-14390223 ] Adrien Grand commented on LUCENE-6352: -- Just had another look at the patch and found two issues: - Occurrences still allocates blocks using MAX_VALUE instead of the number of docs per segment - Scores allocates using '(valueCount + arraySize - 1) / arraySize' but I think we need to cast to a long before the addition and then back to an int after the division in order to avoid overflows if the doc count in the segment is greater than MAX_VALUE - arraySize. So this would be: '(int) (((long) valueCount + arraySize - 1) / arraySize)' Otherwise +1 to commit! This is interesting usage of two-phase iteration. Add global ordinal based query time join - Key: LUCENE-6352 URL: https://issues.apache.org/jira/browse/LUCENE-6352 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch Global ordinal based query time join as an alternative to the current query time join. The implementation is faster for subsequent joins between reopens, but requires an OrdinalMap to be built. This join has certain restrictions and requirements: * A document can only refer to on other document. (but can be referred by one or more documents) * A type field must exist on all documents and each document must be categorized to a type. This is to distingues between the from and to side. * There must be a single sorted doc values field use by both the from and to documents. By encoding join into a single doc values field it is trival to build an ordinals map from it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join
[ https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386461#comment-14386461 ] Adrien Grand commented on LUCENE-6352: -- Thanks Martijn! I had a look at the patch it looks very clean, I like it. {code} Query rewrittenFromQuery = fromQuery.rewrite(indexReader); (JoinUtil.java) {code} I think you should rather call searcher.rewrite(fromQuery) here, which will take care of rewriting until rewrite returns 'this'. {code} final float[][] blocks = new float[Integer.MAX_VALUE / arraySize][]; {code} Instead of allocating based on Integer.MAX_VALUE, maybe it should use the number of unique values? ie. '(int) (((long) valueCount + arraySize - 1) / arraySize)' ? {code} return new ComplexExplanation(true, score, Score based on join value + joinValue.utf8ToString()); {code} I don't think it is safe to convert to a string as we have no idea whether the value represents an utf8 string? In BaseGlobalOrdinalScorer, you are caching the current doc ID, maybe we should not? When I worked on approximations, caching the current doc ID proved to be quite error-prone and it was often better to just call approximation.docID() when the current doc ID was needed. Another thing I'm wondering about is the equals/hashCode impl of this global ordinal query: since documents that match depend on what happens in other segments, this query cannot be cached per segment. So maybe it should include the current IndexReader in its equals/hashCode comparison in order to work correctly with query caches? In the read-only case, this would still allow this query to be cached since the current reader never changes while in the read/write case this query will unlikely be cached given that the query cache will notice that it does not get reused? Add global ordinal based query time join - Key: LUCENE-6352 URL: https://issues.apache.org/jira/browse/LUCENE-6352 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch Global ordinal based query time join as an alternative to the current query time join. The implementation is faster for subsequent joins between reopens, but requires an OrdinalMap to be built. This join has certain restrictions and requirements: * A document can only refer to on other document. (but can be referred by one or more documents) * A type field must exist on all documents and each document must be categorized to a type. This is to distingues between the from and to side. * There must be a single sorted doc values field use by both the from and to documents. By encoding join into a single doc values field it is trival to build an ordinals map from it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org