[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join

2015-04-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482924#comment-14482924
 ] 

ASF subversion and git services commented on LUCENE-6352:
-

Commit 1671777 from [~martijn.v.groningen] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1671777 ]

LUCENE-6352: Improved tests for global ordinal join

 Add global ordinal based query time join 
 -

 Key: LUCENE-6352
 URL: https://issues.apache.org/jira/browse/LUCENE-6352
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
 Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, 
 LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch


 Global ordinal based query time join as an alternative to the current query 
 time join. The implementation is faster for subsequent joins between reopens, 
 but requires an OrdinalMap to be built.
 This join has certain restrictions and requirements:
 * A document can only refer to on other document. (but can be referred by one 
 or more documents)
 * A type field must exist on all documents and each document must be 
 categorized to a type. This is to distingues between the from and to side.
 * There must be a single sorted doc values field use by both the from and 
 to documents. By encoding join into a single doc values field it is trival 
 to build an ordinals map from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join

2015-04-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482915#comment-14482915
 ] 

ASF subversion and git services commented on LUCENE-6352:
-

Commit 1671774 from [~martijn.v.groningen] in branch 'dev/trunk'
[ https://svn.apache.org/r1671774 ]

LUCENE-6352: Improved tests for global ordinal join

 Add global ordinal based query time join 
 -

 Key: LUCENE-6352
 URL: https://issues.apache.org/jira/browse/LUCENE-6352
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
 Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, 
 LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch


 Global ordinal based query time join as an alternative to the current query 
 time join. The implementation is faster for subsequent joins between reopens, 
 but requires an OrdinalMap to be built.
 This join has certain restrictions and requirements:
 * A document can only refer to on other document. (but can be referred by one 
 or more documents)
 * A type field must exist on all documents and each document must be 
 categorized to a type. This is to distingues between the from and to side.
 * There must be a single sorted doc values field use by both the from and 
 to documents. By encoding join into a single doc values field it is trival 
 to build an ordinals map from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join

2015-04-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393544#comment-14393544
 ] 

ASF subversion and git services commented on LUCENE-6352:
-

Commit 1670990 from [~martijn.v.groningen] in branch 'dev/trunk'
[ https://svn.apache.org/r1670990 ]

LUCENE-6352: Added a new query time join to the join module that uses global 
ordinals, which is faster for subsequent joins between reopens.

 Add global ordinal based query time join 
 -

 Key: LUCENE-6352
 URL: https://issues.apache.org/jira/browse/LUCENE-6352
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
 Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, 
 LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch


 Global ordinal based query time join as an alternative to the current query 
 time join. The implementation is faster for subsequent joins between reopens, 
 but requires an OrdinalMap to be built.
 This join has certain restrictions and requirements:
 * A document can only refer to on other document. (but can be referred by one 
 or more documents)
 * A type field must exist on all documents and each document must be 
 categorized to a type. This is to distingues between the from and to side.
 * There must be a single sorted doc values field use by both the from and 
 to documents. By encoding join into a single doc values field it is trival 
 to build an ordinals map from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join

2015-04-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393569#comment-14393569
 ] 

ASF subversion and git services commented on LUCENE-6352:
-

Commit 1670991 from [~martijn.v.groningen] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1670991 ]

LUCENE-6352: Added a new query time join to the join module that uses global 
ordinals, which is faster for subsequent joins between reopens.

 Add global ordinal based query time join 
 -

 Key: LUCENE-6352
 URL: https://issues.apache.org/jira/browse/LUCENE-6352
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
 Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, 
 LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch


 Global ordinal based query time join as an alternative to the current query 
 time join. The implementation is faster for subsequent joins between reopens, 
 but requires an OrdinalMap to be built.
 This join has certain restrictions and requirements:
 * A document can only refer to on other document. (but can be referred by one 
 or more documents)
 * A type field must exist on all documents and each document must be 
 categorized to a type. This is to distingues between the from and to side.
 * There must be a single sorted doc values field use by both the from and 
 to documents. By encoding join into a single doc values field it is trival 
 to build an ordinals map from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join

2015-04-01 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390204#comment-14390204
 ] 

Adrien Grand commented on LUCENE-6352:
--

Or maybe we could just document that this feature expects that the join field 
stores utf8 string values?

 Add global ordinal based query time join 
 -

 Key: LUCENE-6352
 URL: https://issues.apache.org/jira/browse/LUCENE-6352
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
 Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, 
 LUCENE-6352.patch


 Global ordinal based query time join as an alternative to the current query 
 time join. The implementation is faster for subsequent joins between reopens, 
 but requires an OrdinalMap to be built.
 This join has certain restrictions and requirements:
 * A document can only refer to on other document. (but can be referred by one 
 or more documents)
 * A type field must exist on all documents and each document must be 
 categorized to a type. This is to distingues between the from and to side.
 * There must be a single sorted doc values field use by both the from and 
 to documents. By encoding join into a single doc values field it is trival 
 to build an ordinals map from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join

2015-04-01 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390223#comment-14390223
 ] 

Adrien Grand commented on LUCENE-6352:
--

Just had another look at the patch and found two issues:
 - Occurrences still allocates blocks using MAX_VALUE instead of the number of 
docs per segment
 - Scores allocates using '(valueCount + arraySize - 1) / arraySize' but I 
think we need to cast to a long before the addition and then back to an int 
after the division in order to avoid overflows if the doc count in the segment 
is greater than MAX_VALUE - arraySize. So this would be: '(int) (((long) 
valueCount + arraySize - 1) / arraySize)'

Otherwise +1 to commit! This is interesting usage of two-phase iteration.


 Add global ordinal based query time join 
 -

 Key: LUCENE-6352
 URL: https://issues.apache.org/jira/browse/LUCENE-6352
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
 Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch, 
 LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch


 Global ordinal based query time join as an alternative to the current query 
 time join. The implementation is faster for subsequent joins between reopens, 
 but requires an OrdinalMap to be built.
 This join has certain restrictions and requirements:
 * A document can only refer to on other document. (but can be referred by one 
 or more documents)
 * A type field must exist on all documents and each document must be 
 categorized to a type. This is to distingues between the from and to side.
 * There must be a single sorted doc values field use by both the from and 
 to documents. By encoding join into a single doc values field it is trival 
 to build an ordinals map from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6352) Add global ordinal based query time join

2015-03-30 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386461#comment-14386461
 ] 

Adrien Grand commented on LUCENE-6352:
--

Thanks Martijn! I had a look at the patch it looks very clean, I like it.

{code}
Query rewrittenFromQuery = fromQuery.rewrite(indexReader); (JoinUtil.java)
{code}

I think you should rather call searcher.rewrite(fromQuery) here, which will 
take care of rewriting until rewrite returns 'this'.

{code}
final float[][] blocks = new float[Integer.MAX_VALUE / arraySize][];
{code}

Instead of allocating based on Integer.MAX_VALUE, maybe it should use the 
number of unique values? ie. '(int) (((long) valueCount + arraySize - 1) / 
arraySize)' ?

{code}
return new ComplexExplanation(true, score, Score based on join value  + 
joinValue.utf8ToString());
{code}

I don't think it is safe to convert to a string as we have no idea whether the 
value represents an utf8 string?

In BaseGlobalOrdinalScorer, you are caching the current doc ID, maybe we should 
not? When I worked on approximations, caching the current doc ID proved to be 
quite error-prone and it was often better to just call approximation.docID() 
when the current doc ID was needed.

Another thing I'm wondering about is the equals/hashCode impl of this global 
ordinal query: since documents that match depend on what happens in other 
segments, this query cannot be cached per segment. So maybe it should include 
the current IndexReader in its equals/hashCode comparison in order to work 
correctly with query caches? In the read-only case, this would still allow this 
query to be cached since the current reader never changes while in the 
read/write case this query will unlikely be cached given that the query cache 
will notice that it does not get reused?

 Add global ordinal based query time join 
 -

 Key: LUCENE-6352
 URL: https://issues.apache.org/jira/browse/LUCENE-6352
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Martijn van Groningen
 Attachments: LUCENE-6352.patch, LUCENE-6352.patch, LUCENE-6352.patch


 Global ordinal based query time join as an alternative to the current query 
 time join. The implementation is faster for subsequent joins between reopens, 
 but requires an OrdinalMap to be built.
 This join has certain restrictions and requirements:
 * A document can only refer to on other document. (but can be referred by one 
 or more documents)
 * A type field must exist on all documents and each document must be 
 categorized to a type. This is to distingues between the from and to side.
 * There must be a single sorted doc values field use by both the from and 
 to documents. By encoding join into a single doc values field it is trival 
 to build an ordinals map from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org