[jira] [Commented] (SOLR-4787) Join Contrib

2017-05-10 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004864#comment-16004864
 ] 

Erick Erickson commented on SOLR-4787:
--

Samuel:

In a word "no". Patches are built against whatever version of the source 
happens to be current when developing them. Until they're committed, no jars 
are built or archived. Once committed, the JIRA will be closed and marked 
"fixed". This particular source patch is over 2 years old and hasn't been kept 
up to date for whatever reason.

Your choices are
1> check out code from that time frame
2> update the patch to build against the current code base.

If you do the latter, please contribute the patch back!

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch, 
> SOLR-4787-with-testcase-fix.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> 

[jira] [Commented] (SOLR-4787) Join Contrib

2017-05-10 Thread Samuel Tatipamula (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004758#comment-16004758
 ] 

Samuel Tatipamula commented on SOLR-4787:
-

Not able to build jar using patches . Can I get solr join contrib jar ? 

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch, 
> SOLR-4787-with-testcase-fix.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> hjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib lib jars must be registed in the solrconfig.xml.
>  
> After issuing the "ant dist" command from inside the solr directory the joins 
> contrib 

[jira] [Commented] (SOLR-4787) Join Contrib

2016-09-20 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507330#comment-15507330
 ] 

Mikhail Khludnev commented on SOLR-4787:


right. it seems SOLR-8395 sank. You can estimate score=none search time. It's 
dominated by from side result size. You need to run from side query, then build 
long \{!terms} query from result keys, and then measure its' execution time. 
Sum of these times gives you rough JoinUtil query time, and let judge whether 
or not to reindex.  

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must 

[jira] [Commented] (SOLR-4787) Join Contrib

2016-09-20 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507057#comment-15507057
 ] 

David Smiley commented on SOLR-4787:


The algorithm in JoinUtil is very different from Solr's default join.  IMO 
JoinUtil will be faster for most use-cases but understand it definitely 
depends.  I've twice added a QParser to use JoinUtil in my projects.

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> hjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib lib jars 

[jira] [Commented] (SOLR-4787) Join Contrib

2016-09-20 Thread Vadim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506968#comment-15506968
 ] 

Vadim Ivanov commented on SOLR-4787:


Hi, Mikhail
Recently I've installed SOLR 6.2 and did several tests.
score=none parameter is very tricky. As you 've mentioned here 
http://blog-archive.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
"Enabling score and even specifying {!join … score=none}.. picks the different 
algorithm ... which expects "from" field to be string docValues".
So, as my id is NUMERIC (bjoin requires it) I receive error: "unexpected 
docvalues type NUMERIC for field 'id' (expected one of [SORTED, SORTED_SET]). 
Re-index with correct docvalues type."
Do you think that changing schema (id type to string docvalues) and further 
reindexing worth doing? And  Lucene’s JoinUtil with clause score=none might 
perform considerably better than the original Solr’s algorithm?

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1=group:5
> The example filter query above will search the fromIndex (collection2) 

[jira] [Commented] (SOLR-4787) Join Contrib

2016-09-19 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503587#comment-15503587
 ] 

Mikhail Khludnev commented on SOLR-4787:


sure. now we have a wunderwaffe: *filter()*
ie. 
{code}
..=-{!join from=id fromIndex=hdq to=hdquotes 
v=$hqquery}=qt_release:1 filter(qt_cnid:4 AND qt_disabled:0)...
{code}

beware that space in subordinate clause after \{!...} sometimes might not be 
recognized, thus I used {{v=$hqquery}} trick

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> hjoin.
>  

[jira] [Commented] (SOLR-4787) Join Contrib

2016-09-19 Thread Vadim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503559#comment-15503559
 ] 

Vadim Ivanov commented on SOLR-4787:


Thak you, Mikhail.
But I still have some doubts:
1. Will SOLR use filter cache, when "join" is inside ? It seems to me that 
regular join is not cached. 
2. Bjoin has fq sentence and, for example, clause like this could be written:

...=-{!bjoin from=id fromIndex=hdq to=hdquotes fq=$qq}qt_release:1
=(qt_cnid:4 AND qt_disabled:0)...

Regular solr join is without fq, so cold you drop a hint of rewriting this 
clause?



> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be 

[jira] [Commented] (SOLR-4787) Join Contrib

2016-09-19 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503138#comment-15503138
 ] 

Mikhail Khludnev commented on SOLR-4787:


Note: adding {{score=none}} as a local parameter can speedup join on extremely 
large indexes. 

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> hjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib lib jars must be registed in the solrconfig.xml.
>  
> After issuing the "ant dist" command from inside the solr directory 

[jira] [Commented] (SOLR-4787) Join Contrib

2016-09-19 Thread Vadim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15503072#comment-15503072
 ] 

Vadim Ivanov commented on SOLR-4787:


How to upgrade to the latest version of SOLR and be able to perform fast joins 
? 
(Bjoin or something as fast as it)
SOLR is developing and 6.x version is already available but I'm stuck on 4.10 
because of joins
I’m using bjoins to join collections with 2-3 billion documents (set of 
products) to collections of availability (less than 10 million documents)  and 
managed to get response time about 200 ms, which is acceptable. 
But, using regular joins, leads to response time greater than 10 sec
Are there any changes in 6x versions allowing join to be as fast as bjoin?


> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 6.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the 

[jira] [Commented] (SOLR-4787) Join Contrib

2015-12-14 Thread Marcus Bergner (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056220#comment-15056220
 ] 

Marcus Bergner commented on SOLR-4787:
--

I've been trying out the patches in this ticket for a while (with Solr 4.9.1) 
and most recently the patches by [~krantiparisa] that uses UnInvertedLongField 
to handle multi-value "to"/"from" in a hjoin query. After stumbling on a couple 
of issues I think I have something that works reasonably well. The last thing 
that had me puzzled was that after indexing slightly more data than a very 
trivial test set in the index I was getting NullPointerException in the 
UnInvertedLongField constructor:

{noformat}
Terms terms = reader.terms(field);  // returns null
...
TermsEnum termsEnum = terms.iterator(null);   // throws NullPointerException
{noformat}

I've added various null checks in the code, but I don't really understand 
how/why reader.terms(field) would ever return null if the field name is defined 
in solrconfig.xml? Also, how bad would it be to simply have empty arrays in 
UnInvertedLongField and ignore any docId values >= length of the internal 
arrays if reader.terms(field) for some reason does return null? My tests so far 
look promising with such a patch but it gives me a somewhat bad feeling.

This was my tiny test hierarchy that worked, the ids were large integers (19 
digits).

||ID||Members||Comment||
|id1|id2|Level A|
|id2|id3, id4|Level B|
|id3| |Level C|
|id4| |Level C|
|id5| |Level C, no parent, added later after initial tests|

Using the above documents 1-4 a \{!hjoin fromIndex=coll from=memberslvlB 
to=idlvlC\}... worked and swapping to/from for the reversed search also worked. 
Adding an additional document to the index (id5) that was not a member of level 
B in this case caused the same queries to fail with NullPointerException in 
UnInvertedLongField constructor. Note that each "level" in the hierarchy here 
has their own field names both for their id fields and member list field 
(basically "$type.id" and "$type.member"). I first thought it could be related 
to dynamic fields but after changing my indexing and Solr schema to use real 
fields I could see the same problem.


> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: Trunk
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with 

Re: [jira] [Commented] (SOLR-4787) Join Contrib

2015-12-14 Thread Erick Erickson
bq: reader.terms(field) would ever return null if the field name is
defined in solrconfig.xml

I'm a little out of my depth here, but assuming you mean schema.xml above, just
because a field is defined there doesn't mean that it's ever actually
used in any
of the index structures unless at least one document adds a value for
that field.
Lucene doesn't know that schema.xml exists at all, that's a convention imposed
by Solr. Could it be that no document in your corpus has any term for
that particular
field?

Best,
Erick

On Mon, Dec 14, 2015 at 8:24 AM, Marcus Bergner (JIRA)  wrote:
>
> [ 
> https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056220#comment-15056220
>  ]
>
> Marcus Bergner commented on SOLR-4787:
> --
>
> I've been trying out the patches in this ticket for a while (with Solr 4.9.1) 
> and most recently the patches by [~krantiparisa] that uses 
> UnInvertedLongField to handle multi-value "to"/"from" in a hjoin query. After 
> stumbling on a couple of issues I think I have something that works 
> reasonably well. The last thing that had me puzzled was that after indexing 
> slightly more data than a very trivial test set in the index I was getting 
> NullPointerException in the UnInvertedLongField constructor:
>
> {noformat}
> Terms terms = reader.terms(field);  // returns null
> ...
> TermsEnum termsEnum = terms.iterator(null);   // throws NullPointerException
> {noformat}
>
> I've added various null checks in the code, but I don't really understand 
> how/why reader.terms(field) would ever return null if the field name is 
> defined in solrconfig.xml? Also, how bad would it be to simply have empty 
> arrays in UnInvertedLongField and ignore any docId values >= length of the 
> internal arrays if reader.terms(field) for some reason does return null? My 
> tests so far look promising with such a patch but it gives me a somewhat bad 
> feeling.
>
> This was my tiny test hierarchy that worked, the ids were large integers (19 
> digits).
>
> ||ID||Members||Comment||
> |id1|id2|Level A|
> |id2|id3, id4|Level B|
> |id3| |Level C|
> |id4| |Level C|
> |id5| |Level C, no parent, added later after initial tests|
>
> Using the above documents 1-4 a \{!hjoin fromIndex=coll from=memberslvlB 
> to=idlvlC\}... worked and swapping to/from for the reversed search also 
> worked. Adding an additional document to the index (id5) that was not a 
> member of level B in this case caused the same queries to fail with 
> NullPointerException in UnInvertedLongField constructor. Note that each 
> "level" in the hierarchy here has their own field names both for their id 
> fields and member list field (basically "$type.id" and "$type.member"). I 
> first thought it could be related to dynamic fields but after changing my 
> indexing and Solr schema to use real fields I could see the same problem.
>
>
>> Join Contrib
>> 
>>
>> Key: SOLR-4787
>> URL: https://issues.apache.org/jira/browse/SOLR-4787
>> Project: Solr
>>  Issue Type: New Feature
>>  Components: search
>>Affects Versions: 4.2.1
>>Reporter: Joel Bernstein
>>Priority: Minor
>> Fix For: Trunk
>>
>> Attachments: SOLR-4787-deadlock-fix.patch, 
>> SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
>> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
>> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
>> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
>> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
>> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
>> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>>
>>
>> This contrib provides a place where different join implementations can be 
>> contributed to Solr. This contrib currently includes 3 join implementations. 
>> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
>> the FieldCache API this patch will only build with Solr 4.2 or above.
>> *HashSetJoinQParserPlugin aka hjoin*
>> The hjoin provides a join implementation that filters results in one core 
>> based on the results of a search in another core. This is similar in 
>> functionality to the JoinQParserPlugin but the implementation differs in a 
>> couple of important ways.
>> The first way is that the hjoin is designed to work with int and long join 
>> keys only. So, in order to use hjoin, int or long join keys must be included 
>> in both the to and from core.
>> The second difference is that the hjoin builds memory structures that are 
>> used to quickly connect the join keys. So, the hjoin will need more memory 
>> then the JoinQParserPlugin to perform the join.
>> The main advantage of the hjoin is that it can scale to join millions of 
>> keys between cores and provide sub-second response time. The hjoin 

[jira] [Commented] (SOLR-4787) Join Contrib

2015-07-30 Thread Gopal Patwa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648285#comment-14648285
 ] 

Gopal Patwa commented on SOLR-4787:
---

if this is not committed to solr 5.x, does anyone has this join patch for solr 
5.x ?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will 

[jira] [Commented] (SOLR-4787) Join Contrib

2015-04-08 Thread Tom Winch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485282#comment-14485282
 ] 

Tom Winch commented on SOLR-4787:
-

See also the new external source join, xjoin, at 
https://issues.apache.org/jira/browse/SOLR-7341

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in 

[jira] [Commented] (SOLR-4787) Join Contrib

2015-03-03 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345727#comment-14345727
 ] 

Bill Bell commented on SOLR-4787:
-

To be consistent can we add FQ?

Based on post by Yonik:

The join qparser has no fq parameter, so that is ignored. 

-Yonik 
http://heliosearch.org - native code faceting, facet functions, 
sub-facets, off-heap data 


 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib 

[jira] [Commented] (SOLR-4787) Join Contrib

2015-03-02 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344601#comment-14344601
 ] 

Bill Bell commented on SOLR-4787:
-

This seems like a no-brainer.  Can we commit this into 5.xxx ?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-12-18 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252796#comment-14252796
 ] 

Kranti Parisa commented on SOLR-4787:
-

[~aheaven] Did you apply this patch to test the joins with fq? 
If you tried with the default solr join, then fq is not a supported param for 
the default solr joins. 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-07-29 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077928#comment-14077928
 ] 

Alexander S. commented on SOLR-4787:


It seems join doesn't work as expected, please have a look: 
http://lucene.472066.n3.nabble.com/Search-results-inconsistency-when-using-joins-td4149810.html

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-04-10 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965413#comment-13965413
 ] 

Kranti Parisa commented on SOLR-4787:
-

Arul, thanks for posting the findings.

I don't think LONG fields are supported by bjoin.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-04-07 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961733#comment-13961733
 ] 

Alexander S. commented on SOLR-4787:


@Kranti Parisa, hi, any lick with this?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  in the WEB-INF/lib directory 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-04-04 Thread Arul Kalaipandian (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13960120#comment-13960120
 ] 

Arul Kalaipandian commented on SOLR-4787:
-

Last week, we tried the patch(SOLR-4787) in our test system   performance of 
hjoin is quite better than the standard join. 

But with following issues,

1) With 'int' join fields, bjoin throws  ArrayIndexOutOfBoundsException

{code:title=bjoin throws ArrayIndexOutOfBoundsException}

Caused by: org.apache.solr.client.solrj.SolrServerException: 
java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: -1
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155)
... 48 more
Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: -1
at 
org.apache.solr.joins.BitSetJoinQParserPlugin$BitSetJoinQuery.createWeight(BitSetJoinQParserPlugin.java:282)
at 
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:664)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
at 
org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1122)
at 
org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:825)
at 
org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:942)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1399)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
... 48 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.lucene.util.OpenBitSet.get(OpenBitSet.java:174)
at 
org.apache.solr.joins.BitSetJoinQParserPlugin$BitSetJoinQuery.createWeight(BitSetJoinQParserPlugin.java:273)
... 61 more
{code}  

2) Tescases with both 'bjoin'  'hjoin' are fails with thread leaks.

{code:title=Both hjoin  bjoin (With or witout localparam 'threads')  }
Thread[id=29, name=commitScheduler-7-thread-1, 
state=TIMED_WAITING, group=TGRP-VolatileQueryTest]
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:662)
{code}


3) 'bjoin' throws NumberFormatException  for 'long' join fields.
   It would be nice to validate the field's type before executing the join 
query.

{code:title=Exception with 'long' join fields }
Caused by: java.lang.NumberFormatException: Invalid shift value in prefixCoded 
bytes (is encoded value really an INT?)
at 
org.apache.lucene.util.NumericUtils.getPrefixCodedIntShift(NumericUtils.java:210)
at org.apache.lucene.util.NumericUtils$2.accept(NumericUtils.java:493)
at 
org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:241)
at 
org.apache.lucene.search.FieldCacheImpl$Uninvert.uninvert(FieldCacheImpl.java:308)
at 
org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:653)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:212)
at 
org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:571)
at 
org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:619)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:212)
at 
org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:571)
at 
org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:546)
at 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-21 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943153#comment-13943153
 ] 

Alexander S. commented on SOLR-4787:


Kranti Parisa

Did you try to apply this patch to 4.7.0? I was trying to download it here: 
http://www.apache.org/dyn/closer.cgi/lucene/solr/4.7.0 and then did the next 
steps:
* ant compile
* ant ivy-bootstrap
* ant dist
And then created a package for my Linux distributive, but no luck, Solr fails 
to initialize with
queryParser name=hjoin 
class=org.apache.solr.search.joins.HashSetJoinQParserPlugin/

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-21 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943154#comment-13943154
 ] 

Kranti Parisa commented on SOLR-4787:
-

Alex, I will try that tonight or tomorrow and post my findings.


 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-20 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941765#comment-13941765
 ] 

Kranti Parisa commented on SOLR-4787:
-

Gopal, you can't get the values using joins. you will need to make a second 
call with the result (potentially sorted and paginated on firstCore). Using FQs 
in the first join call, you can hit the caches in the second call. if you need 
more details, describe your use case

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-20 Thread Gopal Patwa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941860#comment-13941860
 ] 

Gopal Patwa commented on SOLR-4787:
---

Thanks Kranti, here is my usecase 

Event Collection:
eventId=1
title=Lady Gaga
date=06/03/2014

EventTicketStats Collection
eventId=1
minPrice=200
minQuantity=5

When user search for lady gaga on event document using hjoin with 
EventTicketStats then result should include min price and qty data from join 
core.

Final Result for Event Collection:
eventId=1
title=Lady Gaga
date=06/03/2014
minPrice=200
minQuantity=5

And user has option to filter result for price and qty like show events for 
minPrice  100
The reason we have EventStats in separate document that our ticket data changes 
every 5 seconds but Event data changes are like twice a day

I thought using Updatable Numeric DocValue after denormalizing Event document 
with min price and qty fields But Solr does not have support for that feature 
yet. So I need to rely on using join


 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-20 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942635#comment-13942635
 ] 

Kranti Parisa commented on SOLR-4787:
-

so for any query you might return one or more EVENTS matching the title search 
terms + filters. 

say you have 30 events matching the given criteria but your pagination is 1-10, 
so you would be displaying the top 10 most relevant EVENTS.. this would be the 
docList of your first query.. and from the ResponseWriter you would need to 
make a call to TICKETS core, by using the original filters + the 10 event ids 
and execute that request (you might need to use LocalSolrQueryRequest and 
pre-processed filters etc to hit the caches of the first query). and collect 
the field info you need for each EVENT..

From the joins implementation point of view, there is no such thing to fetch 
the values or scores from the secondCore.. it would be very costly to do 
that.. you would need to do write some custom ResponseWriters etc which does 
this stuff.. especially considering your requirement of maintaing EVENTS and 
TICKETS separately. There is also a new feature Collapse, Expand results.. but 
then I am not sure about using them for your use case..



 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-19 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940570#comment-13940570
 ] 

Kranti Parisa commented on SOLR-4787:
-

can you post the query

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  in the WEB-INF/lib directory of the solr 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-19 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940740#comment-13940740
 ] 

Alexander S. commented on SOLR-4787:


Any query fails, seems I am doing something wrong (perhaps the patch was 
applied incorrectly). I see this error:
{quote}
SolrCore Initialization Failures
crm-dev: 
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Error loading class 'org.apache.solr.search.joins.HashSetJoinQParserPlugin'
{quote}
when trying to access the web interface.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 

Re: [jira] [Commented] (SOLR-4787) Join Contrib

2014-03-19 Thread Erick Erickson
I'd guess you have an older jar in your
classpath somewhere.

Did you apply the patch to a fresh checkout of the code?

Best,
Erick

On Wed, Mar 19, 2014 at 10:45 AM, Alexander S. (JIRA) j...@apache.org wrote:

 [ 
 https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940740#comment-13940740
  ]

 Alexander S. commented on SOLR-4787:
 

 Any query fails, seems I am doing something wrong (perhaps the patch was 
 applied incorrectly). I see this error:
 {quote}
 SolrCore Initialization Failures
 crm-dev: 
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
 Error loading class 'org.apache.solr.search.joins.HashSetJoinQParserPlugin'
 {quote}
 when trying to access the web interface.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of 
 keys between cores and provide sub-second response time. The hjoin should 
 work well with up to two million results from the fromIndex and tens of 
 millions of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query 
 if the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate 
 a list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-19 Thread Gopal Patwa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941359#comment-13941359
 ] 

Gopal Patwa commented on SOLR-4787:
---

I am trying to use this patch (27/Jan/14 12:26) but getting below exception 
using hjoin

Solr 4.6

Schema.xml id field between parent and child
   field name=id type=long indexed=true stored=true required=true 
multiValued=false/ 

Caused by: java.lang.IllegalStateException: Type mismatch: id was indexed as 
SORTED
at 
org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:896)
at 
org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:875)
at 
org.apache.solr.joins.HashSetJoinQParserPlugin$LongJoinRunner.init(HashSetJoinQParserPlugin.java:504)
at 
org.apache.solr.joins.HashSetJoinQParserPlugin$LongHashSetJoinQuery.createWeight(HashSetJoinQParserPlugin.java:313)

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937732#comment-13937732
 ] 

Alexander S. commented on SOLR-4787:


Thank you, Kranti Parisa, I am far from java development, how can I apply this 
patch and build solr for linux? I tried to patch, it creates a new folder 
joins in solr/contrib, installed ivy and launched ant compile but got this 
error:
{quote}
common.compile-core:
[mkdir] Created dir: 
/home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java
[javac] Compiling 3 source files to 
/home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 
/home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:883:
 error: reached end of file while parsing
[javac]   return this.delegate.acceptsDocsOutOfOrder();
[javac]^
[javac] 
/home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:884:
 error: reached end of file while parsing
[javac] 2 errors
[javac] 1 warning

BUILD FAILED
/home/heaven/Desktop/solr-4.7.0/build.xml:106: The following error occurred 
while executing this line:
/home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:458: The following error 
occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:449: The following error 
occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:471: The following 
error occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:1736: Compile failed; 
see the compiler error output for details.

Total time: 8 minutes 55 seconds
{quote}

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937747#comment-13937747
 ] 

Alexander S. commented on SOLR-4787:


Nvm, there were 3 missing } at the end of HashSetJoinQParserPlugin.java, the 
build was successful, testing now.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937845#comment-13937845
 ] 

Alexander S. commented on SOLR-4787:


Kranti,

Do I need to update anything in my solr config/schema? I've just tried the 
patched version and it still ignores the fq parameter. I was using solr 4.7.0.

Thanks,
Alex

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937868#comment-13937868
 ] 

Kranti Parisa commented on SOLR-4787:
-

Alex,

Are you using HashSetJoin? Did you configure in solrconfig.xml?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937887#comment-13937887
 ] 

Alexander S. commented on SOLR-4787:


Hi, I am using simple join, this way: {!join from=profile_ids_im to=id_i 
fq=$joinFilter1 v=$joinQuery1}.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937895#comment-13937895
 ] 

Kranti Parisa commented on SOLR-4787:
-

NestedJoins (fqs) are implemented in HashSetJoin. so after applying the patch 
you will need to configure it in solrconfig.xml

queryParser name=hjoin 
class=org.apache.solr.search.joins.HashSetJoinQParserPlugin/

and use {!hjoin from=profile_ids_im to=id_i fq=$joinFilter1 v=$joinQuery1}, so 
you are trying to do a self join on the same core?


 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937921#comment-13937921
 ] 

Alexander S. commented on SOLR-4787:


Ok, thx, I'll try with hjoin. And yes, I am trying to do it on the same core.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938117#comment-13938117
 ] 

Alexander S. commented on SOLR-4787:


Getting this error:
{code}
RSolr::Error::Http - 500 Internal Server Error
Error: {msg=SolrCore 'crm-dev' is not available due to init failure: Error 
loading class 
'org.apache.solr.search.joins.HashSetJoinQParserPlugin',trace=org.apache.solr.common.SolrException:
 SolrCore 'crm-dev' is not available due to init failure: Error loading class 
'org.apache.solr.search.joins.HashSetJoinQParserPlugin'
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:827)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:309)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
{code}

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-05 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920798#comment-13920798
 ] 

Alexander S. commented on SOLR-4787:


Hi Joel, thanks, I seems need to perform a nested join inside a single 
collection, but need fq inside join as it is shown here: 
https://issues.apache.org/jira/browse/SOLR-4787?focusedCommentId=13750854page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13750854

I have a single collection with a field type which determines the kind of 
document. 3 types of documents: Profile, Site, and SiteSource.
When searching for Profiles I have to look in SiteSource content, so I need 
something like this:
{code}
q = {!join from=owner_id_im to=id_i fq=$joinFilter1 v=$joinQuery1} # Profile → 
Site join
joinQuery1 = {!join from=site_id_i to=id_i fq=$joinFilter2 v=$joinQuery2} # 
Site → SiteSource join
joinQuery2 = {!edismax}my_keywords
joinFilter1 = type:Site
joinFilter2 = type:SiteSource
{code}

Right now this works only partially, fq inside {!join} is ignored.
When to expect this patch to be merged? Also, will it work in the way I've 
explained or do I understand it wrong?

Thank you,
Alex

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-05 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920802#comment-13920802
 ] 

Kranti Parisa commented on SOLR-4787:
-

Alex,

I will try to create a patch today for nested joins and post here. Which 
version of solr are you using?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  in the 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-05 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920805#comment-13920805
 ] 

Alexander S. commented on SOLR-4787:


Hi, 4.4 and 4.7

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  in the WEB-INF/lib directory of the solr webapplication. 
 This will ensure that the top level Solr 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-02-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915768#comment-13915768
 ] 

Alexander S. commented on SOLR-4787:


Just tried 4.7.0 and it does not work either.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  in the WEB-INF/lib directory of the solr webapplication. 
 This will 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-02-28 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916095#comment-13916095
 ] 

Joel Bernstein commented on SOLR-4787:
--

Hi Alexander,

This ticket has not been committed. There are two joins described on the list 
of QParserPlugins here:

https://cwiki.apache.org/confluence/display/solr/Other+Parser

Joel

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-02-26 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912786#comment-13912786
 ] 

Alexander S. commented on SOLR-4787:


Which release does have support for {!join} with fq parameter? I was trying 
with 4.5.1 but fq seems does not have any effect.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-01-28 Thread Upayavira (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884156#comment-13884156
 ] 

Upayavira commented on SOLR-4787:
-

Happy to be ignored, but wouldn't {!bitsetjoin} and {!hashjoin} be more 
descriptive and therefore more useful? It would mean people would get a more 
intuitive sense of what this is doing before they have to resort to 
documentation.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-01-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884248#comment-13884248
 ] 

David Smiley commented on SOLR-4787:


+1 to \{!bitsetjoin} and \{!hashjoin}

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  in the WEB-INF/lib directory of the solr webapplication. 
 This will ensure that 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-10-25 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805485#comment-13805485
 ] 

Kranti Parisa commented on SOLR-4787:
-

I have recently extended the hjoin further to support multiple FQs separated by 
comma (,)

{code}
/masterCore/select?q=*:*fq=({!hjoin fromIndex=ACore from=parentid to=id v=$aQ 
*fq=$BJoinQ,$AlocalFQ*})aQ=(f1:false)BJoinQ=({!hjoin fromIndex=BCore from=bid 
to=aid}tag:abc)AlocalFQ=(fieldName:value)
{code}

This will allow using the filter caches for multiple nested queries while using 
the hjoin like how solr supports multiple FQ params within the same request.

Any feedback for the syntax? is comma separated FQs sounds ok?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.6

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-10-01 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783110#comment-13783110
 ] 

Peter Keegan commented on SOLR-4787:


Thanks, just tried the latest patch. For identical queries in my test index, 
'bjoin' is twice as fast as 'hjoin' for both small and large inner set sizes.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-10-01 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783124#comment-13783124
 ] 

Kranti Parisa commented on SOLR-4787:
-

Yes, but you might see different results (especially the memory) when you have 
long keys. If you don't have memory restrictions then yes bjoin should 
perform better.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-30 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782179#comment-13782179
 ] 

Peter Keegan commented on SOLR-4787:


I'm seeing ByteArray compilation problem, too. Where would I find this class?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 
 HashSet to perform the underlying join. Because of this the bjoin is much 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-30 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782184#comment-13782184
 ] 

Joel Bernstein commented on SOLR-4787:
--

Yes, I noticed the latest patch is reffering to ByteArray which isn't present. 
I'm going to be putting up new patch shortly to resolve this. It will also 
include the latest work done on the BitSet join.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-15 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768078#comment-13768078
 ] 

Kranti Parisa commented on SOLR-4787:
-

I have implemented multi-value keys for hjoin using a new field 
UnIvertedLongField. Sanity checks looks good. Also tested with FQs (nested 
Joins). I will prepare a patch sometime tomorrow and post here.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-13 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766601#comment-13766601
 ] 

Kranti Parisa commented on SOLR-4787:
-

That's cool. Once it is up, I will run some performance tests and post my 
findings. So it also supports FQs for nested joins and uses filter caches, 
right?


 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 
 HashSet to perform the underlying join. 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-13 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766609#comment-13766609
 ] 

Joel Bernstein commented on SOLR-4787:
--

Kranti, new patch is up with the bjoin that supports multi-value fields. It 
supports nested fq and filter caching as well.

I won't be able to work on the hjoin for a while, so feel free to port the 
multi-value field support to hjoin.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-13 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766446#comment-13766446
 ] 

Joel Bernstein commented on SOLR-4787:
--

Kranti, the bjoin now supports multi-value joins. I'll work on getting the 
patch up here today.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 
 HashSet to perform the underlying join. Because of this the bjoin is much 
 faster and can provide 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-13 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766657#comment-13766657
 ] 

Kranti Parisa commented on SOLR-4787:
-

Yes, will first test the bjoin for multi-valued fields and then try to extend 
hjoin for multi-value fields.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 
 HashSet to perform the underlying join. Because of this the bjoin is much 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-13 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767361#comment-13767361
 ] 

Kranti Parisa commented on SOLR-4787:
-

Something missing in the Patch? I am seeing ByteArray compilation problem. Also 
does bjoin needs any specific types of field configs in schema.xml ?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 
 HashSet to perform the underlying 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-12 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765915#comment-13765915
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel, I modified the JoinQParserPlugin (default one) to allow FQs. It seems to 
be working fine, I will need to test more for caches/performance. Do you have 
any updates for supporting multi-valued keys with hjoin or bjoin?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-27 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751190#comment-13751190
 ] 

Joel Bernstein commented on SOLR-4787:
--

Are all join keys single value? Currently the hjoin and bjoin only support 
single value join keys.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 
 HashSet to perform the underlying join. Because of this the bjoin is much 
 faster and can provide 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-27 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751201#comment-13751201
 ] 

Kranti Parisa commented on SOLR-4787:
-

The values in to fields are single values but from fields are multi valued. 
does this mean that the implementation need significant changes to support 
multi values? I will take a look at the code today.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-27 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751206#comment-13751206
 ] 

Joel Bernstein commented on SOLR-4787:
--

Right now there aren't really efficient memory structures to perform the joins 
on multi-value fields. Our best bet right now would be the SORTED_SET docValues 
described http://wiki.apache.org/solr/DocValues. But this is not really 
designed for integers or longs.

I think the best way to handle is to fully normalize the data so that the join 
keys are single valued. Basically model the data the way you would in a 
relational database then use the nested joins to join the normalized indexes 
together. 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-27 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751237#comment-13751237
 ] 

Kranti Parisa commented on SOLR-4787:
-

I have 5 million documents (might increase in future) each having 10 values in 
the parentid field. So if we normalize the size of the index would become 50 
million documents which would slow down the indexing as well search. I don't 
mind trying with String keys with DocValues (SORTED_SET) if it works.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-27 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751240#comment-13751240
 ] 

Joel Bernstein commented on SOLR-4787:
--

The current implementation doesn't support the multi-value fields though. So it 
will need to be implemented.



 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 
 HashSet to perform the underlying join. Because of this the bjoin is much 
 faster and 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-26 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750854#comment-13750854
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel,

Seems there is something wrong with nested hjoin.

Example:

{code}/masterCore/select?q=*:*fq=({!hjoin fromIndex=ACore from=parentid to=id 
v=$aQ fq=$BJoinQ})aQ=(f1:false)BJoinQ=({!join fromIndex=BCore from=bid 
to=aid}tag:abc){code}

The above query gives me 25558 results

and when I try both joins with hjoin, as follows, it gives me 1 document

{code}/masterCore/select?q=*:*fq=({!hjoin fromIndex=ACore from=parentid to=id 
v=$aQ fq=$BJoinQ})aQ=(f1:false)BJoinQ=({!hjoin fromIndex=BCore from=bid 
to=aid}tag:abc){code}

am I missing anything?


 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-25 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749757#comment-13749757
 ] 

Joel Bernstein commented on SOLR-4787:
--

These are great additions. Not sure I can apply them here though because I'm 
not setting the bits in order. I'm going to need a random access sparse 
implementation. I've seen some but they are LGPL. 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-24 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749431#comment-13749431
 ] 

Joel Bernstein commented on SOLR-4787:
--

Any recommendations for a good sparse bitset implementation?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 *BitSetJoinQParserPlugin aka bjoin*
 The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 
 HashSet to perform the underlying join. Because of this the bjoin is much 
 faster and can provide sub-second response times on result sets 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-24 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749462#comment-13749462
 ] 

David Smiley commented on SOLR-4787:


See these exciting additions to Lucene 4.5:
* LUCENE-5084: Added new Elias-Fano encoder, decoder and DocIdSet
  implementations. (Paul Elschot via Adrien Grand)

* LUCENE-5081: Added WAH8DocIdSet, an in-memory doc id set implementation based
  on word-aligned hybrid encoding. (Adrien Grand)

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-22 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747907#comment-13747907
 ] 

Joel Bernstein commented on SOLR-4787:
--

That's exactly the syntax. I'm just working out the caching details and then 
I'll put up the code.

Getting the queryResultCache and FilterCache to play nicely with nested joins 
is tricky.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-22 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747912#comment-13747912
 ] 

Kranti Parisa commented on SOLR-4787:
-

Cool, will you be able to put the code up here sometime tomorrow? I want to 
apply that patch and see how it performs.


 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-22 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747915#comment-13747915
 ] 

Joel Bernstein commented on SOLR-4787:
--

Yes. I'm very close now. I'll also need to write some quick docs because these 
joins have a lot more functionality. 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-22 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747919#comment-13747919
 ] 

Kranti Parisa commented on SOLR-4787:
-

Awesome!

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-21 Thread Steven Bower (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746126#comment-13746126
 ] 

Steven Bower commented on SOLR-4787:


When using the pJoin is relevance still applied to the results on the left side 
of the join?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-21 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746647#comment-13746647
 ] 

Joel Bernstein commented on SOLR-4787:
--

Yes, the main query still has relevance applied. The filter join query does not 
impact relevance though.

There is going to be a large set of code changes to this ticket, probably next 
week. The implementation of the pjoin is changing and another more scalable 
join will be added as well.

The vjoin is going to remain the same for the time being.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-21 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746912#comment-13746912
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel:

That's cool, thanks. Did you ever thought of supporting fq parameter for the 
Join, like how we support for v for query. the idea is to use filter cache on 
the child core while executing the join query. 



 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-21 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747147#comment-13747147
 ] 

David Smiley commented on SOLR-4787:


Excellent suggestion [~kranti2006] on supporting filter queries on the 
from-side of the join.  I implemented a custom join query and it has a 
from-side filter query feature.  It can help a great deal with performance.  It 
wasn't that hard to add, either.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-21 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747155#comment-13747155
 ] 

Kranti Parisa commented on SOLR-4787:
-

That's awesome! Yes, for sure it would be a big deal for performance especially 
it will allow us to cache the majority of the join query which could be common 
for all the cases. Once it's filter cached, we can run additional clases thru 
normal queries super fast!

I was thinking to add a new local-param to support fq. If you already have 
something, do you want to share?


 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-21 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747166#comment-13747166
 ] 

David Smiley commented on SOLR-4787:


I don't have the rights to it but I'll share the most pertinent line of code:
{code}
SolrIndexSearcher.ProcessedFilter processedFilter =
searcher.getProcessedFilter(null, filters);
{code}

See, Solr does most of the work with that one line call; everything else, such 
as parsing the queries is easy/common stuff.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-21 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747168#comment-13747168
 ] 

Joel Bernstein commented on SOLR-4787:
--

I'll put up what I've been working on tomorrow. One of the joins I'll be adding 
supports the fq on the from side of join. These joins also support both 
PostFilter and traditional filter query joins. 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-21 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747170#comment-13747170
 ] 

Joel Bernstein commented on SOLR-4787:
--

Yeah, that's the exact approach I took David.



 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-21 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747172#comment-13747172
 ] 

Kranti Parisa commented on SOLR-4787:
-

Great, thanks!

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-08-21 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747175#comment-13747175
 ] 

Kranti Parisa commented on SOLR-4787:
-

Nested Joins! That's exactly what I am trying :) and thought about adding fq to 
the solr joins.

Using local-param in the first join: v=$joinQjoinQ=(field:123 AND 
_query_={another join}). So here another join could be passed as a FQ and it 
should get results faster!!

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-07-23 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716342#comment-13716342
 ] 

Joel Bernstein commented on SOLR-4787:
--

Kranti,

Let me know how the pjoin is performing for you. I'm going to be testing out 
some different data structures for the pjoin to see if I can get better 
performance.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-07-23 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716380#comment-13716380
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel,

Initial performance results looks like:
(Restarted solr - hence no caches at the beginning)

- with no cache: pjoin is 2-3 times faster than join
- with cache: pjoin is 3-4 times slower than join

Agree with your idea, we should try with other data structures and may be a 
look at the caching strategy used in pjoin.

Are the queries already running in parallel to find the intersection?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-07-23 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717540#comment-13717540
 ] 

Joel Bernstein commented on SOLR-4787:
--

Kranti,

Odd that the pjoin cache is making things slower. I'll do some testing and see 
if I can turn up the same results. 

The join query runs first and builds a data structure in memory that is used to 
post filter the main query. The main query then runs and the post filter is 
applied.

I'm exploring another scenario that will perform 5x faster then the current 
pjoin. But the tradeoff is a longer warmup time when a new searcher is opened.

Do you have real-time indexing requirements or can you live with some warm-up 
time.

 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0, 4.5

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-07-23 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717553#comment-13717553
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel,

Thanks for the details. Yes, we do some real-time indexing. Say, every 30min we 
get deltas. how much warmup time that we are looking at for 5M docs?

Also, if we have more than one pjoins in the fq, each points to their own 
cores, can those pjoins be executed in parallel and find the intersection which 
will finally be applied as a filter for the main query?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0, 4.5

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-07-22 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715925#comment-13715925
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel,

I wanted to try implementing pjoin for the following use case.

masterCore = 1M keys (id field, long)
childCore = 5M documents (with parentid field, long, whose values are equal to 
the values of id field in the masterCore

And syntax looks like
http://localhost:8180/solr/masterCore/select?q=title:afq=({!pjoin%20fromIndex=childCore%20from=parentid%20to=id%20v=$childQ})childQ=(fieldOne:somevalue
 AND fieldTwo:[1 TO 100])

I am getting SyntaxError, but the same syntax works with normal join. any 
ideas?

Also it seems currently pjoin supports only int keys, can you please update 
pjoin to allow for longs

-
Kranti



 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-07-22 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715944#comment-13715944
 ] 

Kranti Parisa commented on SOLR-4787:
-

I did review the PostFilterJoinQParserPlugin code and found that it was 
expecting fromCore instead of fromIndex (like in normal Join)

after trying fromCore now getting the expected error related to INT keys, as 
my keys are Long.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-28 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695802#comment-13695802
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel,

Yes, sort implementation would be costly, I did review the code. 
I think having the option for MIN/MAX would help in few cases, and if we pass 
that as null then we are same as with current implementation.

Having LongLongOpenHashMap would really help.

-
Kranti

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-27 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694851#comment-13694851
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel,

I am able to get the fields from the other core, pretty cool. (there are minor 
gaps in the README.txt and ant build scripts to include joins, hpcc jar files)

Seems we have limitation of returning only INTs, can we support LONGs? means 
can we change the implementation to use LongLongOpenHashMap?

- Kranti

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-27 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694902#comment-13694902
 ] 

Kranti Parisa commented on SOLR-4787:
-

if we have the following case

PARENT CORE

ID Field
--
p1
p2
p3

CHILD CORE

ID Field  Parent ID field   ValueToFetch Field
-
c1  p1 1234
c2  p1 3456


Then ValueToFetch will be the last one (in this case: 3456)

May be we should think of doing 

1. Allow sort parameter to the vjoin function

2. consider the Sort when executing the query

3. for each PARENT ID, pick up the first ValueToFetch and ignore the rest (as 
we are specifying the sort preference, we are saying to collect the top 
document)

4. sort param could have multiple values like general solr sort

so vjoin will look like 
vjoin(joinCore, foreignKey, foreignVal, primaryKey, query, $vSort)

vSort=field1 desc, field2 asc

5. if no sort param is specified then current implementation works (picking the 
the value from the last document)

Joel, your ideas? 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-27 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695221#comment-13695221
 ] 

Joel Bernstein commented on SOLR-4787:
--

Kranti,

Glad it's working for you. The one-to-many join is tricky as it is in a 
relational database.

The sort idea has some implementation problems. Currently the query on the 
fromCore only collects the BitSet containing the matching docs. So the normal 
sorting collectors aren't used here. Putting them in play will cause 
scalability issues.

One thing we could do is either take the MIN or MAX value. There would be a 
performance hit here as well but not nearly as much as the sort approach. The 
syntax could be 

vjoin(fromCore, fromKey, fromValue, toKey, query, MIN|MAX)

I have no problems switching to the LongLongOpenHashMap.

The next thing I planned to do on this ticket is support both integers and 
longs.

Joel




 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-25 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693215#comment-13693215
 ] 

Joel Bernstein commented on SOLR-4787:
--

Kranti,

The latest patch will apply cleanly on Solr 4x. The README.txt file has the 
configuration steps that are needed to get the vjoin working.

The pjoin is not covered in the README.txt but it will be shortly.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-25 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693218#comment-13693218
 ] 

Joel Bernstein commented on SOLR-4787:
--

Steven,

Thanks for the patch. I'll add this to the main patch this week. 

I'm currently working on a third, more scalable, join implementation.

If anyone knows of a good sparse bitset implementation please let me know. 

Joel 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-24 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691990#comment-13691990
 ] 

Kranti Parisa commented on SOLR-4787:
-

cool, thanks.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-23 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691612#comment-13691612
 ] 

Kranti Parisa commented on SOLR-4787:
-

I am trying to apply this patch on branch_4x code 
(http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x). but getting 
the following error, any idea?

wget https://issues.apache.org/jira/secure/attachment/12587067/SOLR-4787.patch 
-O - | patch -p0 --dry-run

--2013-06-23 17:35:32--  
https://issues.apache.org/jira/secure/attachment/12587067/SOLR-4787.patch
Resolving issues.apache.org... 140.211.11.121
Connecting to issues.apache.org|140.211.11.121|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26641 (26K) [text/x-patch]
Saving to: `STDOUT'

100%[===] 26,641   
143K/s   in 0.2s

2013-06-23 17:35:32 (143 KB/s) - written to stdout [26641/26641]

patching file solr/example/solr/collection1/conf/solrconfig.xml
Hunk #1 FAILED at 81.
Hunk #2 succeeded at 515 (offset -2 lines).
Hunk #3 succeeded at 564 (offset -2 lines).
Hunk #4 succeeded at 1535 (offset 6 lines).
Hunk #5 succeeded at 1548 (offset 6 lines).
Hunk #6 succeeded at 1788 (offset 6 lines).
Hunk #7 succeeded at 1803 (offset 6 lines).
1 out of 7 hunks FAILED -- saving rejects to file 
solr/example/solr/collection1/conf/solrconfig.xml.rej
patching file solr/example/exampledocs/mem.xml
patching file solr/contrib/joins/ivy.xml
patching file solr/contrib/joins/src/java/org/apache/solr/joins/CacheSet.java
patching file 
solr/contrib/joins/src/java/org/apache/solr/joins/SegmentBitSetCollector.java
patching file 
solr/contrib/joins/src/java/org/apache/solr/joins/PostFilterJoinQParserPlugin.java
patching file 
solr/contrib/joins/src/java/org/apache/solr/joins/ValueSourceJoinParserPlugin.java
patching file solr/contrib/joins/README.txt
patching file solr/contrib/joins/build.xml

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-23 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691661#comment-13691661
 ] 

Joel Bernstein commented on SOLR-4787:
--

Kranti,

I'm going to remove the solrconfig.xml changes so the patch does not complain 
with 4.x branch. I'll make sure the README.txt explains the changes that need 
to be made in the solconfig.xml. I should be able to put this up tomorrow.



 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-07 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678038#comment-13678038
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel,

Thanks for the information. I shall update the plugin and test the same. 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-06 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13677149#comment-13677149
 ] 

Joel Bernstein commented on SOLR-4787:
--

Kranti,

The vjoin has two performance hotspots:

1) The creation of the HashMap for the hashjoin. My testing shows that it can 
support a couple hundred thousand keys before performance becomes an issue. The 
pjoin is much more scable in this area and can support millions of keys. The 
reason for this is that the pjoin only needs a sorted array to perform  binary 
searches against. Java's array.sort() can sort a random array of integers much 
faster then the LongToInt hashmap impl used by vjoin.

For personalized relevance though this should be plenty because each user will 
have a custom set of data to join to. You don't want or need to have a 
relevance record for each document. If a document is not available in the 
relevance core, vjoin will return a neutral boost of 1.


2) The hash key lookup each time the vjoin is called. This will be called for 
each document that is scored in the result set. This should scale to support 
result sets into the millions. I tested with 4,000,000 results and had 
excellent performance.





 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(joinCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-03 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673478#comment-13673478
 ] 

Joel Bernstein commented on SOLR-4787:
--

Thanks for the recommendations, FastUtil looks great. I'm going to switch the 
JoinValueSourceParserPlugin2 over to use a hash join on long keys.
 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
 in the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like 
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /
 *JoinValueSourceParserPlugin2 aka vjoin2 aka Personalized ValueSource Join*
 vjoin2 supports personalized ValueSource joins. The syntax is similar to 
 vjoin but adds an extra parameter so a query can be specified to join a 
 specific record set from the fromCore. This is designed to allow customer 
 specific relevance information to be added to the fromCore and then joined at 
 query time.
 Syntax:
 bf=vjoin2(fromCore,fromKey,fromVal,toKey,query)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-06-03 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673481#comment-13673481
 ] 

Kranti Parisa commented on SOLR-4787:
-

great, thanks Joel.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
 in the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like 
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /
 *JoinValueSourceParserPlugin2 aka vjoin2 aka Personalized ValueSource Join*
 vjoin2 supports personalized ValueSource joins. The syntax is similar to 
 vjoin but adds an extra parameter so a query can be specified to join a 
 specific record set from the fromCore. This is designed to allow customer 
 specific relevance information to be added to the fromCore and then joined at 
 query time.
 Syntax:
 bf=vjoin2(fromCore,fromKey,fromVal,toKey,query)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-05-31 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671199#comment-13671199
 ] 

Dawid Weiss commented on SOLR-4787:
---

Pull a class or two in source code form from fastutil or from HPPC. These are 
nearly identical these days, fastutil has support for Java collections 
interfaces (HPPC has its own API not stemming from JUC). Both of these are 
equally fast.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
 in the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like 
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /
 *JoinValueSourceParserPlugin2 aka vjoin2 aka Personalized ValueSource Join*
 vjoin2 supports personalized ValueSource joins. The syntax is similar to 
 vjoin but adds an extra parameter so a query can be specified to join a 
 specific record set from the fromCore. This is designed to allow customer 
 specific relevance information to be added to the fromCore and then joined at 
 query time.
 Syntax:
 bf=vjoin2(fromCore,fromKey,fromVal,toKey,query)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >