[jira] [Commented] (SOLR-6568) Join Discovery Contrib

2016-03-03 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178638#comment-15178638
 ] 

Joel Bernstein commented on SOLR-6568:
--

Yes, the distributed joins in the Streaming API have superseded this ticket. 

> Join Discovery Contrib
> --
>
> Key: SOLR-6568
> URL: https://issues.apache.org/jira/browse/SOLR-6568
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 5.0
>
>
> This contribution was commissioned by the *NCBI* (National Center for 
> Biotechnology Information). 
> The Join Discovery Contrib is a set of Solr plugins that support large scale 
> joins and "join facets" between Solr cores. 
> There are two different Join implementations included in this contribution. 
> Both implementations are designed to work with integer join keys. It is very 
> common in large BioInformatic and Genomic databases to use integer primary 
> and foreign keys. Integer keys allow Bioinformatic and Genomic search engines 
> and discovery tools to perform complex operations on large data sets very 
> efficiently. 
> The Join Discovery Contrib provides features that will be applicable to 
> anyone working with the freely available databases from the NCBI and likely a 
> large number of other BioInformatic and Genomic databases. These features are 
> not specific though to Bioinformatics and Genomics, they can be used in any 
> datasets where integer keys are used to define the primary and foreign keys.
> What is included in this contrib:
> 1) A new JoinComponent. This component is used instead of the standard 
> QueryComponent. It facilitates very large scale relational joins between two 
> Solr indexes (cores). The join algorithm used in this component is known as a 
> *parallel partitioned merge join*. This is an algorithm which partitions the 
> results from both sides of the join and then sorts and merges the partitions 
> in parallel. 
>  Below are some of it's features:
> * Sub-second performance on very large joins. The parallel join algorithm is 
> capable of sub-second performance on joins with tens of millions of records 
> on both sides of the join.
> * The JoinComponent returns "tuples" with fields from both sides of the join. 
> The initial release returns the primary keys from both sides of the join and 
> the join key. 
> * The tuples also include, and are ranked by, a combined score from both 
> sides of the join.
> * Special purpose memory-mapped on-disk indexes to support \*:\* joins. This 
> makes it possible to join an entire index with a sub-set of another index 
> with sub-second performance. 
> * Support for very fast one-to-one, one-to-many and many-to-many joins. Fast 
> many-to-many joins make it possible to join between indexes on multi-value 
> fields. 
> 2) A new JoinFacetComponent. This component provides facets for both indexes 
> involved in the join. 
> 3) The BitSetJoinQParserPlugin. A very fast parallel filter join based on 
> bitsets that supports infinite levels of nesting. It can be used as a filter 
> query in combination with the JoinComponent or with the standard query 
> component. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6568) Join Discovery Contrib

2016-03-02 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177046#comment-15177046
 ] 

Otis Gospodnetic commented on SOLR-6568:


Hey [~joel.bernstein], has this been superseded by other joins?

> Join Discovery Contrib
> --
>
> Key: SOLR-6568
> URL: https://issues.apache.org/jira/browse/SOLR-6568
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 5.0
>
>
> This contribution was commissioned by the *NCBI* (National Center for 
> Biotechnology Information). 
> The Join Discovery Contrib is a set of Solr plugins that support large scale 
> joins and "join facets" between Solr cores. 
> There are two different Join implementations included in this contribution. 
> Both implementations are designed to work with integer join keys. It is very 
> common in large BioInformatic and Genomic databases to use integer primary 
> and foreign keys. Integer keys allow Bioinformatic and Genomic search engines 
> and discovery tools to perform complex operations on large data sets very 
> efficiently. 
> The Join Discovery Contrib provides features that will be applicable to 
> anyone working with the freely available databases from the NCBI and likely a 
> large number of other BioInformatic and Genomic databases. These features are 
> not specific though to Bioinformatics and Genomics, they can be used in any 
> datasets where integer keys are used to define the primary and foreign keys.
> What is included in this contrib:
> 1) A new JoinComponent. This component is used instead of the standard 
> QueryComponent. It facilitates very large scale relational joins between two 
> Solr indexes (cores). The join algorithm used in this component is known as a 
> *parallel partitioned merge join*. This is an algorithm which partitions the 
> results from both sides of the join and then sorts and merges the partitions 
> in parallel. 
>  Below are some of it's features:
> * Sub-second performance on very large joins. The parallel join algorithm is 
> capable of sub-second performance on joins with tens of millions of records 
> on both sides of the join.
> * The JoinComponent returns "tuples" with fields from both sides of the join. 
> The initial release returns the primary keys from both sides of the join and 
> the join key. 
> * The tuples also include, and are ranked by, a combined score from both 
> sides of the join.
> * Special purpose memory-mapped on-disk indexes to support \*:\* joins. This 
> makes it possible to join an entire index with a sub-set of another index 
> with sub-second performance. 
> * Support for very fast one-to-one, one-to-many and many-to-many joins. Fast 
> many-to-many joins make it possible to join between indexes on multi-value 
> fields. 
> 2) A new JoinFacetComponent. This component provides facets for both indexes 
> involved in the join. 
> 3) The BitSetJoinQParserPlugin. A very fast parallel filter join based on 
> bitsets that supports infinite levels of nesting. It can be used as a filter 
> query in combination with the JoinComponent or with the standard query 
> component. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6568) Join Discovery Contrib

2015-04-08 Thread Tom Winch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485281#comment-14485281
 ] 

Tom Winch commented on SOLR-6568:
-

See also the new external source join, xjoin, at 
https://issues.apache.org/jira/browse/SOLR-7341

 Join Discovery Contrib
 --

 Key: SOLR-6568
 URL: https://issues.apache.org/jira/browse/SOLR-6568
 Project: Solr
  Issue Type: New Feature
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0


 This contribution was commissioned by the *NCBI* (National Center for 
 Biotechnology Information). 
 The Join Discovery Contrib is a set of Solr plugins that support large scale 
 joins and join facets between Solr cores. 
 There are two different Join implementations included in this contribution. 
 Both implementations are designed to work with integer join keys. It is very 
 common in large BioInformatic and Genomic databases to use integer primary 
 and foreign keys. Integer keys allow Bioinformatic and Genomic search engines 
 and discovery tools to perform complex operations on large data sets very 
 efficiently. 
 The Join Discovery Contrib provides features that will be applicable to 
 anyone working with the freely available databases from the NCBI and likely a 
 large number of other BioInformatic and Genomic databases. These features are 
 not specific though to Bioinformatics and Genomics, they can be used in any 
 datasets where integer keys are used to define the primary and foreign keys.
 What is included in this contrib:
 1) A new JoinComponent. This component is used instead of the standard 
 QueryComponent. It facilitates very large scale relational joins between two 
 Solr indexes (cores). The join algorithm used in this component is known as a 
 *parallel partitioned merge join*. This is an algorithm which partitions the 
 results from both sides of the join and then sorts and merges the partitions 
 in parallel. 
  Below are some of it's features:
 * Sub-second performance on very large joins. The parallel join algorithm is 
 capable of sub-second performance on joins with tens of millions of records 
 on both sides of the join.
 * The JoinComponent returns tuples with fields from both sides of the join. 
 The initial release returns the primary keys from both sides of the join and 
 the join key. 
 * The tuples also include, and are ranked by, a combined score from both 
 sides of the join.
 * Special purpose memory-mapped on-disk indexes to support \*:\* joins. This 
 makes it possible to join an entire index with a sub-set of another index 
 with sub-second performance. 
 * Support for very fast one-to-one, one-to-many and many-to-many joins. Fast 
 many-to-many joins make it possible to join between indexes on multi-value 
 fields. 
 2) A new JoinFacetComponent. This component provides facets for both indexes 
 involved in the join. 
 3) The BitSetJoinQParserPlugin. A very fast parallel filter join based on 
 bitsets that supports infinite levels of nesting. It can be used as a filter 
 query in combination with the JoinComponent or with the standard query 
 component. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6568) Join Discovery Contrib

2014-09-27 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150589#comment-14150589
 ] 

Jack Krupansky commented on SOLR-6568:
--

This sounds quite interesting, but... it's tagged as minor, so... what's the 
catch or limitation that prevents this from being a major?

Does it well well or at all for indexes that are not 100% memory resident? What 
about SSD?

Does it only work with integer join keys? Is that a restriction that could be 
relaxed? Or possibly have two parallel components, one that is super fast for 
integer keys and only reasonably fast for non-integer keys. Might it be 
possible to build an off-heap map from non-integer key to a temporary integer 
key?


 Join Discovery Contrib
 --

 Key: SOLR-6568
 URL: https://issues.apache.org/jira/browse/SOLR-6568
 Project: Solr
  Issue Type: New Feature
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0


 This contribution was commissioned by the *NCBI* (National Center for 
 Biotechnology Information). 
 The Join Discovery Contrib is a set of Solr plugins that support large scale 
 joins and join facets between Solr cores. 
 There are two different Join implementations included in this contribution. 
 Both implementations are designed to work with integer join keys. It is very 
 common in large BioInformatic and Genomic databases to use integer primary 
 and foreign keys. Integer keys allow Bioinformatic and Genomic search engines 
 and discovery tools to perform complex operations on large data sets very 
 efficiently. 
 The Join Discovery Contrib provides features that will be applicable to 
 anyone working with the freely available databases from the NCBI and likely a 
 large number of other BioInformatic and Genomic databases. These features are 
 not specific though to Bioinformatics and Genomics, they can be used in any 
 datasets where integer keys are used to define the primary and foreign keys.
 What is included in this contrib:
 1) A new JoinComponent. This component is used instead of the standard 
 QueryComponent. It facilitates very large scale relational joins between two 
 Solr indexes (cores). The join algorithm used in this component is known as a 
 *parallel partitioned merge join*. This is an algorithm which partitions the 
 results from both sides of the join and then sorts and merges the partitions 
 in parallel. 
  Below are some of it's features:
 * Sub-second performance on very large joins. The parallel join algorithm is 
 capable of sub-second performance on joins with tens of millions of records 
 on both sides of the join.
 * The JoinComponent returns tuples with fields from both sides of the join. 
 The initial release returns the primary keys from both sides of the join and 
 the join key. 
 * The tuples also include, and are ranked by, a combined score from both 
 sides of the join.
 * Special purpose memory-mapped on-disk indexes to support \*:\* joins. This 
 makes it possible to join an entire index with a sub-set of another index 
 with sub-second performance. 
 * Support for very fast one-to-one, one-to-many and many-to-many joins. Fast 
 many-to-many joins make it possible to join between indexes on multi-value 
 fields. 
 2) A new JoinFacetComponent. This component provides facets for both indexes 
 involved in the join. 
 3) The BitSetJoinQParserPlugin. A very fast parallel filter join based on 
 bitsets that supports infinite levels of nesting. It can be used as a filter 
 query in combination with the JoinComponent or with the standard query 
 component. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org