[jira] [Comment Edited] (SOLR-6568) Join Discovery Contrib

Jack Krupansky (JIRA) Sat, 27 Sep 2014 06:55:21 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150589#comment-14150589
 ]


Jack Krupansky edited comment on SOLR-6568 at 9/27/14 1:53 PM:
---------------------------------------------------------------

This sounds quite interesting, but... it's tagged as "minor", so... what's the 
catch or limitation that prevents this from being a "major"?

Does it work well or at all for indexes that are not 100% memory resident? What 
about SSD?

Does it only work with "integer" join keys? Is that a restriction that could be 
relaxed? Or possibly have two parallel components, one that is super fast for 
integer keys and only reasonably fast for non-integer keys. Might it be 
possible to build an off-heap map from non-integer key to a temporary integer 
key?



was (Author: jkrupan):
This sounds quite interesting, but... it's tagged as "minor", so... what's the 
catch or limitation that prevents this from being a "major"?

Does it well well or at all for indexes that are not 100% memory resident? What 
about SSD?

Does it only work with "integer" join keys? Is that a restriction that could be 
relaxed? Or possibly have two parallel components, one that is super fast for 
integer keys and only reasonably fast for non-integer keys. Might it be 
possible to build an off-heap map from non-integer key to a temporary integer 
key?


> Join Discovery Contrib
> ----------------------
>
>                 Key: SOLR-6568
>                 URL: https://issues.apache.org/jira/browse/SOLR-6568
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0
>
>
> This contribution was commissioned by the *NCBI* (National Center for 
> Biotechnology Information). 
> The Join Discovery Contrib is a set of Solr plugins that support large scale 
> joins and "join facets" between Solr cores. 
> There are two different Join implementations included in this contribution. 
> Both implementations are designed to work with integer join keys. It is very 
> common in large BioInformatic and Genomic databases to use integer primary 
> and foreign keys. Integer keys allow Bioinformatic and Genomic search engines 
> and discovery tools to perform complex operations on large data sets very 
> efficiently. 
> The Join Discovery Contrib provides features that will be applicable to 
> anyone working with the freely available databases from the NCBI and likely a 
> large number of other BioInformatic and Genomic databases. These features are 
> not specific though to Bioinformatics and Genomics, they can be used in any 
> datasets where integer keys are used to define the primary and foreign keys.
> What is included in this contrib:
> 1) A new JoinComponent. This component is used instead of the standard 
> QueryComponent. It facilitates very large scale relational joins between two 
> Solr indexes (cores). The join algorithm used in this component is known as a 
> *parallel partitioned merge join*. This is an algorithm which partitions the 
> results from both sides of the join and then sorts and merges the partitions 
> in parallel. 
>  Below are some of it's features:
> * Sub-second performance on very large joins. The parallel join algorithm is 
> capable of sub-second performance on joins with tens of millions of records 
> on both sides of the join.
> * The JoinComponent returns "tuples" with fields from both sides of the join. 
> The initial release returns the primary keys from both sides of the join and 
> the join key. 
> * The tuples also include, and are ranked by, a combined score from both 
> sides of the join.
> * Special purpose memory-mapped on-disk indexes to support \*:\* joins. This 
> makes it possible to join an entire index with a sub-set of another index 
> with sub-second performance. 
> * Support for very fast one-to-one, one-to-many and many-to-many joins. Fast 
> many-to-many joins make it possible to join between indexes on multi-value 
> fields. 
> 2) A new JoinFacetComponent. This component provides facets for both indexes 
> involved in the join. 
> 3) The BitSetJoinQParserPlugin. A very fast parallel filter join based on 
> bitsets that supports infinite levels of nesting. It can be used as a filter 
> query in combination with the JoinComponent or with the standard query 
> component. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6568) Join Discovery Contrib

Reply via email to