[ https://issues.apache.org/jira/browse/SOLR-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150589#comment-14150589 ]
Jack Krupansky edited comment on SOLR-6568 at 9/27/14 1:53 PM: --------------------------------------------------------------- This sounds quite interesting, but... it's tagged as "minor", so... what's the catch or limitation that prevents this from being a "major"? Does it work well or at all for indexes that are not 100% memory resident? What about SSD? Does it only work with "integer" join keys? Is that a restriction that could be relaxed? Or possibly have two parallel components, one that is super fast for integer keys and only reasonably fast for non-integer keys. Might it be possible to build an off-heap map from non-integer key to a temporary integer key? was (Author: jkrupan): This sounds quite interesting, but... it's tagged as "minor", so... what's the catch or limitation that prevents this from being a "major"? Does it well well or at all for indexes that are not 100% memory resident? What about SSD? Does it only work with "integer" join keys? Is that a restriction that could be relaxed? Or possibly have two parallel components, one that is super fast for integer keys and only reasonably fast for non-integer keys. Might it be possible to build an off-heap map from non-integer key to a temporary integer key? > Join Discovery Contrib > ---------------------- > > Key: SOLR-6568 > URL: https://issues.apache.org/jira/browse/SOLR-6568 > Project: Solr > Issue Type: New Feature > Reporter: Joel Bernstein > Assignee: Joel Bernstein > Priority: Minor > Fix For: 5.0 > > > This contribution was commissioned by the *NCBI* (National Center for > Biotechnology Information). > The Join Discovery Contrib is a set of Solr plugins that support large scale > joins and "join facets" between Solr cores. > There are two different Join implementations included in this contribution. > Both implementations are designed to work with integer join keys. It is very > common in large BioInformatic and Genomic databases to use integer primary > and foreign keys. Integer keys allow Bioinformatic and Genomic search engines > and discovery tools to perform complex operations on large data sets very > efficiently. > The Join Discovery Contrib provides features that will be applicable to > anyone working with the freely available databases from the NCBI and likely a > large number of other BioInformatic and Genomic databases. These features are > not specific though to Bioinformatics and Genomics, they can be used in any > datasets where integer keys are used to define the primary and foreign keys. > What is included in this contrib: > 1) A new JoinComponent. This component is used instead of the standard > QueryComponent. It facilitates very large scale relational joins between two > Solr indexes (cores). The join algorithm used in this component is known as a > *parallel partitioned merge join*. This is an algorithm which partitions the > results from both sides of the join and then sorts and merges the partitions > in parallel. > Below are some of it's features: > * Sub-second performance on very large joins. The parallel join algorithm is > capable of sub-second performance on joins with tens of millions of records > on both sides of the join. > * The JoinComponent returns "tuples" with fields from both sides of the join. > The initial release returns the primary keys from both sides of the join and > the join key. > * The tuples also include, and are ranked by, a combined score from both > sides of the join. > * Special purpose memory-mapped on-disk indexes to support \*:\* joins. This > makes it possible to join an entire index with a sub-set of another index > with sub-second performance. > * Support for very fast one-to-one, one-to-many and many-to-many joins. Fast > many-to-many joins make it possible to join between indexes on multi-value > fields. > 2) A new JoinFacetComponent. This component provides facets for both indexes > involved in the join. > 3) The BitSetJoinQParserPlugin. A very fast parallel filter join based on > bitsets that supports infinite levels of nesting. It can be used as a filter > query in combination with the JoinComponent or with the standard query > component. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org