Hello I’ve received user feedback regarding collocated join <https://solr.apache.org/guide/solr/latest/query-guide/join-query-parser.html#joining-multiple-shard-collections> .
The documents in the system are divided into 2 collections - text + > frequently used metadata and extended metadata (1 to 1). > There are expected to be 40 million documents, which are usually divided > into 8 shards, with the average size of a document with extended metadata > being 2KB. > That is, without sharding, such an index would have been a very rough > 80GB, but in fact it's only 10GB 😎 > This greatly helps mmap for searching the main collection (it's hard to > assess the impact, but it's unlikely to have gotten worse 😅) > All JOIN queries are expected to be faster (they used to take up to a > minute, now - up to a dozen seconds) almost proportionally to the degree of > sharding. > As a bonus, we also get fast indexing of documents (we didn't measure it > on real data here, as it wasn't a problematic area, but the traffic has > become much lower according to network monitoring) > It's been in commercial operation for over a year, by the way 😉 Is anyone else using it in production? On Thu, Jun 1, 2023 at 12:19 PM Mikhail Khludnev <[email protected]> wrote: > Hello, > I think I'm done with code and tests. This feature allowsto join > collections with multiple shards on both sides for the sake of scalability. > It requires me to introduce AffinityPlacementPlugin.withCollectionShards > where I feel much uncertainty and need good advice. > https://issues.apache.org/jira/browse/SOLR-16717 > https://github.com/apache/solr/pull/1550/ > > Thanks! > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev
