Re: SolrCloud and Joins

2013-07-31 Thread David Larochelle
Thanks Walter, Existing media sets will rarely change but new media sets will be added relatively frequently. (There is a many to many relationship between media sets and media sources.) Given the size of data, a new Media Set that only includes 1% of the collection would include 6 million rows.

SolrCloud and Joins

2013-07-29 Thread David Larochelle
I'm setting up SolrCloud with around 600 million documents. The basic structure of each document is: stories_id: integer, media_id: integer, sentence: text_en We have a number of stories from different media and we treat each sentence as a separate document because we need to run sentence level

Re: SolrCloud and Joins

2013-07-29 Thread Walter Underwood
Denormalize. Add media_set_id to each sentence document. Done. wunder On Jul 29, 2013, at 7:58 AM, David Larochelle wrote: I'm setting up SolrCloud with around 600 million documents. The basic structure of each document is: stories_id: integer, media_id: integer, sentence: text_en We

Re: SolrCloud and Joins

2013-07-29 Thread David Larochelle
We'd like to be able to easily update the media set to source mapping. I'm concerned that if we store the media_sets_id in the sentence documents, it will be very difficult to add additional media set to source mapping. I imagine that adding a new media set would either require reimporting all 600

Re: SolrCloud and Joins

2013-07-29 Thread Walter Underwood
A join may seem clean, but it will be slow and (currently) doesn't work in a cluster. You find all the sentences in a media set by searching for that set id and requesting only the sentence_id (yes, you need that). Then you reindex them. With small documents like this, it is probably fairly