Re: distributed search limitations via SolrCloud
5.x will still build a war file that you an deploy on Tomcat. But support for that is going away eventually, certainly by 6.0. But you do have to make the decision sometime before 6.0 at least. Best, Erick On Wed, May 27, 2015 at 1:24 PM, Vishal Swaroop vishal@gmail.com wrote: Thanks a lot Erick... great inputs... Currently our deployment is on Tomcat 7 and I think SOLR 5.x does not support Tomcat but runs on its own Jetty server, right ? I will discuss this with the team. Thanks again. Regards Vishal On Wed, May 27, 2015 at 4:16 PM, Erick Erickson erickerick...@gmail.com wrote: I'd move to Solr 4.10.3 at least, but preferably Solr 5.x. Solr 5.2 is being readied for release as we speak, it'll probably be available in a week or so barring unforeseen problems and that's the one I'd go with by preference. Do be aware, though, that the 5.x Solr world deprecates using a war file. It's still actually produced, but Solr is moving towards start scripts instead. This is something new to get used to. See: https://wiki.apache.org/solr/WhyNoWar Best, Erick On Wed, May 27, 2015 at 12:51 PM, Vishal Swaroop vishal@gmail.com wrote: Thanks a lot Erick... You are right we should not delay moving to sharding/SolrCloud process. As you all are expert... currently we are using SOLR 4.7.. Do you suggest we should move to latest SOLR release 5.1.0 ? or we can manage the above issue using SOLR 4.7 Regards Vishal On Wed, May 27, 2015 at 2:21 PM, Erick Erickson erickerick...@gmail.com wrote: Hard to say. I've seen 20M doc be the place you need to consider sharding/SolrCloud. I've seen 300M docs be the place you need to start sharding. That said I'm quite sure you'll need to shard before you get to 2B. There's no good reason to delay that process. You'll have to do something about the join issue though, that's the problem you might want to solve first. The new streaming aggregation stuff might help there, you'll have to figure that out. The first thing I'd explore is whether you can denormlized your way out of the need to join. Or whether you can use block joins instead. Best, Erick On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop vishal@gmail.com wrote: Currently, we have SOLR configured on single linux server (24 GB physical memory) with multiple cores. We are using SOLR joins (https://wiki.apache.org/solr/Join) across cores on this single server. But, as data will grow to ~2 billion we need to assess whether we’ll need to run SolrCloud as In a DistributedSearch environment, you can not Join across cores on multiple nodes Please suggest at what point or index size should we start considering to run SolrCloud ? Regards
distributed search limitations via SolrCloud
Currently, we have SOLR configured on single linux server (24 GB physical memory) with multiple cores. We are using SOLR joins (https://wiki.apache.org/solr/Join) across cores on this single server. But, as data will grow to ~2 billion we need to assess whether we’ll need to run SolrCloud as In a DistributedSearch environment, you can not Join across cores on multiple nodes Please suggest at what point or index size should we start considering to run SolrCloud ? Regards
Re: distributed search limitations via SolrCloud
Hard to say. I've seen 20M doc be the place you need to consider sharding/SolrCloud. I've seen 300M docs be the place you need to start sharding. That said I'm quite sure you'll need to shard before you get to 2B. There's no good reason to delay that process. You'll have to do something about the join issue though, that's the problem you might want to solve first. The new streaming aggregation stuff might help there, you'll have to figure that out. The first thing I'd explore is whether you can denormlized your way out of the need to join. Or whether you can use block joins instead. Best, Erick On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop vishal@gmail.com wrote: Currently, we have SOLR configured on single linux server (24 GB physical memory) with multiple cores. We are using SOLR joins (https://wiki.apache.org/solr/Join) across cores on this single server. But, as data will grow to ~2 billion we need to assess whether we’ll need to run SolrCloud as In a DistributedSearch environment, you can not Join across cores on multiple nodes Please suggest at what point or index size should we start considering to run SolrCloud ? Regards
Re: distributed search limitations via SolrCloud
Thanks a lot Erick... You are right we should not delay moving to sharding/SolrCloud process. As you all are expert... currently we are using SOLR 4.7.. Do you suggest we should move to latest SOLR release 5.1.0 ? or we can manage the above issue using SOLR 4.7 Regards Vishal On Wed, May 27, 2015 at 2:21 PM, Erick Erickson erickerick...@gmail.com wrote: Hard to say. I've seen 20M doc be the place you need to consider sharding/SolrCloud. I've seen 300M docs be the place you need to start sharding. That said I'm quite sure you'll need to shard before you get to 2B. There's no good reason to delay that process. You'll have to do something about the join issue though, that's the problem you might want to solve first. The new streaming aggregation stuff might help there, you'll have to figure that out. The first thing I'd explore is whether you can denormlized your way out of the need to join. Or whether you can use block joins instead. Best, Erick On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop vishal@gmail.com wrote: Currently, we have SOLR configured on single linux server (24 GB physical memory) with multiple cores. We are using SOLR joins (https://wiki.apache.org/solr/Join) across cores on this single server. But, as data will grow to ~2 billion we need to assess whether we’ll need to run SolrCloud as In a DistributedSearch environment, you can not Join across cores on multiple nodes Please suggest at what point or index size should we start considering to run SolrCloud ? Regards
Re: distributed search limitations via SolrCloud
I'd move to Solr 4.10.3 at least, but preferably Solr 5.x. Solr 5.2 is being readied for release as we speak, it'll probably be available in a week or so barring unforeseen problems and that's the one I'd go with by preference. Do be aware, though, that the 5.x Solr world deprecates using a war file. It's still actually produced, but Solr is moving towards start scripts instead. This is something new to get used to. See: https://wiki.apache.org/solr/WhyNoWar Best, Erick On Wed, May 27, 2015 at 12:51 PM, Vishal Swaroop vishal@gmail.com wrote: Thanks a lot Erick... You are right we should not delay moving to sharding/SolrCloud process. As you all are expert... currently we are using SOLR 4.7.. Do you suggest we should move to latest SOLR release 5.1.0 ? or we can manage the above issue using SOLR 4.7 Regards Vishal On Wed, May 27, 2015 at 2:21 PM, Erick Erickson erickerick...@gmail.com wrote: Hard to say. I've seen 20M doc be the place you need to consider sharding/SolrCloud. I've seen 300M docs be the place you need to start sharding. That said I'm quite sure you'll need to shard before you get to 2B. There's no good reason to delay that process. You'll have to do something about the join issue though, that's the problem you might want to solve first. The new streaming aggregation stuff might help there, you'll have to figure that out. The first thing I'd explore is whether you can denormlized your way out of the need to join. Or whether you can use block joins instead. Best, Erick On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop vishal@gmail.com wrote: Currently, we have SOLR configured on single linux server (24 GB physical memory) with multiple cores. We are using SOLR joins (https://wiki.apache.org/solr/Join) across cores on this single server. But, as data will grow to ~2 billion we need to assess whether we’ll need to run SolrCloud as In a DistributedSearch environment, you can not Join across cores on multiple nodes Please suggest at what point or index size should we start considering to run SolrCloud ? Regards