Re: distributed search limitations via SolrCloud

2015-05-28 Thread Erick Erickson
5.x will still build a war file that you an deploy on Tomcat. But
support for that is going away eventually, certainly by 6.0. But you
do have to make the decision sometime before 6.0 at least.

Best,
Erick

On Wed, May 27, 2015 at 1:24 PM, Vishal Swaroop vishal@gmail.com wrote:
 Thanks a lot Erick... great inputs...

 Currently our deployment is on Tomcat 7 and I think SOLR 5.x does not
 support Tomcat but runs on its own Jetty server, right ?
 I will discuss this with the team.

 Thanks again.

 Regards
 Vishal

 On Wed, May 27, 2015 at 4:16 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 I'd move to Solr 4.10.3 at least, but preferably Solr 5.x. Solr 5.2 is
 being readied for release as we speak, it'll probably be available in
 a week or so barring unforeseen problems and that's the one I'd go
 with by preference.

 Do be aware, though, that the 5.x Solr world deprecates using a war
 file. It's still actually produced, but Solr is moving towards start
 scripts instead. This is something new to get used to. See:
 https://wiki.apache.org/solr/WhyNoWar

 Best,
 Erick

 On Wed, May 27, 2015 at 12:51 PM, Vishal Swaroop vishal@gmail.com
 wrote:
  Thanks a lot Erick... You are right we should not delay moving to
  sharding/SolrCloud process.
 
  As you all are expert... currently we are using SOLR 4.7.. Do you suggest
  we should move to latest SOLR release 5.1.0 ? or we can manage the above
  issue using SOLR 4.7
 
  Regards
  Vishal
 
  On Wed, May 27, 2015 at 2:21 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  Hard to say. I've seen 20M doc be the place you need to consider
  sharding/SolrCloud. I've seen 300M docs be the place you need to start
  sharding. That said I'm quite sure you'll need to shard before you get
  to 2B. There's no good reason to delay that process.
 
  You'll have to do something about the join issue though, that's the
  problem you might want to solve first. The new streaming aggregation
  stuff might help there, you'll have to figure that out.
 
  The first thing I'd explore is whether you can denormlized your way
  out of the need to join. Or whether you can use block joins instead.
 
  Best,
  Erick
 
  On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop vishal@gmail.com
  wrote:
   Currently, we have SOLR configured on single linux server (24 GB
 physical
   memory) with multiple cores.
   We are using SOLR joins (https://wiki.apache.org/solr/Join) across
  cores on
   this single server.
  
   But, as data will grow to ~2 billion we need to assess whether we’ll
 need
   to run SolrCloud as In a DistributedSearch environment, you can not
 Join
   across cores on multiple nodes
  
   Please suggest at what point or index size should we start
 considering to
   run SolrCloud ?
  
   Regards
 



distributed search limitations via SolrCloud

2015-05-27 Thread Vishal Swaroop
Currently, we have SOLR configured on single linux server (24 GB physical
memory) with multiple cores.
We are using SOLR joins (https://wiki.apache.org/solr/Join) across cores on
this single server.

But, as data will grow to ~2 billion we need to assess whether we’ll need
to run SolrCloud as In a DistributedSearch environment, you can not Join
across cores on multiple nodes

Please suggest at what point or index size should we start considering to
run SolrCloud ?

Regards


Re: distributed search limitations via SolrCloud

2015-05-27 Thread Erick Erickson
Hard to say. I've seen 20M doc be the place you need to consider
sharding/SolrCloud. I've seen 300M docs be the place you need to start
sharding. That said I'm quite sure you'll need to shard before you get
to 2B. There's no good reason to delay that process.

You'll have to do something about the join issue though, that's the
problem you might want to solve first. The new streaming aggregation
stuff might help there, you'll have to figure that out.

The first thing I'd explore is whether you can denormlized your way
out of the need to join. Or whether you can use block joins instead.

Best,
Erick

On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop vishal@gmail.com wrote:
 Currently, we have SOLR configured on single linux server (24 GB physical
 memory) with multiple cores.
 We are using SOLR joins (https://wiki.apache.org/solr/Join) across cores on
 this single server.

 But, as data will grow to ~2 billion we need to assess whether we’ll need
 to run SolrCloud as In a DistributedSearch environment, you can not Join
 across cores on multiple nodes

 Please suggest at what point or index size should we start considering to
 run SolrCloud ?

 Regards


Re: distributed search limitations via SolrCloud

2015-05-27 Thread Vishal Swaroop
Thanks a lot Erick... You are right we should not delay moving to
sharding/SolrCloud process.

As you all are expert... currently we are using SOLR 4.7.. Do you suggest
we should move to latest SOLR release 5.1.0 ? or we can manage the above
issue using SOLR 4.7

Regards
Vishal

On Wed, May 27, 2015 at 2:21 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Hard to say. I've seen 20M doc be the place you need to consider
 sharding/SolrCloud. I've seen 300M docs be the place you need to start
 sharding. That said I'm quite sure you'll need to shard before you get
 to 2B. There's no good reason to delay that process.

 You'll have to do something about the join issue though, that's the
 problem you might want to solve first. The new streaming aggregation
 stuff might help there, you'll have to figure that out.

 The first thing I'd explore is whether you can denormlized your way
 out of the need to join. Or whether you can use block joins instead.

 Best,
 Erick

 On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop vishal@gmail.com
 wrote:
  Currently, we have SOLR configured on single linux server (24 GB physical
  memory) with multiple cores.
  We are using SOLR joins (https://wiki.apache.org/solr/Join) across
 cores on
  this single server.
 
  But, as data will grow to ~2 billion we need to assess whether we’ll need
  to run SolrCloud as In a DistributedSearch environment, you can not Join
  across cores on multiple nodes
 
  Please suggest at what point or index size should we start considering to
  run SolrCloud ?
 
  Regards



Re: distributed search limitations via SolrCloud

2015-05-27 Thread Erick Erickson
I'd move to Solr 4.10.3 at least, but preferably Solr 5.x. Solr 5.2 is
being readied for release as we speak, it'll probably be available in
a week or so barring unforeseen problems and that's the one I'd go
with by preference.

Do be aware, though, that the 5.x Solr world deprecates using a war
file. It's still actually produced, but Solr is moving towards start
scripts instead. This is something new to get used to. See:
https://wiki.apache.org/solr/WhyNoWar

Best,
Erick

On Wed, May 27, 2015 at 12:51 PM, Vishal Swaroop vishal@gmail.com wrote:
 Thanks a lot Erick... You are right we should not delay moving to
 sharding/SolrCloud process.

 As you all are expert... currently we are using SOLR 4.7.. Do you suggest
 we should move to latest SOLR release 5.1.0 ? or we can manage the above
 issue using SOLR 4.7

 Regards
 Vishal

 On Wed, May 27, 2015 at 2:21 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Hard to say. I've seen 20M doc be the place you need to consider
 sharding/SolrCloud. I've seen 300M docs be the place you need to start
 sharding. That said I'm quite sure you'll need to shard before you get
 to 2B. There's no good reason to delay that process.

 You'll have to do something about the join issue though, that's the
 problem you might want to solve first. The new streaming aggregation
 stuff might help there, you'll have to figure that out.

 The first thing I'd explore is whether you can denormlized your way
 out of the need to join. Or whether you can use block joins instead.

 Best,
 Erick

 On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop vishal@gmail.com
 wrote:
  Currently, we have SOLR configured on single linux server (24 GB physical
  memory) with multiple cores.
  We are using SOLR joins (https://wiki.apache.org/solr/Join) across
 cores on
  this single server.
 
  But, as data will grow to ~2 billion we need to assess whether we’ll need
  to run SolrCloud as In a DistributedSearch environment, you can not Join
  across cores on multiple nodes
 
  Please suggest at what point or index size should we start considering to
  run SolrCloud ?
 
  Regards