Re: solr with Isilon HDFS

2015-12-05 Thread Gaurav Patel
Got it.  Thanks for the help Erick.

On Sat, Dec 5, 2015 at 1:02 PM, Erick Erickson 
wrote:

> bq:  Should zookeeper be installed along with solr on each box or should be
>   installed in separate 2 Virtual machines by itself?
>
> Zookeeper should be installed on an odd number of machines as it requires
> a quorum which is (Number of zookeeper nodes)/2 + 1. With two ZKs,
> if either of them fail you fall below quorum so you're actually _more_
> likely to fall below quorum than with one ZK. And more generally
> more likely to fall below quorum with an even number of ZK nodes than
> odd.
>
> But your question seems odd on another level. The number of Zookeepers you
> run is entirely independent of the number of Solr nodes. You should just
> pick a
> number of Zookeepers and install them. Unless your installation is large,
> 3 Zookeepers is usually enough.
>
> Whether they go on their own VMs or not isn't as interesting as
> whether they should
> go on separate physical boxes. That way if someone pulls a plug on a whole
> physical server, then you still have a quorum, whereas if you put two ZKs
> on a
> single physical box you can lose quorum if that one machine goes down.
>
> Now, all that said, it's perfectly possible to run with just a
> _single_ zookeeper
> running on a single box that's may or may not be running Solr. The risk is
> that
> your cluster will be unable to index documents (but still maybe able to
> search)
> if that ZK becomes unavailable for any reason. I run this way all the time
> for
> development.
>
> In a small installation where all your servers are physically located
> together and you run with a single ZK node, if that ZK node regularly
> becomes
> unavailable, you have problems that adding more ZK nodes probably won't
> help
> with ;)
>
> Best,
> Erick
>
> On Sat, Dec 5, 2015 at 4:09 AM, Gaurav Patel  wrote:
> > Thanks Toke.  Your input has been informative and valuable.
> > I will go through the links you provided and will let you know what we
> end
> > up going.
> >
> > On Sat, Dec 5, 2015 at 5:02 AM, Toke Eskildsen 
> > wrote:
> >
> >> Gaurav Patel  wrote:
> >> > 3 Physical Machines with 60 cpu cores and 512 GB RAM each.
> >> > EMC Isilon Appliance with PB storage. It can be accessed via HDFS or
> NFS.
> >>
> >> We have experimented a little bit with smaller machines, backed by EMC
> >> Isilon over NFS. That worked surprisingly well, but ultimately did not
> >> scale for us as we could not justify paying for enterprise SSDs for the
> >> Isilon. There is a write-up at
> >> https://sbdevel.wordpress.com/2013/12/06/danish-webscale/
> >>
> >> > Can we use solr cloud for this setup?
> >>
> >> Yes. That is independent of the backing storage.
> >>
> >> > How many instances of SOLR are recommended per physical machines
> >> > and how much ram should be allocated to it.
> >>
> >> "That depends".
> >>
> >>
> http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >>
> >> The amount of RAM for JVMs should be whatever is needed. Or to put it
> >> another way: There are some explicitly configured internal caches in
> Solr,
> >> but just setting Xmx to a very high number will not help performance. On
> >> the contrary, it will lead to long garbage collecting pauses and eat
> from
> >> the precious disk cache.
> >>
> >> There are some rules of thumb for running Solr, but my own meta rule of
> >> thumbs is that their applicability goes down when scale goes up. One of
> the
> >> rules of thumb is to have 1 Solr instance per machine. But running JVMs
> >> with very large heaps (100GB+) has the potential of extremely long
> garbage
> >> collection pauses and also implies a larger memory overhead due to
> internal
> >> pointer size.
> >>
> >> > Should zookeeper be installed along with solr on each box or should be
> >> > installed in separate 2 Virtual machines by itself?
> >>
> >> I have no opinion on that.
> >>
> >> > Can we run kakfa and cassandra along with solr on each physical
> machine?
> >>
> >> Sure, but they will of course compete with Solr for resources.
> >>
> >> > Anybody running Solr with HDFS in production?
> >>
> >> It is a recurring theme on this mailing list at least. It can be
> searched
> >> at
> >> https://www.mail-archive.com/solr-user@lucene.apache.org/
> >>
> >> - Toke Eskildsen
> >>
>


Re: solr with Isilon HDFS

2015-12-05 Thread Erick Erickson
bq:  Should zookeeper be installed along with solr on each box or should be
  installed in separate 2 Virtual machines by itself?

Zookeeper should be installed on an odd number of machines as it requires
a quorum which is (Number of zookeeper nodes)/2 + 1. With two ZKs,
if either of them fail you fall below quorum so you're actually _more_
likely to fall below quorum than with one ZK. And more generally
more likely to fall below quorum with an even number of ZK nodes than
odd.

But your question seems odd on another level. The number of Zookeepers you
run is entirely independent of the number of Solr nodes. You should just pick a
number of Zookeepers and install them. Unless your installation is large,
3 Zookeepers is usually enough.

Whether they go on their own VMs or not isn't as interesting as
whether they should
go on separate physical boxes. That way if someone pulls a plug on a whole
physical server, then you still have a quorum, whereas if you put two ZKs on a
single physical box you can lose quorum if that one machine goes down.

Now, all that said, it's perfectly possible to run with just a
_single_ zookeeper
running on a single box that's may or may not be running Solr. The risk is that
your cluster will be unable to index documents (but still maybe able to search)
if that ZK becomes unavailable for any reason. I run this way all the time for
development.

In a small installation where all your servers are physically located
together and you run with a single ZK node, if that ZK node regularly becomes
unavailable, you have problems that adding more ZK nodes probably won't help
with ;)

Best,
Erick

On Sat, Dec 5, 2015 at 4:09 AM, Gaurav Patel  wrote:
> Thanks Toke.  Your input has been informative and valuable.
> I will go through the links you provided and will let you know what we end
> up going.
>
> On Sat, Dec 5, 2015 at 5:02 AM, Toke Eskildsen 
> wrote:
>
>> Gaurav Patel  wrote:
>> > 3 Physical Machines with 60 cpu cores and 512 GB RAM each.
>> > EMC Isilon Appliance with PB storage. It can be accessed via HDFS or NFS.
>>
>> We have experimented a little bit with smaller machines, backed by EMC
>> Isilon over NFS. That worked surprisingly well, but ultimately did not
>> scale for us as we could not justify paying for enterprise SSDs for the
>> Isilon. There is a write-up at
>> https://sbdevel.wordpress.com/2013/12/06/danish-webscale/
>>
>> > Can we use solr cloud for this setup?
>>
>> Yes. That is independent of the backing storage.
>>
>> > How many instances of SOLR are recommended per physical machines
>> > and how much ram should be allocated to it.
>>
>> "That depends".
>>
>> http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> The amount of RAM for JVMs should be whatever is needed. Or to put it
>> another way: There are some explicitly configured internal caches in Solr,
>> but just setting Xmx to a very high number will not help performance. On
>> the contrary, it will lead to long garbage collecting pauses and eat from
>> the precious disk cache.
>>
>> There are some rules of thumb for running Solr, but my own meta rule of
>> thumbs is that their applicability goes down when scale goes up. One of the
>> rules of thumb is to have 1 Solr instance per machine. But running JVMs
>> with very large heaps (100GB+) has the potential of extremely long garbage
>> collection pauses and also implies a larger memory overhead due to internal
>> pointer size.
>>
>> > Should zookeeper be installed along with solr on each box or should be
>> > installed in separate 2 Virtual machines by itself?
>>
>> I have no opinion on that.
>>
>> > Can we run kakfa and cassandra along with solr on each physical machine?
>>
>> Sure, but they will of course compete with Solr for resources.
>>
>> > Anybody running Solr with HDFS in production?
>>
>> It is a recurring theme on this mailing list at least. It can be searched
>> at
>> https://www.mail-archive.com/solr-user@lucene.apache.org/
>>
>> - Toke Eskildsen
>>


Re: solr with Isilon HDFS

2015-12-05 Thread Gaurav Patel
Thanks Toke.  Your input has been informative and valuable.
I will go through the links you provided and will let you know what we end
up going.

On Sat, Dec 5, 2015 at 5:02 AM, Toke Eskildsen 
wrote:

> Gaurav Patel  wrote:
> > 3 Physical Machines with 60 cpu cores and 512 GB RAM each.
> > EMC Isilon Appliance with PB storage. It can be accessed via HDFS or NFS.
>
> We have experimented a little bit with smaller machines, backed by EMC
> Isilon over NFS. That worked surprisingly well, but ultimately did not
> scale for us as we could not justify paying for enterprise SSDs for the
> Isilon. There is a write-up at
> https://sbdevel.wordpress.com/2013/12/06/danish-webscale/
>
> > Can we use solr cloud for this setup?
>
> Yes. That is independent of the backing storage.
>
> > How many instances of SOLR are recommended per physical machines
> > and how much ram should be allocated to it.
>
> "That depends".
>
> http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> The amount of RAM for JVMs should be whatever is needed. Or to put it
> another way: There are some explicitly configured internal caches in Solr,
> but just setting Xmx to a very high number will not help performance. On
> the contrary, it will lead to long garbage collecting pauses and eat from
> the precious disk cache.
>
> There are some rules of thumb for running Solr, but my own meta rule of
> thumbs is that their applicability goes down when scale goes up. One of the
> rules of thumb is to have 1 Solr instance per machine. But running JVMs
> with very large heaps (100GB+) has the potential of extremely long garbage
> collection pauses and also implies a larger memory overhead due to internal
> pointer size.
>
> > Should zookeeper be installed along with solr on each box or should be
> > installed in separate 2 Virtual machines by itself?
>
> I have no opinion on that.
>
> > Can we run kakfa and cassandra along with solr on each physical machine?
>
> Sure, but they will of course compete with Solr for resources.
>
> > Anybody running Solr with HDFS in production?
>
> It is a recurring theme on this mailing list at least. It can be searched
> at
> https://www.mail-archive.com/solr-user@lucene.apache.org/
>
> - Toke Eskildsen
>


Re: solr with Isilon HDFS

2015-12-05 Thread Toke Eskildsen
Gaurav Patel  wrote:
> 3 Physical Machines with 60 cpu cores and 512 GB RAM each.
> EMC Isilon Appliance with PB storage. It can be accessed via HDFS or NFS.

We have experimented a little bit with smaller machines, backed by EMC Isilon 
over NFS. That worked surprisingly well, but ultimately did not scale for us as 
we could not justify paying for enterprise SSDs for the Isilon. There is a 
write-up at https://sbdevel.wordpress.com/2013/12/06/danish-webscale/

> Can we use solr cloud for this setup?

Yes. That is independent of the backing storage.

> How many instances of SOLR are recommended per physical machines
> and how much ram should be allocated to it.

"That depends".
http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

The amount of RAM for JVMs should be whatever is needed. Or to put it another 
way: There are some explicitly configured internal caches in Solr, but just 
setting Xmx to a very high number will not help performance. On the contrary, 
it will lead to long garbage collecting pauses and eat from the precious disk 
cache.

There are some rules of thumb for running Solr, but my own meta rule of thumbs 
is that their applicability goes down when scale goes up. One of the rules of 
thumb is to have 1 Solr instance per machine. But running JVMs with very large 
heaps (100GB+) has the potential of extremely long garbage collection pauses 
and also implies a larger memory overhead due to internal pointer size.

> Should zookeeper be installed along with solr on each box or should be
> installed in separate 2 Virtual machines by itself?

I have no opinion on that.

> Can we run kakfa and cassandra along with solr on each physical machine?

Sure, but they will of course compete with Solr for resources.

> Anybody running Solr with HDFS in production?

It is a recurring theme on this mailing list at least. It can be searched at
https://www.mail-archive.com/solr-user@lucene.apache.org/

- Toke Eskildsen


solr with Isilon HDFS

2015-12-03 Thread Gaurav Patel
Hi

We are facing below challenge:

Product Use Case: Analytics

Hardware:
3 Physical Machines with 60 cpu cores and 512 GB RAM each.
EMC Isilon Appliance with PB storage. It can be accessed via HDFS or NFS.

Questions:
Can we use solr cloud for this setup?
How many instances of SOLR are recommended per physical machines and how
much ram should be allocated to it.
Should zookeeper be installed along with solr on each box or should be
installed in separate 2 Virtual machines by itself?
Can we run kakfa and cassandra along with solr on each physical machine?
Anybody running Solr with HDFS in production?


Thanks
Gaurav