Question regarding possibility of data loss

2013-11-19 Thread adfel70
Hi, we plan to establish an ensemble of solr with zookeeper. 
We gonna have 6 solr servers with 2 instances on each server, also we'll
have 6 shards with replication factor 2, in addition we'll have 3
zookeepers. 

Our concern is that we will send documents to index and solr won't index
them but won't send any error message and we will suffer a data loss

1. Is there any situation that can cause this kind of problem? 
2. Can it happen if some of ZKs are down? or some of the solr instances? 
3. How can we monitor them? Can we do something to prevent these kind of
errors? 

Thanks in advance 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-regarding-possibility-of-data-loss-tp4101915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question regarding possibility of data loss

2013-11-19 Thread Shawn Heisey
On 11/19/2013 6:18 AM, adfel70 wrote:
 Hi, we plan to establish an ensemble of solr with zookeeper. 
 We gonna have 6 solr servers with 2 instances on each server, also we'll
 have 6 shards with replication factor 2, in addition we'll have 3
 zookeepers. 

You'll want to do one Solr instance per machine.  Each Solr instance can
house many cores (shard replicas).  More than one instance per machine
will: 1) Add memory/CPU overhead.  2) Accidentally and easily result in
a situation where multiple replicas for a single shard are located on
the same machine.

 Our concern is that we will send documents to index and solr won't index
 them but won't send any error message and we will suffer a data loss
 
 1. Is there any situation that can cause this kind of problem? 
 2. Can it happen if some of ZKs are down? or some of the solr instances? 
 3. How can we monitor them? Can we do something to prevent these kind of
 errors? 

1) If it does become possible for data loss to occur without notifying
your application, it will be considered a very serious bug, and top
priority will be given to fixing it.  A release with the fix will be
made as quickly as possible.  Of course I cannot guarantee that such
bugs don't exist, but I am not aware of any at the moment.

2) You must have a majority ([n/2] + 1) of zookeepers operational.  If
you have three or four zookeepers, one zookeeper can be down and
SolrCloud will continue to function perfectly.  With five or six
zookeepers, two can be down.  With seven or eight, three can be down.
As far as Solr itself, if one replica of each shard from a collection is
working, then the entire collection will work.  That means you'll want
to have at least replicationFactor=2, so there are two copies of each shard.

3) There are MANY options for monitoring.  Many of them are completely
free, and it is always possible to write your own.  One high-level thing
you can do is make sure the hosts are up and that they are running the
proper number of java processes.  Solr offers a number of API entry
points that will tell you how things are working, and more are added
over time.  I don't think there are any zookeeper-specific informational
capabilities at the moment, but I did file a bug report asking for the
feature.  When I have some time, I will work on a fix for it.  One of
the other committers may decide to work on it as well.

If you want out-of-the-box Solr-specific monitoring and are willing to
pay for it, Sematext offers SPM.  One of Sematext's employees is very
active on this list, and they just added Zookeeper monitoring to their
capabilities.  They do have a free version, but it has extremely limited
monitoring history.

http://sematext.com/

Thanks,
Shawn



Re: Question regarding possibility of data loss

2013-11-19 Thread Daniel Collins
Regarding data loss, Solr returns an error code to the callling app (either
HTTP error code, or equivalent in SolrJ), so if it fails to index for a
known reason, you'll know about it.

There are always edge cases though.

If Solr indexes the document (returns success), that means the document is
in the transaction log (and should be in the log for each replica).
If someone pulls the plug on the machines and the hard drives crash, then
the transaction log might not be re-playable when the system comes back
up...

Now Solr won't tell you what's trashed (since it can't possibly know). At
that point your whole collection might be corrupt, but *presumably* you
will have a backup available (onsite or off) and a checkpoint time of when
you took that backup, so you can replay any indexing work that might have
happened since then.

Admittedly that's extreme, but it depends how cast iron a guarantee you
want :)

But in all seriousness, Shaun is right, Solr is stable, and if it can't
index a doc it will tell you.
In the case of ALL ZK being down or all Solr servers for a particular
shard, you will generate an error when you try to index anything (HTTP
503/Service Is Unavailable or the SolrJ equivalent).


On 19 November 2013 15:35, Shawn Heisey s...@elyograg.org wrote:

 On 11/19/2013 6:18 AM, adfel70 wrote:
  Hi, we plan to establish an ensemble of solr with zookeeper.
  We gonna have 6 solr servers with 2 instances on each server, also we'll
  have 6 shards with replication factor 2, in addition we'll have 3
  zookeepers.

 You'll want to do one Solr instance per machine.  Each Solr instance can
 house many cores (shard replicas).  More than one instance per machine
 will: 1) Add memory/CPU overhead.  2) Accidentally and easily result in
 a situation where multiple replicas for a single shard are located on
 the same machine.

  Our concern is that we will send documents to index and solr won't index
  them but won't send any error message and we will suffer a data loss
 
  1. Is there any situation that can cause this kind of problem?
  2. Can it happen if some of ZKs are down? or some of the solr instances?
  3. How can we monitor them? Can we do something to prevent these kind of
  errors?

 1) If it does become possible for data loss to occur without notifying
 your application, it will be considered a very serious bug, and top
 priority will be given to fixing it.  A release with the fix will be
 made as quickly as possible.  Of course I cannot guarantee that such
 bugs don't exist, but I am not aware of any at the moment.

 2) You must have a majority ([n/2] + 1) of zookeepers operational.  If
 you have three or four zookeepers, one zookeeper can be down and
 SolrCloud will continue to function perfectly.  With five or six
 zookeepers, two can be down.  With seven or eight, three can be down.
 As far as Solr itself, if one replica of each shard from a collection is
 working, then the entire collection will work.  That means you'll want
 to have at least replicationFactor=2, so there are two copies of each
 shard.

 3) There are MANY options for monitoring.  Many of them are completely
 free, and it is always possible to write your own.  One high-level thing
 you can do is make sure the hosts are up and that they are running the
 proper number of java processes.  Solr offers a number of API entry
 points that will tell you how things are working, and more are added
 over time.  I don't think there are any zookeeper-specific informational
 capabilities at the moment, but I did file a bug report asking for the
 feature.  When I have some time, I will work on a fix for it.  One of
 the other committers may decide to work on it as well.

 If you want out-of-the-box Solr-specific monitoring and are willing to
 pay for it, Sematext offers SPM.  One of Sematext's employees is very
 active on this list, and they just added Zookeeper monitoring to their
 capabilities.  They do have a free version, but it has extremely limited
 monitoring history.

 http://sematext.com/

 Thanks,
 Shawn




Re: Question regarding possibility of data loss

2013-11-19 Thread Mark Miller
I’d recommend you start with the upcoming 4.6 release. Should be out this week 
or next.

- Mark

On Nov 19, 2013, at 8:18 AM, adfel70 adfe...@gmail.com wrote:

 Hi, we plan to establish an ensemble of solr with zookeeper. 
 We gonna have 6 solr servers with 2 instances on each server, also we'll
 have 6 shards with replication factor 2, in addition we'll have 3
 zookeepers. 
 
 Our concern is that we will send documents to index and solr won't index
 them but won't send any error message and we will suffer a data loss
 
 1. Is there any situation that can cause this kind of problem? 
 2. Can it happen if some of ZKs are down? or some of the solr instances? 
 3. How can we monitor them? Can we do something to prevent these kind of
 errors? 
 
 Thanks in advance 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Question-regarding-possibility-of-data-loss-tp4101915.html
 Sent from the Solr - User mailing list archive at Nabble.com.