Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Sachin Kale
- Why you have to keep two nodes on some machines?
- These are very powerful machines (32-Core, 64GB) and our index size
is 1GB. We are allocating 7GB to JVM, so we thought it would be OK to have
two instances on the same machine.

- Physical hardware or virtual machines?
- Physical hardware

- What is the size of this index?
- 1GB

- Is this all on a local network or are there links with potential outages
or failures in between?
- local network

- What is the query load?
- 10K requests per minute.

- Have you had a look at garbage collection?
- GC time is generally 5-10%. I have attached a screenshot.

- Do you use the internal Zookeeper?
   - No. We have setup external Zookeeper ensemble with 3 instances.
Following is the ZooKeeper configuration:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.70.27:2888:3888
server.2=192.168.70.64:2889:3889
server.3=192.168.70.26:2889:3889

Also, in solr.xml, we have zkClientTimeout set to 3.

- How many nodes?
- 3
- Any observers?
- I don't know what observers are. Can you please explain?

- What kind of load does Zookeeper show?
- Load is normal I guess. Need to double-check.

- How much RAM do these nodes have available?
   - Each SOLR node has 7GB allocated. For ZooKeeper, we have not allocated
the memory explicitly.

- Do some servers get into swapping?
- Not sure. How do I check that?


On Fri, Oct 17, 2014 at 2:04 AM, Jürgen Wagner (DVT) 
juergen.wag...@devoteam.com wrote:

  Hello,
   you have one shard and 11 replicas? Hmm...

 - Why you have to keep two nodes on some machines?
 - Physical hardware or virtual machines?
 - What is the size of this index?
 - Is this all on a local network or are there links with potential outages
 or failures in between?
 - What is the query load?
 - Have you had a look at garbage collection?
 - Do you use the internal Zookeeper?
 - How many nodes?
 - Any observers?
 - What kind of load does Zookeeper show?
 - How much RAM do these nodes have available?
 - Do some servers get into swapping?
 - ...

 How about some more details in terms of sizing and topology?

 Cheers,
 --Jürgen


 On 16.10.2014 18:41, sachinpkale wrote:

 Hi,

 Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave
 configuration. We have only one collection and it has only only one shard.
 Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have
 two instances running on each) out of which one is leader.

 Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud, it
 shows at least one (sometimes, it is 2-3) node status as recovering. We are
 using HAProxy load balancer and there also many times, it is showing the
 nodes are recovering. This is happening for all nodes in the cluster.

 What would be the problem here? How do I check this in logs?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --

 Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
 уважением
 *i.A. Jürgen Wagner*
 Head of Competence Center Intelligence
  Senior Cloud Consultant

 Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
 Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
 E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
 --
 Managing Board: Jürgen Hatzipantelis (CEO)
 Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
 Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071





Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Sachin Kale
From ZooKeeper side, we have following configuration:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.70.27:2888:3888
server.2=192.168.70.64:2889:3889
server.3=192.168.70.26:2889:3889

Also, in solr.xml, we have zkClientTimeout set to 3.

On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson erickerick...@gmail.com
wrote:

 And what is your zookeeper timeout? When it's too short that can lead
 to this behavior.

 Best,
 Erick

 On Thu, Oct 16, 2014 at 4:34 PM, Jürgen Wagner (DVT)
 juergen.wag...@devoteam.com wrote:
  Hello,
you have one shard and 11 replicas? Hmm...
 
  - Why you have to keep two nodes on some machines?
  - Physical hardware or virtual machines?
  - What is the size of this index?
  - Is this all on a local network or are there links with potential
 outages
  or failures in between?
  - What is the query load?
  - Have you had a look at garbage collection?
  - Do you use the internal Zookeeper?
  - How many nodes?
  - Any observers?
  - What kind of load does Zookeeper show?
  - How much RAM do these nodes have available?
  - Do some servers get into swapping?
  - ...
 
  How about some more details in terms of sizing and topology?
 
  Cheers,
  --Jürgen
 
 
  On 16.10.2014 18:41, sachinpkale wrote:
 
  Hi,
 
  Recently we have shifted to SolrCloud (4.10.1) from traditional
 Master-Slave
  configuration. We have only one collection and it has only only one
 shard.
  Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we
 have
  two instances running on each) out of which one is leader.
 
  Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud,
 it
  shows at least one (sometimes, it is 2-3) node status as recovering. We
 are
  using HAProxy load balancer and there also many times, it is showing the
  nodes are recovering. This is happening for all nodes in the cluster.
 
  What would be the problem here? How do I check this in logs?
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
 
  Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
  уважением
  i.A. Jürgen Wagner
  Head of Competence Center Intelligence
   Senior Cloud Consultant
 
  Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
  Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
 1543
  E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
 
  
  Managing Board: Jürgen Hatzipantelis (CEO)
  Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
  Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
 
 



Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Sachin Kale
Also, the PingRequestHandler is configured as:

requestHandler name=/admin/ping class=solr.PingRequestHandler
str name=healthcheckFileserver-enabled.txt/str/requestHandler


On Fri, Oct 17, 2014 at 9:07 AM, Sachin Kale sachinpk...@gmail.com wrote:

 From ZooKeeper side, we have following configuration:
 tickTime=2000
 dataDir=/var/lib/zookeeper
 clientPort=2181
 initLimit=5
 syncLimit=2
 server.1=192.168.70.27:2888:3888
 server.2=192.168.70.64:2889:3889
 server.3=192.168.70.26:2889:3889

 Also, in solr.xml, we have zkClientTimeout set to 3.

 On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 And what is your zookeeper timeout? When it's too short that can lead
 to this behavior.

 Best,
 Erick

 On Thu, Oct 16, 2014 at 4:34 PM, Jürgen Wagner (DVT)
 juergen.wag...@devoteam.com wrote:
  Hello,
you have one shard and 11 replicas? Hmm...
 
  - Why you have to keep two nodes on some machines?
  - Physical hardware or virtual machines?
  - What is the size of this index?
  - Is this all on a local network or are there links with potential
 outages
  or failures in between?
  - What is the query load?
  - Have you had a look at garbage collection?
  - Do you use the internal Zookeeper?
  - How many nodes?
  - Any observers?
  - What kind of load does Zookeeper show?
  - How much RAM do these nodes have available?
  - Do some servers get into swapping?
  - ...
 
  How about some more details in terms of sizing and topology?
 
  Cheers,
  --Jürgen
 
 
  On 16.10.2014 18:41, sachinpkale wrote:
 
  Hi,
 
  Recently we have shifted to SolrCloud (4.10.1) from traditional
 Master-Slave
  configuration. We have only one collection and it has only only one
 shard.
  Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we
 have
  two instances running on each) out of which one is leader.
 
  Whenever I see the cluster status using http://IP:HOST/solr/#/~cloud,
 it
  shows at least one (sometimes, it is 2-3) node status as recovering. We
 are
  using HAProxy load balancer and there also many times, it is showing the
  nodes are recovering. This is happening for all nodes in the cluster.
 
  What would be the problem here? How do I check this in logs?
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
 
  Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
  уважением
  i.A. Jürgen Wagner
  Head of Competence Center Intelligence
   Senior Cloud Consultant
 
  Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
  Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
 1543
  E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
 
  
  Managing Board: Jürgen Hatzipantelis (CEO)
  Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
  Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
 
 





Manual leader election in SolrCloud

2014-10-13 Thread Sachin Kale
Is it possible to elect the leader manually in SOLR Cloud 4.10.1?


-Sachin-


Re: Master-Slave setup using SolrCloud

2014-10-04 Thread Sachin Kale
Apparently, there is a bug in Solr 4.10.0 which was causing the
NullPointerExceptions. SOLR-6501
https://issues.apache.org/jira/browse/SOLR-6501
We have updated our production SOLR to 4.10.1


On Thu, Oct 2, 2014 at 8:13 PM, Sachin Kale sachinpk...@gmail.com wrote:

 If I look into the logs, many times I get only following line without any
 stacktrace:

 *ERROR - 2014-10-02 19:35:25.516; org.apache.solr.common.SolrException;
 java.lang.NullPointerException*

 These exceptions are not coming continuously. Once in every 10-15 minutes.
 But once it starts, there are continuous 800-1000 such exceptions one after
 another. Is it related to cache warmup?

 I can provide following information regarding the setup:
 We are now on using Solr 4.10.0
 Memory allocated to each SOLR instance is 7GB. I guess it is more than
 sufficient for 1 GB index, right?
 Indexes are stored as normal, local filesystem.
 I am using three caches:
 Query Cache: Size 4096, autoWarmCount 2048
 Filter cache: size 8192, autoWarmCount 4096
 Document cache: size 4096

 I am experimenting with commitMaxTime for both soft and hard commits

 After referring following:

 http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 Hence, I set following:

 autoCommit
 maxTime${solr.autoCommit.maxTime:6}/maxTime
 openSearcherfalse/openSearcher
 /autoCommit

 autoSoftCommit
 maxTime${solr.autoSoftCommit.maxTime:90}/maxTime
 /autoSoftCommit

 Also, we are getting following warnings many times:

 *java.lang.NumberFormatException: For input string: 5193.0*

 Earlier we were on SOLR 4.4.0 and when we are upgraded to 4.10.0, we
 pointed it to the same index we were using for 4.4.0

 On Thu, Oct 2, 2014 at 7:11 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 10/2/2014 6:58 AM, Sachin Kale wrote:
  We are trying to move our traditional master-slave Solr configuration to
  SolrCloud. As our index size is very small (around 1 GB), we are having
  only one shard.
  So basically, we are having same master-slave configuration with one
 leader
  and 6 replicas.
  We are experimenting with maxTime of both AutoCommit and AutoSoftCommit.
  Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1
 minute
  (Let me know if these values does not make sense).
 
  Caches are set such that warmup time is at most 20 seconds.
 
  We are having continuous indexing requests mostly for updating the
 existing
  documents. Few requests are for deleting/adding the documents.
 
  The problem we are facing is that we are getting very frequent
  NullPointerExceptions.
  We get continuous 200-300 such exceptions within a period of 30 seconds
 and
  for next few minutes, it works fine.
 
  Stacktrace of NullPointerException:
 
  *ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException;
  null:java.lang.NullPointerException*
  *at
 
 org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)*
  *at
 
 org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)*
  *at
 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)*
 
  ​
  I am not sure what would be causing it. My guess, whenever, it is
 trying to
  replay tlog, we are getting these exceptions. Is anything wrong in my
  configuration?

 Your automatic commit settings are fine.  If you had tried to use a very
 small maxTime like 1000 (1 second), I would tell you that it's probably
 too short.

 The tlogs only get replayed when a core is first started or reloaded.
 These appear to be errors during queries, having nothing at all to do
 with indexing.

 I can't be sure with the available information (no Solr version,
 incomplete stacktrace, no info about what request caused and received
 the error), but if I had to guess, I'd say you probably changed your
 schema so that certain fields are now required that weren't required
 before, and didn't reindex, so those fields are not present on every
 document.  Or it might be that you added a uniqueKey and didn't reindex,
 and that field is not present on every document.

 http://wiki.apache.org/solr/HowToReindex

 Thanks,
 Shawn





Master-Slave setup using SolrCloud

2014-10-02 Thread Sachin Kale
Hello,

We are trying to move our traditional master-slave Solr configuration to
SolrCloud. As our index size is very small (around 1 GB), we are having
only one shard.
So basically, we are having same master-slave configuration with one leader
and 6 replicas.
We are experimenting with maxTime of both AutoCommit and AutoSoftCommit.
Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1 minute
(Let me know if these values does not make sense).

Caches are set such that warmup time is at most 20 seconds.

We are having continuous indexing requests mostly for updating the existing
documents. Few requests are for deleting/adding the documents.

The problem we are facing is that we are getting very frequent
NullPointerExceptions.
We get continuous 200-300 such exceptions within a period of 30 seconds and
for next few minutes, it works fine.

Stacktrace of NullPointerException:

*ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException;
null:java.lang.NullPointerException*
*at
org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)*
*at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)*
*at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)*

​
I am not sure what would be causing it. My guess, whenever, it is trying to
replay tlog, we are getting these exceptions. Is anything wrong in my
configuration?


-Sachin-


Re: Master-Slave setup using SolrCloud

2014-10-02 Thread Sachin Kale
If I look into the logs, many times I get only following line without any
stacktrace:

*ERROR - 2014-10-02 19:35:25.516; org.apache.solr.common.SolrException;
java.lang.NullPointerException*

These exceptions are not coming continuously. Once in every 10-15 minutes.
But once it starts, there are continuous 800-1000 such exceptions one after
another. Is it related to cache warmup?

I can provide following information regarding the setup:
We are now on using Solr 4.10.0
Memory allocated to each SOLR instance is 7GB. I guess it is more than
sufficient for 1 GB index, right?
Indexes are stored as normal, local filesystem.
I am using three caches:
Query Cache: Size 4096, autoWarmCount 2048
Filter cache: size 8192, autoWarmCount 4096
Document cache: size 4096

I am experimenting with commitMaxTime for both soft and hard commits

After referring following:
http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Hence, I set following:

autoCommit
maxTime${solr.autoCommit.maxTime:6}/maxTime
openSearcherfalse/openSearcher
/autoCommit

autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:90}/maxTime
/autoSoftCommit

Also, we are getting following warnings many times:

*java.lang.NumberFormatException: For input string: 5193.0*

Earlier we were on SOLR 4.4.0 and when we are upgraded to 4.10.0, we
pointed it to the same index we were using for 4.4.0

On Thu, Oct 2, 2014 at 7:11 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 10/2/2014 6:58 AM, Sachin Kale wrote:
  We are trying to move our traditional master-slave Solr configuration to
  SolrCloud. As our index size is very small (around 1 GB), we are having
  only one shard.
  So basically, we are having same master-slave configuration with one
 leader
  and 6 replicas.
  We are experimenting with maxTime of both AutoCommit and AutoSoftCommit.
  Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1
 minute
  (Let me know if these values does not make sense).
 
  Caches are set such that warmup time is at most 20 seconds.
 
  We are having continuous indexing requests mostly for updating the
 existing
  documents. Few requests are for deleting/adding the documents.
 
  The problem we are facing is that we are getting very frequent
  NullPointerExceptions.
  We get continuous 200-300 such exceptions within a period of 30 seconds
 and
  for next few minutes, it works fine.
 
  Stacktrace of NullPointerException:
 
  *ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException;
  null:java.lang.NullPointerException*
  *at
 
 org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)*
  *at
 
 org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)*
  *at
 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)*
 
  ​
  I am not sure what would be causing it. My guess, whenever, it is trying
 to
  replay tlog, we are getting these exceptions. Is anything wrong in my
  configuration?

 Your automatic commit settings are fine.  If you had tried to use a very
 small maxTime like 1000 (1 second), I would tell you that it's probably
 too short.

 The tlogs only get replayed when a core is first started or reloaded.
 These appear to be errors during queries, having nothing at all to do
 with indexing.

 I can't be sure with the available information (no Solr version,
 incomplete stacktrace, no info about what request caused and received
 the error), but if I had to guess, I'd say you probably changed your
 schema so that certain fields are now required that weren't required
 before, and didn't reindex, so those fields are not present on every
 document.  Or it might be that you added a uniqueKey and didn't reindex,
 and that field is not present on every document.

 http://wiki.apache.org/solr/HowToReindex

 Thanks,
 Shawn