RE: Solr hardware memory question

2013-12-12 Thread Hoggarth, Gil
Thanks for this - I haven't any previous experience with utilising SSDs in the 
way you suggest, so I guess I need to start learning! And thanks for the 
Danish-webscale URL, looks like very informed reading. (Yes, I think we're 
working in similar industries with similar constraints and expectations).

Compiliing my answers into one email,  Curious how many documents per shard 
you were planning? The number of documents per shard and field type will drive 
the amount of a RAM needed to sort and facet.
- Number of documents per shard, I think about 200 million. That's a bit of a 
rough estimate based on other Solrs we run though. Which I think means we hold 
a lot of data for each document, though I keep arguing to keep this to the 
truly required minimum. We also have many facets, some of which are pretty 
large (I'm stretching my understanding here but I think most documents have 
many 'entries' in many facets so these really hit us performance-wise.)

I try to keep a 1-to-1 ratio of Solr nodes to CPUs with a few spare for the 
operating system. I utilise MMapDirectory to manage memory via the OS. So at 
this moment I guessing that we'll have 56 Solr dedicated CPUs across 2 physical 
32 CPU servers and _hopefully_ 256GB RAM on each. This would give 28 shards and 
each would have 5GB java memory (in Tomcat), leaving 126GB on each server for 
the OS and MMap. (I believe the Solr theory for this doesn't accurately work 
out but we can accept the edge cases where this will fail.)

I can also see that our hardware requirements will also depend on usage as well 
as the volume of data, and I've been pondering how best we can structure our 
index/es to facilitate a long term service (which means that, given it's a lot 
of data, I need to structure the data so that new usage doesn't require 
re-indexing.) But at this early stage, as people say, we need to prototype, 
test, profile etc. and to do that I need the hardware to run the trials (policy 
dictates that I buy the production hardware now, before profiling - I get to 
control much of the design and construction so I don't argue with this!) 

Thanks for all the comments everyone, all very much appreciated :)
Gil


-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: 11 December 2013 12:02
To: solr-user@lucene.apache.org
Subject: Re: Solr hardware memory question

On Tue, 2013-12-10 at 17:51 +0100, Hoggarth, Gil wrote:
 We're probably going to be building a Solr service to handle a dataset 
 of ~60TB, which for our data and schema typically gives a Solr index 
 size of 1/10th - i.e., 6TB. Given there's a general rule about the 
 amount of hardware memory required should exceed the size of the Solr 
 index (exceed to also allow for the operating system etc.), how have 
 people handled this situation?

By acknowledging that it is cheaper to buy SSDs instead of trying to compensate 
for slow spinning drives with excessive amounts of RAM. 

Our plans for an estimated 20TB of indexes out of 372TB of raw web data is to 
use SSDs controlled by a single machine with 512GB of RAM (or was it 256GB? 
I'll have to ask the hardware guys):
https://sbdevel.wordpress.com/2013/12/06/danish-webscale/

As always YMMW and the numbers you quite elsewhere indicates that your queries 
are quite complex. You might want to be a bit of profiling to see if they are 
heavy enough to make the CPU the bottleneck.

Regards,
Toke Eskildsen, State and University Library, Denmark




Solr hardware memory question

2013-12-10 Thread Hoggarth, Gil
We're probably going to be building a Solr service to handle a dataset
of ~60TB, which for our data and schema typically gives a Solr index
size of 1/10th - i.e., 6TB. Given there's a general rule about the
amount of hardware memory required should exceed the size of the Solr
index (exceed to also allow for the operating system etc.), how have
people handled this situation? Do I really need, for example, 12 servers
with 512GB RAM, or are there other techniques to handling this?

 

Many thanks in advance for any general/conceptual/specific
ideas/comments/answers!

Gil

 

 

Gil Hoggarth

Web Archiving Technical Services Engineer 

The British Library, Boston Spa, West Yorkshire, LS23 7BQ



RE: Solr hardware memory question

2013-12-10 Thread Hoggarth, Gil
Thanks Shawn. You're absolutely right about the performance balance,
though it's good to hear it from an experienced source (if you don't
mind me calling you that!) Fortunately we don't have a top performance
requirement, and we have a small audience so a low query volume. On
similar systems we're managing to just provide a Solr service with a
3TB index size on 160GB RAM, though we have scripts to handle the
occasionally necessary service restart when someone submits a more
exotic query. This, btw, gives a response time of ~45-90 seconds for
uncached queries. My question I suppose comes from my hope that we can
do away with the restart scripts as I doubt they help the Solr service
(as they can if necessary just kill processes and restart), and get to
responses times  20 seconds.

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 10 December 2013 17:37
To: solr-user@lucene.apache.org
Subject: Re: Solr hardware memory question

On 12/10/2013 9:51 AM, Hoggarth, Gil wrote:
 We're probably going to be building a Solr service to handle a dataset

 of ~60TB, which for our data and schema typically gives a Solr index 
 size of 1/10th - i.e., 6TB. Given there's a general rule about the 
 amount of hardware memory required should exceed the size of the Solr 
 index (exceed to also allow for the operating system etc.), how have 
 people handled this situation? Do I really need, for example, 12 
 servers with 512GB RAM, or are there other techniques to handling
this?

That really depends on what kind of query volume you'll have and what
kind of performance you want.  If your query volume is low and you can
deal with slow individual queries, then you won't need that much memory.
 If either of those requirements increases, you'd probably need more
memory, up to the 6TB total -- or 12TB if you need to double the total
index size for redundancy purposes.  If your index is constantly growing
like most are, you need to plan for that too.

Putting the entire index into RAM is required for *top* performance, but
not for base functionality.  It might be possible to put only a fraction
of your index into RAM.  Only testing can determine what you really need
to obtain the performance you're after.

Perhaps you've already done this, but you should try as much as possible
to reduce your index size.  Store as few fields as possible, only just
enough to build a search result list/grid and retrieve the full document
from the canonical data store.  Save termvectors and docvalues on as few
fields as possible.  If you can, reduce the number of terms produced by
your analysis chains.

Thanks,
Shawn



RE: How to work with remote solr savely?

2013-11-22 Thread Hoggarth, Gil
We solved this issue outside of Solr. As you've done, restrict the
server to localhost access to Solr, add firewall rules to allow your
developers on port 80, and proxypass allowed port 80 transfer to Solr.
Remember to include the proxypassreverse too.
(This runs on linux and apache httpd btw.)

-Original Message-
From: Stavros Delisavas [mailto:stav...@delisavas.de] 
Sent: 22 November 2013 14:24
To: solr-user@lucene.apache.org
Subject: How to work with remote solr savely?

Hello Solr-Friends,
I have a question about working with solr which is installed on a remote
server.
I have a php-project with a very big mysql-database of about 10gb and I
am also using solr for about 10,000,000 entries indexed for fast search
and access of the mysql-data.
I have a local copy myself so I can continue to work on the php-project
itself, but I want to make it available for more developers too. How can
I make solr accessable ONLY for those exclusive developers? For mysql
it's no problem to add an additional mysql-user with limited access.

But for Solr it seems difficult to me. I have had my administrator
restrict the java-port 8080 to localhost only. That way no one outside
can access solr or the solr-admin interface.
How can I allow access to other developers without making the whole
solr-interface (port 8080) available to the public?

Thanks,

Stavros


RE: How to work with remote solr savely?

2013-11-22 Thread Hoggarth, Gil
You could also use one of the proxy scripts, such as
http://code.google.com/p/solr-php-client/, which is coincidentally
linked (eventually) from Michael's suggested SolrSecurity URL.

-Original Message-
From: michael.boom [mailto:my_sky...@yahoo.com] 
Sent: 22 November 2013 14:53
To: solr-user@lucene.apache.org
Subject: Re: How to work with remote solr savely?

http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication

Maybe you could achieve write/read access limitation by setting path
based
authentication:
The update handler /solr/core/update  should be protected by
authentication, with credentials only known to you. But then of course,
your indexing client will need to authenticate in order to add docs to
solr.
Your select handler /solr/core/select could then be open or protected
by http auth with credentials open to developers.

That's the first idea that comes to mind - haven't tested it. 
If you do, feedback and let us know how it went.



-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-t
p4102612p4102618.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Why do people want to deploy to Tomcat?

2013-11-12 Thread Hoggarth, Gil
For me, a side-affect of 'example' is that it's just that, not appropriate for 
production. But also, there's the organisation factor beyond Solr that is about 
staff expertise - we don't have any systems that utilise jetty so we're 
unfamiliar with its configuration, issues, or oddities. Tomcat is our defacto 
container so it makes sense for us to implement Solr within Tomcat.

If we ruled out these reasons, I'd still be looking for a container that:
- was a standalone installation (i.e., outside of Solr tarball) so that it 
would be managed via yum (we run on RHEL). This separates any issues of Solr 
from issues of jetty, which given a current lack of jetty knowledge would be a 
helpful thing.
- the container service could be managed via standard SysV startup processes. 
To be fair, I've implemented our own for Tomcat and could do this for jetty, 
but I'd prefer jetty included this (which would suggest it is more prepared for 
enterprise use).
- Likewise, I assume all of jetty's configuration can be reset to use normal 
RHEL /etc/ and /var/ directories, but I'd prefer that jetty did this for me (to 
demonstrate again it's enterprise-ready status).

Yes, I could do all the necessary bespoke configuration so that jetty follows 
the above reasons, but because I'd have to I question if it's ready for our 
enterprise setup (which mainly means that our Operations team will fight 
against unusual configurations).

Having added all of this, I have to admit that I like the idea of using jetty 
because you guys tell me that Solr is affectively pre-configured for jetty. But 
then I'd want to know what in particular these jetty configurations were!

BTW Very pleased that this is being discussed - the views can help me argue our 
case to use jetty if it is indeed more beneficial to do so.

Gil

-Original Message-
From: Sebastián Ramírez [mailto:sebastian.rami...@senseta.com] 
Sent: 12 November 2013 13:38
To: solr-user@lucene.apache.org
Subject: Re: Why do people want to deploy to Tomcat?

I agree with Doug, when I started I had to spend some time figuring out what 
was just an example and what I would have to change in a production 
environment... until I found that all the example was ready for production.

Of course, you commonly have to change the settings, parameters, fields, etc. 
of your Solr system, but the example doesn't have anything that is not for 
production.


Sebastián Ramírez
[image: SENSETA – Capture  Analyze] http://www.senseta.com/


On Tue, Nov 12, 2013 at 8:18 AM, Amit Aggarwal amit.aggarwa...@gmail.comwrote:

 Agreed with Doug
 On 12-Nov-2013 6:46 PM, Doug Turnbull  
 dturnb...@opensourceconnections.com
 wrote:

  As an aside, I think one reason people feel compelled to deviate 
  from the distributed jetty distribution is because the folder is named 
  example.
  I've had to explain to a few clients that this is a bit of a misnomer.
 The
  IT dept especially sees example and feels uncomfortable using that 
  as a starting point for a jetty install. I wish it was called 
  default or
 bin
  or something where its more obviously the default jetty distribution 
  of Solr.
 
 
  On Tue, Nov 12, 2013 at 7:06 AM, Roland Everaert 
  reveatw...@gmail.com
  wrote:
 
   In my case, the first time I had to deploy and configure solr on 
   tomcat (and jboss) it was a requirement to reuse as much as 
   possible the application/web server already in place. The next 
   deployment I also use tomcat, because I was used to deploy on 
   tomcat and I don't know jetty
 at
   all.
  
   I could ask the same question with regard to jetty. Why 
   use/bundle(/ if
  not
   recommend) jetty with solr over other webserver solutions?
  
   Regards,
  
  
   Roland Everaert.
  
  
  
   On Tue, Nov 12, 2013 at 12:33 PM, Alvaro Cabrerizo 
   topor...@gmail.com
   wrote:
  
In my case, the selection of the servlet container has never 
been a
  hard
requirement. I mean, some customers provide us a virtual machine
   configured
with java/tomcat , others have a tomcat installed and want to 
share
 it
   with
solr, others prefer jetty because their sysadmins are used to
 configure
it...  At least in the projects I've been working in, the 
selection
 of
   the
servlet engine has not been a key factor in the project success.
   
Regards.
   
   
On Tue, Nov 12, 2013 at 12:11 PM, Andre Bois-Crettez
andre.b...@kelkoo.comwrote:
   
 We are using Solr running on Tomcat.

 I think the top reasons for us are :
  - we already have nagios monitoring plugins for tomcat that 
 trace queries ok/error, http codes / response time etc in 
 access logs,
  number
 of threads, jvm memory usage etc
  - start, stop, watchdogs, logs : we also use our standard 
 tools
 for
   that
  - what about security filters ? Is that possible with jetty ?

 André


 On 11/12/2013 04:54 AM, Alexandre Rafalovitch wrote:

 Hello,

 

How to cancel a collection 'optimize'?

2013-11-11 Thread Hoggarth, Gil
We have an internal Solr collection with ~1 billion documents. It's
split across 24 shards and uses ~3.2TB of disk space. Unfortunately
we've triggered an 'optimize' on the collection (via a restarted browser
tab), which has raised the disk usage to 4.6TB, with 130GB left on the
disk volume.

 

As I fully expect Solr to use up all of the disk space as the collection
is more than 50% of the disk volume, how can I cancel this optimize? And
separately, if I were to reissue with maxSegments=(high number, eg 40),
should I still expect the same disk usage? (I'm presuming so as doesn't
it need to gather the whole index to determine which docs should go into
which segments?)

 

Solr 4.4 on RHEL6.4, 160GB RAM, 5GB per shard.

 

(Great conference last week btw - so much to learn!)

 

 

Gil Hoggarth

Web Archiving Technical Services Engineer 

The British Library, Boston Spa, West Yorkshire, LS23 7BQ

Tel: 01937 546163

 



RE: How to cancel a collection 'optimize'?

2013-11-11 Thread Hoggarth, Gil
Hi Otis, thanks for the response. I could stop the whole Solr service as
as yet there's no audience access to it, but might it be left in an
incomplete state and thus try to complete optimisation when the service
is restarted?

[Yes, we did speak in Dublin - you can see we need that monitoring
service! Must set up the demo version, asap!]

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: 11 November 2013 16:02
To: solr-user@lucene.apache.org
Subject: Re: How to cancel a collection 'optimize'?

Hi Gil,
(we spoke in Dublin, didn't we?)

Short of stopping Solr I have a feeling there isn't much you can do
hm. or, I wonder if you could somehow get a thread dump, get the PID
of the thread (since I believe threads in Linux are run as processes),
and then kill that thread... Feels scary and I'm not sure what this
might do to the index, but maybe somebody else can jump in and comment
on this approach or suggest a better one.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics Solr 
Elasticsearch Support * http://sematext.com/


On Mon, Nov 11, 2013 at 10:44 AM, Hoggarth, Gil gil.hogga...@bl.uk
wrote:
 We have an internal Solr collection with ~1 billion documents. It's 
 split across 24 shards and uses ~3.2TB of disk space. Unfortunately 
 we've triggered an 'optimize' on the collection (via a restarted 
 browser tab), which has raised the disk usage to 4.6TB, with 130GB 
 left on the disk volume.



 As I fully expect Solr to use up all of the disk space as the 
 collection is more than 50% of the disk volume, how can I cancel this 
 optimize? And separately, if I were to reissue with maxSegments=(high 
 number, eg 40), should I still expect the same disk usage? (I'm 
 presuming so as doesn't it need to gather the whole index to determine

 which docs should go into which segments?)



 Solr 4.4 on RHEL6.4, 160GB RAM, 5GB per shard.



 (Great conference last week btw - so much to learn!)





 Gil Hoggarth

 Web Archiving Technical Services Engineer

 The British Library, Boston Spa, West Yorkshire, LS23 7BQ

 Tel: 01937 546163





RE: New shard leaders or existing shard replicas depends on zookeeper?

2013-10-24 Thread Hoggarth, Gil
Absolutely, the scenario I'm seeing does _sound_ like I've not specified
the number of shards, but I think I have - the evidence is:
- DnumShards=24 defined within the /etc/sysconfig/solrnode* files

- DnumShards=24 seen on each 'ps' line (two nodes listed here):
 tomcat   26135 1  5 09:51 ?00:00:22 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode1
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp
org.apache.catalina.startup.Bootstrap start
tomcat   26225 1  5 09:51 ?00:00:19 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode2
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp
org.apache.catalina.startup.Bootstrap start

- The Solr node dashboard shows -DnumShards=24 in its list of Args for
each node

And yet, the ldwa01 nodes are leader and replica of shard 17 and there
are no other shard leaders created. Plus, if I only change the ZK
ensemble declarations in /etc/system/solrnode* to the different dev ZK
servers, all 24 leaders are created before any replicas are added.

I can also mention, when I browse the Cloud view, I can see both the
ldwa01 collection and the ukdomain collection listed, suggesting that
this information comes from the ZKs - I assume this is as expected.
Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed
for ldwa01 but these addresses are also listed as 'Down' in the ukdomain
collection (except for :8983 which only shows in the ldwa01 collection).

Any help very gratefully received.
Gil

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 23 October 2013 18:50
To: solr-user@lucene.apache.org
Subject: Re: New shard leaders or existing shard replicas depends on
zookeeper?

My first impulse would be to ask how you created the collection. It sure
_sounds_ like you didn't specify 24 shards and thus have only a single
shard, one leader and 23 replicas

bq: ...to point to the zookeeper ensemble also used for the ukdomain
collection...

so my guess is that this ZK ensemble has the ldwa01 collection defined
as having only one shard

I admit I pretty much skimmed your post though...

Best,
Erick


On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil gil.hogga...@bl.uk
wrote:

 Hi solr-users,



 I'm seeing some confusing behaviour in Solr/zookeeper and hope you can

 shed some light on what's happening/how I can correct it.



 We have two physical servers running automated builds of RedHat 6.4 
 and Solr 4.4.0 that host two separate Solr services. The first server 
 (called ld01) has 24 shards and hosts a collection called 'ukdomain'; 
 the second server (ld02) also has 24 shards and hosts a different 
 collection called 'ldwa01'. It's evidently important to note that 
 previously both of these physical servers provided the 'ukdomain'
 collection, but the 'ldwa01' server has been rebuilt for the new 
 collection.



 When I start the ldwa01 solr nodes with their zookeeper configuration 
 (defined in /etc/sysconfig/solrnode* and with collection.configName as
 'ldwa01cfg') pointing to the development zookeeper ensemble, all nodes

 initially become shard leaders and then replicas as I'd expect. But if

 I change the ldwa01 solr nodes to point to the zookeeper ensemble also

 used for the ukdomain collection, all ldwa01 solr nodes start on the 
 same shard (that is, the first ldwa01 solr node becomes the shard 
 leader, then every other solr node becomes a replica for this shard). 
 The significant point here is no other ldwa01 shards gain leaders (or
replicas).



 The ukdomain collection uses a zookeeper collection.configName of 
 'ukdomaincfg', and prior to the creation of this ldwa01 service the 
 collection.configName of 'ldwa01cfg' has never previously been used.
So 
 I'm

RE: New shard leaders or existing shard replicas depends on zookeeper?

2013-10-24 Thread Hoggarth, Gil
I think my question is easier, because I think the problem below was
caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg'
zk collection name didn't specify the number of shards (and thus
defaulted to 1).

So, how can I change the number of shards for an existing collection/zk
collection name, especially when the ZK ensemble in question is the
production version and supporting other Solr collections that I do not
want to interrupt. (Which I think means that I can't just delete the
clusterstate.json and restart the ZKs as this will also lose the other
Solr collection information.)

Thanks in advance, Gil

-Original Message-
From: Hoggarth, Gil [mailto:gil.hogga...@bl.uk] 
Sent: 24 October 2013 10:13
To: solr-user@lucene.apache.org
Subject: RE: New shard leaders or existing shard replicas depends on
zookeeper?

Absolutely, the scenario I'm seeing does _sound_ like I've not specified
the number of shards, but I think I have - the evidence is:
- DnumShards=24 defined within the /etc/sysconfig/solrnode* files

- DnumShards=24 seen on each 'ps' line (two nodes listed here):
 tomcat   26135 1  5 09:51 ?00:00:22 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode1
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp
org.apache.catalina.startup.Bootstrap start
tomcat   26225 1  5 09:51 ?00:00:19 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode2
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp
org.apache.catalina.startup.Bootstrap start

- The Solr node dashboard shows -DnumShards=24 in its list of Args for
each node

And yet, the ldwa01 nodes are leader and replica of shard 17 and there
are no other shard leaders created. Plus, if I only change the ZK
ensemble declarations in /etc/system/solrnode* to the different dev ZK
servers, all 24 leaders are created before any replicas are added.

I can also mention, when I browse the Cloud view, I can see both the
ldwa01 collection and the ukdomain collection listed, suggesting that
this information comes from the ZKs - I assume this is as expected.
Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed
for ldwa01 but these addresses are also listed as 'Down' in the ukdomain
collection (except for :8983 which only shows in the ldwa01 collection).

Any help very gratefully received.
Gil

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 23 October 2013 18:50
To: solr-user@lucene.apache.org
Subject: Re: New shard leaders or existing shard replicas depends on
zookeeper?

My first impulse would be to ask how you created the collection. It sure
_sounds_ like you didn't specify 24 shards and thus have only a single
shard, one leader and 23 replicas

bq: ...to point to the zookeeper ensemble also used for the ukdomain
collection...

so my guess is that this ZK ensemble has the ldwa01 collection defined
as having only one shard

I admit I pretty much skimmed your post though...

Best,
Erick


On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil gil.hogga...@bl.uk
wrote:

 Hi solr-users,



 I'm seeing some confusing behaviour in Solr/zookeeper and hope you can

 shed some light on what's happening/how I can correct it.



 We have two physical servers running automated builds of RedHat 6.4 
 and Solr 4.4.0 that host two separate Solr services. The first server 
 (called ld01) has 24 shards and hosts a collection called 'ukdomain'; 
 the second server (ld02) also has 24 shards and hosts a different 
 collection called 'ldwa01'. It's evidently important to note that 
 previously both of these physical servers provided the 'ukdomain'
 collection, but the 'ldwa01' server has been rebuilt for the new 
 collection.



 When I start the ldwa01 solr

New shard leaders or existing shard replicas depends on zookeeper?

2013-10-23 Thread Hoggarth, Gil
Hi solr-users,

 

I'm seeing some confusing behaviour in Solr/zookeeper and hope you can
shed some light on what's happening/how I can correct it.

 

We have two physical servers running automated builds of RedHat 6.4 and
Solr 4.4.0 that host two separate Solr services. The first server
(called ld01) has 24 shards and hosts a collection called 'ukdomain';
the second server (ld02) also has 24 shards and hosts a different
collection called 'ldwa'. It's evidently important to note that
previously both of these physical servers provided the 'ukdomain'
collection, but the 'ldwa' server has been rebuilt for the new
collection.

 

When I start the ldwa solr nodes with their zookeeper configuration
(defined in /etc/sysconfig/solrnode* and with collection.configName as
'ldwacfg') pointing to the development zookeeper ensemble, all nodes
initially become shard leaders and then replicas as I'd expect. But if I
change the ldwa solr nodes to point to the zookeeper ensemble also used
for the ukdomain collection, all ldwa solr nodes start on the same shard
(that is, the first ldwa solr node becomes the shard leader, then every
other solr node becomes a replica for this shard). The significant point
here is no other ldwa shards gain leaders (or replicas).

 

The ukdomain collection uses a zookeeper collection.configName of
'ukdomaincfg', and prior to the creation of this ldwa service the
collection.configName of 'ldwacfg' has never previously been used. So
I'm confused why the ldwa service would differ when the only difference
is which zookeeper ensemble is used (both zookeeper ensembles are
automatedly built using version 3.4.5).

 

If anyone can explain why this is happening and how I can get the ldwa
services to start correctly using the non-development zookeeper
ensemble, I'd be very grateful! If more information or explanation is
needed, just ask.

 

Thanks, Gil

 

Gil Hoggarth

Web Archiving Technical Services Engineer 

The British Library, Boston Spa, West Yorkshire, LS23 7BQ

 



Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Hoggarth, Gil
Hi all, I hope you can advise a solution to our incorrect data directory
issue.

 

We have 2 physical servers using Solr 4.3.0, each with 24 separate
tomcat instances (RedHat 6.4, java 1.7.0_10-b18, tomcat 7.0.34) with a
solr shard in each. This configuration means that each shard has its own
data directory declared. (Server OS, tomcat and solr, including shards,
created via automated builds.) 

 

That is, for example,

- tomcat instance, /var/local/tomcat/solrshard3/, port 8985

- corresponding solr instance, /usr/local/solrshard3/, with
/usr/local/solrshard3/collection1/conf/solrconfig.xml

- corresponding solr data directory,
/var/local/solrshard3/collection1/data/

 

We process ~1.5 billion documents, which is why we use so 48 shards (24
leaders, 24 replicas). These physical servers are rebooted regularly to
fsck their drives. When rebooted, we always see several (~10-20) shards
failing to start (UI cloud view shows them as 'Down' or 'Recovering'
though they never recover without intervention), though there is not a
pattern to which shards fail to start - we haven't recorded any that
always or never fail. On inspection, the UI dashboard for these failed
shards displays, for example:

- HostServer1

- Instance/usr/local/sholrshard3/collection1

- Data/var/local/solrshard6/collection1/data

- Index  /var/local/solrshard6/collection1/data/index

 

To fix such failed shards, I manually restart the shard leader and
replicas, which fixes the issue. However, of course, I would like to
know a permanent cure for this, not a remedy.

 

We use a separate zookeeper service, spread across 3 Virtual Machines
within our private network of ~200 servers (physical and virtual).
Network traffic is constant but relatively little across 1GB bandwidth.

 

Any advice or suggestions greatly appreciated.

Gil

 

Gil Hoggarth

Web Archiving Engineer

The British Library, Boston Spa, West Yorkshire, LS23 7BQ

 



RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Hoggarth, Gil
Thanks for your reply Daniel.

The dataDir is set in each solrconfig.xml; each one has been checked to
ensure it points to its corresponding location. The error we see is that
on machine reboot not all of the shards start successfully, and if the
fail was to be a leader the replicas can't take its place (presumably
because the leader incorrect data directory is inconsistent with their
own).

More detail that I can add is that the catalina.out log for failed
shards reports:
May 15, 2013 5:56:02 PM org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/solr] created a ThreadLocal with key of
type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@524e13f6]) and a
value of type
[org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value
[org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. Threads
are going to be renewed over time to try and avoid a probable memory
leak.

This doesn't (to me) relate to the problem, but that doesn't necessarily
mean it's not. Plus, it's the only SEVERE reported and only reported in
the failed shard catalina.out log.

Checking the zookeeper logs, we're seeing:
2013-05-16 13:25:46,839 [myid:1] - WARN
[RecvWorker:3:QuorumCnxManager$RecvWorker@762] - Connection broken for
id 3, my id = 1, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(Quoru
mCnxManager.java:747)
2013-05-16 13:25:46,841 [myid:1] - WARN
[RecvWorker:3:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker
2013-05-16 13:25:46,842 [myid:1] - WARN
[SendWorker:3:QuorumCnxManager$SendWorker@679] - Interrupted while
waiting for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.re
portInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw
aitNanos(AbstractQueuedSynchronizer.java:2095)
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389
)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(Quorum
CnxManager.java:831)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnx
Manager.java:62)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(Quoru
mCnxManager.java:667)
2013-05-16 13:25:46,843 [myid:1] - WARN
[SendWorker:3:QuorumCnxManager$SendWorker@688] - Send worker leaving
thread

This is I think as separate issue in that this happens immediately after
I restart a zookeeper. (I.e., I see this in a log, restart that
zookeeper, and immediately see a similar issue in one of the other two
zookeeper logs).



-Original Message-
From: Daniel Collins [mailto:danwcoll...@gmail.com] 
Sent: 16 May 2013 13:28
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3.0: Shard instances using incorrect data directory
on machine boot

What actual error do you see in Solr?  Is there an exception and if so,
can you post that?  As I understand it, datatDir is set from the
solrconfig.xml file, so either your instances are picking up the wrong
file, or you have some override which is incorrect?  Where do you set
solr.data.dir, at the environment when you start Solr or in solrconfig?


On 16 May 2013 12:23, Hoggarth, Gil gil.hogga...@bl.uk wrote:

 Hi all, I hope you can advise a solution to our incorrect data 
 directory issue.



 We have 2 physical servers using Solr 4.3.0, each with 24 separate 
 tomcat instances (RedHat 6.4, java 1.7.0_10-b18, tomcat 7.0.34) with a

 solr shard in each. This configuration means that each shard has its 
 own data directory declared. (Server OS, tomcat and solr, including 
 shards, created via automated builds.)



 That is, for example,

 - tomcat instance, /var/local/tomcat/solrshard3/, port 8985

 - corresponding solr instance, /usr/local/solrshard3/, with 
 /usr/local/solrshard3/collection1/conf/solrconfig.xml

 - corresponding solr data directory,
 /var/local/solrshard3/collection1/data/



 We process ~1.5 billion documents, which is why we use so 48 shards 
 (24 leaders, 24 replicas). These physical servers are rebooted 
 regularly to fsck their drives. When rebooted, we always see several 
 (~10-20) shards failing to start (UI cloud view shows them as 'Down'
or 'Recovering'
 though they never recover without intervention), though there is not a

 pattern to which shards fail to start - we haven't recorded any that 
 always or never fail. On inspection, the UI dashboard for these failed

 shards displays, for example:

 - HostServer1

 - Instance/usr/local/sholrshard3/collection1

 - Data/var/local/solrshard6/collection1/data

 - Index

RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Hoggarth, Gil
Thanks for your response Shawn, very much appreciated.
Gil

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 16 May 2013 15:59
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.3.0: Shard instances using incorrect data directory
on machine boot

 The dataDir is set in each solrconfig.xml; each one has been checked 
 to ensure it points to its corresponding location. The error we see is

 that on machine reboot not all of the shards start successfully, and 
 if the fail was to be a leader the replicas can't take its place 
 (presumably because the leader incorrect data directory is 
 inconsistent with their own).

Although you can set the dataDir in solrconfig.xml, I would strongly
recommend that you don't.

If you are using the old-style solr.xml (which has cores and core tags)
then set the dataDir in each core tag in solr.xml. This gets read and
set before the core is created, so there's less chance of it getting
scrambled. The solrconfig is read as part of core creation.

If you are using the new style solr.xml (new with 4.3.0) then you'll
need absolute dataDir paths, and they need to go in each core.properties
file.
Due to a bug, relative paths won't work as expected. I need to see if I
can make sure the fix makes it into 4.3.1.

If moving dataDir out of solrconfig.xml fixes it, then we probably have
a bug.

Yout Zookeeper problems might be helped by increasing zkClientTimeout.

Thanks,
Shawn