Hi Shawn,
Without going into excessive detail on our design, I won't be able to
sufficiently justify an answer to your question as to the why of it.
Suffice it to say we plan to deploy this indexing for our entire
customer base. Because of size these document collections and the way
that they will grow over time, doubling up in machines is not feasible
in our current infrastructure at this time. It may be justified later,
but not today. It's less expensive to add more CPUs and RAM than
doubling up on physical machines. Additionally, there are further
budgetary constraints going into our international datacenters which
prevents us from having identical clusters across the board, thus
requiring doubling up. We're not talking about 2 or 3 machines here.
We're talking 128 running instances of Solr with 64 clusters and many
shards.
However, that doesn't preclude the use of something like Docker or KVM
to allow encapsulation of each Solr environment on a virtual machine
which is hooked to a fast storage subsystem.
I would also suggest that if the recommendation is not to run two
instance side-by-side, then the documentation regarding how to set this
up should be removed and a strong statement put in its place that
running multiple Solr instances is not a supported configuration. Right
now, the documentation does not state this and, in fact, implies that it
is perfectly fine to run multiple instances side by side as long as
independent disks are used to hold the instances.
Note, this was not my design and I am not a fan doing this, but I'm not
the person making this decision. I am the person who's tasked to
implement this design choice.
Thanks.
On 2/17/16 10:19 PM, Shawn Heisey wrote:
On 2/17/2016 10:38 PM, Brian Wright wrote:
We have a new project to use Solr. Our Solr instance will use Jetty
rather than Tomcat. We plan to extend the Solr core system by adding
additional classes (jar files) to the
/opt/solr/server/solr-webapp/webapp/WEB-INF/lib directory to extend
features. We also plan to run two instances of Solr on each physical
server preferably from a single installed Solr instance. I've read the
best practices doc on running two Solr instances, and while it's
detailed about how to set up two instances, it doesn't cover our
specific use case.
Why do you want to run multiple instances on one server? Unless you
have a REALLY good reason to have more than one instance per server,
don't do it. One instance can handle many indexes with no problem.
The only valid reason I can think of to run more than one instance per
machine is when a single instance requires a VERY large heap. In that
case, it *might* be better to run two instances that each have a smaller
heap, so that garbage collection times are lower. I personally would
add more machines, rather than run multiple instances.
Generally the best way to load custom jars (and contrib components like
the dataimport handler) in Solr is to create a "lib" directory in the
solr home (where solr.xml lives) and place all extra jars there. They
will be loaded once when Solr starts, and all cores will have access to
them.
The rest of your email was concerned with running multiple instances.
If you *REALLY* want to go against advice and do this, here's the
recommended way:
https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-RunningmultipleSolrnodesperhost
It is very likely possible to run multiple instances out of the same
installation directory, but I am not sure how to do it.
Thanks,
Shawn
--
Signature
*Brian Wright*
*Sr. Systems Engineer *
901 Mariners Island Blvd Suite 200
San Mateo, CA 94404 USA
*Email *bri...@marketo.com <mailto:bri...@marketo.com>
*Phone *+1.650.539.3530**
*****www.marketo.com <http://www.marketo.com/>*
Marketo Logo