Timothy Potter created SOLR-6743:
------------------------------------

             Summary: Support deploying SolrCloud on YARN
                 Key: SOLR-6743
                 URL: https://issues.apache.org/jira/browse/SOLR-6743
             Project: Solr
          Issue Type: New Feature
          Components: Hadoop Integration, SolrCloud
            Reporter: Timothy Potter


We're seeing Solr running with Hadoop more and more and YARN allows us to 
deploy and manage distributed applications across a cluster of machines. This 
feature will provide support for deploying SolrCloud in YARN. Currently, the 
code is implemented in an open-source project hosted on Lucidworks github, see: 
https://github.com/LucidWorks/yarn-proto

We'd like to submit this to the Apache Solr project as a contrib so it is 
easier to run Solr on YARN right out-of-the-box. There are a few hurdles to get 
over though:

1) Overall approach: There are various options for supporting YARN, such as 
Apache Slider, but I opted to just use the YARN client API directly which 
simply invokes the bin/solr start script under the covers. The YARN specific 
code is quite simple and most of the code is just handling command line 
options/parsing. I'm curious what others think about having a simple native 
solution that ships with Solr (similar to the HdfsDirectoryFactory) vs. 
something more heavy-weight that requires 3rd party tools to be involved.

2) Unit testing - Solr on YARN relies on putting a full Solr bundle into HDFS 
(which you can see how that might work in the SolrYarnTestIT test case). This 
obviously has problems in the Solr build as there is no bundle of Solr 
available during unit testing. I'm thinking about having a mock bundle that 
simulates starting Solr but that limits what we can verify on the cluster once 
it's up.

3) Shutdown - In order to support an orderly shutdown of Solr when the 
application is stopped by the ResourceManager, we need a shutdown handler in 
Jetty/Solr that allows a remote application to request shutdown. The built-in 
Jetty shutdown handler requires the stop request to come from localhost. To 
work-around this, I've introduced a custom ShutdownHandler that can be 
configured using System properties at startup to allow a remote host to request 
shutdown. When YARN starts Solr nodes, I register the address of the SolrMaster 
node with a secret key that will allow the SolrMaster to shutdown Solr 
gracefully. This seems secure since only the SolrMaster can request shutdown 
using the correct key. Other ideas on how to handle graceful shutdown?

4) Additional features: The current implementation is useful for 
starting/stopping SolrCloud nodes in YARN. My thinking is that you'll provision 
the cluster using YARN and then just interact with Solr directly using Solr's 
API , so the YARN layer is quite thin. Other features needed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to