The ultimate answer is that you need to test your configuration with your
expected workflow.

However, the thing that mitigates the remote IO factor (hopefully) is that
the Solr HDFS stuff features a blockcache that should (when tuned
correctly) cache in RAM the blocks your Solr process needs the most.

Solr on HDFS currently doesn't have any sort of rack locality like there is
with say HBase colocated on the HDFS nodes. So you can expect that even
with Solr installed on the same nodes as your datanodes for HDFS, that
there will be remote IO.



Michael Della Bitta

Senior Software Engineer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions
<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>

On Tue, Mar 24, 2015 at 2:47 PM, Joseph Obernberger <j...@lovehorsepower.com
> wrote:

> Hi All - does it make sense to run a solr shard on a node within an Hadoop
> cluster that is not a data node?  In that case all the data that node
> processes would need to come over the network, but you get the benefit of
> more CPU for things like faceting.
> Thank you!
>
> -Joe
>

Reply via email to