Re: Miserable Experience Using Solr. Again.

Shawn Heisey Wed, 14 Sep 2016 06:11:44 -0700

On 9/13/2016 5:42 PM, Aaron Greenspan wrote:
> I get this on digest mode (and wasn’t even sure my initial message
> went through to the list), so please forgive the delay in responding.


I've added you as BCC so you'll get this as soon as I send it.  I wrote
most of it last night, and left it to complete in the morning -- and now
I see that Jan has replied with similar information.

> I think the various reactions to my post suggest that a sizable number
> of users (and by "users" I mean those who are not affiliated with
> Apache and who are not core contributors) find Solr difficult to use.
> For me, this was confirmed many months ago when a family friend—a
> non-technical CEO twice my age of a company recently acquired for a
> very sizable sum—came over for dinner and without any prompting from
> anyone began complaining about this impossible program at work called
> Solr that none of his engineers could get to work. By his telling, he
> had several experienced engineers working on it. 

I've been using Solr for about six years now.  When I first got started,
I spent a HUGE amount of time figuring out the most basic things, and I
asked plenty of dumb questions right here on this list.  I think it took
me about three days to get from that initial download of the 1.4.0
archive to a working server that had something besides "collection1" on
it.  It took another month or so beyond that before I could demonstrate
anything usable to my team, and after that had to start writing tools
that would actually create the index without manual intervention.  One
of those tools was an init script.  Now Solr will install an init script
on Unix-like operating systems.

My active production indexes are running on a couple of different 4.x
versions.  I have production 5.x indexes on servers serving a hot
standby role, but they have not been fully vetted, so the primaries
remain on older versions.  It'll be a while before I get around to 6.x.

> I’m aware that issues with Java are not Solr’s fault. But most
> programs still manage to gracefully fail when they are missing a
> dependency, and then clearly report what’s missing. If you’re not
> actually a Java programmer, which I am not, "major.minor 52.0" (for
> example) is meaningless gibberish. "Please download and install JRE
> 1.8 to run this software" would be considerably clearer. How is it
> that Solr can search through millions of files, but it can’t do that? 

I know that in the 5.x days, we had Java version detection in the start
script, so that the start would complain if certain buggy versions of
Java 7 were detected.  I think it would even refuse to start if the
version wasn't new enough.  If we have lost that with 6.x, that needs to
go back in, and we will look at that problem immediately.

On password security:  I hear you.  Part of the issue is that Solr can't
*directly* do security.  It's sitting behind another piece of software
that handles the network and HTTP -- Jetty.  Until recently, Solr really
didn't touch the servlet container, allowing it to do its thing
according to its config files.  Part of this was due to the fact that
before 5.0, we did not know what container was being used -- the user
had the option of deploying in several different containers, and none of
them handled security in quite the same way.  Since 5.0, the only
officially supported container is the Jetty that Solr includes, so we
CAN put container-specific code into Solr.  This is why 5.3 and later
have good support for authentication.

TL;DR info:  When you password protect Solr, the admin UI actually
doesn't get protected.  It is nothing more than static HTML, CSS,
Javascript, and images.  The admin UI actually runs in your browser, not
on the server.  What gets password protection is the HTTP API used for
information, queries, and updates.

You're absolutely right that our documentation and error messages are
completely inadequate for a novice user.  The error messages sometimes
aren't even adequate for an experienced Java developer to know what went
wrong, at least not without examining the source code.

> As for Bram Van Dam’s question about how a settings database would
> work, I don’t think it’s worth getting too specific here, but my
> general response would be, if you need a good model for how to widely
> deploy software—not a perfect model, but a good one—look at WordPress.
> A lot of people use WordPress. Like any software, it has its flaws.
> But average people are able to sign in, with a password (!), change
> their admin settings, and save those settings I’m pretty certain to a
> MySQL schema. I’d love to be able to do that with Solr. 

I concur with what Alexandre said about Wordpress compared to Solr.  The
target audience and deployment method are quite different ... but I take
your point too -- we can learn a lot from projects like WordPress, which
has had to address "first contact" issues in their documentation.

The addition of Zookeeper capability to Solr in version 4.0 created
SolrCloud, which automates the job of using multiple Solr machines as a
scalable and redundant cluster.  Unfortunately, Zookeeper is not trivial
to set up, so we traded a super-hard problem for a different and
slightly less challenging problem.  Managing zookeeper is easier than
setting up all of the infrastructure required for a cluster when
SolrCloud is NOT used.

Solr has a running mode where it embeds a Zookeeper server inside of
Solr itself.  The port number is usually 9983.  The startup script will
set that port number to the Solr port plus 1000, unless a specific port
number is configured.

The cloud example (bin/solr start -e cloud) starts an embedded zookeeper
on the first Solr instance.  This arrangement is suitable for a demo or
proof of concept, but a different setup is highly recommended for
production: an external ensemble of three or more zookeeper instances
running on separate physical boxes.  Three servers are required for
zookeeper redundancy, and running them outside Solr is recommended so
that the entire ZK ensemble stays up even when a Solr process is
stopped.  Zookeeper does NOT need to run on separate hardware from Solr,
though a large and busy cluster will perform better if zookeeper uses a
different storage volume than Solr does.

I hate that you've had a bad experience with Solr.  Your feedback has
given us some pointers about specific things we can improve.  I hope
you'll be willing to continue providing guidance.

Thanks,
Shawn

Re: Miserable Experience Using Solr. Again.

Reply via email to