Re: Miserable Experience Using Solr. Again.

Shawn Heisey Fri, 16 Sep 2016 06:47:53 -0700

Responses inline.  More potentially flimsy excuses coming your way.

On 9/15/2016 9:56 PM, Aaron Greenspan wrote:
> My two cents: I’m glad to see the discussion over improved documentation, but 
> if you give me a choice between better docs and better UI, I’ll choose a 
> better UI every time. If contributors are going to spend real time on the 
> concerns raised in this thread, spend the time on making the software better 
> to the point where more docs are unnecessary. All sorts of things could 
> improve that would make the product far more intuitive (and I know, there are 
> probably JIRA entries on most of these already…).


The UI is not really intended to be the way that Solr gets accessed. 
It's mostly just a way for an admin to peer into what Solr is doing.  At
this time, it's not really a good tool for configuration.  The admin UI
in 1.x and 3.x only had one or two ways to change the system ... and one
of those ways was enable/disable on the ping handler.  Everything else
the UI did back then was informational.

In newer versions of Solr, the UI does have some capability to make
changes to the system state -- which it does by accessing the HTTP API,
same as any program you would write that uses Solr.

> - The psuedo-frames in the web UI are the source of all kinds of problems, 
> with lots of weird horizontal scrolling I’ve noticed over the years. It makes 
> the Logging screen in particular infuriating to use. When I click on certain 
> log entries an arbitrary-seeming "false" flips to "true" under the "WARN" 
> statement in the Level column. But on other log entries, it all just goes 
> haywire all over the screen because it’s too big both horizontally and 
> vertically, and then re-condenses as though I’d never clicked, as I mentioned 
> before.

If the UI problems you've found are filed as bugs (or as a single bug
listing the problems), we can take it from there.  My free time is
pretty small, or I would do it myself.

Checking the actual logfile has always been a better option than the
Logging tab in the UI.  The info in ERROR messages is typically too
large for effective display in the UI, and the UI excludes messages that
are at a severity of INFO or less.  The default level for the logfile is
INFO, and is typically very verbose.

> - The top menu on the left is in plain English. The core menu on the bottom 
> is written as though it’s being viewed by a person who only speaks UNIX. For 
> example, there is no space between "Data" and "Import" in "DataImport" and 
> "Segments info" could just be "Segments". Is "Plugins / Stats" two menus in 
> one?

The Solr feature that this accesses is contained in a class named
DataImportHandler.  The most commonly configured URL path for the
handler is "/dataimport".  Is it more important to be grammatically
correct, or connect the wording in the admin UI with what the user
actually sees in solrconfig.xml?  You're probably right that it should
have that space, so it doesn't grate on the nerves of those who like
things to be correct.

On "Segments Info" ... brevity counts for a lot in an interface.  That
sounds like a good change.

> - "Ping" in the menu takes you nowhere in particular and shouldn’t really be 
> a menu item. It should be part of the main dashboard with all of the other 
> tech stats (which I do like) or a menu called "Status". (Why would one core 
> ping faster than another anyway? If this is really for "cloud" installations 
> where cores can be split up on different servers, why am I seeing it when 
> everything is local and immediate?)

Good point.  I'm not sure why the time for the ping is so prominently
displayed.  Ping isn't really about speed -- it's about whether or not
the server is up and functional.  It's also a legacy feature that sort
of works with Cloud, but isn't really aware of Cloud.  Plenty of room
for improvement in the UI, in the ping feature itself, and the docs.

> - On the Data Import page, the expandable icons are [-] when they’re expanded 
> and still [-] when they’re collapsed. Extremely confusing.

That's definitely a bug.  The priority of that bug might be debatable.

> - The Data Import UI makes no mention anywhere of the ability to import from 
> MySQL, which is 99% of what I want to do with this product. It doesn’t tell 
> me how to set up the MySQL connector, doesn’t give me a button that turns it 
> on in some modular fashion, doesn’t tell me if the server connection is 
> successful, doesn’t let me easily enter or edit credentials, doesn’t let me 
> edit my queries anywhere, and doesn’t let me test out a new query and see how 
> it might fit into the Solr schema. These deficiencies are presumably also 
> true for any database data source, e.g. Postgres/DB2/ODBC/whatever—which also 
> are not listed, were I curious to know what Solr can do just by looking at 
> the product itself.
>
> - Nor does the Data Import UI have another section for picking a folder on 
> the filesystem that might contain PDFs I want to import with Tika.

The dataimport section of the UI isn't about configuring dataimport or
telling it where to grab data from, it's more starting an import with
the already-configured settings and about visibility into the current
status of an ongoing import.  Configuring the feature must still be done
with the XML config file right now.  Having configuration options for
DIH in the UI would be awesome.

> - There is no field picker on the Query screen, but I just spent all of that 
> time defining my fields in those XML files I can’t edit or auto-generate 
> through the UI. That means I have to do extra work to remember them all. But 
> then there is a field picker on the Analysis screen?

That's a cool idea.  It would require a rework of the query screen.  The
"q" box would still need to be there -- a field picker works great when
a query is only on one field, but the full query syntax lets you query
for information on multiple fields, and writing a UI to deal with THAT
would very challenging.

The analysis page only works on one field at a time, and even works on a
fieldType, which is also available in that screen's dropdown.

> - How do I restart Solr from the UI? Or change memory allocation settings? 
> Can I?

You can't.  Solr is still implemented as a java webapp.  Although
containers do have the ability to hot deploy their applications, which
is like a restart, using this feature is typically plagued by memory
leaks and other problems.  Restarting the actual *java* process is even
further out of reach for the UI, at least currently.

> - How do I change the port the UI is running on from the UI? Or limit the IP 
> addresses Jetty is binding to? Can I?

This would require changing the system-level config and restarting Java,
which can't be done through the UI currently.

One of the ideas that has been floated is to rewrite Solr as *two* java
processes -- one that is a controller with extremely minimal memory
requirements that starts the *main* java process, and can completely
control every aspect.  That's about the only way that settings like
memory allocation and listen port could be changed in the admin UI. 
Implementing this feature would not be trivial, but I really like the idea.

> - How do I change log settings through the UI? Can I?

Which log settings?  The logging level for each class can already be
changed.  All other settings are configured with log4j.properties,
including permanent adjustments to logging levels for classes.  Many
settings can be changed programmatically, but if we introduce code for
configuring a specific logging framework, then users will not be able to
change the logging framework that Solr uses.  In the future, I think we
might take away the choice of logging framework and make additional
config options for logging available in the admin UI, but for now, the
user can still change the framework.

> - For those places where technical terms really are necessary (and I’d argue 
> that should be nowhere), tooltips should be pervasive to explain what 
> everything means. q? fq? fl? df?

Agreed.  We've recently changed to a completely new UI that has far more
cloud capability included, and it's possible that some of this has
regressed.  I know the old UI had quite a few tooltips on the Query
screen.  The old UI is still accessible in 6.x versions, at least for now.

> - The Data Import UI currently broadcasts your MySQL database password to the 
> world for whoever is logged in. In a best-case scenario, the legitimate user 
> might have search admin permissions but ideally not full MySQL permissions. 
> In a worst-case scenario, they’re a random stranger who used nmap to find an 
> open port 8983 and just got closer to rooting your server or at least taking 
> all of your data. This feature—showing the password—seems unnecessary. Though 
> I take issue with the kludge of shoving a configuration file in an IFRAME or 
> something like it in the first place, while it’s still part of the product, 
> at least replace whatever comes after the string "password=" and before the 
> next space with some asterisks.

The password can be encrypted.  The capability is NOT well documented,
and I had a hell of a time figuring out how to generate the password,
which I think I did with openssl.  Worse, the *encryption* password is
sitting in the DIH config file in cleartext.  This would defeat a casual
observer, but not a determined attacker with some brains.

The inclusion of the full config is one of the ways that the UI's status
as an add-on, instead of a fully functional configuration point, is
apparent.

> - The Data Import UI always checks the "clean" box by default, which means 
> every time I try to do something I almost erase my entire core, which takes a 
> full day and a lot of CPU cycles to rebuild. I know this has been in JIRA for 
> some time, and it’s still not fixed. And it’s a bug that destroys data!

Yep.  Another problem that has apparently escaped our immediate
attention.  Make some noise on the Jira.  Anyone who's on the dev list
automatically gets a copy of all Jira activity.

Saving the status of that checkbox across browser sessions might be
challenging, but I think that the status within a session could be
handled with cookies.

> - Revealing my true ignorance here, I have no idea how to use the Analysis 
> screen.

It's deceptively simple, and once you DO grok it, it will be your best
friend.  With it, you can determine whether your field analysis will
generate matches when you make specific queries.

Basic usage -- pick a field.  You probably want "verbose" checked.  Type
some query terms into the "query" side.  Type the actual text that will
be indexed into a document for that field into the "index" side.

For the query side, it doesn't have the full query parser capability, so
you can't include operators like "AND".  If it's a phrase query, don't
include the quotes in what you type.  It also can't deal with field
names in the query.  These are things that we'd really like to have, but
the analysis screen accesses a back-end analysis request handler, and
that request handler doesn't deal with those things, at least not yet.

When you press the "Analyze" button, Solr will show you how both sides
are analyzed.  Any terms from the query side that show up on the index
side will be highlighted with a purple color.  You'll have to look at
the last line of "index" analysis output for purple matches and apply
some of your own intelligence to determine whether matches on that
screen actually will result in a query match -- default operator,
operators included in the actual query, etc.  If the query is a phrase
query (includes quotes) then the relative term positions will matter as
well as having the highlighted matches.

Determining whether a multi-field query will result in a match will
require using the analysis screen for each field in the query.

> - The Files screen doesn’t say the name of the parent folder at the top, so 
> it’s not entirely obvious where I even am on my filesystem or why I want the 
> ability to view, but not edit, files through the browser.

Everything in "Files" is found in the core's "conf" directory, or in the
case of SolrCloud, in the configuration that's stored in Zookeeper. 
This could be more clear, definitely.  The location of the directory or
zookeeper path should be shown there.

At one time, we had the ability to edit the config files in the UI.  A
big player (Redhat, if I remember right) filed a security bug against
Solr because of this, so we took it out.

> My general feeling is that the UI does mostly things I don’t understand or 
> want, and not the few basic things that I do want. Even with a great sample 
> environment, this would still be true. This is why the documentation ends up 
> being so important. Again, I would point everyone to successful server 
> products that are already out there that have done similar tasks well for 
> decades now. If you don’t like WordPress, or the now aged Netscape Enterprise 
> Server, then there’s phpMyAdmin, Oracle Application Server, Microsoft IIS, 
> etc. They all have web UIs that are for the most part self-explanatory. Good 
> software doesn’t force users to learn how it works. It hides the inner 
> workings under the interface, so that people never even have to worry about 
> it at all.

As already said above, the UI is not the true access point for Solr. 
It's a helper that's been added on.  The true access point is the
various request handlers forming the HTTP API -- /select, /update, etc
... and the HTTP API lacks complete documentation.

I would like to see a future where the admin UI is more than just an
addon ... but even then, I think the HTTP API will *still* be the most
important piece of the system.

Thanks,
Shawn

Re: Miserable Experience Using Solr. Again.

Reply via email to