Erik Hatcher schrieb:
On Nov 16, 2008, at 6:18 PM, Ryan McKinley wrote:
my assumption with solrjs is that you are hitting "read-only" solr
servers that you don't mind if people query directly.
Exactly the assumption I'm going with too.
It would not be appropriate for something where you don't want people
(who really care) to know you are running solr and could execute
arbitrary queries.
Since it is an example, I don't mind leaving the /admin interface open
on:
http://example.solrstuff.org/solrjs/admin/
but /update has a password:
http://example.solrstuff.org/solrjs/update
I have said in the past I like the idea of a "read-only" flag in solr
config that would throw an error if you try to do something with the
UpdateHandler. However there are other ways to do that also.
As the thoughts and ideas of this thread are spread in several emails, let me
just drop my uncoordinated thoughts here:
For solrjs, what exactly is the required information solr has to provide
"directly":
- We need data for several widgets. This data will be in 99% of the cases some
facet information and/or result docs. The result docs will be in suitable
ranges, no webpage will display 100000+ result items at the same time.
- So "potentially dangerous" request params like rows>1000 or some other
handlers apart from StandardRequest may be blocked.
- update handlers and admin interface shouldn't be exposed.
Like others mentioned before, I'm not sure this is a task that *has* to be
solved inside Solr. As a standalone servlet, it is verly likely that it is NOT
accessible directly in a production environment.
Hiding or password protecting update/admin is an easy task using a proxy like
apache http. It could also be solved by a configurable ServletFilter delivered
with solr, that is initialized inside solr's web.xml. To separate the concerns,
I think it should not be coded "deeper" inside the solr code. The idea of a
"read-only" server can be implemented like that. Optional update urls that are
only accessed inside a firewall or something may also be present.
This servlet filter may also check the request params for things that are not
needed for solrjs and potentially dangerous. It even may check how frequently
urls are accessed (thinking about DoS).
I think even if it looks like a direct access, using solrjs doesn't have to be
different to "common" solr webapps. Usually these apps take user input, a web
application translates this input into a solr query and translates the result in
a suitable client format. Other solr stuff is blocked indirectly because only
this app has access to solr. Now the last 2 steps are done inside the client.
But if we block stuff that isn't used by the client, we are in control of what
may happen.
If that isn't secure enough, the more complicated solution would be the create
such a stateful servlet that holds the query state of a client, and solrjs only
performs /select/solrjs/?new_query=city:vienna or something. Then the query
generation and all solr related stuff happens again on the server.
I think it should easily be reached to deliver this SecuritySolrFilter with the
standard solr distribution, making it configurable for the user to decide what
urls are blocked/password protected and what request parameters should be
checked for illegal values. On the other hand, existing firewalls and proxies of
the destination system may be used.Therefore some "best-practices" may be
helpful in the solr wiki.
I would be fine by me to help implementing a standard securty filter for solr.
WDYT?
regards,
matthias