Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Chris Hostetter

: > Ah ... this is the one problem with high volume on an involved thread ...
: > i'm sending replies to messages you write after you've already read other
: > replies to other messages you sent and changed your mind :)

: Should we start a new thread?

I don't think it would make a differnece ... we just need to slow down :)

: Ok, now (I think) I see the difference between our ideas.
:
: >From your code, it looks like you want the RequestParser to extract
: 'qt' that defines the RequestHandler.  In my proposal, the
: RequestHandler is selected independent of the RequestParser.

no, no, no ... i'm sorry if i gave that impression ... the RequestParser
*only* worries about getting a streams, it shouldn't have any way of even
*guessing* what RequestHandler is going to be used.

for refrence: http://www.nabble.com/Re%3A-p8438292.html

note that i never mention "qt" .. instead i refer to
"core.execute(solrReq, solrRsp);" doing exactly what it does today ...
core.execute will call getRequestHandler(solrReq.getQueryType()) to pick
the RequestHandler to use.

the Servlet is what creates the SolrRequest object, and puts whatever
SolrParams it wants (including "qt") in that SolrRequest before asking the
SolrCore to take care of it.

: What do you imagine happens in:
: >
: > String p = pickRequestParser(req);

let's use the URL syntax you've been talking about that people seem to
have agreed looks good (assuming i understand correctly) ...

   /servlet/${requesthandler}:${requestparser}?param1=val1¶m2=val2

what i was suggesting was that then the servlet which uses that URL
structure might have a utility method called pickRequestParser that would look 
like...

  private String pickRequestParser(HttpServletRequest req) {
String[] pathParts = req.getPathInfo().split("\:");
if (pathParts.length < 2 || "".equal(pathParts[1]))
  return "default"; // or "standard", or null -- whatever
return pathParts[1];
  }


: If the RequestHandler is defined by the RequestParser,  I would
: suggest something like:

again, i can't emphasis enough that that's not what i was proposing ... i
am in no way shape or form trying to talk you out of the idea that it
should be possible to specify the RequestParser, the RequestHandler, and
the OutputWriter all as part of the URL, and completley independent of
eachother.

the RequestHandler and the OutputWriter could be specified as regular
SolrParams that come from any part of the HTTP request, but the
RequestParser needs to come from some part of the URL thta can be
inspected with out any risk of affecting the raw post stream (ie: no
HttpServletRequest.getParameter() calls)

: I still don't see why:
:
: >
: > // let the parser preprocess the streams if it wants...
: > Iterable s = solrParser.preprocess
: >   (getStreamIno(req),  new Pointer() {
: > InputStream get() {
: >   return req.getInputStream();
: > });
: >
: > Solrparams params = makeSolrRequest(req);
: >
: > // let the parser decide what to do with the existing streams,
: > // or provide new ones
: > Iterable solrParser.process(solrReq, s);
: >
: > // ServletSolrRequest is a basic impl of SolrRequest
: > SolrRequest solrReq = new ServletSolrRequest(params, s);
: >
:
: can not be contained entirely in:
:
:   SolrRequest solrReq = parser.parse( req );

because then the RequestParser would be defining how the URL is getting
parsed -- the makeSolrRequest utility placeholder i described had the
wrong name, i should have called it makeSolrParams ... it would look
something like this in the URL syntax i described above...

  private SolrParams makeSolrParams(HttpServletRequest req) {
// this class already in our code base, used as is
SolrParams p = new ServletSolrParams(req);
String[] pathParts = req.getPathInfo().split("\:");
if ("".equal(pathParts[0]))
  return p;
Map tmp = new HashMap();
tmp.put("qt", pathPaths[0]);
return new DefaultSolrParams(new MapSolrParams(tmp), p);
  }



the nutshell version of everything i'm trying to say is...

 SolrRequest
   - models all info about a request to solr to do something:
 - the key=val params assocaited with that request
 - any streams of data associated with that request
 RequestParser(s)
   - different instances for different sources of streams
   - is given two chances to generate ContentStreams:
 - once using the raw stream from the HTTP request
 - once using the params for the SolrRequest
 SolrSerlvet
   - the only thing with direct access to the HttpServletRequest, shields
 the other interface APIs from from the mechanincs of HTTP
   - dictates the URL structure
 - determines the name of the RequestParser to use
 - lets parser have the raw input stream
 - determines where SolrParams for request come from
 - lets parser have params to make more streams if it wants to.
 SolrCore
   - does all of hte name lookups for processing a SolrRequest:
 - 

RE: separate log files

2007-01-18 Thread Ben Incani
Hi Solr devs,

I'm running multiple instances of Solr, which all using the same war
file to load from.  To log to separate files I implemented the following
kludge.

-Ben


23d22
< import org.apache.solr.request.SolrQueryResponse;
24a24
> import org.apache.solr.request.SolrQueryResponse;
33a34,36
> 
> import java.io.ByteArrayInputStream;
> import java.io.ByteArrayOutputStream;
34a38,39
> import java.io.InputStream;
> import java.io.OutputStream;
35a41,42
> import java.util.Properties;
> import java.util.logging.LogManager;
47a55,80
>   /*
>* switch java.util.logging.Logger appenders
>*
>* Add the following to the web context file
>* 
>*/
>   private void switchAppenders(String prefix) {
>   String logParam = "org.apache.juli.FileHandler.prefix";
>   log.info("switching appender to " + logParam + "=" +
prefix);
>   Properties props = new Properties();
>   try {
>   InputStream configStream =
getClass().getResourceAsStream("/logging.properties");
>   props.load(configStream);
>   configStream.close();
>   props.setProperty(logParam, prefix);
>   ByteArrayOutputStream os = new
ByteArrayOutputStream();
>   props.store((OutputStream)os, "LOGGING
PROPERTIES"); 
>   LogManager.getLogManager().readConfiguration(new
ByteArrayInputStream(os.toByteArray()));
>   log.info("props: " + props.toString());
>   }
>   catch(Exception e) {
>   String errMsg = "Error: Cannot load
configuration file; Cause: " + e.getMessage();
>   log.info(errMsg);
>   }
>   }
>   
48a82
> 
52c86,91
< 
---
>   
>   // change the logging properties
>   String prefix =
(String)c.lookup("java:comp/env/solr/log-prefix");
>   if (prefix!=null)
> switchAppenders(prefix);  
>   
64a104
>



> -Original Message-
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, 17 January 2007 6:04 AM
> To: solr-user@lucene.apache.org
> Subject: Re: separate log files
> 
> 
> : I wonder of jetty or tomcat can be configured to put logging output
> : for different webapps in different log files...
> 
> i've never tried it, but the tomcat docs do talk about tomcat 
> providing a custom implimentation of java.util.logging 
> specificly for this purpose.
> 
> Ben: please take a look at this doc...
> 
> http://tomcat.apache.org/tomcat-5.5-doc/logging.html
> 
> ..specifically the section on java.util.logging (since that's 
> what Solr
> uses) ... I believe you'll want something like the "Example 
> logging.properties file to be placed in common/classes" so 
> that you can control the logging.
> 
> Please let us all know if this works for you ... it would 
> make a great addition to the SolrTomcat wiki page.
> 
> 
> : On 1/15/07, Ben Incani <[EMAIL PROTECTED]> wrote:
> : > Hi Solr users,
> : >
> : > I'm running multiple instances of Solr, which all using 
> the same war
> : > file to load from.
> : >
> : > Below is an example of the servlet context file used for each
> : > application.
> : >
> : >  : > debug="0" crossContext="true" >
> : >  : > value="/var/local/app1" override="true" />
> : > 
> : >
> : > Hence each application is using the same
> : > WEB-INF/classes/logging.properties file to configure logging.
> : >
> : > I would like to each instance to log to separate log 
> files such as;
> : > app1-solr.-mm-dd.log
> : > app2-solr.-mm-dd.log
> : > ...
> : >
> : > Is there an easy way to append the context path to
> : > org.apache.juli.FileHandler.prefix
> : > E.g.
> : > org.apache.juli.FileHandler.prefix = ${catalina.context}-solr.
> : >
> : > Or would this require a code change?
> : >
> : > Regards
> : >
> : > -Ben
> :
> 
> 
> 
> -Hoss
> 
> 


Re: graduation todo list

2007-01-18 Thread Yonik Seeley

On 1/18/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

: The old website has been redirected to the new, so those links are
: less important.

one hitch to this is that the symlink from when we moved hte javadocs is
now gone, so links like this found in the wiki 9and in mail archives) no
longer work...

http://incubator.apache.org/solr/docs/api/org/apache/solr/request/DisMaxRequestHandler.html

...instead of re-adding a symlink, we should probably put in a .htaccess
file to do a redirect from http://lucene.apache.org/solr/docs/(.*) to
http://lucene.apache.org/solr/$1


So people don't keep linking to the docs url?  Sounds fine to me...
I'm on my way home, but I'll handle it later if no one else does so
first.

-Yonik


Re: graduation todo list

2007-01-18 Thread Chris Hostetter

: The old website has been redirected to the new, so those links are
: less important.

one hitch to this is that the symlink from when we moved hte javadocs is
now gone, so links like this found in the wiki 9and in mail archives) no
longer work...

http://incubator.apache.org/solr/docs/api/org/apache/solr/request/DisMaxRequestHandler.html

...instead of re-adding a symlink, we should probably put in a .htaccess
file to do a redirect from http://lucene.apache.org/solr/docs/(.*) to
http://lucene.apache.org/solr/$1


-Hoss



Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Ryan McKinley


: I was...  then you talked me out of it!  You are correct, the client
: should determine the RequestParser independent of the RequestHandler.

Ah ... this is the one problem with high volume on an involved thread ...
i'm sending replies to messages you write after you've already read other
replies to other messages you sent and changed your mind :)



Should we start a new thread?




Here's a more fleshed out version of the psuedo-java i posted earlier,
with all of my adendums inlined and a few simple metho calls changed to
try and make the purpose more clear...



Ok, now (I think) I see the difference between our ideas.


From your code, it looks like you want the RequestParser to extract

'qt' that defines the RequestHandler.  In my proposal, the
RequestHandler is selected independent of the RequestParser.

What do you imagine happens in:


String p = pickRequestParser(req);



This looks like you would have to have a standard way (per servlet) of
gettting the RequestParser.  How do you invision that?  What would be
the standard way to choose your request parser?


If the RequestHandler is defined by the RequestParser,  I would
suggest something like:

interface SolrRequest
{
 RequestHandler getHandler();
 Iterable getContentStreams();
 SolrParams getParams();
}

interface RequestParser
{
 SolrRequest getRequest( HttpServletRequest req );

 // perhaps remove getHandler() from SolrRequest and add:
 RequestHandler getHandler();
}

And then configure a servlet or filter with the RequestParser


   SolrRequestFilter
   ...
   
 RequestParser
 org.apache.solr.parser.StandardRequestParser
   


Given that the number of RequestParsers is realistically small (as
Yonik mentioned), I think this could be a good solution.

To update my current proposal:
1. Servlet/Filter defines the RequestParser
2. requestParser parses handler & request from HttpServletRequest
3. handled essentially as before

To update the example URLs, defined by the "StandardRequestParser"
 /path/to/handler/?param
where /path/to/handler is the "name" defined in solrconfig.xml

To use a different RequestParser, it would need to be configured in web.xml
 /customparser/whatever/path/i/like


- - - - - - - - - - - - - -

I still don't see why:



// let the parser preprocess the streams if it wants...
Iterable s = solrParser.preprocess
  (getStreamIno(req),  new Pointer() {
InputStream get() {
  return req.getInputStream();
});

Solrparams params = makeSolrRequest(req);

// let the parser decide what to do with the existing streams,
// or provide new ones
Iterable solrParser.process(solrReq, s);

// ServletSolrRequest is a basic impl of SolrRequest
SolrRequest solrReq = new ServletSolrRequest(params, s);



can not be contained entirely in:

 SolrRequest solrReq = parser.parse( req );

assuming the SolrRequest interface includes

 Iterable getContentStreams();

the parser can use use req.getInputStream() however it likes - either
to make params and/or to build ContentStreams

- - - - - - - -

good good
ryan


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Ryan McKinley


Cool.  I think i need more examples... concrete is good :-)

I don't quite grok your format below... is it one line or two?
/path/defined/in/solrconfig:parser?params
/${handler}:${parser}

Is that simply

/${handler}:${parser}?params



yes.  the ${} is just to show what is extracted from the request URI,
not a specific example

Imagine you have a CsvUpdateHander defined in solrconfig.xml with a
"name"="my/update/csv".

The standard RequestParser could extract the parameters and
Iterable for each of the following requests:

POST: /my/update/csv/?separator=,&fields=foo,bar,baz
(body) "10,20,30"

POST:/my/update/csv/
multipart post with 5 files and 6 form fields defining
(unlike the previous example this the handle would get 5 input streams
rather then 1)

GET: /my/update/csv/?post.remoteURL=http://..&separator=,&fields=foo,bar,baz&;...
fill the stream with the content from a remote URL

GET: /my/update/csv/?post.body=bodycontent,&fields=foo,bar,baz&...
use 'bodycontent' as the input stream.  (note, this does not make much
sense for csv, but is a useful example)

POST: /my/update/csv:remoteurls/?separator=,&fields=foo,bar,baz
(body) http://url1,http://url2,http:/url3...
In this case we would use a custom RequestParser ("remoteurls") that
would read the post body and convert it to a stream of content urls.

- - - - - - -

The URL path (everything before the ':') would be entirely defined and
configured by solrconfig.xml  A filter would see if the request path
matches a registered handler - if not it will pass it up the filter
chain.  This would allow custom filters and servlets to co-exist in
the top level URL path.  Consider:

solrconfig.xml
 

web.xml:
 
   MyRestfulDelete
   /mydelete/*
 

POST: /delete?id=AAA   would be sent to DeleteHandler
POST: /mydelete/AAA/ would be sent to MyRestfulDelete

Alternativly, you could have:


solrconfig.xml
 

web.xml:
 
   MyRestfulDelete
   /delete/*
 

POST: /standard/delete?id=AAA   would be sent to DeleteHandler
POST: /delete/AAA/ would be sent to MyRestfulDelete

I am suggesting we do not try have the default request servlet/filter
support extracting parameters from the URL.  I think this is a
reasonable tradeoff to be able to have the request path easily user
configurable using the *existing* plugin configuration.

- - - - - - - -

In a previous email, you mentioned changing the URL structure.  With
this proposal, we would continue to support:
/select?wt=XXX

for the Csv example, you would also be able to call:
GET: /select?qt=/my/update/csv/&post.remoteURL=http://..&sepa...

ryan


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Chris Hostetter

: However, I'm not yet convinced the benefits are worth the costs.  If
: the number of RequestParsers remain small, and within the scope of
: being included in the core, that functionality could just be included
: in a single non-pluggable RequestParser.
:
: I'm not convinced is a bad idea either, but I'd like to hear about
: usecases for new RequestParsers (new ways of generically getting an
: input stream)?

I don't really see it being a very high cost ... and even if we can't
imagine any other potential user written RequestParser, we already know of
at least 4 use cases we want to support out of the box for getting
streams:

 1) raw post body (as a single stream)
 2) multi-part post body (file upload, potentially several streams)
 3) local file(s) specified by path (1 or more streams)
 4) remote resource(s) specified by URL(s) (1 or more streams)

...we could put all that logic in a single class with that looks at a
SolrParam to pick what method to use or we could extract each one into
it's own class using a common interface ... either way we can hardcode the
list of viable options if we want to avoid the issue of letting the client
configure them .. but i still think it's worth the effort to talk about
what that common interface might be.

I think my idea of having both a preProcess and a process method in
RequestParser so it can do things before and after the Servlet has
extracted SolrParams from the URL would work in all of the cases we've
thought of.



-Hoss



Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Chris Hostetter

: I was...  then you talked me out of it!  You are correct, the client
: should determine the RequestParser independent of the RequestHandler.

Ah ... this is the one problem with high volume on an involved thread ...
i'm sending replies to messages you write after you've already read other
replies to other messages you sent and changed your mind :)

: Are you suggesting there would be multiple servlets each with a
: different methods to get the SolrParams from the url?  How does the
: servlet know if it can touch req.getParameter()?

I'm suggesting that their *could* be multiple Servlets with multiple URL
structures ... my worry is not that we need multiple options now, it's
that i don't wnat to cope up with an API for writting plugins that then
has to be throw out down the road when if we want/ened to change the URL

: How would the default servlet fill up SolrParams?

prior to calling RequestParser.preProcess, it would only access very
limited parts of the HttpServletRequest -- the bare minimum it needs to
pick a RequsetParser ... probably just the path, maybe the HTTP Headers --
but if we had a URL structure where we really wanted to specify the
RequestParser in a URL param it could do it using getQueryString

*after* calling RequestParser.preProcess the Servlet can access any part
of the HttpServletRequest (because if the RequestParser wanted to use the
raw POST InputStream it would have, and if it doesn't then it's fair game
to let HttpServletRequest pull data out of it when the Servlet calls
HttpServletRequest.getParameterMap() -- or any of the other
HttpServletRequest methods to build up the SolrParams however it wants
based on the URL structure it wants to use ... then RequestParser.process
can use those SolrParams to get any other streams it may want and add them
to the SolrRequest.

Here's a more fleshed out version of the psuedo-java i posted earlier,
with all of my adendums inlined and a few simple metho calls changed to
try and make the purpose more clear...



// Simple inteface for having a lazy refrence to something
interface Pointer {
  T get();
}

interface RequestParser {
  public init(NamedList nl); // the usual

  /** will be passed the raw input stream from the
   * HttpServletRequest, ... as well as whatever other HttpServletRequest
   * header info we decide its important for the RequestParser to know
   * about the stream, and is safe for Servlets to access and make
   * available to the RequestParser (ie: HTTP method, content-type,
   * content-length, etc...)
   *
   * I'm using a NamedList instance instead of passing the
   * HttpServletRequest to maintain a good abstraction -- only the Serlet
   * know about HTTP, so if we ever want to write an RMI interface to Solr,
   * the same RequestParser plugins will still work ... in practice it
   * might be better to explicitly spell out every piece of info about
   * the stream we want to pass
   *
   * This is the method where a RequestParser which is going to use the
   * raw POST body to build up eithera single stream, or several streams
   * from a multi-part request has the info it needs to do so.
   */
  public Iterable preProcess(NamedList streamInfo,
Pointer s);

  /** garunteed that the second arg will be the result from
   * a previous call to preProcess, and that that Iterable from
   * preProcess will not have been inspected or touched in anyway, nor
   * will any refrences to it be maintained after this call.
   *
   * this is the method where a RequestParser which is going to use
   * request params to open streams from local files, or remote URLs
   * can do so -- a particulararly ambitious RequestParser could use
   * both the raw POST data *and* remote files specified in params
   * because it has the choice of what to do with the
   * Iterable it reutnred from the earlier preProcess call.
   */
  public Iterable process(SolrRequest request,
 Iterable i);

}


class SolrUberServlet extends HttpServlet {

  // servlet specific method which does minimal inspection of
  // req to determine the parser name based on the URL
  private String pickRequestParser(HttpServletRequest req) { ... }

  // extracts just the most crucial info about the HTTP Stream from the
  // HttpServletRequest, so it can be passed to RequestParser.preProcss
  // must be careful not to use anything that might access the stream.
  private NamedLIst getStreamInfo(HttpServletRequest req) { ... }

  // builds the SolrParams for the request using servlet specific URL rules,
  // this method is free to use anything in the HttpServletRequest
  // because it won't be called untill after preProcess
  private SolrParams makeSolrRequestParams(HttpServletRequest req) { ... }

  public service(HttpServletRequest req, HttpServletResponse response) {
SolrCore core = getCore();
Solr(Query)Response solrRsp = new Solr(Query)Response();

String p = pickRequestParser(req)

[jira] Commented: (SOLR-80) negative filter queries

2007-01-18 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465857
 ] 

Yonik Seeley commented on SOLR-80:
--

>From an interface point of view, I'm heavily leaning toward getting rid of the 
>restriction of lucene queries having to be all negative.  This would allow 
>using a negative-only query anywhere one currently can use a positive query.

One could simply and naturally do fq=-id:10 to filter out a single document.


> negative filter queries
> ---
>
> Key: SOLR-80
> URL: https://issues.apache.org/jira/browse/SOLR-80
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Yonik Seeley
>
> There is a need for negative filter queries to avoid long filter generation 
> times and large caching requirements.
> Currently, if someone wants to filter out a small number of documents, they 
> must specify the complete set of documents to express those negative 
> conditions against.  
> q=foo&fq=id:[* TO *] -id:101
> In this example, to filter out a single document, the complete set of 
> documents (minus one) is generated, and a large bitset is cached.  You could 
> also add the restriction to the main query, but that doesn't work with the 
> dismax handler which doesn't have a facility for this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Yonik Seeley

On 1/18/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:

On 1/18/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On 1/18/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> > Yes, this proposal would fix the URL structure to be
> > /path/defined/in/solrconfig:parser?params
> > /${handler}:${parser}
> >
> > I *think* this cleanly handles most cases cleanly and simply.  The
> > only exception is where you want to extract variables from the URL
> > path.
>
> But that's not a hypothetical case, extracting variables from the URL
> path is something I need now (to add metadata about the data in the
> raw post body, like the CSV separator).
>
> POST to http://localhost:8983/solr/csv?separator=,&fields=foo,bar,baz
> with a body of "10,20,30"
>

Sorry, by "in the URL" I mean "in the URL path." The RequestParser can
extract whatever it likes from getQueryString()

The url you list above could absolutely be handled with the proposed
format.


Cool.  I think i need more examples... concrete is good :-)

I don't quite grok your format below... is it one line or two?
/path/defined/in/solrconfig:parser?params
/${handler}:${parser}

Is that simply

/${handler}:${parser}?params

Or is it all one line where you actually have params twice?

-Yonik


Re: graduation todo list

2007-01-18 Thread Yonik Seeley

OK, I think I got all the important stuff on
http://wiki.apache.org/solr/TaskList

except subersion pointers from the Wiki.
If you run across broken svn links, please help fix them.

The old website has been redirected to the new, so those links are
less important.

-Yonik


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Ryan McKinley

On 1/18/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 1/18/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> Yes, this proposal would fix the URL structure to be
> /path/defined/in/solrconfig:parser?params
> /${handler}:${parser}
>
> I *think* this cleanly handles most cases cleanly and simply.  The
> only exception is where you want to extract variables from the URL
> path.

But that's not a hypothetical case, extracting variables from the URL
path is something I need now (to add metadata about the data in the
raw post body, like the CSV separator).

POST to http://localhost:8983/solr/csv?separator=,&fields=foo,bar,baz
with a body of "10,20,30"



Sorry, by "in the URL" I mean "in the URL path." The RequestParser can
extract whatever it likes from getQueryString()

The url you list above could absolutely be handled with the proposed
format.  The thing that could not be handled is:
http://localhost:8983/solr/csv/foo/bar/baz/
with body "10,20,30"



> There are pleanty of ways to rewrite RESTfull urls into a
> path+params structure.  If someone absolutly needs RESTfull urls, it
> can easily be implemented with a new Filter/Servlet that picks the
> 'handler' and directly creates a SolrRequest from the URL path.

While being able to customize something is good, having really good
defaults is better IMO :-)  We should also be focused on exactly what
we want our standard update URLs to look like in parallel with the
design of how to support them.



again, i totally agree.  My point is that I don't think we need to
make the dispatch filter handle *all* possible ways someone may want
to structure their request.  It should offer the best defaults
possible.  If that is not sufficient, someone can extend it.


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Yonik Seeley

On 1/18/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:

Yes, this proposal would fix the URL structure to be
/path/defined/in/solrconfig:parser?params
/${handler}:${parser}

I *think* this cleanly handles most cases cleanly and simply.  The
only exception is where you want to extract variables from the URL
path.


But that's not a hypothetical case, extracting variables from the URL
path is something I need now (to add metadata about the data in the
raw post body, like the CSV separator).

POST to http://localhost:8983/solr/csv?separator=,&fields=foo,bar,baz
with a body of "10,20,30"


There are pleanty of ways to rewrite RESTfull urls into a
path+params structure.  If someone absolutly needs RESTfull urls, it
can easily be implemented with a new Filter/Servlet that picks the
'handler' and directly creates a SolrRequest from the URL path.


While being able to customize something is good, having really good
defaults is better IMO :-)  We should also be focused on exactly what
we want our standard update URLs to look like in parallel with the
design of how to support them.

As a site note, with a change of URLs, we get a "free" chance to
change whatever we want about the parameters or response format...
backward compatibility only applies to the original URLs IMO.

-Yonik


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Ryan McKinley


I'm confused by your sentence "A RequestParser converts a
HttpServletRequest to a SolrRequest." .. i thought you were advocating
that the servlet parse the URL to pick a RequestHandler, and then the
RequestHandler dicates the RequestParser?



I was...  then you talked me out of it!  You are correct, the client
should determine the RequestParser independent of the RequestHandler.



: /path/registered/in/solr/config:requestparser?params
:
: If no ':' is in the URL, use 'standard' parser
:
: 1. The URL path determins the RequestHandler
: 2. The URL path determins the RequestParser
: 3. SolrRequest = RequestParser.parse( HttpServletRequest )
: 4. handler.handleRequest( req, res );
: 5. write the response

do you mean the path before hte colon determins the RequestHandler and the
path after the colon determines the RequestParser?


yes, that is my proposal.


fine too ... i was specificly trying to avoid making any design
decissions that required a particular URL structure, in what you propose
we are dictating more then just the "/handler/path:parser" piece of the
URL, we are also dicating that the Parser decides how the rest of the path
and all URL query string data will be interpreted ...


Yes, this proposal would fix the URL structure to be
/path/defined/in/solrconfig:parser?params
/${handler}:${parser}

I *think* this cleanly handles most cases cleanly and simply.  The
only exception is where you want to extract variables from the URL
path.  There are pleanty of ways to rewrite RESTfull urls into a
path+params structure.  If someone absolutly needs RESTfull urls, it
can easily be implemented with a new Filter/Servlet that picks the
'handler' and directly creates a SolrRequest from the URL path.  In my
opinion, for this level of customization is reasonable that people
edit web.xml and put in their own servlets and filters.



what i'm proposing is that the Servlet decide how to get the SolrParams
out of an HttpServletRequest, using whatever URL that servlet wants;


I guess I'm not understanding this yet:

Are you suggesting there would be multiple servlets each with a
different methods to get the SolrParams from the url?  How does the
servlet know if it can touch req.getParameter()?

How would the default servlet fill up SolrParams?




I think i'm getting confused ... i thought you were advocating that
RequestParsers be implimented as ServletFilters (or Servlets) ...


Originally I was... but again, you talked me out of it.  (this time
not totally)  I think the /path:parser format is clear and allows for
most everything off the shelf.  If you want to do something different,
that can easily be a custom filter (or servlet)

Essentially, i think it is reasonable for people to skip
'RequestParsers' in a custom servlet and be able to build the
SolrRequest directly.  This level of customization is reasonable to
handle directly with web.xml


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Yonik Seeley

OK, trying to catch up on this huge thread... I think I see why it's
become more complicated than I originally envisioned.

What I originally thought:
1) add a way to get a Reader or InputStream from SolrQueryRequest, and
then reuse it for updates too
2) use the plugin name in the URL
3) write code that could handle multi-part post, or could grab args
from the URL.
4) profit!

I think the main additional complexity is the idea that RequestParser
(#3) be both pluggable and able to be specified in the actual request.
I hadn't considered that, and it's an interesting idea.

Without pluggable RequestParser:
- something like CSV loader would have to check the params for a
"file" param and if so, open the local file themselves

With a pluggable RequestParser:
- the LocalFileRequestParser would be specified in the url (like
/update/csv:local) and it will handle looking for the "file" param and
opening the file.  The CSV plugin can be a little simpler by just
getting a Reader.
- a new way of getting a stream could be developed (a new
RequestParser) and most stream oriented plugins could just use it.

However, I'm not yet convinced the benefits are worth the costs.  If
the number of RequestParsers remain small, and within the scope of
being included in the core, that functionality could just be included
in a single non-pluggable RequestParser.

I'm not convinced is a bad idea either, but I'd like to hear about
usecases for new RequestParsers (new ways of generically getting an
input stream)?

-Yonik


Re: subversion move

2007-01-18 Thread Yonik Seeley

On 1/18/07, Zaheed Haque <[EMAIL PROTECTED]> wrote:

A minor detail.. Logo on the Solr page still says
"incubator"..shouldn't that be taken off now? or
are they some "Graduation holding period" :=)


I already changed this yesterday in subversion, but I didn't want to
update the live site until the new download link worked (the download
mirrors take a while to sync).  It should be updated sometime today.

-Yonik


Re: subversion move

2007-01-18 Thread Zaheed Haque

Hi:

A minor detail.. Logo on the Solr page still says
"incubator"..shouldn't that be taken off now? or
are they some "Graduation holding period" :=)

Cheers

On 1/18/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

Solr's source in subversion has moved within the ASF repository to
to https://svn.apache.org/repos/asf/lucene/solr/
(Thanks Doug!)

The easiest way to change your working directories is to use "svn switch".
For example, if you have the "trunk" of solr checked out, cd to that
directory and execute
svn switch https://svn.apache.org/repos/asf/lucene/solr/trunk

Don't forget to change any SVN paths that may be configured in your IDEs too.

-Yonik



[jira] Commented: (SOLR-112) Hierarchical Handler Config

2007-01-18 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465689
 ] 

Hoss Man commented on SOLR-112:
---

Damn Ryan ... you keep taking on cool features nad churning out patches too 
fast for me to read them!

this sounds like a cool idea, the one big caveat is documenting exactly how the 
NamedList "merge" method you wrote is expected to work ... ie:
  * what it does if both named lists have the same key?
  * does it do deep merging of nested named list/collections?
  * what does it do if one list has an element without a name (first and 
formost a 
NamedLIst is an order list after all - the names are optional)

..as far as unit tests go, the easiest way to test something like this is to 
start by writing a unit test of just the NamedList mergin logic -- independent 
of anything else (this class would be a good place to put a test of the 
SOLR-107 changes too by the way).  

next would be to test that the merge logic is getting used as you expect, with 
a test that uses a config file with several handlers all inheriting various 
properties from one another, and then a test that does queries against them -- 
the easiest way to do validate that the init params were getting inherited 
correctly would probably be to use an "EchoConfigRequestHandler" that looked 
something like this...

   public class EchoConfigRequestHandler impliments SolrRequestHandler {
 private NamedList initParams;
 public void init(NamedList nl) { initParams = nl);
 public void handleRequest(SOlrQueryRequest req, SolrQueryResponse rsp) {
 rsp.add(initParams);
 }
   }

the AbstractSolrTestClass makes it easy for you to use any solrconfig file you 
want by overriding a method -- so you could even write one test case using one 
config file with lots of examples of inherited init params and a test method 
for each asserting that the params are what is expected, and then subclass it 
with another test class instance that's exactly the same except for the using a 
differnet solrconfig file where all of hte params are duplicated out explicitly 
-- testing your assumptions so to speak.


> Hierarchical Handler Config
> ---
>
> Key: SOLR-112
> URL: https://issues.apache.org/jira/browse/SOLR-112
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
>Priority: Minor
> Fix For: 1.2
>
> Attachments: SOLR-112.patch
>
>
> From J.J. Larrea on SOLR-104
> 2. What would make this even more powerful would be the ability to "subclass" 
> (meaning refine and/or extend) request handler configs: If the requestHandler 
> element allowed an attribute extends="" and 
> chained the SolrParams, then one could do something like:
>class="solr.DisMaxRequestHandler" >
> 
>  0.01
>  
> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>  
>  ... much more, per the "dismax" example in the sample solrconfig.xml ...
>   
>   ... and replacing the "partitioned" example ...
>extends="search/products/all" >
> 
>   inStock:true
> 
>   

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




RE: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Chris Hostetter

: > With all this talk about plugins, registries etc., /me can't help
: > thinking that this would be a good time to introduce the Spring IoC
: > container to manage this stuff.

I don't have a lot of familiarity with spring except for the XML
configuration file used for telling the spring context what objects you
want it to create on startup and what constructor args to pass then and
what methods to call and so on -- with an easy ability to tell it to pass
one object you had it construct as a param to another object you are hving
it construct.

on the whole, it seems really nice, and eventually using it to replace a
lot of the home-growm configuration in SOlr would probably make a lot of
sense ... but i don't think migrating to Spring is neccessary as part of
the current push to support more configurable plugins for updates ... SOlr
already has a pretty decent set of utilities for allowing class instances
to be specified in the xml config file and have configuration arguments
passed to them on initialization .. it's not as fancy as spring and it
doesn't support as many features as spring, but it works well enough that
it should be easy to use with the new plugins we start to add -- switching
to spring right now would probably only complicate the issues, and
probably wouldn't make adding Update plugins any easier.

equally important: adding a few new types of plugins now probably won't
make it any harder to switch to somehting like spring later ... which as i
said, is something i definitely anticipate happening




-Hoss



Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-18 Thread Chris Hostetter

: I think the confusion is that (in my view) the RequestParser is the
: *only* object able to touch the stream.  I don't think anything should
: happen between preProcess() and process();  A RequestParser converts a
: HttpServletRequest to a SolrRequest.  Nothing else will touch the
: servlet request.

that makes it the RequestParsers responsibility to dictate the URL format
(if it's the only one that can touch the HttpServletRequest) i was
proposing a method by which the Servlet could determine the URL format --
there could in fact be multiple servlets supporting different URL formats
if we had some need for it -- and the RequestParser could generate streams
based on the raw POST data and/or any streams it wants to find based on
the SolrParams generated from the URL (ie: local files, remote resources,
etc)

I'm confused by your sentence "A RequestParser converts a
HttpServletRequest to a SolrRequest." .. i thought you were advocating
that the servlet parse the URL to pick a RequestHandler, and then the
RequestHandler dicates the RequestParser?

: /path/registered/in/solr/config:requestparser?params
:
: If no ':' is in the URL, use 'standard' parser
:
: 1. The URL path determins the RequestHandler
: 2. The URL path determins the RequestParser
: 3. SolrRequest = RequestParser.parse( HttpServletRequest )
: 4. handler.handleRequest( req, res );
: 5. write the response

do you mean the path before hte colon determins the RequestHandler and the
path after the colon determines the RequestParser? ... that would work
fine too ... i was specificly trying to avoid making any design
decissions that required a particular URL structure, in what you propose
we are dictating more then just the "/handler/path:parser" piece of the
URL, we are also dicating that the Parser decides how the rest of the path
and all URL query string data will be interpreted -- which means if we
have a PostBodyRequestParser and a LocalFileRequestParser and a
RemoteUrlRequestParser and which all use the query stirng params to get
the SolrParams for the request (and in the case of the last two: to know
what file/url to parse) and then we decide that we want to support a URL
structure that is more REST like and uses the path for including
information, now we have to write a new version of all of those
RequestParsers ( subclass of each probably) that knows what our new URL
structure looks like ... even if that never comes up, every RequestParser
(even custom ones written by users to use some crazy proprietery binary
protocols we've never heard of to fetch stream of data has to worry about
extracting the SOlrParams out of the URL.

what i'm proposing is that the Servlet decide how to get the SolrParams
out of an HttpServletRequest, using whatever URL that servlet wants; the
RequestParser decides how to get the ContentStreams needed for that
request -- in a way that can work regardless of wether the stream is
acctually part of the HttpServletRequest, or just refrenced by a param in
the the request; the RequestHandler decides what to do with those params
and streams; and the the ResponseWriter decides how to format the results
produced by the RequestHandler back to the client.

: > : If anyone needs to customize this chain of events, they could easily
: > : write their own Servlet/Filter

: I don't *think* this would happen often, and the people would only do
: it if they are unhappy with the default URL structure -> behavior
: mapping.  I am not suggesting this would be the normal way to
: configure solr.

I think i'm getting confused ... i thought you were advocating that
RequestParsers be implimented as ServletFilters (or Servlets) ... but if
that were the case it wouldn't just be able changing hte URL structure, it
would be able picking new ways to get streams .. but that doesn't seem to
be what you are suggesting, so i'm not sure what i was missunderstanding.



-Hoss