Re: [Solr Wiki] Update of LukeRequestHandler by ryan
Yonik Seeley wrote: A really pedantic, super minor comment, but should docID be docId instead, or are my aesthetics just off? For consistency, you are right. I have keep getting myself into trouble because i like ID better then Id... this taste often disagrees with some of the automagic bean getters/setters change in rev 533304.
Re: Do we agree on our RTC way of working? (was: Welcome Ryan McKinley!)
On 4/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: snip-lotsa-good-stuff/ ...My *personal* philosophy is probably more permissive than most:.. Thanks for sharing this, you're totally right that a half-baked patch is better than no patch at all, and that there are different stages which make sense in contributions. Hard rules wouldn't work, but I'm glad we've had this discussion (and I'll go back to my corner now ;-) Also, thanks Hoss for creating http://wiki.apache.org/solr/CommitPolicy, I think it's really good to have this. -Bertrand
Re: solr release planning for 1.2
Yonik Seeley wrote: On 4/5/07, Ryan McKinley [EMAIL PROTECTED] wrote: I'm certainly on board with adding a requestHandler mapping for /update, but i'm not sure how i feel about changing it under the covers ... I'm suggesting we keep /update mapped to SolrUpdateServlet in web.xml, but map: requestHandler name=/update class=solr.XmlUpdateRequestHandler +1 I am not sure what we should do with the DispatchFilter handle-select parameter: init-param param-namehandle-select/param-name param-valuetrue/param-value /init-param Why do we need this parameter? I thought that /select through DispatchFilter would be backward compatible with the servlet's current handling? If that's the case, just have dispatch handle it and be done with it. Since writing this, I added SOLR-204 - this lets you configure if the DispatchFilter will handle select in solrconfig.xml rather then web.xml If the configuration is in solrconfig.xml, we can set the example to use the dispatcher but still leave the option of the 'old' style servlet if that is desired. The only real difference between them is how errors are returned. The dispatcher calls req.sendError( code, msg ) while the servlet writes them out directly (causing them to be hidden by IE/FF) SOLR-204 removes the init-param
move UpdateParams
I'd like to move UpdateParams from o.a.s.handler to o.a.s.util The other classes like it are in .util objections?
[jira] Commented: (SOLR-204) Let solrconfig.xml configure the SolrDispatchFilter to handle /select
[ https://issues.apache.org/jira/browse/SOLR-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492505 ] Yonik Seeley commented on SOLR-204: --- I wanted to try this out to see what sendError() output looks like, but the patch isn't applying cleanly. $ patch -p0 c:/dl/SOLR-204* (Stripping trailing CRs from patch.) patching file src/test/test-files/solr/conf/solrconfig.xml (Stripping trailing CRs from patch.) patching file src/webapp/WEB-INF/web.xml (Stripping trailing CRs from patch.) patching file src/webapp/src/org/apache/solr/servlet/SolrDispatchFilter.java Hunk #1 FAILED at 56. 1 out of 1 hunk FAILED -- saving rejects to file src/webapp/src/org/apache/solr/ servlet/SolrDispatchFilter.java.rej (Stripping trailing CRs from patch.) patching file src/webapp/src/org/apache/solr/servlet/SolrRequestParsers.java (Stripping trailing CRs from patch.) patching file example/solr/conf/solrconfig.xml Hunk #1 succeeded at 231 (offset 8 lines). Let solrconfig.xml configure the SolrDispatchFilter to handle /select - Key: SOLR-204 URL: https://issues.apache.org/jira/browse/SOLR-204 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assigned To: Ryan McKinley Attachments: SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch The major reason to make everythign use the SolrDispatchFilter is that we would have consistent error handling. Currently, SolrServlet spits back errors using: PrintWriter writer = response.getWriter(); writer.write(msg); and the SolrDispatchFilter spits them back using: res.sendError( code, ex.getMessage() ); Using sendError lets the servlet container format the code so it shows up ok in a browser. Without it, you may have to view source to see the error. Aditionaly, SolrDispatchFilter is more decerning about including stack trace. It only includes a stack trace of 500 or an unknown response code. Eventually, the error should probably be formatted in the requested format - SOLR-141. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: solr release planning for 1.2
On 4/28/07, Ryan McKinley [EMAIL PROTECTED] wrote: If the configuration is in solrconfig.xml, we can set the example to use the dispatcher but still leave the option of the 'old' style servlet if that is desired. The only real difference between them is how errors are returned. The dispatcher calls req.sendError( code, msg ) while the servlet writes them out directly (causing them to be hidden by IE/FF) I think only the body of the response changes since the HTTP error codes were already being used for /select Since the body of the response was never really specified, and it wasn't in a parseable format, I think using sendError() could be considered backward compatible. -Yonik
Admin interface configuration changes?
As we move to arbitrary path based configuration, the JSP admin pages don't really know where things are and what to link to. In looking into how to replace get-file.jsp and how to have an upload page for /update and /update/csv, I stumbled on the idea that we could have the list of options for what is displayed in the admin interface configured in solrconfig.xml. Perhaps something like: admin defaultQuerysolr/defaultQuery header links name=solr link name=Schema path=/admin/file?file=schema.xml / link name=Config path=/admin/file?file=solrconfig.xml / link name=Analysis path=/admin/analysis.jsp / br/ link name=Statistics path=/admin/stats.jsp / link name=Info path=/admin/registry.jsp / link name=Distribution path=/admin/distributiondump.jsp / link name=Ping path=/admin/ping / link name=Logging path=/admin/logging.jsp / /links links name=update link name=Update path=/admin/?show=update.html / link name=CSVpath=/admin/?show=updatecsv.html / /links links name=App server link name=Properties path=/admin/properties / link name=Thread Dump path=/admin/threaddump.jsp / /links /header ... Thoughts?
[jira] Updated: (SOLR-204) Let solrconfig.xml configure the SolrDispatchFilter to handle /select
[ https://issues.apache.org/jira/browse/SOLR-204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-204: --- Attachment: SOLR-204-HandleSelect.patch applies cleanly with trunk Let solrconfig.xml configure the SolrDispatchFilter to handle /select - Key: SOLR-204 URL: https://issues.apache.org/jira/browse/SOLR-204 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assigned To: Ryan McKinley Attachments: SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch The major reason to make everythign use the SolrDispatchFilter is that we would have consistent error handling. Currently, SolrServlet spits back errors using: PrintWriter writer = response.getWriter(); writer.write(msg); and the SolrDispatchFilter spits them back using: res.sendError( code, ex.getMessage() ); Using sendError lets the servlet container format the code so it shows up ok in a browser. Without it, you may have to view source to see the error. Aditionaly, SolrDispatchFilter is more decerning about including stack trace. It only includes a stack trace of 500 or an unknown response code. Eventually, the error should probably be formatted in the requested format - SOLR-141. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-204) Let solrconfig.xml configure the SolrDispatchFilter to handle /select
[ https://issues.apache.org/jira/browse/SOLR-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492508 ] Ryan McKinley commented on SOLR-204: sendError lets the web app decide how to format the response body. Typically they put HTML with the status code, with a footer saying the Jetty or Resin This is what you get to configure with: error-page exception-typejava.lang.Exception/exception-type location/error/location /error-page error-pageerror-code404/error-codelocation/error/location/error-page etc Let solrconfig.xml configure the SolrDispatchFilter to handle /select - Key: SOLR-204 URL: https://issues.apache.org/jira/browse/SOLR-204 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assigned To: Ryan McKinley Attachments: SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch The major reason to make everythign use the SolrDispatchFilter is that we would have consistent error handling. Currently, SolrServlet spits back errors using: PrintWriter writer = response.getWriter(); writer.write(msg); and the SolrDispatchFilter spits them back using: res.sendError( code, ex.getMessage() ); Using sendError lets the servlet container format the code so it shows up ok in a browser. Without it, you may have to view source to see the error. Aditionaly, SolrDispatchFilter is more decerning about including stack trace. It only includes a stack trace of 500 or an unknown response code. Eventually, the error should probably be formatted in the requested format - SOLR-141. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-204) Let solrconfig.xml configure the SolrDispatchFilter to handle /select
[ https://issues.apache.org/jira/browse/SOLR-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492511 ] Yonik Seeley commented on SOLR-204: --- OK cool, for something like an undefined field, it looks fine: undefined field catdsfgsdg But for something like a query parsing error, the only pointer to *what* the error is is in the stack trace, and you don't get that back. You just get: Error parsing Lucene query The logs show: SEVERE: org.apache.lucene.queryParser.ParseException: Cannot parse 'foo:*': '*' or '?' not allowed as first character in WildcardQuery at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:149) at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:94) at org.apache.solr.request.StandardRequestHandler.handleRequestBody(StandardRequestHandler.java:85) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) Hmmm, but I think this is an exception issue: In QueryParsing.java: } catch (ParseException e) { SolrCore.log(e); throw new SolrException(400,Error parsing Lucene query,e); } should probably be something more like: throw new SolrException(400,Query parsing error: + e.getMessage() ,e); Let solrconfig.xml configure the SolrDispatchFilter to handle /select - Key: SOLR-204 URL: https://issues.apache.org/jira/browse/SOLR-204 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assigned To: Ryan McKinley Attachments: SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch The major reason to make everythign use the SolrDispatchFilter is that we would have consistent error handling. Currently, SolrServlet spits back errors using: PrintWriter writer = response.getWriter(); writer.write(msg); and the SolrDispatchFilter spits them back using: res.sendError( code, ex.getMessage() ); Using sendError lets the servlet container format the code so it shows up ok in a browser. Without it, you may have to view source to see the error. Aditionaly, SolrDispatchFilter is more decerning about including stack trace. It only includes a stack trace of 500 or an unknown response code. Eventually, the error should probably be formatted in the requested format - SOLR-141. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-204) Let solrconfig.xml configure the SolrDispatchFilter to handle /select
[ https://issues.apache.org/jira/browse/SOLR-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492512 ] Ryan McKinley commented on SOLR-204: should probably be something more like: throw new SolrException(400,Query parsing error: + e.getMessage() ,e); Yes, the other change is that errors for RequestDispatcher only print the stack trace if it is =500, 400 (bad request) assumes the message will contain a user useful response. Let solrconfig.xml configure the SolrDispatchFilter to handle /select - Key: SOLR-204 URL: https://issues.apache.org/jira/browse/SOLR-204 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assigned To: Ryan McKinley Attachments: SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch The major reason to make everythign use the SolrDispatchFilter is that we would have consistent error handling. Currently, SolrServlet spits back errors using: PrintWriter writer = response.getWriter(); writer.write(msg); and the SolrDispatchFilter spits them back using: res.sendError( code, ex.getMessage() ); Using sendError lets the servlet container format the code so it shows up ok in a browser. Without it, you may have to view source to see the error. Aditionaly, SolrDispatchFilter is more decerning about including stack trace. It only includes a stack trace of 500 or an unknown response code. Eventually, the error should probably be formatted in the requested format - SOLR-141. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Luke handler help
I have a few things I'd like to check with the Luke handler, if you call could check some of the assumptions, that would be great. * I want to print out the document frequency for a term in a given document. Since that term shows up in the given document, I would think the term frequency must be 1. I am using: reader.docFreq( t ) [line 236] The results seem reasonable, but *sometimes* it returns zero... is that possible? * I want to return the lucene field flags for each field. I run through all the field names with: reader.getFieldNames(IndexReader.FieldOption.ALL). Is there a way to get any Fieldable for a given name? IIUC, all terms with the same name will have the same flags. I tried searching for a document with that field, it works, but only for stored fields. * I just realized that I am only returning stored fields for get getDocumentFieldsInfo() (it uses Document.getFields()) How can I get find *all* Fieldables for a given document? I have tried following the luke source, but get a bit lost ;) * Each field gets an boolean attribute cacheableFaceting -- this true if the number of distinct terms is smaller then the filterCacheSize. I get the filterCacheSize from: solrconfig.xml:query/filterCache/@size and get the distinctTerm count from counting up the termEnum. Is this logic solid? I know the cacheability changes if you are faciting multiple fields at once, but its still nice to have a ballpark estimate without needing to know the internals. thanks for any pointers ryan
[jira] Updated: (SOLR-212) Embeddable class to call solr directly
[ https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-212: --- Attachment: SOLR-212-DirectSolrConnection.patch Adding dataDir to an optional constructor. Embeddable class to call solr directly -- Key: SOLR-212 URL: https://issues.apache.org/jira/browse/SOLR-212 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Minor Attachments: SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch For some embedded applications, it is useful to call solr without running an HTTP server. This class mimics the behavior you would get if you sent the request through an HTTP connection. It is designed to work nicely (ie simple) with JNI the main function is: public class DirectSolrConnection { String request( String pathAndParams, String body ) throws Exception { ... } } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Luke handler help
Yonik Seeley wrote: On 4/28/07, Ryan McKinley [EMAIL PROTECTED] wrote: I have a few things I'd like to check with the Luke handler, if you call could check some of the assumptions, that would be great. * I want to print out the document frequency for a term in a given document. Since that term shows up in the given document, I would think the term frequency must be 1. I am using: reader.docFreq( t ) [line 236] The results seem reasonable, but *sometimes* it returns zero... is that possible? Is the field indexed? Did you run the field through the analyzer to get the terms (to match what's in the index)? If both of those are true, it seems like the docFreq should always be greater than 0. aah, that makes sense - now that you mention it, I only see df=0 for non-indexed, stored fields. In an inverted index, terms point to documents. So you have to traverse *all* of the terms of a field across all documents, and keep track of when you run across the document you are interested in. When you do, then get the positions that the term appeared at, and keep track of them. After you have covered all the terms, you can put everything in order. There could be gaps (positionIncrement, stop word removal, etc) and it's also possible for multiple tokens to appear at the same position. For a full-text field with many terms, and a large index, this could take a *long* time. It's probably very useful for debugging though. that must be why luke starts a new thread for 'reconstruct and edit' For now, i will leave this out of the handler, and leave that open to someone with the need/time in the future. * Each field gets an boolean attribute cacheableFaceting -- this true if the number of distinct terms is smaller then the filterCacheSize. I get the filterCacheSize from: solrconfig.xml:query/filterCache/@size and get the distinctTerm count from counting up the termEnum. Is this logic solid? I know the cacheability changes if you are faciting multiple fields at once, but its still nice to have a ballpark estimate without needing to know the internals. It could get trickier... I'm about to hack up a quick patch now that will reduce memory usage by only using the filterCache above a certain df threshold. It may increase or decrease the faceting speed - TBD. Also, other alternate faceting schemes are in the works (a month or two out). I'd leave this attribute out and just report on the number of unique terms. ok, that seems reasonable. Some kind of histogram might be really nice though (how many terms under varying df values): 1=412 (412 terms have a df of 1) 2=516 (516 terms have a df of 2) 4=600 8=650 16=670 32=680 64=683 128=685 256=686 11325=690 (the maxDf found) I'll take a look at that Remember that df is not updated when a document is marked for deletion in Lucene. So you can have a df of 2, do a search, and only come up with one document. that would explain why I'm seeing df 1 for the uniqueKey!
[jira] Commented: (SOLR-212) Embeddable class to call solr directly
[ https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492518 ] Brian Whitman commented on SOLR-212: Much love from user land on this one. I just successfully put solr in a C app without any webserver running using JNI. After I clean up my JNI calling code I can post an example app here to show how it's done on the client side if anyone is interested? Embeddable class to call solr directly -- Key: SOLR-212 URL: https://issues.apache.org/jira/browse/SOLR-212 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Minor Attachments: SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch For some embedded applications, it is useful to call solr without running an HTTP server. This class mimics the behavior you would get if you sent the request through an HTTP connection. It is designed to work nicely (ie simple) with JNI the main function is: public class DirectSolrConnection { String request( String pathAndParams, String body ) throws Exception { ... } } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Luke handler help
In an inverted index, terms point to documents. So you have to traverse *all* of the terms of a field across all documents, and keep track of when you run across the document you are interested in. When you do, then get the positions that the term appeared at, and keep track of them. After you have covered all the terms, you can put everything in order. There could be gaps (positionIncrement, stop word removal, etc) and it's also possible for multiple tokens to appear at the same position. For a full-text field with many terms, and a large index, this could take a *long* time. It's probably very useful for debugging though. I just realized that it's worse... if you specified a field, then you only have to iterate the terms for that field. If you want *all* of the indexed, non-stored fields for a particular document, but don't know what they are, there is no info to help you. You need to iterate over *all* terms in the index. Luckily, there is patch in the works in Lucene that will make skipTo(myDoc) in TermDocs faster. That should speed things up a little. Remember that df is not updated when a document is marked for deletion in Lucene. So you can have a df of 2, do a search, and only come up with one document. that would explain why I'm seeing df 1 for the uniqueKey! Yep, that's not likely to ever be fixed in Lucene. Again, it's the nature of the inverted index... given a particular docid, you really have no clue what terms in the index point to that docid. -Yonik
[jira] Commented: (SOLR-212) Embeddable class to call solr directly
[ https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492522 ] Brian Whitman commented on SOLR-212: Since the main use case of SOLR-212 is to embed it in client applications, we should be careful about logging. As of now SOLR-212 will spit stuff all over stderr. I suggest putting this System.setProperty(java.util.logging.config.file, instanceDir+/conf/logging.properties); near line 79 of DirectSolrConnection.java. That way, if a developer/user chooses, they can put a logging.prop file in conf and set direct logging of Solr requests either to their own application logs or a file. If the conf/logging.properties file does not exist, I believe the default logging.properties will be used (which is what happens now.) Embeddable class to call solr directly -- Key: SOLR-212 URL: https://issues.apache.org/jira/browse/SOLR-212 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Minor Attachments: SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch For some embedded applications, it is useful to call solr without running an HTTP server. This class mimics the behavior you would get if you sent the request through an HTTP connection. It is designed to work nicely (ie simple) with JNI the main function is: public class DirectSolrConnection { String request( String pathAndParams, String body ) throws Exception { ... } } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-181) Support for Required field Property
[ https://issues.apache.org/jira/browse/SOLR-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-181: --- Attachment: solr-181-required-fields.patch Finally got a chance to look at this. It looks good. I made a few modifications: 1. changed tabs to spaces 2. Added javadoc comments to make it clear that RequiredFields must contain all fieldsWithDefaultValues 3. The error now contains the documents uniqueKey 4. moved the test to o.a.s.schema 5. I added a non-final flag to SchemaField to say if the field is required. 6. Modified IndexSchema.java to set the uniqueKey as required *unless* it is specified as required=false in the schema 7. Added required=true to the example schema.xml 8. Added required=false to the test schema.xml (one test does not include it) As a note to anyone else looking at the change log, Greg's patch also modifies AbstractSolrTestCase and TestHarness to be able to check what status is expected from checkUpdateU I think this offers a good solution to the (mis)feature that you could have a null uniqueKey. This patch lets you have a null uniqueKey, but you have to configure it. Support for Required field Property - Key: SOLR-181 URL: https://issues.apache.org/jira/browse/SOLR-181 Project: Solr Issue Type: Improvement Components: update Reporter: Greg Ludington Priority: Minor Attachments: solr-181-required-fields.patch, solr-181-required-fields.patch In certain situations, it can be helpful to require every document in your index has a value for a given field. While ideally the indexing client(s) should be responsible enough to add all necessary fields, this patch allows it to be enforced in the Solr schema, by adding a required property to a field entry. For example, with this in the schema: field name=name type=nametext indexed=true stored=true required=true/ A request to index a document without a name field will result in this response: result status=1org.apache.solr.core.SolrException: missing required fields: name (and then, of course, the stack trace) /result The meat of this patch is that DocumentBuilder.getDoc() throws a SolrException if not all required fields have values; this may not work well as is with SOLR-139, Support updateable/modifiable documents, and may have to be changed depending on that issue's final disposition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-181) Support for Required field Property
[ https://issues.apache.org/jira/browse/SOLR-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley reassigned SOLR-181: -- Assignee: Ryan McKinley Support for Required field Property - Key: SOLR-181 URL: https://issues.apache.org/jira/browse/SOLR-181 Project: Solr Issue Type: Improvement Components: update Reporter: Greg Ludington Assigned To: Ryan McKinley Priority: Minor Attachments: solr-181-required-fields.patch, solr-181-required-fields.patch In certain situations, it can be helpful to require every document in your index has a value for a given field. While ideally the indexing client(s) should be responsible enough to add all necessary fields, this patch allows it to be enforced in the Solr schema, by adding a required property to a field entry. For example, with this in the schema: field name=name type=nametext indexed=true stored=true required=true/ A request to index a document without a name field will result in this response: result status=1org.apache.solr.core.SolrException: missing required fields: name (and then, of course, the stack trace) /result The meat of this patch is that DocumentBuilder.getDoc() throws a SolrException if not all required fields have values; this may not work well as is with SOLR-139, Support updateable/modifiable documents, and may have to be changed depending on that issue's final disposition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-212) Embeddable class to call solr directly
[ https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-212: --- Attachment: SOLR-212-DirectSolrConnection.patch Updated to take an (optional) logging path Embeddable class to call solr directly -- Key: SOLR-212 URL: https://issues.apache.org/jira/browse/SOLR-212 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Minor Attachments: SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch For some embedded applications, it is useful to call solr without running an HTTP server. This class mimics the behavior you would get if you sent the request through an HTTP connection. It is designed to work nicely (ie simple) with JNI the main function is: public class DirectSolrConnection { String request( String pathAndParams, String body ) throws Exception { ... } } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-220) Solr returns HTTP status code=1 in some case
Solr returns HTTP status code=1 in some case -- Key: SOLR-220 URL: https://issues.apache.org/jira/browse/SOLR-220 Project: Solr Issue Type: Bug Components: search Reporter: Koji Sekiguchi If I request the following on solr example: http://localhost:8080/solr/select?q=ipod%3Bzzz+ascversion=2.2start=0rows=10indent=on I got an exception as I expected because zzz isn't undefined, but HTTP status code is 1. I expected 400 in this case. The reason of this is because IndexSchema.getField() method throws SolrException(1,) and QueryParsing.parseSort() doesn't catch it: // getField could throw an exception if the name isn't found SchemaField f = schema.getField(part); // === makes HTTP status code=1 if (f == null || !f.indexed()){ throw new SolrException( 400, can not sort on unindexed field: +part ); } There seems to be a couple of ways to solve this problem: 1. IndexSchema.getField() method throws SolrException(400,) 2. IndexSchema.getField() method doesn't throw the exception but returns null 3. The caller catches the exception and re-throws SolrException(400,) 4. The caller catches the exception and re-throws SolrException(400,,cause) that wraps the cause exception I think either #3 or #4 will be acceptable. The attached patch is #3 for sort on undefined field. Other than QueryParsing.parseSort(), IndexSchema.getField() is called by the following class/methos: - CSVLoader.prepareFields() - JSONWriter.writeDoc() - SimpleFacets.getTermCounts() - QueryParsing.parseValSource() I'm not sure these methods require same patch. Any thoughts? regards, -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-220) Solr returns HTTP status code=1 in some case
[ https://issues.apache.org/jira/browse/SOLR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-220: Attachment: QueryParsing.patch the patch for sort on undefined field Solr returns HTTP status code=1 in some case -- Key: SOLR-220 URL: https://issues.apache.org/jira/browse/SOLR-220 Project: Solr Issue Type: Bug Components: search Reporter: Koji Sekiguchi Attachments: QueryParsing.patch If I request the following on solr example: http://localhost:8080/solr/select?q=ipod%3Bzzz+ascversion=2.2start=0rows=10indent=on I got an exception as I expected because zzz isn't undefined, but HTTP status code is 1. I expected 400 in this case. The reason of this is because IndexSchema.getField() method throws SolrException(1,) and QueryParsing.parseSort() doesn't catch it: // getField could throw an exception if the name isn't found SchemaField f = schema.getField(part); // === makes HTTP status code=1 if (f == null || !f.indexed()){ throw new SolrException( 400, can not sort on unindexed field: +part ); } There seems to be a couple of ways to solve this problem: 1. IndexSchema.getField() method throws SolrException(400,) 2. IndexSchema.getField() method doesn't throw the exception but returns null 3. The caller catches the exception and re-throws SolrException(400,) 4. The caller catches the exception and re-throws SolrException(400,,cause) that wraps the cause exception I think either #3 or #4 will be acceptable. The attached patch is #3 for sort on undefined field. Other than QueryParsing.parseSort(), IndexSchema.getField() is called by the following class/methos: - CSVLoader.prepareFields() - JSONWriter.writeDoc() - SimpleFacets.getTermCounts() - QueryParsing.parseValSource() I'm not sure these methods require same patch. Any thoughts? regards, -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-220) Solr returns HTTP status code=1 in some case
[ https://issues.apache.org/jira/browse/SOLR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492531 ] Ryan McKinley commented on SOLR-220: I just checked in a much smaller patch that at least won't throw a status code=1 http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/schema/IndexSchema.java?r1=533449r2=533448pathrev=533449 We should probably use your patch so that it has a nice context specific error, rather then the general undefined field As an aside, SOLR-204 will make the request dispatcher the default /select handler. This catches invalid error codes and returns a 500. thanks Solr returns HTTP status code=1 in some case -- Key: SOLR-220 URL: https://issues.apache.org/jira/browse/SOLR-220 Project: Solr Issue Type: Bug Components: search Reporter: Koji Sekiguchi Attachments: QueryParsing.patch If I request the following on solr example: http://localhost:8080/solr/select?q=ipod%3Bzzz+ascversion=2.2start=0rows=10indent=on I got an exception as I expected because zzz isn't undefined, but HTTP status code is 1. I expected 400 in this case. The reason of this is because IndexSchema.getField() method throws SolrException(1,) and QueryParsing.parseSort() doesn't catch it: // getField could throw an exception if the name isn't found SchemaField f = schema.getField(part); // === makes HTTP status code=1 if (f == null || !f.indexed()){ throw new SolrException( 400, can not sort on unindexed field: +part ); } There seems to be a couple of ways to solve this problem: 1. IndexSchema.getField() method throws SolrException(400,) 2. IndexSchema.getField() method doesn't throw the exception but returns null 3. The caller catches the exception and re-throws SolrException(400,) 4. The caller catches the exception and re-throws SolrException(400,,cause) that wraps the cause exception I think either #3 or #4 will be acceptable. The attached patch is #3 for sort on undefined field. Other than QueryParsing.parseSort(), IndexSchema.getField() is called by the following class/methos: - CSVLoader.prepareFields() - JSONWriter.writeDoc() - SimpleFacets.getTermCounts() - QueryParsing.parseValSource() I'm not sure these methods require same patch. Any thoughts? regards, -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-181) Support for Required field Property
[ https://issues.apache.org/jira/browse/SOLR-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492532 ] Yonik Seeley commented on SOLR-181: --- Haven't looked at the code, but the description looks fine. +1 Support for Required field Property - Key: SOLR-181 URL: https://issues.apache.org/jira/browse/SOLR-181 Project: Solr Issue Type: Improvement Components: update Reporter: Greg Ludington Assigned To: Ryan McKinley Priority: Minor Attachments: solr-181-required-fields.patch, solr-181-required-fields.patch In certain situations, it can be helpful to require every document in your index has a value for a given field. While ideally the indexing client(s) should be responsible enough to add all necessary fields, this patch allows it to be enforced in the Solr schema, by adding a required property to a field entry. For example, with this in the schema: field name=name type=nametext indexed=true stored=true required=true/ A request to index a document without a name field will result in this response: result status=1org.apache.solr.core.SolrException: missing required fields: name (and then, of course, the stack trace) /result The meat of this patch is that DocumentBuilder.getDoc() throws a SolrException if not all required fields have values; this may not work well as is with SOLR-139, Support updateable/modifiable documents, and may have to be changed depending on that issue's final disposition. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-212) Embeddable class to call solr directly
[ https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492534 ] Otis Gospodnetic commented on SOLR-212: --- Brian: interested! Embeddable class to call solr directly -- Key: SOLR-212 URL: https://issues.apache.org/jira/browse/SOLR-212 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assigned To: Ryan McKinley Priority: Minor Attachments: SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch For some embedded applications, it is useful to call solr without running an HTTP server. This class mimics the behavior you would get if you sent the request through an HTTP connection. It is designed to work nicely (ie simple) with JNI the main function is: public class DirectSolrConnection { String request( String pathAndParams, String body ) throws Exception { ... } } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-221) faceting memory and performance improvement
[ https://issues.apache.org/jira/browse/SOLR-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-221: -- Attachment: facet.patch faceting memory and performance improvement --- Key: SOLR-221 URL: https://issues.apache.org/jira/browse/SOLR-221 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Assigned To: Yonik Seeley Attachments: facet.patch 1) compare minimum count currently needed to the term df and avoid unnecessary intersection count 2) set a minimum term df in order to use the filterCache, otherwise iterate over TermDocs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-221) faceting memory and performance improvement
[ https://issues.apache.org/jira/browse/SOLR-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492543 ] Yonik Seeley commented on SOLR-221: --- The results are slightly surprising. I made up an index, and each document contained 4 random numbers between 1 and 500,000 This is not the distribution one would expect to see in a real index. but we can still learn much. The synthetic index: maxDoc=500,000 numDocs=393,566 number of segments = 5 number of unique facet terms = 490903 filterCache max size = 1,000,000 entries (more than enough) JVM=1.5.0_09 -server -Xmx200M System=WinXP, 3GHz P4, hyperthreaded, 1GB dual channel RAM facet type = facet.field, facet.sort=true, facet.limit=10 maximum df of any term = 15 warming times were not included... queries were run many times and the lowest time recorded. Number of documents that match test base queries (for example, base query #1 matches 175K docs): 1) 175000, 2) 43000 3) 8682 4) 2179 5) 422 6) 1 WITHOUT PATCH (milliseconds to facet each base query): 1578, 1578, 1547, 1485, 1484,1422 WITH PATCH (min df comparison w/ term df, minDfFilterCache=0) (all field cache) 984, 1203, 1391, 1437, 1484, 1420 WITH PATCH (min df comp, minDfFilterCache=30) (no fieldCache at all) 1406, 2344, 3125, 3015, 3172, 3172 CONCLUSION1: min df comparison increases faceting speed 60% when the base query matches many documents. With a real term distribution, this could be even greater. CONCLUSION2: opting to not use the fieldCache for smaller df terms can save a lot of memory, but it hurts performance up to 200% for our non-optimized index. CONCLUSION3: using the field cache less can significantly speed up warming time (times not shown, but a full warming of the fieldCache took 33 sec) now the same index, but optimized === WITH PATCH (optimized, min df comparison w/ term df, minDfFilterCache=0) (all field cache) 172, 312, 485, 578, 610, 656 WITH PATCH (optimized, min df comp, minDfFilterCache=30) (no fieldCache at all) 265, 344, 422, 468, 500, 484 CONCLUSION3: An optimized index increased performance 200-500% CONCLUSION4: The fact that an all-fieldcache option was significantly faster on an optimized probably cannot totally be explained by accurate dfs (no deleted documents to inflate the term df values), means that just iterating over the terms is *much* faster in an optimized index (a potential Lucene area to look into) faceting memory and performance improvement --- Key: SOLR-221 URL: https://issues.apache.org/jira/browse/SOLR-221 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Assigned To: Yonik Seeley Attachments: facet.patch 1) compare minimum count currently needed to the term df and avoid unnecessary intersection count 2) set a minimum term df in order to use the filterCache, otherwise iterate over TermDocs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.