Re: Solr 1.1 released
*clink* wonderful job, Solr team! Erik On Dec 22, 2006, at 5:14 PM, Bertrand Delacretaz wrote: On 12/22/06, Yonik Seeley [EMAIL PROTECTED] wrote: ...Solr 1.1 is now available for download!... Yoo-hooh, congratulations, virtual champagne everybody! -Bertrand
Re: ApacheCon Europe '07 Solr proposals?
On 11/30/06, Yonik Seeley [EMAIL PROTECTED] wrote: ...Oh, and people should feel free to pilfer anything that might be useful my last presentation Is anyone submitting an intro to Solr talk? I'm planing to submit a case study of my current project (how to graft Solr on existing CMSes, preparing documents, ajax frontends, etc.), but I think we should have an introductory talk as well. -Bertrand
Re: Solr 1.1 released
Yup, definitely good stuff. Nice, speedy release process too, especially compared to other Incubator release approvals. Yoav On 12/23/06, Erik Hatcher [EMAIL PROTECTED] wrote: *clink* wonderful job, Solr team! Erik On Dec 22, 2006, at 5:14 PM, Bertrand Delacretaz wrote: On 12/22/06, Yonik Seeley [EMAIL PROTECTED] wrote: ...Solr 1.1 is now available for download!... Yoo-hooh, congratulations, virtual champagne everybody! -Bertrand
Re: ApacheCon Europe '07 Solr proposals?
I haven't submitted yet, but I'd be happy to submit an intro Solr preso. I've got an interesting Solr/Ruby related project I plan to contribute under client/ruby real soon now that I may want to present instead, or in addition to. Erik On Dec 23, 2006, at 4:40 AM, Bertrand Delacretaz wrote: On 11/30/06, Yonik Seeley [EMAIL PROTECTED] wrote: ...Oh, and people should feel free to pilfer anything that might be useful my last presentation Is anyone submitting an intro to Solr talk? I'm planing to submit a case study of my current project (how to graft Solr on existing CMSes, preparing documents, ajax frontends, etc.), but I think we should have an introductory talk as well. -Bertrand
Re: Schema Parsing Failed, fix?
: : Node.ATTRIBUTE_NODE case so it is treated the same as TEXT_NODE and it : : works for resin and the tests pass. : : Hmmm... yeah, this seems to be a mistake in the DOM-Level-3-Core : description of what getText is suppose to do ... it says that for : ATTRIBUTE_NODE you should concat all of the children -- but how would an : ATTRIBUTE ever have children? Did some more reading ... according to DOM-Level-3-Core, an Attr's allowed children are Text and EntityReference. Xerces2-j NodeImpl..getTextContent duplicates the table from the Level-3-Core docs (which is also in the java 1.5 javadocs for org.w3c.dom.Node.getTextContent()) which the notable exception that they move ATTRIBUTE_NODE down into the second row (indicating nodeValue should be used instead of concating the children) ... the impl backs this up (AttrImpl inherits getTextContent from NodeImpl, which by default returns this.getNodeValue()) http://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/dom/NodeImpl.html#getTextContent() http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Node.html#getTextContent() http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/dom/AttrImpl.java?view=markup http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/dom/NodeImpl.java?view=markup Fortunately, the DOM Spec says that accessing the Attr.nodeValue is defined to be Attr.value, which is documented as... On retrieval, the value of the attribute is returned as a string. Character and general entity references are replaced with their values. ...so even if someone out there is acctually obeying the spec about giving Attr's child nodes, we should still be safe using getNodeValue in the Node.ATTRIBUTE_NODE case since the spec says that needs to work too. -Hoss
[jira] Assigned: (SOLR-92) XML parsing error with resin-3.0.21
[ http://issues.apache.org/jira/browse/SOLR-92?page=all ] Hoss Man reassigned SOLR-92: Assignee: Hoss Man should have put this in the bug instead of email... http://www.nabble.com/Schema-Parsing-Failed%2C-fix--tf2868892.html#a8038207 : : Node.ATTRIBUTE_NODE case so it is treated the same as TEXT_NODE and it : : works for resin and the tests pass. : : Hmmm... yeah, this seems to be a mistake in the DOM-Level-3-Core : description of what getText is suppose to do ... it says that for : ATTRIBUTE_NODE you should concat all of the children -- but how would an : ATTRIBUTE ever have children? Did some more reading ... according to DOM-Level-3-Core, an Attr's allowed children are Text and EntityReference. Xerces2-j NodeImpl..getTextContent duplicates the table from the Level-3-Core docs (which is also in the java 1.5 javadocs for org.w3c.dom.Node.getTextContent()) which the notable exception that they move ATTRIBUTE_NODE down into the second row (indicating nodeValue should be used instead of concating the children) ... the impl backs this up (AttrImpl inherits getTextContent from NodeImpl, which by default returns this.getNodeValue()) http://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/dom/NodeImpl.html#getTextContent() http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Node.html#getTextContent() http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/dom/AttrImpl.java?view=markup http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/dom/NodeImpl.java?view=markup Fortunately, the DOM Spec says that accessing the Attr.nodeValue is defined to be Attr.value, which is documented as... On retrieval, the value of the attribute is returned as a string. Character and general entity references are replaced with their values. ...so even if someone out there is acctually obeying the spec about giving Attr's child nodes, we should still be safe using getNodeValue in the Node.ATTRIBUTE_NODE case since the spec says that needs to work too. -- ...i'll commit this change along with some more comments explaining it XML parsing error with resin-3.0.21 --- Key: SOLR-92 URL: http://issues.apache.org/jira/browse/SOLR-92 Project: Solr Issue Type: Bug Affects Versions: 1.2 Environment: running resin-3.0.21 Reporter: Ryan McKinley Assigned To: Hoss Man Priority: Minor Attachments: resinXmlParser.patch When the resin XML parser starts, it gets the following error trying to parse the config file: [00:25:35.025] Caused by: java.lang.NumberFormatException: empty String [00:25:35.025] at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:994) [00:25:35.025] at java.lang.Float.parseFloat(Float.java:394) [00:25:35.025] at org.apache.solr.core.Config.getFloat(Config.java:174) [00:25:35.025] at org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:273) see: http://www.mail-archive.com/solr-dev@lucene.apache.org/msg01852.html -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: filter input from multiple fields
: Imaging I have a form like : form action=select/ method=get ... : input type=text name=startDate/ .. : input type=text name=endDate/ : Using the StandardRequestHandler without prior processing would result : that startDate and endDate would be ignored since they are not : within the query string and are not solr standard param that got : processed. Typically it's not recommended do have your front end users/clients hitting Solr directly sa part of an HTML form submit ... the more conventional way to think of it is that Solr is a backend service, which your applicationn can talk to over HTTP -- if you were dealing with a database, you wouldn't expect that you could generate an HTML form for your clients and then have them submit that form in some way that resulted in their browser using JDBC (or ODBC) to communicate directly with your database, their client would communicate with your App, which would validate their input, impose some security checks on the input, and then execute the underlying query to your database -- working wtih Solr should be very similar, it just so happens thta instead of using JDBC or some other binary protocol, Solr uses HTTP, and you *can* talk to it directly from a web browser, but that's relaly more of a debugging feature then anything else. -Hoss
Re: filter input from multiple fields
: I did a small hack and it works like a charm without the above mentioned : handler. I only activated variable substitution for the FQ for testing : if you think that is a nice feature I can activate it for the rest. As i said in my other reply ... i think you should reconsider the approach you are taking towards your end goal -- but in general, this of allowing variable substitution in the lucene query params seems pretty slick to me ... a more general solution might be to modify the SolrQueryParser directly to have a new void setParamVariables(SolrParams p) method. if it's called (with non null input), then any string that SolrQueryParser instance is asked to parse would first be preprocessed looking for the ${} pattern and pulling the values out of the SOlrParams instance. request handlers could then either pass their main params (if they wanted to allow kitchen sink param substitution) or if they want to be more robust (ie: Standard and DisMax), they could do what you describe: have a configured list of param names that would be used to construct a new instance of SOlrParams explicitly for the SolrQueryParser -- but i would think that would be be a good use for a new seperate init param in the solrconfig, it's not hte kind of thing you'd ever want to let the client specify. The reason this really seems cool to me is because it the format/params passing could work in either order: the format could be specificed in the config with params coming from the client, or the config could list a big long list of constant params that the client could then use however they want by specifying a format that used them. -Hoss
Re: [jira] Commented: (SOLR-81) Add Query Spellchecker functionality
: Yeah, I've used the Lucene-based spellchecker before, I just never had : to hook it up with Solr. At this point I'm not interested in the fancy : stuff (cache, RAMDir...), I just want to figure out how to configure it : via schema.xml... But the crux of the issue is that if you are maintaining a second index inside your base Solr installation for the purposes of the Spellchecker class, then you don't want or need to configure it in schema.xml -- it lives outside the schema space. I pointed this out the last time spellchecking came up, there are two extremely differnet approaches involved when you talk about implimenting a spelling/suggestion service with Solr... In the first approach, the main SOlr index *is* the suggestion index ... each Document represents a suggested word, with one stored field telling you what the word is, and indexed fields containing the ngrams. you could populate this index from any initial source: a dictionary, logs of popular query terms, or a dump of all terms in your corpus. At query time, your application would query this index seperately from querying your main Solr index containing your domain specific data. The second approach is to have the spelling/suggestion index live inside of your Solr index side by side with your main domain specific index, so your Request Handler can talk to it directly, and it can be populated directly using the terms in your corpus -- this sounds like the approach you are taking, but in this approach there is no need for your schema.xml to know anything about the index .. just use the SpellChecker class as is: construct it with an empty RAMDirectory and call indexDictionary on a LuceneDictionary pointed at your main Solr index. The only code you really need to write is something to run clearIndex and indexDirectory as a newSearcher hook (the easiest way probably being to hang your Spellchecker instance off of a single element Solr cache nad write a Regenerator) But like i said: you dodn't need to worry about making the schema know about your ngrams -- you do that if you're going for the first approach. -Hoss