Re: Solr 1.1 released

2006-12-23 Thread Erik Hatcher

*clink*

wonderful job, Solr team!

Erik

On Dec 22, 2006, at 5:14 PM, Bertrand Delacretaz wrote:


On 12/22/06, Yonik Seeley [EMAIL PROTECTED] wrote:

...Solr 1.1 is now available for download!...


Yoo-hooh, congratulations, virtual champagne everybody!
-Bertrand




Re: ApacheCon Europe '07 Solr proposals?

2006-12-23 Thread Bertrand Delacretaz

On 11/30/06, Yonik Seeley [EMAIL PROTECTED] wrote:

...Oh, and people should feel free to pilfer anything that might be
useful my last presentation


Is anyone submitting an intro to Solr talk?

I'm planing to submit a case study of my current project (how to graft
Solr on existing CMSes, preparing documents, ajax frontends, etc.),
but I think we should have an introductory talk as well.

-Bertrand


Re: Solr 1.1 released

2006-12-23 Thread Yoav Shapira

Yup, definitely good stuff.  Nice, speedy release process too,
especially compared to other Incubator release approvals.

Yoav

On 12/23/06, Erik Hatcher [EMAIL PROTECTED] wrote:

*clink*

wonderful job, Solr team!

Erik

On Dec 22, 2006, at 5:14 PM, Bertrand Delacretaz wrote:

 On 12/22/06, Yonik Seeley [EMAIL PROTECTED] wrote:
 ...Solr 1.1 is now available for download!...

 Yoo-hooh, congratulations, virtual champagne everybody!
 -Bertrand




Re: ApacheCon Europe '07 Solr proposals?

2006-12-23 Thread Erik Hatcher
I haven't submitted yet, but I'd be happy to submit an intro Solr  
preso.  I've got an interesting Solr/Ruby related project I plan to  
contribute under client/ruby real soon now that I may want to  
present instead, or in addition to.


Erik


On Dec 23, 2006, at 4:40 AM, Bertrand Delacretaz wrote:


On 11/30/06, Yonik Seeley [EMAIL PROTECTED] wrote:

...Oh, and people should feel free to pilfer anything that might be
useful my last presentation


Is anyone submitting an intro to Solr talk?

I'm planing to submit a case study of my current project (how to graft
Solr on existing CMSes, preparing documents, ajax frontends, etc.),
but I think we should have an introductory talk as well.

-Bertrand




Re: Schema Parsing Failed, fix?

2006-12-23 Thread Chris Hostetter

: : Node.ATTRIBUTE_NODE case so it is treated the same as TEXT_NODE and it
: : works for resin and the tests pass.
:
: Hmmm... yeah, this seems to be a mistake in the DOM-Level-3-Core
: description of what getText is suppose to do ... it says that for
: ATTRIBUTE_NODE you should concat all of the children -- but how would an
: ATTRIBUTE ever have children?

Did some more reading ... according to DOM-Level-3-Core, an Attr's allowed
children are Text and EntityReference.

Xerces2-j NodeImpl..getTextContent duplicates the table from the
Level-3-Core docs (which is also in the java 1.5 javadocs for
org.w3c.dom.Node.getTextContent()) which the notable exception that they
move ATTRIBUTE_NODE down into the second row (indicating nodeValue should
be used instead of concating the children) ... the impl backs this up
(AttrImpl inherits getTextContent from NodeImpl, which by default returns
this.getNodeValue())

http://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/dom/NodeImpl.html#getTextContent()
http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Node.html#getTextContent()
http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/dom/AttrImpl.java?view=markup
http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/dom/NodeImpl.java?view=markup

Fortunately, the DOM Spec says that accessing the Attr.nodeValue is
defined to be Attr.value, which is documented as...

On retrieval, the value of the attribute is returned as a string.
Character and general entity references are replaced with their values.

...so even if someone out there is acctually obeying the spec about
giving Attr's child nodes, we should still be safe using getNodeValue in
the Node.ATTRIBUTE_NODE case since the spec says that needs to work too.

-Hoss



[jira] Assigned: (SOLR-92) XML parsing error with resin-3.0.21

2006-12-23 Thread Hoss Man (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-92?page=all ]

Hoss Man reassigned SOLR-92:


Assignee: Hoss Man

should have put this in the bug instead of email...

http://www.nabble.com/Schema-Parsing-Failed%2C-fix--tf2868892.html#a8038207



: : Node.ATTRIBUTE_NODE case so it is treated the same as TEXT_NODE and it
: : works for resin and the tests pass.
:
: Hmmm... yeah, this seems to be a mistake in the DOM-Level-3-Core
: description of what getText is suppose to do ... it says that for
: ATTRIBUTE_NODE you should concat all of the children -- but how would an
: ATTRIBUTE ever have children?

Did some more reading ... according to DOM-Level-3-Core, an Attr's allowed
children are Text and EntityReference.

Xerces2-j NodeImpl..getTextContent duplicates the table from the
Level-3-Core docs (which is also in the java 1.5 javadocs for
org.w3c.dom.Node.getTextContent()) which the notable exception that they
move ATTRIBUTE_NODE down into the second row (indicating nodeValue should
be used instead of concating the children) ... the impl backs this up
(AttrImpl inherits getTextContent from NodeImpl, which by default returns
this.getNodeValue())

http://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/dom/NodeImpl.html#getTextContent()
http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/Node.html#getTextContent()
http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/dom/AttrImpl.java?view=markup
http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/dom/NodeImpl.java?view=markup

Fortunately, the DOM Spec says that accessing the Attr.nodeValue is
defined to be Attr.value, which is documented as...

On retrieval, the value of the attribute is returned as a string.
Character and general entity references are replaced with their values.

...so even if someone out there is acctually obeying the spec about
giving Attr's child nodes, we should still be safe using getNodeValue in
the Node.ATTRIBUTE_NODE case since the spec says that needs to work too.

--

...i'll commit this change along with some more comments explaining it

 XML parsing error with resin-3.0.21
 ---

 Key: SOLR-92
 URL: http://issues.apache.org/jira/browse/SOLR-92
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.2
 Environment: running resin-3.0.21
Reporter: Ryan McKinley
 Assigned To: Hoss Man
Priority: Minor
 Attachments: resinXmlParser.patch


 When the resin XML parser starts, it gets the following error trying to parse 
 the config file:
 [00:25:35.025] Caused by: java.lang.NumberFormatException: empty String
 [00:25:35.025]  at
 sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:994)
 [00:25:35.025]  at java.lang.Float.parseFloat(Float.java:394)
 [00:25:35.025]  at org.apache.solr.core.Config.getFloat(Config.java:174)
 [00:25:35.025]  at
 org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:273)
 see: http://www.mail-archive.com/solr-dev@lucene.apache.org/msg01852.html

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: filter input from multiple fields

2006-12-23 Thread Chris Hostetter

: Imaging I have a form like
: form action=select/ method=get
...
: input type=text name=startDate/
..
: input type=text name=endDate/

: Using the StandardRequestHandler without prior processing would result
: that startDate and endDate would be ignored since they are not
: within the query string and are not solr standard param that got
: processed.

Typically it's not recommended do have your front end users/clients
hitting Solr directly sa part of an HTML form submit ... the more
conventional way to think of it is that Solr is a backend service, which
your applicationn can talk to over HTTP -- if you were dealing with a
database, you wouldn't expect that you could generate an HTML form for
your clients and then have them submit that form in some way that resulted
in their browser using JDBC (or ODBC) to communicate directly with your
database, their client would communicate with your App, which would
validate their input, impose some security checks on the input, and then
execute the underlying query to your database -- working wtih Solr should
be very similar, it just so happens thta instead of using JDBC or some
other binary protocol, Solr uses HTTP, and you *can* talk to it directly
from a web browser, but that's relaly more of a debugging feature then
anything else.



-Hoss



Re: filter input from multiple fields

2006-12-23 Thread Chris Hostetter

: I did a small hack and it works like a charm without the above mentioned
: handler. I only activated variable substitution for the FQ for testing
: if you think that is a nice feature I can activate it for the rest.

As i said in my other reply ... i think you should reconsider the approach
you are taking towards your end goal -- but in general, this of allowing
variable substitution in the lucene query params seems pretty slick to me
... a more general solution might be to modify the SolrQueryParser
directly to have a new void setParamVariables(SolrParams p) method.  if
it's called (with non null input), then any string that SolrQueryParser
instance is asked to parse would first be preprocessed looking for the ${}
pattern and pulling the values out of the SOlrParams instance.

request handlers could then either pass their main params (if they wanted
to allow kitchen sink param substitution) or if they want to be more
robust (ie: Standard and DisMax), they could do what you describe: have a
configured list of param
names that would be used to construct a new instance of SOlrParams
explicitly for the SolrQueryParser -- but i would think that would be be a
good use for a new seperate init param in the solrconfig, it's not hte
kind of thing you'd ever want to let the client specify.

The reason this really seems cool to me is because it the format/params
passing could work in either order: the format could be specificed in the
config with params coming from the client, or the config could list a big
long list of constant params that the client could then use however they
want by specifying a format that used them.



-Hoss



Re: [jira] Commented: (SOLR-81) Add Query Spellchecker functionality

2006-12-23 Thread Chris Hostetter

: Yeah, I've used the Lucene-based spellchecker before, I just never had
: to hook it up with Solr.  At this point I'm not interested in the fancy
: stuff (cache, RAMDir...), I just want to figure out how to configure it
: via schema.xml...

But the crux of the issue is that if you are maintaining a second index
inside your base Solr installation for the purposes of the Spellchecker
class, then you don't want or need to configure it in schema.xml -- it
lives outside the schema space.

I pointed this out the last time spellchecking came up, there are two
extremely differnet approaches involved when you talk about implimenting
a spelling/suggestion service with Solr...

In the first approach, the main SOlr index *is* the suggestion index ...
each Document represents a suggested word, with one stored field telling
you what the word is, and indexed fields containing the ngrams.  you could
populate this index from any initial source: a dictionary, logs of popular
query terms, or a dump of all terms in your corpus.  At query time, your
application would query this index seperately from querying your main
Solr index containing your domain specific data.

The second approach is to have the spelling/suggestion index live inside
of your Solr index side by side with your main domain specific index, so
your Request Handler can talk to it directly, and it can be populated
directly using the terms in your corpus -- this sounds like the
approach you are taking, but in this approach there is no need for your
schema.xml to know anything about the index .. just use the SpellChecker
class as is: construct it with an empty RAMDirectory and call
indexDictionary on a LuceneDictionary pointed at your main Solr index.
The only code you really need to write is something to run clearIndex and
indexDirectory as a newSearcher hook  (the easiest way probably being to
hang your Spellchecker instance off of a single element Solr cache nad
write a Regenerator)

But like i said: you dodn't need to worry about making the schema know
about your ngrams -- you do that if you're going for the first approach.



-Hoss