Re: [Solr Wiki] Update of LukeRequestHandler by ryan

2007-04-28 Thread Ryan McKinley

Yonik Seeley wrote:

A really pedantic, super minor comment, but
should docID be docId instead, or are my aesthetics just off?



For consistency, you are right.  I have keep getting myself into trouble 
because i like ID better then Id...  this taste often disagrees with 
some of the automagic bean getters/setters


change in rev 533304.


Re: Do we agree on our RTC way of working? (was: Welcome Ryan McKinley!)

2007-04-28 Thread Bertrand Delacretaz

On 4/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:

snip-lotsa-good-stuff/


...My *personal* philosophy is probably more permissive than most:..


Thanks for sharing this, you're totally right that a half-baked patch
is better than no patch at all, and that there are different stages
which make sense in contributions.

Hard rules wouldn't work, but I'm glad we've had this discussion (and
I'll go back to my corner now ;-)

Also, thanks Hoss for creating
http://wiki.apache.org/solr/CommitPolicy, I think it's really good to
have this.

-Bertrand


Re: solr release planning for 1.2

2007-04-28 Thread Ryan McKinley

Yonik Seeley wrote:

On 4/5/07, Ryan McKinley [EMAIL PROTECTED] wrote:
 I'm certainly on board with adding a requestHandler mapping for 
/update,

 but i'm not sure how i feel about changing it under the covers ...

I'm suggesting we keep /update mapped to SolrUpdateServlet in web.xml, 
but map:


  requestHandler name=/update class=solr.XmlUpdateRequestHandler 


+1

I am not sure what we should do with the DispatchFilter handle-select 
parameter:

init-param
  param-namehandle-select/param-name
  param-valuetrue/param-value
/init-param


Why do we need this parameter?  I thought that /select through
DispatchFilter would be backward compatible with the servlet's current
handling?  If that's the case, just have dispatch handle it and be
done with it.



Since writing this, I added SOLR-204 - this lets you configure if the 
DispatchFilter will handle select in solrconfig.xml rather then web.xml


If the configuration is in solrconfig.xml, we can set the example to use 
the dispatcher but still leave the option of the 'old' style servlet if 
that is desired.  The only real difference between them is how errors 
are returned.  The dispatcher calls req.sendError( code, msg ) while the 
servlet writes them out directly (causing them to be hidden by IE/FF)


SOLR-204 removes the init-param


move UpdateParams

2007-04-28 Thread Ryan McKinley


I'd like to move UpdateParams from o.a.s.handler to o.a.s.util

The other classes like it are in .util

objections?


[jira] Commented: (SOLR-204) Let solrconfig.xml configure the SolrDispatchFilter to handle /select

2007-04-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492505
 ] 

Yonik Seeley commented on SOLR-204:
---

I wanted to try this out to see what sendError() output looks like, but the 
patch isn't applying cleanly.

$ patch -p0  c:/dl/SOLR-204*
(Stripping trailing CRs from patch.)
patching file src/test/test-files/solr/conf/solrconfig.xml
(Stripping trailing CRs from patch.)
patching file src/webapp/WEB-INF/web.xml
(Stripping trailing CRs from patch.)
patching file src/webapp/src/org/apache/solr/servlet/SolrDispatchFilter.java
Hunk #1 FAILED at 56.
1 out of 1 hunk FAILED -- saving rejects to file src/webapp/src/org/apache/solr/
servlet/SolrDispatchFilter.java.rej
(Stripping trailing CRs from patch.)
patching file src/webapp/src/org/apache/solr/servlet/SolrRequestParsers.java
(Stripping trailing CRs from patch.)
patching file example/solr/conf/solrconfig.xml
Hunk #1 succeeded at 231 (offset 8 lines).

 Let solrconfig.xml configure the SolrDispatchFilter to handle /select
 -

 Key: SOLR-204
 URL: https://issues.apache.org/jira/browse/SOLR-204
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
 Attachments: SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch


 The major reason to make everythign use the SolrDispatchFilter is that we 
 would have consistent error handling.  Currently, 
 SolrServlet spits back errors using:
  PrintWriter writer = response.getWriter();
  writer.write(msg);
 and the SolrDispatchFilter spits them back using:
  res.sendError( code, ex.getMessage() );
 Using sendError lets the servlet container format the code so it shows up 
 ok in a browser.  Without it, you may have to view source to see the error.
 Aditionaly, SolrDispatchFilter is more decerning about including stack trace. 
  It only includes a stack trace of 500 or an unknown response code.
 Eventually, the error should probably be formatted in the requested format - 
 SOLR-141.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: solr release planning for 1.2

2007-04-28 Thread Yonik Seeley

On 4/28/07, Ryan McKinley [EMAIL PROTECTED] wrote:

If the configuration is in solrconfig.xml, we can set the example to use
the dispatcher but still leave the option of the 'old' style servlet if
that is desired.  The only real difference between them is how errors
are returned.  The dispatcher calls req.sendError( code, msg ) while the
servlet writes them out directly (causing them to be hidden by IE/FF)


I think only the body of the response changes since the HTTP error
codes were already being used for /select

Since the body of the response was never really specified, and it
wasn't in a parseable format, I think using sendError() could be
considered backward compatible.

-Yonik


Admin interface configuration changes?

2007-04-28 Thread Ryan McKinley


As we move to arbitrary path based configuration, the JSP admin pages 
don't really know where things are and what to link to.


In looking into how to replace get-file.jsp and how to have an upload 
page for /update and /update/csv, I stumbled on the idea that we could 
have the list of options for what is displayed in the admin interface 
configured in solrconfig.xml.


Perhaps something like:

admin
defaultQuerysolr/defaultQuery
header
  links name=solr
link name=Schema   path=/admin/file?file=schema.xml /
link name=Config   path=/admin/file?file=solrconfig.xml /
link name=Analysis path=/admin/analysis.jsp /
br/
link name=Statistics   path=/admin/stats.jsp /
link name=Info path=/admin/registry.jsp /
link name=Distribution path=/admin/distributiondump.jsp /
link name=Ping path=/admin/ping /
link name=Logging  path=/admin/logging.jsp /
  /links
  links name=update
link name=Update path=/admin/?show=update.html /
link name=CSVpath=/admin/?show=updatecsv.html /
  /links
  links name=App server
link name=Properties   path=/admin/properties /
link name=Thread Dump  path=/admin/threaddump.jsp /
  /links
/header
   ...

Thoughts?




[jira] Updated: (SOLR-204) Let solrconfig.xml configure the SolrDispatchFilter to handle /select

2007-04-28 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-204:
---

Attachment: SOLR-204-HandleSelect.patch

applies cleanly with trunk

 Let solrconfig.xml configure the SolrDispatchFilter to handle /select
 -

 Key: SOLR-204
 URL: https://issues.apache.org/jira/browse/SOLR-204
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
 Attachments: SOLR-204-HandleSelect.patch, 
 SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch


 The major reason to make everythign use the SolrDispatchFilter is that we 
 would have consistent error handling.  Currently, 
 SolrServlet spits back errors using:
  PrintWriter writer = response.getWriter();
  writer.write(msg);
 and the SolrDispatchFilter spits them back using:
  res.sendError( code, ex.getMessage() );
 Using sendError lets the servlet container format the code so it shows up 
 ok in a browser.  Without it, you may have to view source to see the error.
 Aditionaly, SolrDispatchFilter is more decerning about including stack trace. 
  It only includes a stack trace of 500 or an unknown response code.
 Eventually, the error should probably be formatted in the requested format - 
 SOLR-141.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-204) Let solrconfig.xml configure the SolrDispatchFilter to handle /select

2007-04-28 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492508
 ] 

Ryan McKinley commented on SOLR-204:


sendError lets the web app decide how to format the response body.  Typically 
they put HTML with the status code, with a footer saying the Jetty or Resin

This is what you get to configure with:

  error-page
exception-typejava.lang.Exception/exception-type
location/error/location
  /error-page
  
error-pageerror-code404/error-codelocation/error/location/error-page
etc

 Let solrconfig.xml configure the SolrDispatchFilter to handle /select
 -

 Key: SOLR-204
 URL: https://issues.apache.org/jira/browse/SOLR-204
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
 Attachments: SOLR-204-HandleSelect.patch, 
 SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch


 The major reason to make everythign use the SolrDispatchFilter is that we 
 would have consistent error handling.  Currently, 
 SolrServlet spits back errors using:
  PrintWriter writer = response.getWriter();
  writer.write(msg);
 and the SolrDispatchFilter spits them back using:
  res.sendError( code, ex.getMessage() );
 Using sendError lets the servlet container format the code so it shows up 
 ok in a browser.  Without it, you may have to view source to see the error.
 Aditionaly, SolrDispatchFilter is more decerning about including stack trace. 
  It only includes a stack trace of 500 or an unknown response code.
 Eventually, the error should probably be formatted in the requested format - 
 SOLR-141.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-204) Let solrconfig.xml configure the SolrDispatchFilter to handle /select

2007-04-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492511
 ] 

Yonik Seeley commented on SOLR-204:
---

OK cool, for something like an undefined field, it looks fine:
undefined field catdsfgsdg

But for something like a query parsing error, the only pointer to *what* the 
error is is in the stack trace, and you don't get that back.  You just get: 
Error parsing Lucene query

The logs show:
SEVERE: org.apache.lucene.queryParser.ParseException: Cannot parse 'foo:*': '*' 
or '?' not allowed as first character in WildcardQuery
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:149)
at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:94)
at 
org.apache.solr.request.StandardRequestHandler.handleRequestBody(StandardRequestHandler.java:85)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)

Hmmm, but I think this is an exception issue:

In QueryParsing.java:
} catch (ParseException e) {
  SolrCore.log(e);
  throw new SolrException(400,Error parsing Lucene query,e);
}

should probably be something more like:
  throw new SolrException(400,Query parsing error:  + e.getMessage() ,e);


 Let solrconfig.xml configure the SolrDispatchFilter to handle /select
 -

 Key: SOLR-204
 URL: https://issues.apache.org/jira/browse/SOLR-204
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
 Attachments: SOLR-204-HandleSelect.patch, 
 SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch


 The major reason to make everythign use the SolrDispatchFilter is that we 
 would have consistent error handling.  Currently, 
 SolrServlet spits back errors using:
  PrintWriter writer = response.getWriter();
  writer.write(msg);
 and the SolrDispatchFilter spits them back using:
  res.sendError( code, ex.getMessage() );
 Using sendError lets the servlet container format the code so it shows up 
 ok in a browser.  Without it, you may have to view source to see the error.
 Aditionaly, SolrDispatchFilter is more decerning about including stack trace. 
  It only includes a stack trace of 500 or an unknown response code.
 Eventually, the error should probably be formatted in the requested format - 
 SOLR-141.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-204) Let solrconfig.xml configure the SolrDispatchFilter to handle /select

2007-04-28 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492512
 ] 

Ryan McKinley commented on SOLR-204:



 
 should probably be something more like:
   throw new SolrException(400,Query parsing error:  + e.getMessage() ,e);
 

Yes, the other change is that errors for RequestDispatcher only print the stack 
trace if it is =500, 400 (bad request) assumes the message will contain a user 
useful response.  



 Let solrconfig.xml configure the SolrDispatchFilter to handle /select
 -

 Key: SOLR-204
 URL: https://issues.apache.org/jira/browse/SOLR-204
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
 Attachments: SOLR-204-HandleSelect.patch, 
 SOLR-204-HandleSelect.patch, SOLR-204-HandleSelect.patch


 The major reason to make everythign use the SolrDispatchFilter is that we 
 would have consistent error handling.  Currently, 
 SolrServlet spits back errors using:
  PrintWriter writer = response.getWriter();
  writer.write(msg);
 and the SolrDispatchFilter spits them back using:
  res.sendError( code, ex.getMessage() );
 Using sendError lets the servlet container format the code so it shows up 
 ok in a browser.  Without it, you may have to view source to see the error.
 Aditionaly, SolrDispatchFilter is more decerning about including stack trace. 
  It only includes a stack trace of 500 or an unknown response code.
 Eventually, the error should probably be formatted in the requested format - 
 SOLR-141.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Luke handler help

2007-04-28 Thread Ryan McKinley
I have a few things I'd like to check with the Luke handler, if you call 
could check some of the assumptions, that would be great.


* I want to print out the document frequency for a term in a given 
document.  Since that term shows up in the given document, I would think 
the term frequency must be  1.  I am using: reader.docFreq( t ) [line 
236] The results seem reasonable, but *sometimes* it returns zero... is 
that possible?


* I want to return the lucene field flags for each field.  I run through 
all the field names with: 
reader.getFieldNames(IndexReader.FieldOption.ALL).  Is there a way to 
get any Fieldable for a given name?  IIUC, all terms with the same name 
will have the same flags.  I tried searching for a document with that 
field, it works, but only for stored fields.


* I just realized that I am only returning stored fields for get 
getDocumentFieldsInfo() (it uses Document.getFields())  How can I get 
find *all* Fieldables for a given document?  I have tried following the 
luke source, but get a bit lost ;)


* Each field gets an boolean attribute cacheableFaceting -- this true 
if the number of distinct terms is smaller then the filterCacheSize.  I 
get the filterCacheSize from: solrconfig.xml:query/filterCache/@size 
and get the distinctTerm count from counting up the termEnum.  Is this 
logic solid?  I know the cacheability changes if you are faciting 
multiple fields at once, but its still nice to have a ballpark estimate 
without needing to know the internals.



thanks for any pointers
ryan


[jira] Updated: (SOLR-212) Embeddable class to call solr directly

2007-04-28 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-212:
---

Attachment: SOLR-212-DirectSolrConnection.patch

Adding dataDir to an optional constructor.

 Embeddable class to call solr directly
 --

 Key: SOLR-212
 URL: https://issues.apache.org/jira/browse/SOLR-212
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-212-DirectSolrConnection.patch, 
 SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch


 For some embedded applications, it is useful to call solr without running an 
 HTTP server.  This class mimics the behavior you would get if you sent the 
 request through an HTTP connection.  It is designed to work nicely (ie 
 simple) with JNI
 the main function is:
 public class DirectSolrConnection 
 {
   String request( String pathAndParams, String body ) throws Exception
   {
 ...
   }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Luke handler help

2007-04-28 Thread Ryan McKinley

Yonik Seeley wrote:

On 4/28/07, Ryan McKinley [EMAIL PROTECTED] wrote:

I have a few things I'd like to check with the Luke handler, if you call
could check some of the assumptions, that would be great.

* I want to print out the document frequency for a term in a given
document.  Since that term shows up in the given document, I would think
the term frequency must be  1.  I am using: reader.docFreq( t ) [line
236] The results seem reasonable, but *sometimes* it returns zero... is
that possible?


Is the field indexed?
Did you run the field through the analyzer to get the terms (to match
what's in the index)?
If both of those are true, it seems like the docFreq should always be
greater than 0.



aah, that makes sense - now that you mention it, I only see df=0 for 
non-indexed, stored fields.





In an inverted index, terms point to documents.   So you have to
traverse *all* of the terms of a field across all documents, and keep
track of when you run across the document you are interested in.  When
you do, then get the positions that the term appeared at, and keep
track of them.  After you have covered all the terms, you can put
everything in order.  There could be gaps (positionIncrement, stop
word removal, etc) and it's also possible for multiple tokens to
appear at the same position.

For a full-text field with many terms, and a large index, this could
take a *long* time.
It's probably very useful for debugging though.



that must be why luke starts a new thread for 'reconstruct and edit' 
For now, i will leave this out of the handler, and leave that open to 
someone with the need/time in the future.




* Each field gets an boolean attribute cacheableFaceting -- this true
if the number of distinct terms is smaller then the filterCacheSize.  I
get the filterCacheSize from: solrconfig.xml:query/filterCache/@size
and get the distinctTerm count from counting up the termEnum.  Is this
logic solid?  I know the cacheability changes if you are faciting
multiple fields at once, but its still nice to have a ballpark estimate
without needing to know the internals.


It could get trickier... I'm about to hack up a quick patch now that
will reduce memory usage by only using the filterCache  above a
certain df threshold.  It may increase or
decrease the faceting speed - TBD.

Also, other alternate faceting schemes are in the works (a month or two 
out).

I'd leave this attribute out and just report on the number of unique terms.


ok, that seems reasonable.



Some kind of histogram might be really nice though (how many terms
under varying df values):
 1=412  (412 terms have a df of 1)
 2=516  (516 terms have a df of 2)
 4=600
 8=650
16=670
32=680
64=683
128=685
256=686
11325=690  (the maxDf found)



I'll take a look at that



Remember that df is not updated when a document is marked for deletion
in Lucene.
So you can have a df of 2, do a search, and only come up with one document.



that would explain why I'm seeing df  1 for the uniqueKey!



[jira] Commented: (SOLR-212) Embeddable class to call solr directly

2007-04-28 Thread Brian Whitman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492518
 ] 

Brian Whitman commented on SOLR-212:


Much love from user land on this one. I just successfully put solr in a C app 
without any webserver running using JNI.

After I clean up my JNI calling code I can post an example app here to show how 
it's done on the client side if anyone is interested?









 Embeddable class to call solr directly
 --

 Key: SOLR-212
 URL: https://issues.apache.org/jira/browse/SOLR-212
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-212-DirectSolrConnection.patch, 
 SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch


 For some embedded applications, it is useful to call solr without running an 
 HTTP server.  This class mimics the behavior you would get if you sent the 
 request through an HTTP connection.  It is designed to work nicely (ie 
 simple) with JNI
 the main function is:
 public class DirectSolrConnection 
 {
   String request( String pathAndParams, String body ) throws Exception
   {
 ...
   }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Luke handler help

2007-04-28 Thread Yonik Seeley

 In an inverted index, terms point to documents.   So you have to
 traverse *all* of the terms of a field across all documents, and keep
 track of when you run across the document you are interested in.  When
 you do, then get the positions that the term appeared at, and keep
 track of them.  After you have covered all the terms, you can put
 everything in order.  There could be gaps (positionIncrement, stop
 word removal, etc) and it's also possible for multiple tokens to
 appear at the same position.

 For a full-text field with many terms, and a large index, this could
 take a *long* time.
 It's probably very useful for debugging though.


I just realized that it's worse... if you specified a field, then you
only have to iterate the terms for that field.  If you want *all* of
the indexed, non-stored fields for a particular document, but don't
know what they are, there is no info to help you.  You need to iterate
over *all* terms in the index.

Luckily, there is patch in the works in Lucene that will make
skipTo(myDoc) in TermDocs faster.  That should speed things up a
little.


 Remember that df is not updated when a document is marked for deletion
 in Lucene.
 So you can have a df of 2, do a search, and only come up with one document.


that would explain why I'm seeing df  1 for the uniqueKey!


Yep, that's not likely to ever be fixed in Lucene.  Again, it's the
nature of the inverted index... given a particular docid, you really
have no clue what terms in the index point to that docid.

-Yonik


[jira] Commented: (SOLR-212) Embeddable class to call solr directly

2007-04-28 Thread Brian Whitman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492522
 ] 

Brian Whitman commented on SOLR-212:


Since the main use case of SOLR-212 is to embed it in client applications, we 
should be careful about logging. As of now SOLR-212 will spit stuff all over 
stderr.

I suggest putting this

System.setProperty(java.util.logging.config.file, 
instanceDir+/conf/logging.properties);

near line 79 of DirectSolrConnection.java. That way, if a developer/user 
chooses, they can put a logging.prop file in conf and set direct logging of 
Solr requests either to their own application logs or a file. If the 
conf/logging.properties file does not exist, I believe the default 
logging.properties will be used (which is what happens now.)



 Embeddable class to call solr directly
 --

 Key: SOLR-212
 URL: https://issues.apache.org/jira/browse/SOLR-212
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-212-DirectSolrConnection.patch, 
 SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch


 For some embedded applications, it is useful to call solr without running an 
 HTTP server.  This class mimics the behavior you would get if you sent the 
 request through an HTTP connection.  It is designed to work nicely (ie 
 simple) with JNI
 the main function is:
 public class DirectSolrConnection 
 {
   String request( String pathAndParams, String body ) throws Exception
   {
 ...
   }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-181) Support for Required field Property

2007-04-28 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-181:
---

Attachment: solr-181-required-fields.patch

Finally got a chance to look at this.  It looks good.  I made a few 
modifications:

1. changed tabs to spaces
2. Added javadoc comments to make it clear that RequiredFields must contain all 
fieldsWithDefaultValues
3. The error now contains the documents uniqueKey
4. moved the test to o.a.s.schema
5. I added a non-final flag to SchemaField to say if the field is required.
6. Modified IndexSchema.java to set the uniqueKey as required *unless* it is 
specified as required=false in the schema
7. Added required=true to the example schema.xml 
8. Added required=false to the test schema.xml (one test does not include it)

As a note to anyone else looking at the change log, Greg's patch also modifies 
AbstractSolrTestCase and TestHarness to be able to check what status is 
expected from checkUpdateU


I think this offers a good solution to the (mis)feature that you could have a 
null uniqueKey.  This patch lets you have a null uniqueKey, but you have to 
configure it.



 Support for Required field Property
 -

 Key: SOLR-181
 URL: https://issues.apache.org/jira/browse/SOLR-181
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Greg Ludington
Priority: Minor
 Attachments: solr-181-required-fields.patch, 
 solr-181-required-fields.patch


 In certain situations, it can be helpful to require every document in your 
 index has a value for a given field.  While ideally the indexing client(s) 
 should be responsible enough to add all necessary fields, this patch allows 
 it to be enforced in the Solr schema, by adding a required property to a 
 field entry.  For example, with this in the schema:
field name=name type=nametext indexed=true stored=true 
 required=true/
 A request to index a document without a name field will result in this 
 response:
 result status=1org.apache.solr.core.SolrException: missing required 
 fields: name 
 (and then, of course, the stack trace)
 /result
 The meat of this patch is that DocumentBuilder.getDoc() throws a 
 SolrException if not all required fields have values; this may not work well 
 as is with SOLR-139, Support updateable/modifiable documents, and may have to 
 be changed depending on that issue's final disposition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-181) Support for Required field Property

2007-04-28 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley reassigned SOLR-181:
--

Assignee: Ryan McKinley

 Support for Required field Property
 -

 Key: SOLR-181
 URL: https://issues.apache.org/jira/browse/SOLR-181
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Greg Ludington
 Assigned To: Ryan McKinley
Priority: Minor
 Attachments: solr-181-required-fields.patch, 
 solr-181-required-fields.patch


 In certain situations, it can be helpful to require every document in your 
 index has a value for a given field.  While ideally the indexing client(s) 
 should be responsible enough to add all necessary fields, this patch allows 
 it to be enforced in the Solr schema, by adding a required property to a 
 field entry.  For example, with this in the schema:
field name=name type=nametext indexed=true stored=true 
 required=true/
 A request to index a document without a name field will result in this 
 response:
 result status=1org.apache.solr.core.SolrException: missing required 
 fields: name 
 (and then, of course, the stack trace)
 /result
 The meat of this patch is that DocumentBuilder.getDoc() throws a 
 SolrException if not all required fields have values; this may not work well 
 as is with SOLR-139, Support updateable/modifiable documents, and may have to 
 be changed depending on that issue's final disposition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-212) Embeddable class to call solr directly

2007-04-28 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-212:
---

Attachment: SOLR-212-DirectSolrConnection.patch

Updated to take an (optional) logging path

 Embeddable class to call solr directly
 --

 Key: SOLR-212
 URL: https://issues.apache.org/jira/browse/SOLR-212
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-212-DirectSolrConnection.patch, 
 SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch, 
 SOLR-212-DirectSolrConnection.patch


 For some embedded applications, it is useful to call solr without running an 
 HTTP server.  This class mimics the behavior you would get if you sent the 
 request through an HTTP connection.  It is designed to work nicely (ie 
 simple) with JNI
 the main function is:
 public class DirectSolrConnection 
 {
   String request( String pathAndParams, String body ) throws Exception
   {
 ...
   }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-220) Solr returns HTTP status code=1 in some case

2007-04-28 Thread Koji Sekiguchi (JIRA)
Solr returns HTTP status code=1 in some case
--

 Key: SOLR-220
 URL: https://issues.apache.org/jira/browse/SOLR-220
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Koji Sekiguchi


If I request the following on solr example:

http://localhost:8080/solr/select?q=ipod%3Bzzz+ascversion=2.2start=0rows=10indent=on

I got an exception as I expected because zzz isn't undefined, but HTTP status 
code is 1. I expected 400 in this case.
The reason of this is because IndexSchema.getField() method throws 
SolrException(1,) and QueryParsing.parseSort() doesn't catch it:

// getField could throw an exception if the name isn't found
SchemaField f = schema.getField(part);  // === makes HTTP status code=1
if (f == null || !f.indexed()){
  throw new SolrException( 400, can not sort on unindexed field: 
+part );
}

There seems to be a couple of ways to solve this problem:

1. IndexSchema.getField() method throws SolrException(400,)
2. IndexSchema.getField() method doesn't throw the exception but returns null
3. The caller catches the exception and re-throws SolrException(400,)
4. The caller catches the exception and re-throws SolrException(400,,cause) 
that wraps the cause exception

I think either #3 or #4 will be acceptable. The attached patch is #3 for sort 
on undefined field.

Other than QueryParsing.parseSort(), IndexSchema.getField() is called by the 
following class/methos:

- CSVLoader.prepareFields()
- JSONWriter.writeDoc()
- SimpleFacets.getTermCounts()
- QueryParsing.parseValSource()

I'm not sure these methods require same patch. Any thoughts?

regards,


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-220) Solr returns HTTP status code=1 in some case

2007-04-28 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-220:


Attachment: QueryParsing.patch

the patch for sort on undefined field

 Solr returns HTTP status code=1 in some case
 --

 Key: SOLR-220
 URL: https://issues.apache.org/jira/browse/SOLR-220
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Koji Sekiguchi
 Attachments: QueryParsing.patch


 If I request the following on solr example:
 http://localhost:8080/solr/select?q=ipod%3Bzzz+ascversion=2.2start=0rows=10indent=on
 I got an exception as I expected because zzz isn't undefined, but HTTP status 
 code is 1. I expected 400 in this case.
 The reason of this is because IndexSchema.getField() method throws 
 SolrException(1,) and QueryParsing.parseSort() doesn't catch it:
 // getField could throw an exception if the name isn't found
   SchemaField f = schema.getField(part);  // === makes HTTP status code=1
 if (f == null || !f.indexed()){
   throw new SolrException( 400, can not sort on unindexed field: 
 +part );
 }
 There seems to be a couple of ways to solve this problem:
 1. IndexSchema.getField() method throws SolrException(400,)
 2. IndexSchema.getField() method doesn't throw the exception but returns null
 3. The caller catches the exception and re-throws SolrException(400,)
 4. The caller catches the exception and re-throws SolrException(400,,cause) 
 that wraps the cause exception
 I think either #3 or #4 will be acceptable. The attached patch is #3 for sort 
 on undefined field.
 Other than QueryParsing.parseSort(), IndexSchema.getField() is called by the 
 following class/methos:
 - CSVLoader.prepareFields()
 - JSONWriter.writeDoc()
 - SimpleFacets.getTermCounts()
 - QueryParsing.parseValSource()
 I'm not sure these methods require same patch. Any thoughts?
 regards,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-220) Solr returns HTTP status code=1 in some case

2007-04-28 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492531
 ] 

Ryan McKinley commented on SOLR-220:


I just checked in a much smaller patch that at least won't throw a status 
code=1
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/schema/IndexSchema.java?r1=533449r2=533448pathrev=533449

We should probably use your patch so that it has a nice context specific error, 
rather then the general undefined field

As an aside, SOLR-204 will make the request dispatcher the default /select 
handler.  This catches invalid error codes and returns a 500.

thanks



 Solr returns HTTP status code=1 in some case
 --

 Key: SOLR-220
 URL: https://issues.apache.org/jira/browse/SOLR-220
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Koji Sekiguchi
 Attachments: QueryParsing.patch


 If I request the following on solr example:
 http://localhost:8080/solr/select?q=ipod%3Bzzz+ascversion=2.2start=0rows=10indent=on
 I got an exception as I expected because zzz isn't undefined, but HTTP status 
 code is 1. I expected 400 in this case.
 The reason of this is because IndexSchema.getField() method throws 
 SolrException(1,) and QueryParsing.parseSort() doesn't catch it:
 // getField could throw an exception if the name isn't found
   SchemaField f = schema.getField(part);  // === makes HTTP status code=1
 if (f == null || !f.indexed()){
   throw new SolrException( 400, can not sort on unindexed field: 
 +part );
 }
 There seems to be a couple of ways to solve this problem:
 1. IndexSchema.getField() method throws SolrException(400,)
 2. IndexSchema.getField() method doesn't throw the exception but returns null
 3. The caller catches the exception and re-throws SolrException(400,)
 4. The caller catches the exception and re-throws SolrException(400,,cause) 
 that wraps the cause exception
 I think either #3 or #4 will be acceptable. The attached patch is #3 for sort 
 on undefined field.
 Other than QueryParsing.parseSort(), IndexSchema.getField() is called by the 
 following class/methos:
 - CSVLoader.prepareFields()
 - JSONWriter.writeDoc()
 - SimpleFacets.getTermCounts()
 - QueryParsing.parseValSource()
 I'm not sure these methods require same patch. Any thoughts?
 regards,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-181) Support for Required field Property

2007-04-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492532
 ] 

Yonik Seeley commented on SOLR-181:
---

Haven't looked at the code,  but the description looks fine.

+1

 Support for Required field Property
 -

 Key: SOLR-181
 URL: https://issues.apache.org/jira/browse/SOLR-181
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Greg Ludington
 Assigned To: Ryan McKinley
Priority: Minor
 Attachments: solr-181-required-fields.patch, 
 solr-181-required-fields.patch


 In certain situations, it can be helpful to require every document in your 
 index has a value for a given field.  While ideally the indexing client(s) 
 should be responsible enough to add all necessary fields, this patch allows 
 it to be enforced in the Solr schema, by adding a required property to a 
 field entry.  For example, with this in the schema:
field name=name type=nametext indexed=true stored=true 
 required=true/
 A request to index a document without a name field will result in this 
 response:
 result status=1org.apache.solr.core.SolrException: missing required 
 fields: name 
 (and then, of course, the stack trace)
 /result
 The meat of this patch is that DocumentBuilder.getDoc() throws a 
 SolrException if not all required fields have values; this may not work well 
 as is with SOLR-139, Support updateable/modifiable documents, and may have to 
 be changed depending on that issue's final disposition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-212) Embeddable class to call solr directly

2007-04-28 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492534
 ] 

Otis Gospodnetic commented on SOLR-212:
---

Brian: interested!


 Embeddable class to call solr directly
 --

 Key: SOLR-212
 URL: https://issues.apache.org/jira/browse/SOLR-212
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
 Assigned To: Ryan McKinley
Priority: Minor
 Attachments: SOLR-212-DirectSolrConnection.patch, 
 SOLR-212-DirectSolrConnection.patch, SOLR-212-DirectSolrConnection.patch, 
 SOLR-212-DirectSolrConnection.patch


 For some embedded applications, it is useful to call solr without running an 
 HTTP server.  This class mimics the behavior you would get if you sent the 
 request through an HTTP connection.  It is designed to work nicely (ie 
 simple) with JNI
 the main function is:
 public class DirectSolrConnection 
 {
   String request( String pathAndParams, String body ) throws Exception
   {
 ...
   }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-221) faceting memory and performance improvement

2007-04-28 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-221:
--

Attachment: facet.patch

 faceting memory and performance improvement
 ---

 Key: SOLR-221
 URL: https://issues.apache.org/jira/browse/SOLR-221
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Assigned To: Yonik Seeley
 Attachments: facet.patch


 1) compare minimum count currently needed to the term df and avoid 
 unnecessary intersection count
 2) set a minimum term df in order to use the filterCache, otherwise iterate 
 over TermDocs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-221) faceting memory and performance improvement

2007-04-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492543
 ] 

Yonik Seeley commented on SOLR-221:
---

The results are slightly surprising.

I made up an index, and each document contained 4 random numbers between 1 and 
500,000
This is not the distribution one would expect to see in a real index. but we 
can still learn much.

The synthetic index:
 maxDoc=500,000
 numDocs=393,566
 number of segments = 5
 number of unique facet terms = 490903
 filterCache max size = 1,000,000 entries (more than enough)
 JVM=1.5.0_09 -server -Xmx200M
 System=WinXP, 3GHz P4, hyperthreaded, 1GB dual channel RAM
 facet type = facet.field, facet.sort=true, facet.limit=10
 maximum df of any term = 15
 warming times were not included... queries were run many times and the lowest 
time recorded.

Number of documents that match test base queries (for example, base query #1 
matches 175K docs):
1) 175000,  
2) 43000
3) 8682
4) 2179
5) 422
6) 1

WITHOUT PATCH (milliseconds to facet each base query):
1578, 1578, 1547, 1485, 1484,1422

WITH PATCH (min df comparison w/ term df,  minDfFilterCache=0) (all field cache)
 984,  1203, 1391, 1437, 1484, 1420

WITH PATCH (min df comp, minDfFilterCache=30)  (no fieldCache at all)
1406, 2344, 3125, 3015, 3172, 3172

CONCLUSION1: min df comparison increases faceting speed 60% when the base query 
matches many documents.  With a real term distribution, this could be even 
greater.

CONCLUSION2: opting to not use the fieldCache for smaller df terms can save a 
lot of memory, but it hurts performance up to 200% for our non-optimized index.

CONCLUSION3: using the field cache less can significantly speed up warming time 
(times not shown, but a full warming of the fieldCache took 33 sec)

 now the same index, but optimized ===
WITH PATCH (optimized, min df comparison w/ term df,  minDfFilterCache=0) (all 
field cache)
 172,  312,  485,  578,  610,  656

WITH PATCH (optimized, min df comp, minDfFilterCache=30)  (no fieldCache at all)
 265,  344,  422,  468,  500,  484  

CONCLUSION3: An optimized index increased performance 200-500%

CONCLUSION4:  The fact that an all-fieldcache option was significantly faster 
on an optimized probably cannot totally be explained by accurate dfs (no 
deleted documents to inflate the term df values), means that just iterating 
over the terms is *much* faster in an optimized index (a potential Lucene area 
to look into)


 faceting memory and performance improvement
 ---

 Key: SOLR-221
 URL: https://issues.apache.org/jira/browse/SOLR-221
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Assigned To: Yonik Seeley
 Attachments: facet.patch


 1) compare minimum count currently needed to the term df and avoid 
 unnecessary intersection count
 2) set a minimum term df in order to use the filterCache, otherwise iterate 
 over TermDocs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.