SOLR-58

2006-11-08 Thread Otis Gospodnetic
Hi,

If anyone has any comments on http://issues.apache.org/jira/browse/SOLR-58 
(admin pages XMLized), please post them.

I'd like to try writing some XSLs to convert that XML to HTML, so I need some 
additional eye on that XML output.  I've never written a single line XSL, so it 
will take me a while, and I'd love to get SOLR-58 in by the end of the week or 
so.

Thanks,
Otis





[jira] Resolved: (SOLR-2) adding multiple docs at the same time does not produce a well formed response

2006-11-08 Thread Mike Klaas (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-2?page=all ]

Mike Klaas resolved SOLR-2.
---

Resolution: Fixed
  Assignee: Mike Klaas

Fixed in SOLR-65

> adding multiple docs at the same time does not produce a well formed response
> -
>
> Key: SOLR-2
> URL: http://issues.apache.org/jira/browse/SOLR-2
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
> Assigned To: Mike Klaas
>
> i just realized that when adding multiple docs as part of a single /update, 
> the response isn't legal XML.
> this is easy to see in the example app by running "sh post.sh hd.xml" you 
> get...
> 
> ...with no wrapping root tag.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (SOLR-54) Invalid XML response returned when adding a document with a field not declared in solrconfig.xml

2006-11-08 Thread Mike Klaas (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-54?page=all ]

Mike Klaas resolved SOLR-54.


Resolution: Fixed
  Assignee: Mike Klaas

Fixed as part of SOLR-65

> Invalid XML response returned when adding a  document with a field not 
> declared in solrconfig.xml
> -
>
> Key: SOLR-54
> URL: http://issues.apache.org/jira/browse/SOLR-54
> Project: Solr
>  Issue Type: Bug
> Environment: Tomcat 5.17 / Windows XP
>Reporter: Przemyslaw Brzozowski
> Assigned To: Mike Klaas
>
> Below we have a documents with three fields. One of them 'aaa'
> is not defined in solrconfig.xml:
> 
>  
>invoice
>bbb
> name="internal_id">10E3B793A84559081D1EF0BA6BD0BB5E1417573EC5D
>  
> 
> We get an error that resembles an xml but it is not because it has no root 
> tag:
> ERROR:unknown field 'aaa' status="1">org.xmlpull.v1.XmlPullParserException: expected START_TAG or 
> END_TAG not END_DOCUMENT (position: END_DOCUMENT seen 
> ...\n\n... @9:1)
>at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1083)
>at org.apache.solr.core.SolrCore.update(SolrCore.java:681)
>at 
> org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:52)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
>at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
>at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
>at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
>at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
>at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
>at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
>at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
>at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
>at 
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
>at 
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
>at 
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
>at 
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
>at java.lang.Thread.run(Thread.java:595)
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (SOLR-65) Multithreaded DirectUpdateHandler2

2006-11-08 Thread Mike Klaas (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-65?page=all ]

Mike Klaas resolved SOLR-65.


Resolution: Fixed

Checked in patch r472720

> Multithreaded DirectUpdateHandler2
> --
>
> Key: SOLR-65
> URL: http://issues.apache.org/jira/browse/SOLR-65
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Mike Klaas
> Assigned To: Mike Klaas
> Attachments: autocommit_patch.diff, autocommit_patch.diff, 
> autocommit_patch.diff, autocommit_patch.diff
>
>
> Basic implementation of autoCommi functionality, plus overhaul of DUH2 
> threading to reduce contention

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Assigned: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-08 Thread Hoss Man (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-68?page=all ]

Hoss Man reassigned SOLR-68:


Assignee: Hoss Man

> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Assigned To: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-08 Thread Mike Klaas (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12448319 ] 

Mike Klaas commented on SOLR-68:



   [[ Old comment, sent from unregistered email on Wed, 8 Nov 2006 14:40:07 
-0800 ]]

Just wanted to comment that I'm +1 on the idea--getting this workign
with jetty was an annoyance.  I'm not really qualified to comment on
the approach, but it sounds reasonable.



> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-65) Multithreaded DirectUpdateHandler2

2006-11-08 Thread Mike Klaas (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-65?page=comments#action_12448318 ] 

Mike Klaas commented on SOLR-65:



   [[ Old comment, sent from unregistered email on Wed, 8 Nov 2006 13:43:59 
-0800 ]]


Yep, that's a 2-cpu machine.


A good question--it is definitely possible to test such a scenario by
manually creating ram segments and adding them (now that
addIndicesNoOptimize exists).

It still might be possible to improve parallelism on the solr end by
allows doc adds and doDeletions to run concurrently, but if that was a
major bottleneck then the commit frequency could be reduced anyway.

Thanks for the review,
-Mike



> Multithreaded DirectUpdateHandler2
> --
>
> Key: SOLR-65
> URL: http://issues.apache.org/jira/browse/SOLR-65
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Mike Klaas
> Assigned To: Mike Klaas
> Attachments: autocommit_patch.diff, autocommit_patch.diff, 
> autocommit_patch.diff, autocommit_patch.diff
>
>
> Basic implementation of autoCommi functionality, plus overhaul of DUH2 
> threading to reduce contention

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-08 Thread Otis Gospodnetic (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12448296 ] 

Otis Gospodnetic commented on SOLR-68:
--

Hoss, Mortbay guys releases Jetty 6.0 and even 6.1 (or 6.0.1, not sure any 
more) a few weeks ago.  Jetty 5.* is getting obsoleted, to it would be best to 
try Jetty 6.* with your class loader.


> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-08 Thread Hoss Man (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12448278 ] 

Hoss Man commented on SOLR-68:
--

FYI: I just did a fresh install of jetty-5.1.11 and confirmed that (for my 
simple test case of some self contained request handlers in a single JAR) it 
works with jetty and jettyplus.

I'd like to do some more testing with just dropping lucene contrib JARs in, and 
having multiple JARs ... but assuming no suprises pop up, I think i may just 
not worry about some of the more extreme situtions i brought up on the mailing 
list since:
  1) those situations will hopefully be rare
  2) i'm not even sure if those situations will cause a problem
  3) if one of those situations does cause a problem, we can allways re-address 
the specifics of the ClassLoader independent of the overall "API" of having a 
${solr.home}/lib directory
  4) if someone relaly does have a situation we can't address because of some 
specific quirks in the ClassLoader of their servlet container, nothing in this 
approach procludes then from unwrapping the WAR and ading their classes 
directly (which is the only option at the moment)



> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-65) Multithreaded DirectUpdateHandler2

2006-11-08 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-65?page=comments#action_12448273 ] 

Yonik Seeley commented on SOLR-65:
--

Everything looks good to me.

> While I was at it, I converted the  output for multi-adds to a single 
> xml element. Was more information going to be added to this?

I don't think so.

> Still, the throughput gain is about 20-30%.

Not too bad... is that for a dual CPU machine?  The number of cores per chip 
and cores per server are going up (quad cores, Sun niagra, etc), so these types 
of optimizations will grow in importance.

I wonder what kind of gains could be had if Lucene could overlap adding of 
documents (via buffering) with merging of segments.


> Multithreaded DirectUpdateHandler2
> --
>
> Key: SOLR-65
> URL: http://issues.apache.org/jira/browse/SOLR-65
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Mike Klaas
> Assigned To: Mike Klaas
> Attachments: autocommit_patch.diff, autocommit_patch.diff, 
> autocommit_patch.diff, autocommit_patch.diff
>
>
> Basic implementation of autoCommi functionality, plus overhaul of DUH2 
> threading to reduce contention

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-08 Thread Hoss Man (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12448230 ] 

Hoss Man commented on SOLR-68:
--


>> the most 'naive' solution could be separate shared solr.jar in the same 
>> classpath/classloader as plugins...

Yeah, I considered that briefly, but I had two big reservations about 
attempting that approach...

1) Complicates installation in the "simple" case.  right now users that wnat an 
"out of hte box" experience just need to drop a WAR file in place and create 
two config files.  seperating the classes into an external JAR would complicate 
that by requiring them to put that JAR in their containers shared/common 
classpath.

2) I'm 90% certain that would eliminate the ability to run multiple solr 
instances in the same servlet container, because of the way the SolrCore and 
much of hte COnfig information is implimented via Singletons ... if those 
classes were loaded by a common classloader, there would be only one instance 
per container -- not one instance per webapp.

> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-08 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12448229 ] 

Fuad Efendi commented on SOLR-68:
-

understood, thanks...
As I understood (functional requirement): we need to be able to easily add 
plugins without making changes to solr.war, and this patch fixes that. (plugin 
depends on solr classes AND plugin is physically outside of solr.war); the most 
'naive' solution could be separate shared solr.jar in the same 
classpath/classloader as plugins... 

> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Adding Phonetic Search to Solr

2006-11-08 Thread Walter Underwood

On 11/8/06 10:30 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:

> : Also, the phonetic matches are ranked a bit high, so I'm trying a
> : sub-1.0 boost. I was expecting the lower idf to fix that automatically.
> : The metaphone will almost always have a lower idf because multiple
> : words are mapped to one metaphone, so the encoded term occurs in more
> : documents than the surface terms.
> 
> That all makes sense, and yet it's not what you are observing ... which
> leads me to believe you (and I since i want to agree with you) are missing
> something subtle  what does the the Explanation look like for two
> documenets where you feel like one should score higher then the other but
> they don't?

That is my next step. Maybe create some test documents in my corpus and
spend some quality time with Explain and grokking DisMax. I need to
customize Similarity anyway.

wunder
-- 
Walter Underwood
Search Guru, Netflix




Re: Adding Phonetic Search to Solr

2006-11-08 Thread Chris Hostetter

: A naming convention question: should the class names end in
: Filter or TokenFilter (and FilterFactory or TokenFilterFactory)?
: I see both in org.apache.solr.analysis.

Ummm  "yes"  :)

I don't think it makes a big difference ... i'd never noticed the
inconsistency untill now.

: I'm a bit disappointed in the performance, though. It is half the
: speed when adding two phonetic fields to search. Dropped from 300

: Could that be from searching extra fields? Indexing is the same
: speed, so it shouldn't be the DoubleMetaphone class. I'm still
: trying to get a feel for Lucene performance after years with the
: Ultraseek engine.

Your indexing speed may already be limited by something else, so you might
not notice lags in the DoubleMetaphone class at index time ... have you
tried some micro benchmarks on the dm.encode method to see how long it
takes per token?

: Also, the phonetic matches are ranked a bit high, so I'm trying a
: sub-1.0 boost. I was expecting the lower idf to fix that automatically.
: The metaphone will almost always have a lower idf because multiple
: words are mapped to one metaphone, so the encoded term occurs in more
: documents than the surface terms.

That all makes sense, and yet it's not what you are observing ... which
leads me to believe you (and I since i want to agree with you) are missing
something subtle  what does the the Explanation look like for two
documenets where you feel like one should score higher then the other but
they don't?


-Hoss



Re: Adding Phonetic Search to Solr

2006-11-08 Thread Walter Underwood
On 11/7/06 5:44 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

> Grab the code from Lucene in Action, it's got something to get you going, see:
> 
>   http://www.lucenebook.com/search?query=metaphone

Thanks. I thought about looking that up (I have the book), but the
code is really trivial inside Solr. The per-field analyzer takes
care of most of the fuss. The meat is a single line of code in the
token filter using the DoubleMetaphone class from commons codec.

  return new Token(dm.encode(token.termText(),
 token.startOffset(),
 token.endOffset());

Everything else is just initialization and declaration.

A naming convention question: should the class names end in
Filter or TokenFilter (and FilterFactory or TokenFilterFactory)?
I see both in org.apache.solr.analysis.

I'm a bit disappointed in the performance, though. It is half the
speed when adding two phonetic fields to search. Dropped from 300
qps to 130. On the other hand, I never thought I'd be complaining
about an engine delivering over 100 qps!

Could that be from searching extra fields? Indexing is the same
speed, so it shouldn't be the DoubleMetaphone class. I'm still
trying to get a feel for Lucene performance after years with the
Ultraseek engine.

Also, the phonetic matches are ranked a bit high, so I'm trying a
sub-1.0 boost. I was expecting the lower idf to fix that automatically.
The metaphone will almost always have a lower idf because multiple
words are mapped to one metaphone, so the encoded term occurs in more
documents than the surface terms.

One neat trick -- if regular terms are lowercased, they will never
collide with the metaphones, which are all upper case.

wunder
-- 
Walter Underwood
Search Guru, Netflix





[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support

2006-11-08 Thread Erik Hatcher (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-69?page=comments#action_12448207 ] 

Erik Hatcher commented on SOLR-69:
--

I love it when features get implemented by others!  :)   Thanks Bertrand!

> PATCH:MoreLikeThis support
> --
>
> Key: SOLR-69
> URL: http://issues.apache.org/jira/browse/SOLR-69
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: lucene-queries-2.0.0.jar, SOLR-69.patch
>
>
> Here's a patch that implements simple support of Lucene's MoreLikeThis class.
> The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be 
> more appropriate ;-) Erik Hatcher's example mentioned in 
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg00878.html
> To use it, add at least the following parameters to a standard or dismax 
> query:
>   mlt=true
>   mlt.fl=list,of,fields,which,define,similarity
> See the MoreLikeThisHelper source code for more parameters.
> Here are two URLs that work with the example config, after loading all 
> documents found in exampledocs in the index (just to show that it seems to 
> work - of course you need a larger corpus to make it interesting):
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> Results are added to the output like this:
> 
>   ...
>   
> 
>   
> 1.5293242
> SOLR1000
>   
> 
> 
>   
> 1.5293242
> UTF8TEST
>   
> 
>   
> I haven't tested this extensively yet, will do in the next few days. But 
> comments are welcome of course.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-69) PATCH:MoreLikeThis support

2006-11-08 Thread Bertrand Delacretaz (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-69?page=all ]

Bertrand Delacretaz updated SOLR-69:


Attachment: lucene-queries-2.0.0.jar

The MoreLikeThis class comes from the lucene-queries jar, I enclose the version 
used for my tests

> PATCH:MoreLikeThis support
> --
>
> Key: SOLR-69
> URL: http://issues.apache.org/jira/browse/SOLR-69
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: lucene-queries-2.0.0.jar, SOLR-69.patch
>
>
> Here's a patch that implements simple support of Lucene's MoreLikeThis class.
> The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be 
> more appropriate ;-) Erik Hatcher's example mentioned in 
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg00878.html
> To use it, add at least the following parameters to a standard or dismax 
> query:
>   mlt=true
>   mlt.fl=list,of,fields,which,define,similarity
> See the MoreLikeThisHelper source code for more parameters.
> Here are two URLs that work with the example config, after loading all 
> documents found in exampledocs in the index (just to show that it seems to 
> work - of course you need a larger corpus to make it interesting):
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> Results are added to the output like this:
> 
>   ...
>   
> 
>   
> 1.5293242
> SOLR1000
>   
> 
> 
>   
> 1.5293242
> UTF8TEST
>   
> 
>   
> I haven't tested this extensively yet, will do in the next few days. But 
> comments are welcome of course.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-69) PATCH:MoreLikeThis support

2006-11-08 Thread Bertrand Delacretaz (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-69?page=all ]

Bertrand Delacretaz updated SOLR-69:


Attachment: SOLR-69.patch

> PATCH:MoreLikeThis support
> --
>
> Key: SOLR-69
> URL: http://issues.apache.org/jira/browse/SOLR-69
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: SOLR-69.patch
>
>
> Here's a patch that implements simple support of Lucene's MoreLikeThis class.
> The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be 
> more appropriate ;-) Erik Hatcher's example mentioned in 
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg00878.html
> To use it, add at least the following parameters to a standard or dismax 
> query:
>   mlt=true
>   mlt.fl=list,of,fields,which,define,similarity
> See the MoreLikeThisHelper source code for more parameters.
> Here are two URLs that work with the example config, after loading all 
> documents found in exampledocs in the index (just to show that it seems to 
> work - of course you need a larger corpus to make it interesting):
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> Results are added to the output like this:
> 
>   ...
>   
> 
>   
> 1.5293242
> SOLR1000
>   
> 
> 
>   
> 1.5293242
> UTF8TEST
>   
> 
>   
> I haven't tested this extensively yet, will do in the next few days. But 
> comments are welcome of course.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (SOLR-69) PATCH:MoreLikeThis support

2006-11-08 Thread Bertrand Delacretaz (JIRA)
PATCH:MoreLikeThis support
--

 Key: SOLR-69
 URL: http://issues.apache.org/jira/browse/SOLR-69
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Bertrand Delacretaz
Priority: Minor


Here's a patch that implements simple support of Lucene's MoreLikeThis class.

The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be 
more appropriate ;-) Erik Hatcher's example mentioned in 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00878.html

To use it, add at least the following parameters to a standard or dismax query:

  mlt=true
  mlt.fl=list,of,fields,which,define,similarity

See the MoreLikeThisHelper source code for more parameters.

Here are two URLs that work with the example config, after loading all 
documents found in exampledocs in the index (just to show that it seems to work 
- of course you need a larger corpus to make it interesting):

http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score

http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score

Results are added to the output like this:

  ...
  

  
1.5293242
SOLR1000
  


  
1.5293242
UTF8TEST
  

  

I haven't tested this extensively yet, will do in the next few days. But 
comments are welcome of course.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-59) Copy request parameters to Solr's response

2006-11-08 Thread Bill Au (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-59?page=comments#action_12448143 ] 

Bill Au commented on SOLR-59:
-

>I can help stick in some backward compatible code if we decide it should be 
>there. 

+1 on keeping things backward compatible.

> Copy request parameters to Solr's response
> --
>
> Key: SOLR-59
> URL: http://issues.apache.org/jira/browse/SOLR-59
> Project: Solr
>  Issue Type: Improvement
>Reporter: Bertrand Delacretaz
> Attachments: SOLR-59-20061024.patch, SOLR-59-20061102.patch, 
> SOLR-59-20061103.patch, SOLR-59-20061106-newfiles.tar.gz, 
> SOLR-59-20061106.patch, SOLR-59-new-files-20061102.tar.gz
>
>
> This patch copies the request parameters (explicit ones only, not the 
> defaults) to Solr's XML output.
> It is not configurable yet, it is enabled by default and adds a 
> "queryParameters" list to the responseHeader:
> 
> 0
> 1
> 
> 
> red
> blue
> 
> 10
> 0
> on
> solr
> 
> 2.1
> 
> 
> The above example includes a multi-valued parameter, "multi".
> This might still change a bit, but if someone wants to play with it or 
> improve it, here you go.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: incubator report due

2006-11-08 Thread Bill Au

+1

Bill

On 11/7/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:



: http://wiki.apache.org/incubator/November2006

+1



-Hoss