[jira] Commented: (SOLR-303) Distributed Search over HTTP

2008-06-17 Thread Lars Kotthoff (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605868#action_12605868
 ] 

Lars Kotthoff commented on SOLR-303:


Yonik, thanks for taking a look at it.

I've investigated this issue further and I believe I know what the root cause 
is now. The line
{code:title=o.a.s.client.solrj.impl.CommonsHttpSolrServer.java}
...
post.getParams().setContentCharset("UTF-8");
...
{code}
tells the *sender* to encode the data as UTF-8. The way the *receiver* decodes 
the data depends on whatever is set as charset in the Content-Type header. This 
header is currently automatically added by httpclient and, as you can see in 
the netcat log, "application/x-www-form-urlencoded", i.e. without a charset. 
The default charset is ISO-8859-1 (cf. 
[http://hc.apache.org/httpclient-3.x/charencodings.html]). So the data is 
*encoded* as UTF-8 but *decoded* as ISO-8859-1, which causes the effect I 
described earlier.

I tried to reproduce this with TestDistributedSearch myself, but for some 
reason it seems to be fine. Perhaps the Jetty configuration is different to my 
Tomcat configuration. I didn't find any parameter to tell Tomcat the default 
encoding if the Content-Type header doesn't specify one though.

The minimal change I had to make to make it work was add a line to set the 
Content-Type header explicitly, i.e.
{code:title=o.a.s.client.solrj.impl.CommonsHttpSolrServer.java}
...
post.getParams().setContentCharset("UTF-8");
post.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; 
charset=UTF-8");
...
{code}
This probably won't work with multi-part requests though. I'm not sure what the 
right way to handle this would be. The stub Content-Type header is set by 
httpclient when the method is executed, i.e. there's no way to let httpclient 
figure out the first part and then append the charset in CommonsHttpSolrServer.

Some other things I've noticed:
* Just before the content charset is set, the parameters of the POST request 
are populated. If the value for a parameter is null, the code attempts to to 
add a null parameter. This however will cause an IllegalArgumentException from 
httpclient (cf. 
[http://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/methods/PostMethod.html#addParameter(java.lang.String,
 java.lang.String)]).
* TestDistributedSearch does not exercise the code to refine facet counts. 
Adding another facet request with facet.limit=1 redresses this.

> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
> distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
> fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605768#action_12605768
 ] 

Hoss Man commented on SOLR-600:
---

bq. It could also be a concurrency bug in Solr that shows up on the IBM JVM 
because the thread scheduler makes different decisions.

it's possible ... but skimming the code in question it seems unlikely.   
DocumentBuilder.java:225 is a foreach loop over a SolrInputField.  The fact 
that AbstractList$SimpleListIterator is being used indicates that the 
SolrInputField 'value" object is something that extends AbstractList (and not a 
single object, so it seems the anonymous Iterator in SolrInputField is off the 
hook) ... by the time  DocumentBuilder.java:225 is reached, there shouldn't be 
anything modifying that SolrInputDocument.

besides: if it was a concurrency problem wouldn't you expect to see 
ConcurrentModificationException instead of NullPointerException?  Even if 
something was mucking with the values of the ArrayList, i could understand 
seeing next() return null ... but not a NPW in hasNext().

> XML parser stops working under heavy load
> -
>
> Key: SOLR-600
> URL: https://issues.apache.org/jira/browse/SOLR-600
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 
> x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
> Tomcat 6.0.16
> SOLR nightly 16 Jun 2008, and versions prior
> JRE 1.6.0
>Reporter: John Smith
>
> Under heavy load, the following is spat out for every update:
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source)
> at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:735)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605751#action_12605751
 ] 

Walter Underwood commented on SOLR-600:
---

It could also be a concurrency bug in Solr that shows up on the IBM JVM because 
the thread scheduler makes different decisions. 

> XML parser stops working under heavy load
> -
>
> Key: SOLR-600
> URL: https://issues.apache.org/jira/browse/SOLR-600
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 
> x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
> Tomcat 6.0.16
> SOLR nightly 16 Jun 2008, and versions prior
> JRE 1.6.0
>Reporter: John Smith
>
> Under heavy load, the following is spat out for every update:
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source)
> at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:735)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605747#action_12605747
 ] 

Hoss Man commented on SOLR-600:
---

based on the stack trace it seems more likely the bug would be in the 
java.util.AbstractList provided by the IBM JVM

> XML parser stops working under heavy load
> -
>
> Key: SOLR-600
> URL: https://issues.apache.org/jira/browse/SOLR-600
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 
> x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
> Tomcat 6.0.16
> SOLR nightly 16 Jun 2008, and versions prior
> JRE 1.6.0
>Reporter: John Smith
>
> Under heavy load, the following is spat out for every update:
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source)
> at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:735)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605735#action_12605735
 ] 

Yonik Seeley commented on SOLR-600:
---

Certainly sounds like an IBM JVM bug, or perhaps more likely an issue in their 
XML parser that makes it non-thread safe?

> XML parser stops working under heavy load
> -
>
> Key: SOLR-600
> URL: https://issues.apache.org/jira/browse/SOLR-600
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 
> x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
> Tomcat 6.0.16
> SOLR nightly 16 Jun 2008, and versions prior
> JRE 1.6.0
>Reporter: John Smith
>
> Under heavy load, the following is spat out for every update:
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source)
> at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:735)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread John Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605729#action_12605729
 ] 

John Smith commented on SOLR-600:
-

java version "1.6.0_06"
Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b22, mixed mode)

Problem disappears when using the Sun version. Shall I then assume it's an IBM 
JVM problem?

> XML parser stops working under heavy load
> -
>
> Key: SOLR-600
> URL: https://issues.apache.org/jira/browse/SOLR-600
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 
> x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
> Tomcat 6.0.16
> SOLR nightly 16 Jun 2008, and versions prior
> JRE 1.6.0
>Reporter: John Smith
>
> Under heavy load, the following is spat out for every update:
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source)
> at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:735)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-502) Add search time out support

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605719#action_12605719
 ] 

Yonik Seeley commented on SOLR-502:
---

bq. Do you have a suggestion on how to get it into the response header? That 
isn't available down at the SolrIndexSearcher level as far as I can tell.

Off the top of my head, it seems like it might be cleaner to throw an exception 
in the SolrIndexSearcher method doing the searching (that would have the added 
benefit of  automatically bypassing DocSet/DocList caching, etc).

Catch that exception in the query component and set a flag in the header 
indicating that a timeout happened.

Or if it's easier, pass more info down to the SolrIndexSearcher.

After all, this only handles timeouts at the query level (not query 
expansion/rewriting, highlighting, retrieving stored fields, faceting, or any 
other number of components that will be added in the future).  It also doesn't 
even totally handle timeouts at the query level... one could construct a query 
that takes a lot of time yet matches no documents so there is never an 
opportunity to time out.  Then there is the issue of false positives (a major 
GC compaction hits and causes all the currently running queries to time out).  
Given these restrictions, and the fact that most people would choose not to get 
partial results, it seems like we should really try to limit the 
impact/invasiveness of this feature.


> Add search time out support
> ---
>
> Key: SOLR-502
> URL: https://issues.apache.org/jira/browse/SOLR-502
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sean Timm
>Assignee: Otis Gospodnetic
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-502.patch, SOLR-502.patch, SOLR-502.patch, 
> SOLR-502.patch, solrTimeout.patch, solrTimeout.patch, solrTimeout.patch, 
> solrTimeout.patch, solrTimeout.patch
>
>
> Uses LUCENE-997 to add time out support to Solr.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-572) Spell Checker as a Search Component

2008-06-17 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-572:
--

Description: 
http://wiki.apache.org/solr/SpellCheckComponent

Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
following features:
* Allow creating a spell index on a given field and make it possible to have 
multiple spell indices -- one for each field
* Give suggestions on a per-field basis
* Given a multi-word query, give only one consistent suggestion
* Process the query with the same analyzer specified for the source field and 
process each token separately
* Allow the user to specify minimum length for a token (optional)

Consistency criteria for a multi-word query can consist of the following:
* Preserve the correct words in the original query as it is
* Never give duplicate words in a suggestion

  was:
Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
following features:
* Allow creating a spell index on a given field and make it possible to have 
multiple spell indices -- one for each field
* Give suggestions on a per-field basis
* Given a multi-word query, give only one consistent suggestion
* Process the query with the same analyzer specified for the source field and 
process each token separately
* Allow the user to specify minimum length for a token (optional)

Consistency criteria for a multi-word query can consist of the following:
* Preserve the correct words in the original query as it is
* Never give duplicate words in a suggestion


> Spell Checker as a Search Component
> ---
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
>  Issue Type: New Feature
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch
>
>
> http://wiki.apache.org/solr/SpellCheckComponent
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-502) Add search time out support

2008-06-17 Thread Sean Timm (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605714#action_12605714
 ] 

Sean Timm commented on SOLR-502:


Yonik--

Do you have a suggestion on how to get it into the response header?  That isn't 
available down at the SolrIndexSearcher level as far as I can tell.  It would 
be much easier if the ResponseBuilder or some other object was passed all the 
way down to the searcher level, but I was trying to make the smallest change 
possible.

I think an easy machine readable value to indicate partial results is 
important.  I think a descriptive string is optional, but would be a nice 
addition.

-Sean

> Add search time out support
> ---
>
> Key: SOLR-502
> URL: https://issues.apache.org/jira/browse/SOLR-502
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sean Timm
>Assignee: Otis Gospodnetic
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-502.patch, SOLR-502.patch, SOLR-502.patch, 
> SOLR-502.patch, solrTimeout.patch, solrTimeout.patch, solrTimeout.patch, 
> solrTimeout.patch, solrTimeout.patch
>
>
> Uses LUCENE-997 to add time out support to Solr.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-572) Spell Checker as a Search Component

2008-06-17 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-572:
-

Attachment: SOLR-572.patch

Thought some more about the comment about the QueryConverter, and decided to 
abstract it as Shalin suggests.  

> Spell Checker as a Search Component
> ---
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
>  Issue Type: New Feature
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch
>
>
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hadoop get together @ Berlin

2008-06-17 Thread idrost
Hello,

I am happy to announce the first German Hadoop Meetup in Berlin. We will meet 
at 5 p.m. MESZ next Tuesday (24th of June) at the newthinking store in Berlin 
Mitte:

newthinking store GmbH
Tucholskystr. 48
10117 Berlin

Please see also: http://upcoming.yahoo.com/event/807782/

A big Thanks to the newthinking store for providing a room in the center of 
Berlin for us.

There will be drinks provided by newthinking. You can order pizza if you like. 
There are quite a few good restaurants nearby, so we can go there after the 
official part. 

Talks scheduled so far:

Stefan Groschupf will talk about Hadoop in action in one of his customer 
projects. Of course there will be time to ask him questions on his new 
project katta.

Isabel Drost will talk about the new project Mahout.

There will be a few more slots for talks of about 20 minutes with another 10 
minutes for discussion. There will be a beamer so feel free to bring some 
slides. In case you are interested, please contact me by mail. Please also 
contact me by mail if you intend to visit the Meetup if you plan to attend.

Feel free to resend this mail to any communities interested in the meeting.


Isabel

PS: In case this mail reaches the list, I am sorry for the double posting, I 
did not see my first mail arrive and realized only thereafter that I had used 
the wrong From: :(

-- 
The "cutting edge" is getting rather dull.  -- Andy Purshottam
  |\  _,,,---,,_   Web:   
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  


signature.asc
Description: This is a digitally signed message part.


[jira] Commented: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605700#action_12605700
 ] 

Yonik Seeley commented on SOLR-600:
---

bq. Only thing I can think of to test is to push 60,000 to 100,000 updates in 
the span of a few minutes onto a single machine. 

I've actually recently done that... total of 50M documents, multiple threads 
adding docs via CSV upload.
Result was ~4000 docs/second (these were relatively simple docs).  Didn't 
utilize the XML parser, but a nice testcase.

bq. IBM J9 VM 

If this is relatively reproducible, you could try Sun's JVM to rule out a JVM 
bug.




> XML parser stops working under heavy load
> -
>
> Key: SOLR-600
> URL: https://issues.apache.org/jira/browse/SOLR-600
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 
> x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
> Tomcat 6.0.16
> SOLR nightly 16 Jun 2008, and versions prior
> JRE 1.6.0
>Reporter: John Smith
>
> Under heavy load, the following is spat out for every update:
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source)
> at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:735)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-303) Distributed Search over HTTP

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605699#action_12605699
 ] 

Yonik Seeley commented on SOLR-303:
---

Lars: I'm not yet able to reproduce an issue with SolrJ not encoding the 
parameters properly.

The following code finds the sample solr document:
{code}
SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr";);
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("echoParams","all");
params.set("q","+h\u00E9llo");
QueryRequest req = new QueryRequest(params);
req.setMethod(SolrRequest.METHOD.POST);
 System.out.println(server.request(req));
{code}

And netcat confirms the encoding looks good, and is in fact using POST
{code}
$ nc -l -p 8983
POST /solr/select HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0
Host: localhost:8983
Content-Length: 53
Content-Type: application/x-www-form-urlencoded

echoParams=all&q=%2Bh%C3%A9llo&wt=javabin&version=2.2
{code}

I'll see if I can reproduce anything with TestDistributedSearch

> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
> distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
> fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread John Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605696#action_12605696
 ] 

wshs edited comment on SOLR-600 at 6/17/08 12:15 PM:
---

Java(TM) SE Runtime Environment (build pxa6460sr1-20080416_01(SR1))
IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Linux amd64-64 
jvmxa6460-20080415_18762 (JIT enabled, AOT enabled)
J9VM - 20080415_018762_LHdSMr
JIT  - r9_20080415_1520
GC   - 20080415_AA)
JCL  - 20080412_01

Only thing I can think of to test is to push 60,000 to 100,000 updates in the 
span of a few minutes onto a single machine. If I keep the updates under 30,000 
per hour, it runs fine. Due to the sensitivity of the data, I cannot provide a 
copy of example data or schema, but I can provide analogs if needed. I can also 
provide scrubbed config files as well. I'm afraid I lack the knowledge to make 
use of the Jetty.

Edit: System has 8 cores, 32 gigs of memory. Tomcat is not explicitly 
configured to use multiple cores, however, since it's never been needed.

  was (Author: wshs):
Java(TM) SE Runtime Environment (build pxa6460sr1-20080416_01(SR1))
IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Linux amd64-64 
jvmxa6460-20080415_18762 (JIT enabled, AOT enabled)
J9VM - 20080415_018762_LHdSMr
JIT  - r9_20080415_1520
GC   - 20080415_AA)
JCL  - 20080412_01

Only thing I can think of to test is to push 60,000 to 100,000 updates in the 
span of a few minutes onto a single machine. If I keep the updates under 30,000 
per hour, it runs fine. Due to the sensitivity of the data, I cannot provide a 
copy of example data or schema, but I can provide analogs if needed. I can also 
provide scrubbed config files as well. I'm afraid I lack the knowledge to make 
use of the Jetty.
  
> XML parser stops working under heavy load
> -
>
> Key: SOLR-600
> URL: https://issues.apache.org/jira/browse/SOLR-600
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 
> x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
> Tomcat 6.0.16
> SOLR nightly 16 Jun 2008, and versions prior
> JRE 1.6.0
>Reporter: John Smith
>
> Under heavy load, the following is spat out for every update:
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source)
> at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:735)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread John Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605696#action_12605696
 ] 

John Smith commented on SOLR-600:
-

Java(TM) SE Runtime Environment (build pxa6460sr1-20080416_01(SR1))
IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 Linux amd64-64 
jvmxa6460-20080415_18762 (JIT enabled, AOT enabled)
J9VM - 20080415_018762_LHdSMr
JIT  - r9_20080415_1520
GC   - 20080415_AA)
JCL  - 20080412_01

Only thing I can think of to test is to push 60,000 to 100,000 updates in the 
span of a few minutes onto a single machine. If I keep the updates under 30,000 
per hour, it runs fine. Due to the sensitivity of the data, I cannot provide a 
copy of example data or schema, but I can provide analogs if needed. I can also 
provide scrubbed config files as well. I'm afraid I lack the knowledge to make 
use of the Jetty.

> XML parser stops working under heavy load
> -
>
> Key: SOLR-600
> URL: https://issues.apache.org/jira/browse/SOLR-600
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 
> x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
> Tomcat 6.0.16
> SOLR nightly 16 Jun 2008, and versions prior
> JRE 1.6.0
>Reporter: John Smith
>
> Under heavy load, the following is spat out for every update:
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source)
> at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:735)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter

2008-06-17 Thread Geoffrey Young (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geoffrey Young updated SOLR-14:
---

Attachment: SOLR-14.patch

ok, I've given this a shot.  I'm an an open-source guy, even an ASF guy, but 
not a java guy, so forgive my code ;)

the patch should apply cleanly to current trunk.  the last patch before mine 
still had some issues that needed to be worked through wrt term duplication.  
this patch should work a bit better.

all current tests pass when adjusted to 'preserveOriginal=0' (default behavior, 
same as 1.2).  I looked at augmenting the current tests for WDF and 
'preserveOriginal=1' but it's beyond my current java abilities.

installing the patch and running the analyzer yields stuff like this:

  foo => foo
  foo-bar => foo-bar foo bar foobar
  foo-bar baz => foo-bar foo bar foobar baz
  foo! => foo foo!

which seems reasonable to me.

a little shepherding would be awesome.

thanks

--Geoff


> Add the ability to preserve the original term when using WordDelimiterFilter
> 
>
> Key: SOLR-14
> URL: https://issues.apache.org/jira/browse/SOLR-14
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Richard "Trey" Hyde
> Attachments: SOLR-14.patch, TokenizerFactory.java, 
> WordDelimiterFilter.patch, WordDelimiterFilter.patch
>
>
> When doing prefix searching, you need to hang on to the original term 
> othewise you'll miss many matches you should be making.
> Data: ABC-12345
> WordDelimiterFitler may change this into
> ABC 12345 ABC12345
> A user may enter a search such as 
>  ABC\-123*
> Which will fail to find a match given the above scenario.
> The attached patch will allow the use of the "preserveOriginal" option to 
> WordDelimiterFilter and will analyse as
> ABC 12345 ABC12345  ABC-12345 
> in which case we will get a postive match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-502) Add search time out support

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605673#action_12605673
 ] 

Yonik Seeley commented on SOLR-502:
---

>From SOLR-303:
bq. Perhaps a string should also be added indicating why all results were not 
able to be returned.

If we had that (perhaps in the response header) would there still be a need to 
have this partial results flag on DocSet/DocList?  It always felt a little 
wrong that this feature wormed it's way to that low of a level (DocSet, 
response writers, response parsers, etc).  Seems like it could/should me much 
simpler.

> Add search time out support
> ---
>
> Key: SOLR-502
> URL: https://issues.apache.org/jira/browse/SOLR-502
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sean Timm
>Assignee: Otis Gospodnetic
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-502.patch, SOLR-502.patch, SOLR-502.patch, 
> SOLR-502.patch, solrTimeout.patch, solrTimeout.patch, solrTimeout.patch, 
> solrTimeout.patch, solrTimeout.patch
>
>
> Uses LUCENE-997 to add time out support to Solr.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-303) Distributed Search over HTTP

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605670#action_12605670
 ] 

Yonik Seeley commented on SOLR-303:
---

Lars: I committed your fix to the facet.limit value sent to shards, and instead 
of changing ntop when facet.limit<=0, I simply short-circuited checking if 
refinement is needed at all.

Next up: investigate this URL encoding (or lack of it) in the POST body.

> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
> distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
> fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-303) Distributed Search over HTTP

2008-06-17 Thread Sean Timm (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605666#action_12605666
 ] 

Sean Timm commented on SOLR-303:


In SOLR-502, there is the notion of partialResults.  It seems that the same 
flag could be used in this case.  Perhaps a string should also be added 
indicating why all results were not able to be returned.

> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
> distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
> fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-572) Spell Checker as a Search Component

2008-06-17 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-572:
-

Attachment: SOLR-572.patch

Fix for the default name issue, add a test for it.

> Spell Checker as a Search Component
> ---
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
>  Issue Type: New Feature
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>
>
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-303) Distributed Search over HTTP

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605660#action_12605660
 ] 

Yonik Seeley commented on SOLR-303:
---

> But shouldn't there be an option to skip over servers that aren't responding 
> or time out?

That does sound like it would be a useful option (but I think it should be 
false by default though).

FYI, I'm currently looking into Lars' facet changes.

> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
> distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
> fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Wiki loading time

2008-06-17 Thread Grant Ingersoll
OK, FWIW, I asked on infrastructure w/ a subject of "Solr MoinMoin  
Wiki Loading/Save Time" for those who have access to that list (I  
forget if it is public)


-Grant

On Jun 17, 2008, at 12:55 PM, Chris Hostetter wrote:



: I'm going to open an INFRA bug to see if there is any insight to  
be had here.
: I'm not a wiki admin, but maybe there is a way to check how many  
notifications
: are set for that page? (http://wiki.apache.org/solr/SpellCheckComponent 
)


according to the info page, none...

http://wiki.apache.org/solr/SpellCheckComponent?action=info&general=1

...but obviously there is the generic solr-commits notification (which
being site wide seems to be configured at a lower level)



-Hoss






Re: Wiki loading time

2008-06-17 Thread Shalin Shekhar Mangar
And no emails for the wiki edit notifications have made it to solr-commits
list.

On Tue, Jun 17, 2008 at 10:25 PM, Chris Hostetter <[EMAIL PROTECTED]>
wrote:

>
> : I'm going to open an INFRA bug to see if there is any insight to be had
> here.
> : I'm not a wiki admin, but maybe there is a way to check how many
> notifications
> : are set for that page? (http://wiki.apache.org/solr/SpellCheckComponent)
>
> according to the info page, none...
>
> http://wiki.apache.org/solr/SpellCheckComponent?action=info&general=1
>
> ...but obviously there is the generic solr-commits notification (which
> being site wide seems to be configured at a lower level)
>
>
>
> -Hoss
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Wiki loading time

2008-06-17 Thread Chris Hostetter

: I'm going to open an INFRA bug to see if there is any insight to be had here.
: I'm not a wiki admin, but maybe there is a way to check how many notifications
: are set for that page? (http://wiki.apache.org/solr/SpellCheckComponent)

according to the info page, none...

http://wiki.apache.org/solr/SpellCheckComponent?action=info&general=1

...but obviously there is the generic solr-commits notification (which 
being site wide seems to be configured at a lower level)



-Hoss



Re: Wiki loading time

2008-06-17 Thread Shalin Shekhar Mangar
I noticed that too while editing that page a while ago. Maybe the email
request from the wiki is timing out?

On Tue, Jun 17, 2008 at 10:20 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:

> Hmmm, I have been editing the SpellCheckComponent this morning and it is
> taking an absurdly long time to save.  I open a different tab, and the
> results are there almost instantaneously.  I, do notice, that the notices
> are not being sent.  I don't think the page has anything fancy on it terms
> of wiki syntax.
>
> I'm going to open an INFRA bug to see if there is any insight to be had
> here.  I'm not a wiki admin, but maybe there is a way to check how many
> notifications are set for that page? (
> http://wiki.apache.org/solr/SpellCheckComponent)
>
> -Grant
>
>
> On Jun 1, 2008, at 4:21 PM, Grant Ingersoll wrote:
>
>  +1  Thanks Hoss!
>>
>> On Jun 1, 2008, at 3:09 PM, Chris Hostetter wrote:
>>
>>
>>> FWIW: I removed the dynamic macros from the front page, and it's now
>>> *noticibly* faster.
>>>
>>> : : Is it just me, or does the wiki front page
>>> : : (http://wiki.apache.org/solr/FrontPage) take a really long time to
>>> load?
>>> :
>>> : I suspect that has to do with the two dynamicly loaded lists: all
>>> request
>>> : handlers, all response writers.
>>> :
>>> : it makes the wiki easier to manage, but it's probably slower (i'm
>>> guessing
>>> : MoinMoin doesn't cache those "FullSearch" calls.
>>>
>>>
>>> -Hoss
>>>
>>>
>>
>>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Wiki loading time

2008-06-17 Thread Grant Ingersoll
Hmmm, I have been editing the SpellCheckComponent this morning and it  
is taking an absurdly long time to save.  I open a different tab, and  
the results are there almost instantaneously.  I, do notice, that the  
notices are not being sent.  I don't think the page has anything fancy  
on it terms of wiki syntax.


I'm going to open an INFRA bug to see if there is any insight to be  
had here.  I'm not a wiki admin, but maybe there is a way to check how  
many notifications are set for that page? (http://wiki.apache.org/solr/SpellCheckComponent 
)


-Grant

On Jun 1, 2008, at 4:21 PM, Grant Ingersoll wrote:


+1  Thanks Hoss!

On Jun 1, 2008, at 3:09 PM, Chris Hostetter wrote:



FWIW: I removed the dynamic macros from the front page, and it's now
*noticibly* faster.

: : Is it just me, or does the wiki front page
: : (http://wiki.apache.org/solr/FrontPage) take a really long time  
to load?

:
: I suspect that has to do with the two dynamicly loaded lists: all  
request

: handlers, all response writers.
:
: it makes the wiki easier to manage, but it's probably slower (i'm  
guessing

: MoinMoin doesn't cache those "FullSearch" calls.


-Hoss








[jira] Commented: (SOLR-303) Distributed Search over HTTP

2008-06-17 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605653#action_12605653
 ] 

Otis Gospodnetic commented on SOLR-303:
---

Ah, yes, I agree with Brian.  I did see this, too, fut forgot to report it as a 
problem that needs a fix.


> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
> distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
> fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-303) Distributed Search over HTTP

2008-06-17 Thread Brian Whitman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605650#action_12605650
 ] 

Brian Whitman commented on SOLR-303:


When I give the following request:

http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:8984/solr&q=woof

With no server running on 8984 I get a error 500 (naturally.)

But shouldn't there be an option to skip over servers that aren't responding or 
time out? Envisioning a scenario in which this is used to search across 
possibly redundant uniqueIDs and a server being down is not cause for exception.




> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
> distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
> fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-486) Support binary formats for QueryresponseWriter

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605648#action_12605648
 ] 

Yonik Seeley commented on SOLR-486:
---

Fixed (committed) the constructor inconsistency.

> Support binary formats for QueryresponseWriter
> --
>
> Key: SOLR-486
> URL: https://issues.apache.org/jira/browse/SOLR-486
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java, search
>Reporter: Noble Paul
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: SOLR-486.patch, solr-486.patch, SOLR-486.patch, 
> SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, 
> SOLR-486.patch, SOLR-486.patch, SOLR-486.patch
>
>
> QueryResponse writer only allows text data to be written.
> So it is not possible to implement a binary protocol . Create another 
> interface which has a method 
> write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-17 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605645#action_12605645
 ] 

Grant Ingersoll commented on SOLR-572:
--

{quote}
Why is a WhiteSpaceTokenizer being used for tokenizing the value for a 
spellcheck.q parameter? Wouldn't it be more correct to use the query analyzer 
if the index is being built from a Solr field?

The above argument also applies to queryAnalyzerFieldType which is being used 
for QueryConverter
{quote}

My understanding was that the sc.q parameter was already analyzed and ready to 
be checked, thus all it needed was a conversion to tokens.  As for the 
queryAnalyzerFieldType, that assumes the implementation is the 
IndexBasedSpellChecker or some other field based one that the 
SpellCheckComponent doesn't have access to, thus my reasoning that it needs to 
be handled separately and explicitly, which is why it isn't a part of the 
spellchecker configuration.

 {quote}
I see that we can specify our own query converter through the queryConverter 
section in solrconfig.xml. But the SpellCheckComponent uses 
SpellingQueryConverter directly instead of an interface. We should add a 
QueryConvertor interface if this needs to be pluggable.
{quote}

I thought about making it an abstract base class, but in my mind it is really 
easy to override the SpellingQueryConverter and the component should know how 
to deal with it.

 {quote}
If name is omitted from two dictionaries in solrconfig.xml then both get named 
as Default from the SolrSpellChecker#init method and they overwrite each other 
in the spellCheckers map
{quote}

Hmm, not good.  I will fix.

{quote}
How about building the index in the inform() method? I understand that the 
users can build the index using spellcheck.build=true and they can also use 
QuerySenderListener to build the index but this limits the user to use 
FSDirectory because if we use RAMDirectory and solr is restarted, the 
QuerySenderListener never fires and spell checker is left with no index. It's 
not a major inconvenience to use FSDirectory always but then RAMDirectory 
doesn't bring much to the table.
{quote}

I think this gets back to our early discussions about it not working in inform 
b/c we don't have the reader at that point, or something like that.  I really 
don't know the right answer, but do feel free to try it out.  I do think it 
belongs in inform, but not sure if Solr is ready at that point.  As for the 
QuerySenderListener, seems like it should fire if it is restarted, but I admit 
I don't know a whole lot about that functionality.  


> Spell Checker as a Search Component
> ---
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
>  Issue Type: New Feature
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>
>
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-601) protected QParser.parse() and subclasses

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605640#action_12605640
 ] 

Yonik Seeley commented on SOLR-601:
---

1) I will run "ant test" before committing no matter how trivial the change 
seems
2) I will run "ant test" before committing no matter how trivial the change 
seems

;-)


> protected QParser.parse() and subclasses
> 
>
> Key: SOLR-601
> URL: https://issues.apache.org/jira/browse/SOLR-601
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.3
>Reporter: Julien Piquot
> Fix For: 1.3
>
>
> : As QParser.parse is protected and QParser.subQuery is public, everything
> : works fine when I run parse() myself (through unit tests). But when I
> : try to run it through a Solr server, I get :
> all of the concrete impls of QParser in the solr code base declare the
> parse() method as public ... i'm not sure why it's protected in the abstract 
> class ... seems wrong to me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-601) protected QParser.parse() and subclasses

2008-06-17 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605637#action_12605637
 ] 

Shalin Shekhar Mangar commented on SOLR-601:


Yonik: FooQParser#parse is not compiling after this change. It should also be 
changed to public:

[javac] Compiling 106 source files to 
/home/shalinsmangar/work/oss/solr-trunk/build/tests
[javac] 
/home/shalinsmangar/work/oss/solr-trunk/src/test/org/apache/solr/search/FooQParserPlugin.java:43:
 parse() in org.apache.solr.search.FooQParser cannot override parse() in 
org.apache.solr.search.QParser; attempting to assign weaker access privileges; 
was public
[javac]   protected Query parse() throws ParseException {
[javac] ^

> protected QParser.parse() and subclasses
> 
>
> Key: SOLR-601
> URL: https://issues.apache.org/jira/browse/SOLR-601
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.3
>Reporter: Julien Piquot
> Fix For: 1.3
>
>
> : As QParser.parse is protected and QParser.subQuery is public, everything
> : works fine when I run parse() myself (through unit tests). But when I
> : try to run it through a Solr server, I get :
> all of the concrete impls of QParser in the solr code base declare the
> parse() method as public ... i'm not sure why it's protected in the abstract 
> class ... seems wrong to me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-17 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605635#action_12605635
 ] 

Shalin Shekhar Mangar commented on SOLR-572:


A few questions/comments:

# Why is a WhiteSpaceTokenizer being used for tokenizing the value for a 
spellcheck.q parameter? Wouldn't it be more correct to use the query analyzer 
if the index is being built from a Solr field?
# The above argument also applies to queryAnalyzerFieldType which is being used 
for QueryConverter.
# I see that we can specify our own query converter through the queryConverter 
section in solrconfig.xml. But the SpellCheckComponent uses 
SpellingQueryConverter directly instead of an interface. We should add a 
QueryConvertor interface if this needs to be pluggable.
# If name is omitted from two dictionaries in solrconfig.xml then both get 
named as Default from the SolrSpellChecker#init method and they overwrite each 
other in the spellCheckers map
# How about building the index in the inform() method? I understand that the 
users can build the index using spellcheck.build=true and they can also use 
QuerySenderListener to build the index but this limits the user to use 
FSDirectory because if we use RAMDirectory and solr is restarted, the 
QuerySenderListener never fires and spell checker is left with no index. It's 
not a major inconvenience to use FSDirectory always but then RAMDirectory 
doesn't bring much to the table.



> Spell Checker as a Search Component
> ---
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
>  Issue Type: New Feature
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>
>
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-486) Support binary formats for QueryresponseWriter

2008-06-17 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605629#action_12605629
 ] 

Noble Paul commented on SOLR-486:
-

These two constructors are inconsistent
{code:title=CommonsHttpSolrServer.java}
public CommonsHttpSolrServer(URL baseURL) 
  {
this(baseURL, null, new BinaryResponseParser());
  }

  public CommonsHttpSolrServer(URL baseURL, HttpClient client){
this(baseURL, client, new XMLResponseParser());
  }
{code}

> Support binary formats for QueryresponseWriter
> --
>
> Key: SOLR-486
> URL: https://issues.apache.org/jira/browse/SOLR-486
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java, search
>Reporter: Noble Paul
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: SOLR-486.patch, solr-486.patch, SOLR-486.patch, 
> SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, 
> SOLR-486.patch, SOLR-486.patch, SOLR-486.patch
>
>
> QueryResponse writer only allows text data to be written.
> So it is not possible to implement a binary protocol . Create another 
> interface which has a method 
> write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605612#action_12605612
 ] 

Yonik Seeley commented on SOLR-236:
---

Since this is adding new interface/API, it would be very nice if one could 
easily review it.  It's very important that the interface and the exact 
semantics are nailed down IMO (there seem to be a lot of options).
Is http://wiki.apache.org/solr/FieldCollapsing up-to-date?

There don't seem to be any tests either.

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
>Assignee: Otis Gospodnetic
> Attachments: field-collapsing-extended-592129.patch, 
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-486) Support binary formats for QueryresponseWriter

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605601#action_12605601
 ] 

Yonik Seeley commented on SOLR-486:
---

OK, I just committed the latest changes.

> Support binary formats for QueryresponseWriter
> --
>
> Key: SOLR-486
> URL: https://issues.apache.org/jira/browse/SOLR-486
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java, search
>Reporter: Noble Paul
>Assignee: Yonik Seeley
> Fix For: 1.3
>
> Attachments: SOLR-486.patch, solr-486.patch, SOLR-486.patch, 
> SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, 
> SOLR-486.patch, SOLR-486.patch, SOLR-486.patch
>
>
> QueryResponse writer only allows text data to be written.
> So it is not possible to implement a binary protocol . Create another 
> interface which has a method 
> write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-601) protected QParser.parse() and subclasses

2008-06-17 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-601.
---

   Resolution: Fixed
Fix Version/s: 1.3

I changed parse() to public.

> protected QParser.parse() and subclasses
> 
>
> Key: SOLR-601
> URL: https://issues.apache.org/jira/browse/SOLR-601
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.3
>Reporter: Julien Piquot
> Fix For: 1.3
>
>
> : As QParser.parse is protected and QParser.subQuery is public, everything
> : works fine when I run parse() myself (through unit tests). But when I
> : try to run it through a Solr server, I get :
> all of the concrete impls of QParser in the solr code base declare the
> parse() method as public ... i'm not sure why it's protected in the abstract 
> class ... seems wrong to me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605593#action_12605593
 ] 

Yonik Seeley commented on SOLR-600:
---

I have not been able to reproduce this.
Is it possible to create a test case to reproduce?
Does this happen  if you use the bundled Jetty?
What is the exact Java version you are using?

> XML parser stops working under heavy load
> -
>
> Key: SOLR-600
> URL: https://issues.apache.org/jira/browse/SOLR-600
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 
> x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
> Tomcat 6.0.16
> SOLR nightly 16 Jun 2008, and versions prior
> JRE 1.6.0
>Reporter: John Smith
>
> Under heavy load, the following is spat out for every update:
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source)
> at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:735)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-595) support field level boosting to morelikethis handler.

2008-06-17 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-595.
--

Resolution: Fixed

Committed revision 668638.

Made one minor change from the patch, to check to see if the boostFields is 
non-empty before looping over all the query clauses.

> support field level boosting to morelikethis handler.
> -
>
> Key: SOLR-595
> URL: https://issues.apache.org/jira/browse/SOLR-595
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Thomas Morton
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-595.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Allow boosting to be specified for particular fields when using more like 
> this.
> # Parse out "mlt.qf parameters" to get boosts in dismax like format (existing 
> code from DisMax param parse code used to produce a Map)
> #   Iterate through mltquery terms, get boost by looking at field from which 
> mltquery term came,  and multiply boost specified in map by existing term 
> boost.
> * If mlt.boost=false, then you get the same boost values as in map/mlt.qf 
> parameters,
> * If mlt.boost=true then you get normalized boost multiplied by specified 
> boost (which makes sense to me).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (SOLR-595) support field level boosting to morelikethis handler.

2008-06-17 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on SOLR-595 started by Grant Ingersoll.

> support field level boosting to morelikethis handler.
> -
>
> Key: SOLR-595
> URL: https://issues.apache.org/jira/browse/SOLR-595
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Thomas Morton
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-595.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Allow boosting to be specified for particular fields when using more like 
> this.
> # Parse out "mlt.qf parameters" to get boosts in dismax like format (existing 
> code from DisMax param parse code used to produce a Map)
> #   Iterate through mltquery terms, get boost by looking at field from which 
> mltquery term came,  and multiply boost specified in map by existing term 
> boost.
> * If mlt.boost=false, then you get the same boost values as in map/mlt.qf 
> parameters,
> * If mlt.boost=true then you get normalized boost multiplied by specified 
> boost (which makes sense to me).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-601) protected QParser.parse() and subclasses

2008-06-17 Thread Julien Piquot (JIRA)
protected QParser.parse() and subclasses


 Key: SOLR-601
 URL: https://issues.apache.org/jira/browse/SOLR-601
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Julien Piquot


: As QParser.parse is protected and QParser.subQuery is public, everything
: works fine when I run parse() myself (through unit tests). But when I
: try to run it through a Solr server, I get :

all of the concrete impls of QParser in the solr code base declare the
parse() method as public ... i'm not sure why it's protected in the abstract 
class ... seems wrong to me.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.