[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-01 Thread Thomas Peuss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564690#action_12564690
 ] 

Thomas Peuss commented on SOLR-127:
---

{quote}
*  the test classes still need some work, both in terms of the current 
failure mentioned above, and to cover more permutations of options. When we're 
all said and done, we'll probably want at least 3 separate sets of test/configs:
 1. default, no httpCaching section in config at all ... should 
generate Last-Mod and Etag headers and do validation, stoping/starting port 
should make Last-Mod change but not ETag.
 2. never304=false, lastModFrom=dirLastMod ... should generate 
Last-Mod and Etag headers and do validation, no headers should change if we 
stop/start the port.
 3. never304=true ... no Last-Mod of ETag headers, no 304 even if we 
send crazy old If-Modified-Since
* there's also probably some refactoring that can still be done in the 
tests (i noticed some duplicate code that can be moved up into the Base class)
{quote}

I take care of the tests.

{quote}
* it occurred to me while adding the etagSeed that right now the etag 
caching is a singleton, we'll need to make this core-specific (using a 
WeakHashMap i guess? i'm not fond of that approach, but these are really tiny 
pieces of info we are caching)
* calcLastModified and calcEtag currently assume they can get 
requestDispatcher/httpCaching config options from SolrConfig ... but this need 
to be reconciled with SOLR-350 where there is a plan to move all 
requestDispatcher configs to multicore.xml (but i've pointed out in that issue 
i'm not sure if that is necessary or makes sense.)
{quote}

When I remember right every core has its own classloader. Then every core has 
its own set of static fields (and why real singletons are not that easy to do 
in Java).

 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-01 Thread Thomas Peuss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564690#action_12564690
 ] 

tpeuss edited comment on SOLR-127 at 2/1/08 1:09 AM:
---

{quote}
*  the test classes still need some work, both in terms of the current 
failure mentioned above, and to cover more permutations of options. When we're 
all said and done, we'll probably want at least 3 separate sets of test/configs:
 1. default, no httpCaching section in config at all ... should 
generate Last-Mod and Etag headers and do validation, stoping/starting port 
should make Last-Mod change but not ETag.
 2. never304=false, lastModFrom=dirLastMod ... should generate 
Last-Mod and Etag headers and do validation, no headers should change if we 
stop/start the port.
 3. never304=true ... no Last-Mod of ETag headers, no 304 even if we 
send crazy old If-Modified-Since
* there's also probably some refactoring that can still be done in the 
tests (i noticed some duplicate code that can be moved up into the Base class)
{quote}

I take care of the tests.

{quote}
* it occurred to me while adding the etagSeed that right now the etag 
caching is a singleton, we'll need to make this core-specific (using a 
WeakHashMap i guess? i'm not fond of that approach, but these are really tiny 
pieces of info we are caching)
* calcLastModified and calcEtag currently assume they can get 
requestDispatcher/httpCaching config options from SolrConfig ... but this need 
to be reconciled with SOLR-350 where there is a plan to move all 
requestDispatcher configs to multicore.xml (but i've pointed out in that issue 
i'm not sure if that is necessary or makes sense.)
{quote}

When I remember right every core has its own classloader. Then every core has 
its own set of static fields. This is why real singletons are not that easy to 
do in Java.

  was (Author: tpeuss):
{quote}
*  the test classes still need some work, both in terms of the current 
failure mentioned above, and to cover more permutations of options. When we're 
all said and done, we'll probably want at least 3 separate sets of test/configs:
 1. default, no httpCaching section in config at all ... should 
generate Last-Mod and Etag headers and do validation, stoping/starting port 
should make Last-Mod change but not ETag.
 2. never304=false, lastModFrom=dirLastMod ... should generate 
Last-Mod and Etag headers and do validation, no headers should change if we 
stop/start the port.
 3. never304=true ... no Last-Mod of ETag headers, no 304 even if we 
send crazy old If-Modified-Since
* there's also probably some refactoring that can still be done in the 
tests (i noticed some duplicate code that can be moved up into the Base class)
{quote}

I take care of the tests.

{quote}
* it occurred to me while adding the etagSeed that right now the etag 
caching is a singleton, we'll need to make this core-specific (using a 
WeakHashMap i guess? i'm not fond of that approach, but these are really tiny 
pieces of info we are caching)
* calcLastModified and calcEtag currently assume they can get 
requestDispatcher/httpCaching config options from SolrConfig ... but this need 
to be reconciled with SOLR-350 where there is a plan to move all 
requestDispatcher configs to multicore.xml (but i've pointed out in that issue 
i'm not sure if that is necessary or makes sense.)
{quote}

When I remember right every core has its own classloader. Then every core has 
its own set of static fields (and why real singletons are not that easy to do 
in Java).
  
 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t 

[jira] Created: (SOLR-469) DB Import RequestHandler

2008-02-01 Thread Noble Paul (JIRA)
DB Import RequestHandler


 Key: SOLR-469
 URL: https://issues.apache.org/jira/browse/SOLR-469
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.3
Reporter: Noble Paul
Priority: Minor
 Fix For: 1.3


We need a RequestHandler Which can import data from a DB or other dataSources 
into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
(SOLR-103).

The way it works is as follows.

* Provide a configuration file (xml) to the Handler which takes in the 
necessary SQL queries and mappings to a solr schema
  - It also takes in a properties file for the data source 
configuraution
* Given the configuration it can also generate the solr schema.xml
* It is registered as a RequestHandler which can take two commands 
do-full-import, do-delta-import
  -  do-full-import - dumps all the data from the Database into the 
index (based on the SQL query in configuration)
  - do-delta-import - dumps all the data that has changed since last 
import. (We assume a modified-timestamp column in tables)
* It provides a admin page
  - where we can schedule it to be run automatically at regular 
intervals
  - It shows the status of the Handler (idle, full-import, delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-281) Search Components (plugins)

2008-02-01 Thread Michael Dodsworth (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564739#action_12564739
 ] 

Michael Dodsworth commented on SOLR-281:


{quote} 
That would require instantiation with reflection I think. 
{quote} 

Reflection is already being used to create the QParserPlugins (SolrCore:1027 
and AbstractPluginLoader:83) - I'm guessing the reason for the plugin is just 
to avoid creating instances through reflection on every parse (as you could 
keep hold of the QParser class and call newInstance). The second point is moot, 
once you take away the need for createParser(...). 

It's really not that big-a-deal, in the scheme of things. 

{quote} 
QParserPlugin is that interface essentially (except that its an class instead 
of an interface). For library maintainers an abstract class is preferred over 
an interface for things that a user will extend... that way signature changes 
can be made in a backward compatible manner. 
{quote} 

As an aside, method signature changes are usually trivial to fix; personally, 
the pain of those fixes is favourable to extending an abstract class 
unnecessarily. 
Are there any architectural reworking projects on the roadmap? I'm sure 
backward compatibility is a massive concern; perhaps with the more modular 
plugin design route Solr is going down, those concerns can be addressed. If 
there's a chance of being accepted, I would love to contribute a move towards 
using Spring. 



 Search Components (plugins)
 ---

 Key: SOLR-281
 URL: https://issues.apache.org/jira/browse/SOLR-281
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 1.3

 Attachments: SOLR-281-ComponentInit.patch, 
 SOLR-281-ComponentInit.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 solr-281.patch, solr-281.patch, solr-281.patch


 A request handler with pluggable search components for things like:
   - standard
   - dismax
   - more-like-this
   - highlighting
   - field collapsing 
 For more discussion, see:
 http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-330) Use new Lucene Token APIs (reuse and char[] buff)

2008-02-01 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-330:
-

Attachment: SOLR-330.patch

First draft of a patch that updates the various TokenFilters, etc. in Solr to 
use the new Lucene reuse API.  Notes on implementation below:

Also, cleans up some of the javadocs in various files

Added Test for the Porter stemmer.

Cleaned up some string literals to be constants so that they can be safely 
referred to in the tests.

In the PatternTokenFilter, it would be cool if there was a way to just operate 
on the char array, but I don't see that the Pattern/Matcher API supports it.

Same goes for PhoneticTokenFilter

I'm not sure yet if the BufferedTokenStream can take advantage of reuse, so I 
have left them alone for now, other than some minor doc fixes.  I will think 
about this some more.

In RemoveDuplicatesTF, I only converted to using termBuffer, not Token reuse.   
I removed the IN and OUT loop labels, as I don't see what functionality 
they provide.

Added ArraysUtils class and test to provide a bit more functionality than 
Arrays.java offers in terms of comparing two char arrays.  This could be 
expanded at some point to cover other primitive comparisons.

My understanding of the new reusableTokenStream means we can't use it in the 
SolrAnalyzer

On the TrimFilter, it is not clear to me that there would be a token that is 
ever all whitespace.  However, since the test handles it, I wonder why the a 
Token of , when update offsets are on, reports the offsets as the end 
and not the start.  Just a minor nit, but it seems like the start/end offsets 
should be 0, not the end of the token.

I'm not totally sure on the WordDelimiterFilter, as there is a fair amount of 
new token creation,  Also, I think, the newTok() method doesn't set the 
position increment based on the original position increment, so I added that.

 I'm also not completely sure how to handle FieldType DefaultAnalyzer.next().  
It seems like it could reuse the token

Also not sure why the duplicate code for the MultiValueTokenStream in 
HighlighterUtils and SolrHighlighter, so I left the highlighter TokenStreams 
alone.

 Use new Lucene Token APIs (reuse and char[] buff)
 -

 Key: SOLR-330
 URL: https://issues.apache.org/jira/browse/SOLR-330
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-330.patch


 Lucene is getting new Token APIs for better performance.
 - token reuse
 - char[] offset + len instead of String
 Requires a new version of lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-281) Search Components (plugins)

2008-02-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564853#action_12564853
 ] 

Yonik Seeley commented on SOLR-281:
---

Followed up on solr-dev to avoid stealing more of this JIRA isse:
http://www.nabble.com/Re%3A--jira--Commented%3A-Search-Components-%28plugins%29-to15227648.html#a15227648

 Search Components (plugins)
 ---

 Key: SOLR-281
 URL: https://issues.apache.org/jira/browse/SOLR-281
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 1.3

 Attachments: SOLR-281-ComponentInit.patch, 
 SOLR-281-ComponentInit.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 solr-281.patch, solr-281.patch, solr-281.patch


 A request handler with pluggable search components for things like:
   - standard
   - dismax
   - more-like-this
   - highlighting
   - field collapsing 
 For more discussion, see:
 http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2008-02-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564896#action_12564896
 ] 

Yonik Seeley commented on SOLR-139:
---

I'm having second thoughts if this is a good enough approach to really put in 
core Solr.
Requiring that all fields be stored is a really large drawback, esp for large 
indicies with really large documents.

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 1.3

 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2008-02-01 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564910#action_12564910
 ] 

Ryan McKinley commented on SOLR-139:


that is part of why i thought having it in an update request processor makes 
sense -- it can easily be subclassed to pull the existing fields from whereever 
it needs.  Even if it is directly in the UpdateHandler, there could be some 
interface to _loadExistingFields( id )_ or something similar.

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 1.3

 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2008-02-01 Thread Charles Hornberger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564952#action_12564952
 ] 

Charles Hornberger commented on SOLR-236:
-

NegatedDocSet is throwing Unsupported Operation exceptions:

org.apache.solr.common.SolrException:Unsupported Operation 
 at org.apache.solr.search.NegatedDocSet.iterator(NegatedDocSet.java:77)
 at org.apache.solr.search.DocSetBase.getBits(DocSet.java:183)  
 at org.apache.solr.search.NegatedDocSet.getBits(NegatedDocSet.java:27) 
 at org.apache.solr.search.DocSetBase.intersection(DocSet.java:199)  
 at org.apache.solr.search.BitDocSet.intersection(BitDocSet.java:30) 
 at 
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1109)
 at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:811)
 at 
org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1258)
 at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:103)
 at 
org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:155)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:902) 
 at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:275)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
 at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)
 at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
 at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
 at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
 at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
 at java.lang.Thread.run(Thread.java:595)

Not quite sure what search is triggering this path thru the code, but it is not 
happening on every request; just some ... am firing up the debugger now to see 
what I can learn, but thought I'd post this anyway to see if anyone has any 
tips.

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2008-02-01 Thread Charles Hornberger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564966#action_12564966
 ] 

clh edited comment on SOLR-236 at 2/1/08 2:59 PM:
-

Ah ... got the beginnings of a diagnosis. The problem appears when the DocSet 
{{qDocSet}} returned by DocSetHitCollector.getDocSet() -- called at 
org.apache.solr.search.SolrIndexSearcher:1101 in trunk, or 1108 with the 
field_collapsing patch applied, inside getDocListAndSetNC()) -- is a BitDocSet, 
and not when it's a HashDocSet. As the stack trace above shows, calling 
intersection() on a BitDocSet object invokes the superclass' 
DocSetBase.intersection() method, which invokes a call chain that blows up when 
it hits the iterator() method of the NegatedDocSet passed in as the {{filter}} 
parameter to getDocListAndSetNC(); NegatedDocSet.iterator() blows up by design:

{code}
public DocIterator iterator() {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, Unsupported 
Operation);
}
{code}

I see that DocSetBase.intersection(DocSet other) has special-casing logic for 
dealing with {{other}} parameters that are instances of HashDocSet; does it 
also need special casing logic for dealing with {{other}} parameters that are 
NegatedDocSets? Or should NegatedDocSet *really* implement iterator()? Or 
something else entirely?

  was (Author: clh):
Ah ... got the beginnings of a diagnosis. The problem appears when the 
DocSet {{qDocSet}} returned by DocSetHitCollector.getDocSet() -- called at 
org.apache.solr.search.SolrIndexSearcher:1101 in trunk, or 1108 with the 
field_collapsing patch applied, inside getDocListAndSetNC()) -- is a BitDocSet, 
and not when it's a HashDocSet. As the stack trace above shows, calling 
intersection() on a BitDocSet object invokes the superclass' 
DocSetBase.intersection() method, which invokes a call chain that blows up when 
it hits the iterator() method of the NegatedDocSet passed in as the {{filter}} 
parameter to getDocListAndSetNC(); NegatedDocSet.iterator() blows up by design:

{{
public DocIterator iterator() {
 throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, 
Unsupported Operation);
}
}}

I see that DocSetBase.intersection(DocSet other) has special-casing logic for 
dealing with {{other}} parameters that are instances of HashDocSet; does it 
also need special casing logic for dealing with {{other}} parameters that are 
NegatedDocSets? Or should NegatedDocSet *really* implement iterator()? Or 
something else entirely?
  
 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2008-02-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12565019#action_12565019
 ] 

Yonik Seeley commented on SOLR-236:
---

I haven't been following this, so I don't know why there is a need for a 
NegatedDocSet (or if introducing it is the best solution), but it looks like 
you have two cases to handle: one negative set or two negative sets.
If you have a and -b, then return a.andNot(b)
if both a and b are negative (-a.intersection(-b))  then return 
NegatedDocSet(a.union(b))  // per De Morgan, -a-b == -(a|b)

That's only for intersection() of course.

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-01 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-342:
-

Attachment: SOLR-342.patch

Updated to work against trunk.

As always, let me know if there is anything I can do to help get this committed.

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.