[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564690#action_12564690 ] Thomas Peuss commented on SOLR-127: --- {quote} * the test classes still need some work, both in terms of the current failure mentioned above, and to cover more permutations of options. When we're all said and done, we'll probably want at least 3 separate sets of test/configs: 1. default, no httpCaching section in config at all ... should generate Last-Mod and Etag headers and do validation, stoping/starting port should make Last-Mod change but not ETag. 2. never304=false, lastModFrom=dirLastMod ... should generate Last-Mod and Etag headers and do validation, no headers should change if we stop/start the port. 3. never304=true ... no Last-Mod of ETag headers, no 304 even if we send crazy old If-Modified-Since * there's also probably some refactoring that can still be done in the tests (i noticed some duplicate code that can be moved up into the Base class) {quote} I take care of the tests. {quote} * it occurred to me while adding the etagSeed that right now the etag caching is a singleton, we'll need to make this core-specific (using a WeakHashMap i guess? i'm not fond of that approach, but these are really tiny pieces of info we are caching) * calcLastModified and calcEtag currently assume they can get requestDispatcher/httpCaching config options from SolrConfig ... but this need to be reconciled with SOLR-350 where there is a plan to move all requestDispatcher configs to multicore.xml (but i've pointed out in that issue i'm not sure if that is necessary or makes sense.) {quote} When I remember right every core has its own classloader. Then every core has its own set of static fields (and why real singletons are not that easy to do in Java). Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t the very least, we can set a Last-Modified based on when the current IndexReder was open (if not the Date on the IndexReader) and use the same info to determing how to respond to If-Modified-Since requests. (for the record, i think the reason this hasn't occured to me in the 2+ years i've been using Solr, is because with the internal caching, i've yet to need to put a proxy cache in front of Solr) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564690#action_12564690 ] tpeuss edited comment on SOLR-127 at 2/1/08 1:09 AM: --- {quote} * the test classes still need some work, both in terms of the current failure mentioned above, and to cover more permutations of options. When we're all said and done, we'll probably want at least 3 separate sets of test/configs: 1. default, no httpCaching section in config at all ... should generate Last-Mod and Etag headers and do validation, stoping/starting port should make Last-Mod change but not ETag. 2. never304=false, lastModFrom=dirLastMod ... should generate Last-Mod and Etag headers and do validation, no headers should change if we stop/start the port. 3. never304=true ... no Last-Mod of ETag headers, no 304 even if we send crazy old If-Modified-Since * there's also probably some refactoring that can still be done in the tests (i noticed some duplicate code that can be moved up into the Base class) {quote} I take care of the tests. {quote} * it occurred to me while adding the etagSeed that right now the etag caching is a singleton, we'll need to make this core-specific (using a WeakHashMap i guess? i'm not fond of that approach, but these are really tiny pieces of info we are caching) * calcLastModified and calcEtag currently assume they can get requestDispatcher/httpCaching config options from SolrConfig ... but this need to be reconciled with SOLR-350 where there is a plan to move all requestDispatcher configs to multicore.xml (but i've pointed out in that issue i'm not sure if that is necessary or makes sense.) {quote} When I remember right every core has its own classloader. Then every core has its own set of static fields. This is why real singletons are not that easy to do in Java. was (Author: tpeuss): {quote} * the test classes still need some work, both in terms of the current failure mentioned above, and to cover more permutations of options. When we're all said and done, we'll probably want at least 3 separate sets of test/configs: 1. default, no httpCaching section in config at all ... should generate Last-Mod and Etag headers and do validation, stoping/starting port should make Last-Mod change but not ETag. 2. never304=false, lastModFrom=dirLastMod ... should generate Last-Mod and Etag headers and do validation, no headers should change if we stop/start the port. 3. never304=true ... no Last-Mod of ETag headers, no 304 even if we send crazy old If-Modified-Since * there's also probably some refactoring that can still be done in the tests (i noticed some duplicate code that can be moved up into the Base class) {quote} I take care of the tests. {quote} * it occurred to me while adding the etagSeed that right now the etag caching is a singleton, we'll need to make this core-specific (using a WeakHashMap i guess? i'm not fond of that approach, but these are really tiny pieces of info we are caching) * calcLastModified and calcEtag currently assume they can get requestDispatcher/httpCaching config options from SolrConfig ... but this need to be reconciled with SOLR-350 where there is a plan to move all requestDispatcher configs to multicore.xml (but i've pointed out in that issue i'm not sure if that is necessary or makes sense.) {quote} When I remember right every core has its own classloader. Then every core has its own set of static fields (and why real singletons are not that easy to do in Java). Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t
[jira] Created: (SOLR-469) DB Import RequestHandler
DB Import RequestHandler Key: SOLR-469 URL: https://issues.apache.org/jira/browse/SOLR-469 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Noble Paul Priority: Minor Fix For: 1.3 We need a RequestHandler Which can import data from a DB or other dataSources into the Solr index .Think of it as an advanced form of SqlUpload Plugin (SOLR-103). The way it works is as follows. * Provide a configuration file (xml) to the Handler which takes in the necessary SQL queries and mappings to a solr schema - It also takes in a properties file for the data source configuraution * Given the configuration it can also generate the solr schema.xml * It is registered as a RequestHandler which can take two commands do-full-import, do-delta-import - do-full-import - dumps all the data from the Database into the index (based on the SQL query in configuration) - do-delta-import - dumps all the data that has changed since last import. (We assume a modified-timestamp column in tables) * It provides a admin page - where we can schedule it to be run automatically at regular intervals - It shows the status of the Handler (idle, full-import, delta-import) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-281) Search Components (plugins)
[ https://issues.apache.org/jira/browse/SOLR-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564739#action_12564739 ] Michael Dodsworth commented on SOLR-281: {quote} That would require instantiation with reflection I think. {quote} Reflection is already being used to create the QParserPlugins (SolrCore:1027 and AbstractPluginLoader:83) - I'm guessing the reason for the plugin is just to avoid creating instances through reflection on every parse (as you could keep hold of the QParser class and call newInstance). The second point is moot, once you take away the need for createParser(...). It's really not that big-a-deal, in the scheme of things. {quote} QParserPlugin is that interface essentially (except that its an class instead of an interface). For library maintainers an abstract class is preferred over an interface for things that a user will extend... that way signature changes can be made in a backward compatible manner. {quote} As an aside, method signature changes are usually trivial to fix; personally, the pain of those fixes is favourable to extending an abstract class unnecessarily. Are there any architectural reworking projects on the roadmap? I'm sure backward compatibility is a massive concern; perhaps with the more modular plugin design route Solr is going down, those concerns can be addressed. If there's a chance of being accepted, I would love to contribute a move towards using Spring. Search Components (plugins) --- Key: SOLR-281 URL: https://issues.apache.org/jira/browse/SOLR-281 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 Attachments: SOLR-281-ComponentInit.patch, SOLR-281-ComponentInit.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, solr-281.patch, solr-281.patch, solr-281.patch A request handler with pluggable search components for things like: - standard - dismax - more-like-this - highlighting - field collapsing For more discussion, see: http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-330) Use new Lucene Token APIs (reuse and char[] buff)
[ https://issues.apache.org/jira/browse/SOLR-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-330: - Attachment: SOLR-330.patch First draft of a patch that updates the various TokenFilters, etc. in Solr to use the new Lucene reuse API. Notes on implementation below: Also, cleans up some of the javadocs in various files Added Test for the Porter stemmer. Cleaned up some string literals to be constants so that they can be safely referred to in the tests. In the PatternTokenFilter, it would be cool if there was a way to just operate on the char array, but I don't see that the Pattern/Matcher API supports it. Same goes for PhoneticTokenFilter I'm not sure yet if the BufferedTokenStream can take advantage of reuse, so I have left them alone for now, other than some minor doc fixes. I will think about this some more. In RemoveDuplicatesTF, I only converted to using termBuffer, not Token reuse. I removed the IN and OUT loop labels, as I don't see what functionality they provide. Added ArraysUtils class and test to provide a bit more functionality than Arrays.java offers in terms of comparing two char arrays. This could be expanded at some point to cover other primitive comparisons. My understanding of the new reusableTokenStream means we can't use it in the SolrAnalyzer On the TrimFilter, it is not clear to me that there would be a token that is ever all whitespace. However, since the test handles it, I wonder why the a Token of , when update offsets are on, reports the offsets as the end and not the start. Just a minor nit, but it seems like the start/end offsets should be 0, not the end of the token. I'm not totally sure on the WordDelimiterFilter, as there is a fair amount of new token creation, Also, I think, the newTok() method doesn't set the position increment based on the original position increment, so I added that. I'm also not completely sure how to handle FieldType DefaultAnalyzer.next(). It seems like it could reuse the token Also not sure why the duplicate code for the MultiValueTokenStream in HighlighterUtils and SolrHighlighter, so I left the highlighter TokenStreams alone. Use new Lucene Token APIs (reuse and char[] buff) - Key: SOLR-330 URL: https://issues.apache.org/jira/browse/SOLR-330 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-330.patch Lucene is getting new Token APIs for better performance. - token reuse - char[] offset + len instead of String Requires a new version of lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-281) Search Components (plugins)
[ https://issues.apache.org/jira/browse/SOLR-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564853#action_12564853 ] Yonik Seeley commented on SOLR-281: --- Followed up on solr-dev to avoid stealing more of this JIRA isse: http://www.nabble.com/Re%3A--jira--Commented%3A-Search-Components-%28plugins%29-to15227648.html#a15227648 Search Components (plugins) --- Key: SOLR-281 URL: https://issues.apache.org/jira/browse/SOLR-281 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 Attachments: SOLR-281-ComponentInit.patch, SOLR-281-ComponentInit.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, solr-281.patch, solr-281.patch, solr-281.patch A request handler with pluggable search components for things like: - standard - dismax - more-like-this - highlighting - field collapsing For more discussion, see: http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564896#action_12564896 ] Yonik Seeley commented on SOLR-139: --- I'm having second thoughts if this is a good enough approach to really put in core Solr. Requiring that all fields be stored is a really large drawback, esp for large indicies with really large documents. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: New Feature Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 Attachments: Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564910#action_12564910 ] Ryan McKinley commented on SOLR-139: that is part of why i thought having it in an update request processor makes sense -- it can easily be subclassed to pull the existing fields from whereever it needs. Even if it is directly in the UpdateHandler, there could be some interface to _loadExistingFields( id )_ or something similar. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: New Feature Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 Attachments: Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564952#action_12564952 ] Charles Hornberger commented on SOLR-236: - NegatedDocSet is throwing Unsupported Operation exceptions: org.apache.solr.common.SolrException:Unsupported Operation at org.apache.solr.search.NegatedDocSet.iterator(NegatedDocSet.java:77) at org.apache.solr.search.DocSetBase.getBits(DocSet.java:183) at org.apache.solr.search.NegatedDocSet.getBits(NegatedDocSet.java:27) at org.apache.solr.search.DocSetBase.intersection(DocSet.java:199) at org.apache.solr.search.BitDocSet.intersection(BitDocSet.java:30) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1109) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:811) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1258) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:103) at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:155) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117) at org.apache.solr.core.SolrCore.execute(SolrCore.java:902) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:275) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:595) Not quite sure what search is triggering this path thru the code, but it is not happening on every request; just some ... am firing up the debugger now to see what I can learn, but thought I'd post this anyway to see if anyone has any tips. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12564966#action_12564966 ] clh edited comment on SOLR-236 at 2/1/08 2:59 PM: - Ah ... got the beginnings of a diagnosis. The problem appears when the DocSet {{qDocSet}} returned by DocSetHitCollector.getDocSet() -- called at org.apache.solr.search.SolrIndexSearcher:1101 in trunk, or 1108 with the field_collapsing patch applied, inside getDocListAndSetNC()) -- is a BitDocSet, and not when it's a HashDocSet. As the stack trace above shows, calling intersection() on a BitDocSet object invokes the superclass' DocSetBase.intersection() method, which invokes a call chain that blows up when it hits the iterator() method of the NegatedDocSet passed in as the {{filter}} parameter to getDocListAndSetNC(); NegatedDocSet.iterator() blows up by design: {code} public DocIterator iterator() { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, Unsupported Operation); } {code} I see that DocSetBase.intersection(DocSet other) has special-casing logic for dealing with {{other}} parameters that are instances of HashDocSet; does it also need special casing logic for dealing with {{other}} parameters that are NegatedDocSets? Or should NegatedDocSet *really* implement iterator()? Or something else entirely? was (Author: clh): Ah ... got the beginnings of a diagnosis. The problem appears when the DocSet {{qDocSet}} returned by DocSetHitCollector.getDocSet() -- called at org.apache.solr.search.SolrIndexSearcher:1101 in trunk, or 1108 with the field_collapsing patch applied, inside getDocListAndSetNC()) -- is a BitDocSet, and not when it's a HashDocSet. As the stack trace above shows, calling intersection() on a BitDocSet object invokes the superclass' DocSetBase.intersection() method, which invokes a call chain that blows up when it hits the iterator() method of the NegatedDocSet passed in as the {{filter}} parameter to getDocListAndSetNC(); NegatedDocSet.iterator() blows up by design: {{ public DocIterator iterator() { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, Unsupported Operation); } }} I see that DocSetBase.intersection(DocSet other) has special-casing logic for dealing with {{other}} parameters that are instances of HashDocSet; does it also need special casing logic for dealing with {{other}} parameters that are NegatedDocSets? Or should NegatedDocSet *really* implement iterator()? Or something else entirely? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12565019#action_12565019 ] Yonik Seeley commented on SOLR-236: --- I haven't been following this, so I don't know why there is a need for a NegatedDocSet (or if introducing it is the best solution), but it looks like you have two cases to handle: one negative set or two negative sets. If you have a and -b, then return a.andNot(b) if both a and b are negative (-a.intersection(-b)) then return NegatedDocSet(a.union(b)) // per De Morgan, -a-b == -(a|b) That's only for intersection() of course. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-342: - Attachment: SOLR-342.patch Updated to work against trunk. As always, let me know if there is anything I can do to help get this committed. Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.