[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there
[ https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797572#action_12797572 ] Noble Paul commented on SOLR-1602: -- bq.but there have also been some threads out there in the past pointing out that using FQNs can speed up core initialization This is resolved SOLR-921 Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there --- Key: SOLR-1602 URL: https://issues.apache.org/jira/browse/SOLR-1602 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 1.2, 1.3, 1.4 Environment: independent of environment (code structure) Reporter: Chris A. Mattmann Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1602.Mattmann.112509.patch.txt, SOLR-1602.Mattmann.112509_02.patch.txt, upgrade_solr_config Currently all o.a.solr.request.QueryResponseWriter implementations are curiously located in the o.a.solr.request package. Not only is this package getting big (30+ classes), a lot of them are misplaced. There should be a first-class o.a.solr.response package, and the response related classes should be given a home there. Patch forthcoming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797601#action_12797601 ] Paul taylor commented on SOLR-1653: --- Hi, Im using in non Solr in an analyser, and think there maybe a performance issue because you cannot pass a compiled Pattern. In the reusableTokenStream() method you cannot reset a charfilter like you can a tokenizer so it as to recompile the pattern everytime i.e. public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException { SavedStreams streams = (SavedStreams)getPreviousTokenStream(); if (streams == null) { streams = new SavedStreams(); setPreviousTokenStream(streams); streams.tokenStream = new StandardTokenizer(Version.LUCENE_CURRENT,new PatternReplaceCharFilter((no\\.) ([0-9]+),$1$2,reader)); streams.filteredTokenStream = new StandardFilter(streams.filteredTokenStream); streams.filteredTokenStream = new AccentFilter(streams.filteredTokenStream); streams.filteredTokenStream = new LowercaseFilter(streams.filteredTokenStream); } else { streams.tokenStream.reset(new PatternReplaceCharFilter((no\\.) ([0-9]+),$1$2,reader)); } return streams.filteredTokenStream; } add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch, SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1708) Allowing import / update of a specific document using the data import handler
[ https://issues.apache.org/jira/browse/SOLR-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Lachinger updated SOLR-1708: -- Attachment: 02-single-update.patch Allowing import / update of a specific document using the data import handler - Key: SOLR-1708 URL: https://issues.apache.org/jira/browse/SOLR-1708 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Simon Lachinger Attachments: 02-single-update.patch There is the need that changes or new documents need to be added immediately to the Solr Index. This could easily done via the update-handler - however, when using the DataImportHandler it shouldn't be necessary to specify the data extraction for the the DataImportHandler and also do it by feeding it to into the update-handler. It should be centralized. Having to run delta query, identifying the changes, for changes where the ID's of the updated documents are already known to the application is a rather costly (in terms of database load) way to solve this. The attached patch allows to specify one or more query parameters for the delta-import command, named 'root-pk', which allow to specify the document(s) to be updated or added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1708) Allowing import / update of a specific document using the data import handler
Allowing import / update of a specific document using the data import handler - Key: SOLR-1708 URL: https://issues.apache.org/jira/browse/SOLR-1708 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: Simon Lachinger Attachments: 02-single-update.patch There is the need that changes or new documents need to be added immediately to the Solr Index. This could easily done via the update-handler - however, when using the DataImportHandler it shouldn't be necessary to specify the data extraction for the the DataImportHandler and also do it by feeding it to into the update-handler. It should be centralized. Having to run delta query, identifying the changes, for changes where the ID's of the updated documents are already known to the application is a rather costly (in terms of database load) way to solve this. The attached patch allows to specify one or more query parameters for the delta-import command, named 'root-pk', which allow to specify the document(s) to be updated or added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer
While inserting a large pile of documents using StreamingUpdateSolrServer I've found a race condition as all Runner instances stopped while the blocking queue was full. The attached patch solves the problem, to minify it all indentation has been removed. Index: src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java === --- src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java (revision 888167) +++ src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java (working copy) @@ -82,6 +82,7 @@ log.info( starting runner: {} , this ); PostMethod method = null; try { +do { RequestEntity request = new RequestEntity() { // we don't know the length public long getContentLength() { return -1; } @@ -142,6 +143,7 @@ msg.append( request: +method.getURI() ); handleError( new Exception( msg.toString() ) ); } +} while( ! queue.isEmpty()); } catch (Throwable e) { handleError( e ); @@ -149,6 +151,7 @@ finally { try { // make sure to release the connection + if(method != null) method.releaseConnection(); } catch( Exception ex ){} @@ -195,11 +198,11 @@ queue.put( req ); +synchronized( runners ) { if( runners.isEmpty() || (queue.remainingCapacity() queue.size() runners.size() threadCount) ) { -synchronized( runners ) { Runner r = new Runner(); scheduler.execute( r ); runners.add( r ); === This patch has been tested with millions of document inserted to Solr, before that I was unable to inject all of our documents as the following scenario happened. We have a BlockingQueue called runners to handle requests, at one point the queue was emptied by the Runner threads, they all stopped processing new items but sent the collected items to Solr. Solr was busy so that toke a long time, during that the client filled the queue again. As all worker threads were instantiated there were no way to create new Runners to handle the queue so it was growing to upper limit. When the next item was about to put into the queue it was blocked and the race condition just happened. Patch 1, 2: Inside the Runner.run method I've added a do while loop to prevent the Runner to quit while there are new requests, this handles the problem of new requests added while Runner is sending the previous batch. Patch 3 Validity check of method variable is not strictly necessary, just a code clean up. Patch 4 The last part of the patch is to move synchronized outside of conditional to avoid a situation where runners change while evaluating it. Your comments and critique are welcome! Attila
[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors
[ https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797666#action_12797666 ] Grant Ingersoll commented on SOLR-1680: --- Why not broaden this and allow people to pass in their own collectors? Also, can you explain a bit more the use case specifically for Field Collapse? Alternatively, given something like LUCENE-2127, we may want Solr to be able to make query time decisions about what Collector to use. Provide an API to specify custom Collectors --- Key: SOLR-1680 URL: https://issues.apache.org/jira/browse/SOLR-1680 Project: Solr Issue Type: Sub-task Components: search Affects Versions: 1.3 Reporter: Martijn van Groningen Fix For: 1.5 Attachments: field-collapse-core.patch, SOLR-1680.patch The issue is dedicated to incorporate fieldcollapse's changes to the Solr's core code. We want to make it possible for components to specify custom Collectors in SolrIndexSearcher methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there
[ https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797677#action_12797677 ] Ryan McKinley commented on SOLR-1602: - | .Besides which: even if it's just an example it would be pretty shitty to break that example in the very next release. Agreed -- we will make sure old FQNs work (until the next major release), but moving forward, we should remove FQN from schema.xml so this is less of an issue in the future. Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there --- Key: SOLR-1602 URL: https://issues.apache.org/jira/browse/SOLR-1602 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 1.2, 1.3, 1.4 Environment: independent of environment (code structure) Reporter: Chris A. Mattmann Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1602.Mattmann.112509.patch.txt, SOLR-1602.Mattmann.112509_02.patch.txt, upgrade_solr_config Currently all o.a.solr.request.QueryResponseWriter implementations are curiously located in the o.a.solr.request package. Not only is this package getting big (30+ classes), a lot of them are misplaced. There should be a first-class o.a.solr.response package, and the response related classes should be given a home there. Patch forthcoming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there
[ https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797678#action_12797678 ] Ryan McKinley commented on SOLR-1602: - Nobel, this issue is assigned to you? Do you want to take care of it? If not I can... Patches won't work well since it will be a few steps in svn to make sure the history is maintained: 1. svn move the files to a new location, update references etc 2. commit 3. add stub files in the location where the old files were 4. commit Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there --- Key: SOLR-1602 URL: https://issues.apache.org/jira/browse/SOLR-1602 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 1.2, 1.3, 1.4 Environment: independent of environment (code structure) Reporter: Chris A. Mattmann Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1602.Mattmann.112509.patch.txt, SOLR-1602.Mattmann.112509_02.patch.txt, upgrade_solr_config Currently all o.a.solr.request.QueryResponseWriter implementations are curiously located in the o.a.solr.request package. Not only is this package getting big (30+ classes), a lot of them are misplaced. There should be a first-class o.a.solr.response package, and the response related classes should be given a home there. Patch forthcoming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1707) Use google collections immutable collections instead of Collections.unmodifiable**
[ https://issues.apache.org/jira/browse/SOLR-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797687#action_12797687 ] Yonik Seeley commented on SOLR-1707: True immutability? What's that mean over Collections.unmodifiableMap()? And how do we know these are faster or more memory efficient? Use google collections immutable collections instead of Collections.unmodifiable** -- Key: SOLR-1707 URL: https://issues.apache.org/jira/browse/SOLR-1707 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1707.patch google collections offer true immutability and more memory efficiency -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1707) Use google collections immutable collections instead of Collections.unmodifiable**
[ https://issues.apache.org/jira/browse/SOLR-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797713#action_12797713 ] Yonik Seeley commented on SOLR-1707: OK, I whipped up a quick test with String keys, many small maps (anywhere from 1 to 20 keys per map). Java6 -server 64 bit, Win7_x64 Size: Collections.unmodifiableMap: 7.4% bigger than HashMap google immutable map: 22.4% bigger than HashMap Speed: Collections.unmodifiableMap: 4.2% slower than HashMap google immutable map: 26.0% slower than HashMap For best space and speed, looks like we should stick with straight HashMap. Use google collections immutable collections instead of Collections.unmodifiable** -- Key: SOLR-1707 URL: https://issues.apache.org/jira/browse/SOLR-1707 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.5 Attachments: SOLR-1707.patch google collections offer true immutability and more memory efficiency -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797716#action_12797716 ] Patrick Jungermann commented on SOLR-236: - Hi all, we using the Solr's trunk with the latest patch of {{2009-12-24 09:54 AM}}. Within the index, there are ~3.5 million documents with string-based identifiers of a length up to 50 chars. The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option {{collapse.maxdocs=150}} and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset. Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with {{searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent/}}. Without setting the option {{collapse.field}}, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled). Additionally it might be very useful, if the parameter {{collapse=true|false}} would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be conveniently used by other components (highlighting, faceting, ...). Patrick Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors
[ https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797771#action_12797771 ] Shalin Shekhar Mangar commented on SOLR-1680: - bq. Why not broaden this and allow people to pass in their own collectors? Yes, that is the general idea, though it would be API driven than configuration. Any component should be able to pass a Collector to the various SolrIndexSearcher methods. bq. Also, can you explain a bit more the use case specifically for Field Collapse? Field Collapsing needs to use a custom collector. Right now the collector is hard coded inside SolrIndexSearcher. bq. Alternatively, given something like LUCENE-2127, we may want Solr to be able to make query time decisions about what Collector to use. I guess that decision should be made by QueryComponent? If so, then the ability to pass a custom Collector to SolrIndexSearcher methods should be enough. Provide an API to specify custom Collectors --- Key: SOLR-1680 URL: https://issues.apache.org/jira/browse/SOLR-1680 Project: Solr Issue Type: Sub-task Components: search Affects Versions: 1.3 Reporter: Martijn van Groningen Fix For: 1.5 Attachments: field-collapse-core.patch, SOLR-1680.patch The issue is dedicated to incorporate fieldcollapse's changes to the Solr's core code. We want to make it possible for components to specify custom Collectors in SolrIndexSearcher methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors
[ https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797774#action_12797774 ] patrick o'leary commented on SOLR-1680: --- We've just done something like this recently and found the simplest was was to modify ResponseBuilder with setCustomCollector / getCustomCollector, update the QueryCommand to include the custom collector. It gets sticky in the SolrIndexSearcher with caching, and IIRC about 4 places to call the collector, the solution works, but is not in anyway elegant. It would be good to see if we could refactor SolrIndexSearcher first to make it more streamlined. Provide an API to specify custom Collectors --- Key: SOLR-1680 URL: https://issues.apache.org/jira/browse/SOLR-1680 Project: Solr Issue Type: Sub-task Components: search Affects Versions: 1.3 Reporter: Martijn van Groningen Fix For: 1.5 Attachments: field-collapse-core.patch, SOLR-1680.patch The issue is dedicated to incorporate fieldcollapse's changes to the Solr's core code. We want to make it possible for components to specify custom Collectors in SolrIndexSearcher methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797794#action_12797794 ] Martijn van Groningen commented on SOLR-236: bq. The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option collapse.maxdocs=150 and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset. The collapse.maxdocs aborts collapsing after the threshold is met, but it is doing that based on the uncollapsed docset which is not sorted in any way. The result of that is that documents that would normally appear in the first page don't appear at all in the search result. Eventually the collapse component uses the collapsed docset as the result set and not the uncollapsed docset. bq. Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent/. Without setting the option collapse.field, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled). I agree it gets huge. This applies for both the filterCache and field collapse cache. This is something that has to be addressed and certainly will in the new field-collapse implementation. In the patch you're using too much is being cached (some data can even be neglected in the cache). Also in some cases strings are being cached that actually could be replaced with hashcodes. bq. Additionally it might be very useful, if the parameter collapse=true|false would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be conveniently used by other components (highlighting, faceting, ...). That actually makes sense for using the collapse.enable parameter again in the patch. Martijn Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797794#action_12797794 ] Martijn van Groningen edited comment on SOLR-236 at 1/7/10 9:28 PM: bq. The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option collapse.maxdocs=150 and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset. The collapse.maxdocs aborts collapsing after the threshold is met, but it is doing that based on the uncollapsed docset which is not sorted in any way. The result of that is that documents that would normally appear in the first page don't appear at all in the search result. Eventually the collapse component uses the collapsed docset as the result set and not the uncollapsed docset. bq. Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent/. Without setting the option collapse.field, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled). I agree it gets huge. This applies for both the filterCache and field collapse cache. This is something that has to be addressed and certainly will in the new field-collapse implementation. In the patch you're using too much is being cached (some data can even be neglected in the cache). Also in some cases strings are being cached that actually could be replaced with hashcodes. bq. Additionally it might be very useful, if the parameter collapse=true|false would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be conveniently used by other components (highlighting, faceting, ...). That actually makes sense for using the collapse.enable parameter again in the patch. Martijn was (Author: martijn): bq. The result document of our prefix query, which was at position 1 without collapsing, was with collapsing not even within the top 10 results. We using the option collapse.maxdocs=150 and after changing this option to the value 15000, the results seem to be as expected. Because of that, we concluded, that there has to be a problem with the sorting of the uncollapsed docset. The collapse.maxdocs aborts collapsing after the threshold is met, but it is doing that based on the uncollapsed docset which is not sorted in any way. The result of that is that documents that would normally appear in the first page don't appear at all in the search result. Eventually the collapse component uses the collapsed docset as the result set and not the uncollapsed docset. bq. Also, we noticed a huge memory leak problem, when using collapsing. We configured the component with searchComponent name=query class=org.apache.solr.handler.component.CollapseComponent/. Without setting the option collapse.field, it works normally, there are far no memory problems. If requests with enabled collapsing are received by the Solr server, the whole memory (oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests. By using a profiler, we noticed that the filterCache was extraordinary large. We supposed that there could be a caching problem (collapeCache was not enabled). I agree it gets huge. This applies for both the filterCache and field collapse cache. This is something that has to be addressed and certainly will in the new field-collapse implementation. In the patch you're using too much is being cached (some data can even be neglected in the cache). Also in some cases strings are being cached that actually could be replaced with hashcodes. bq. Additionally it might be very useful, if the parameter collapse=true|false would work again and could be used to enabled/disable the collapsing functionality. Currently, the existence of a field choosen for collapsing enables this feature and there is no possibility to configure the fields for collapsing within the request handlers. With that, we could configure it and only enable/disable it within the requests like it will be
Re: Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer
can you submit a patch to JIRA? On Jan 7, 2010, at 10:23 AM, Attila Babo wrote: While inserting a large pile of documents using StreamingUpdateSolrServer I've found a race condition as all Runner instances stopped while the blocking queue was full. The attached patch solves the problem, to minify it all indentation has been removed. Index: src/solrj/org/apache/solr/client/solrj/impl/ StreamingUpdateSolrServer.java === --- src/solrj/org/apache/solr/client/solrj/impl/ StreamingUpdateSolrServer.java (revision 888167) +++ src/solrj/org/apache/solr/client/solrj/impl/ StreamingUpdateSolrServer.java (working copy) @@ -82,6 +82,7 @@ log.info( starting runner: {} , this ); PostMethod method = null; try { +do { RequestEntity request = new RequestEntity() { // we don't know the length public long getContentLength() { return -1; } @@ -142,6 +143,7 @@ msg.append( request: +method.getURI() ); handleError( new Exception( msg.toString() ) ); } +} while( ! queue.isEmpty()); } catch (Throwable e) { handleError( e ); @@ -149,6 +151,7 @@ finally { try { // make sure to release the connection + if(method != null) method.releaseConnection(); } catch( Exception ex ){} @@ -195,11 +198,11 @@ queue.put( req ); +synchronized( runners ) { if( runners.isEmpty() || (queue.remainingCapacity() queue.size() runners.size() threadCount) ) { -synchronized( runners ) { Runner r = new Runner(); scheduler.execute( r ); runners.add( r ); === This patch has been tested with millions of document inserted to Solr, before that I was unable to inject all of our documents as the following scenario happened. We have a BlockingQueue called runners to handle requests, at one point the queue was emptied by the Runner threads, they all stopped processing new items but sent the collected items to Solr. Solr was busy so that toke a long time, during that the client filled the queue again. As all worker threads were instantiated there were no way to create new Runners to handle the queue so it was growing to upper limit. When the next item was about to put into the queue it was blocked and the race condition just happened. Patch 1, 2: Inside the Runner.run method I've added a do while loop to prevent the Runner to quit while there are new requests, this handles the problem of new requests added while Runner is sending the previous batch. Patch 3 Validity check of method variable is not strictly necessary, just a code clean up. Patch 4 The last part of the patch is to move synchronized outside of conditional to avoid a situation where runners change while evaluating it. Your comments and critique are welcome! Attila
[jira] Updated: (SOLR-1698) load balanced distributed search
[ https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-1698: --- Attachment: SOLR-1698.patch Attaching new patch, still limited to LBHttpSolrServer at this point. - includes tests - adds a new expert-level API: public Rsp request(Req req) throws SolrServerException, IOException I chose objects (Rsp and Req) since I imagine we will need to continue to add new parameters and controls to both the request and the response (esp the request... things like timeout, max number of servers to query, etc). The Rsp also contains info about which server returned the response and will allow us to stick with the same server for all phases of a distributed request. - adds the concept of standard servers (those provided by the constructor or addServer)... a server on the zombie list that isn't a standard server won't be added to the alive list if it wakes up, and will not be pinged forever. load balanced distributed search Key: SOLR-1698 URL: https://issues.apache.org/jira/browse/SOLR-1698 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Attachments: SOLR-1698.patch, SOLR-1698.patch Provide syntax and implementation of load-balancing across shard replicas. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge resolved SOLR-1672. Resolution: Fixed Marking as resolved. RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor Attachments: SOLR-1672.patch Original Estimate: 0h Remaining Estimate: 0h As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSetLong in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize); + final BoundedTreeSetLong queue = (sort.equals(count) || sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new BoundedTreeSetLong(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSetLong(maxsize)) : null; SimpleFacets.java [line 221] A getFieldParam(field, facet.sortorder, asc); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1706) wrong tokens output from WordDelimiterFilter when english possessives are in the text
[ https://issues.apache.org/jira/browse/SOLR-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797829#action_12797829 ] Robert Muir commented on SOLR-1706: --- its not just the concatenation, but also the subword generation. In the case below, Autocoder should not be emitted, as only numeric subword generation is turned on. {code} public void test128() throws Exception { assertWdf(word 1234 Super-Duper-XL500-42-Autocoder x'sbd123 a4b3c-, 0,1,0,0,0,0,0,0,0, null, new String[] { word, 1234, 42, Autocoder, a4b3c }, new int[] { 0, 5, 28, 31, 50 }, new int[] { 4, 9, 30, 40, 55 }, new int[] { 1, 1, 1, 1, 2 }); } {code} wrong tokens output from WordDelimiterFilter when english possessives are in the text - Key: SOLR-1706 URL: https://issues.apache.org/jira/browse/SOLR-1706 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir the WordDelimiterFilter english possessive stemming 's removal (on by default) unfortunately causes strange behavior: below you can see that when I have requested to only output numeric concatenations (not words), these english possessive stems are still sometimes output, ignoring the options i have provided, and even then, in a very inconsistent way. {code} assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null, new String[] { 42, AutoCoder }, new int[] { 18, 21 }, new int[] { 20, 30 }, new int[] { 1, 1 }); assertWdf(Super-Duper-XL500-42-AutoCoder's-56, 0,0,0,1,0,0,0,0,1, null, new String[] { 42, AutoCoder, 56 }, new int[] { 18, 21, 33 }, new int[] { 20, 30, 35 }, new int[] { 1, 1, 1 }); assertWdf(Super-Duper-XL500-AB-AutoCoder's, 0,0,0,1,0,0,0,0,1, null, new String[] { }, new int[] { }, new int[] { }, new int[] { }); assertWdf(Super-Duper-XL500-42-AutoCoder's-BC, 0,0,0,1,0,0,0,0,1, null, new String[] { 42 }, new int[] { 18 }, new int[] { 20 }, new int[] { 1 }); {code} where assertWdf is {code} void assertWdf(String text, int generateWordParts, int generateNumberParts, int catenateWords, int catenateNumbers, int catenateAll, int splitOnCaseChange, int preserveOriginal, int splitOnNumerics, int stemEnglishPossessive, CharArraySet protWords, String expected[], int startOffsets[], int endOffsets[], String types[], int posIncs[]) throws IOException { TokenStream ts = new WhitespaceTokenizer(new StringReader(text)); WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts, generateNumberParts, catenateWords, catenateNumbers, catenateAll, splitOnCaseChange, preserveOriginal, splitOnNumerics, stemEnglishPossessive, protWords); assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types, posIncs); } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1706) wrong tokens output from WordDelimiterFilter depending upon options
[ https://issues.apache.org/jira/browse/SOLR-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-1706: -- Description: below you can see that when I have requested to only output numeric concatenations (not words), some words are still sometimes output, ignoring the options i have provided, and even then, in a very inconsistent way. {code} assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null, new String[] { 42, AutoCoder }, new int[] { 18, 21 }, new int[] { 20, 30 }, new int[] { 1, 1 }); assertWdf(Super-Duper-XL500-42-AutoCoder's-56, 0,0,0,1,0,0,0,0,1, null, new String[] { 42, AutoCoder, 56 }, new int[] { 18, 21, 33 }, new int[] { 20, 30, 35 }, new int[] { 1, 1, 1 }); assertWdf(Super-Duper-XL500-AB-AutoCoder's, 0,0,0,1,0,0,0,0,1, null, new String[] { }, new int[] { }, new int[] { }, new int[] { }); assertWdf(Super-Duper-XL500-42-AutoCoder's-BC, 0,0,0,1,0,0,0,0,1, null, new String[] { 42 }, new int[] { 18 }, new int[] { 20 }, new int[] { 1 }); {code} where assertWdf is {code} void assertWdf(String text, int generateWordParts, int generateNumberParts, int catenateWords, int catenateNumbers, int catenateAll, int splitOnCaseChange, int preserveOriginal, int splitOnNumerics, int stemEnglishPossessive, CharArraySet protWords, String expected[], int startOffsets[], int endOffsets[], String types[], int posIncs[]) throws IOException { TokenStream ts = new WhitespaceTokenizer(new StringReader(text)); WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts, generateNumberParts, catenateWords, catenateNumbers, catenateAll, splitOnCaseChange, preserveOriginal, splitOnNumerics, stemEnglishPossessive, protWords); assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types, posIncs); } {code} was: the WordDelimiterFilter english possessive stemming 's removal (on by default) unfortunately causes strange behavior: below you can see that when I have requested to only output numeric concatenations (not words), these english possessive stems are still sometimes output, ignoring the options i have provided, and even then, in a very inconsistent way. {code} assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null, new String[] { 42, AutoCoder }, new int[] { 18, 21 }, new int[] { 20, 30 }, new int[] { 1, 1 }); assertWdf(Super-Duper-XL500-42-AutoCoder's-56, 0,0,0,1,0,0,0,0,1, null, new String[] { 42, AutoCoder, 56 }, new int[] { 18, 21, 33 }, new int[] { 20, 30, 35 }, new int[] { 1, 1, 1 }); assertWdf(Super-Duper-XL500-AB-AutoCoder's, 0,0,0,1,0,0,0,0,1, null, new String[] { }, new int[] { }, new int[] { }, new int[] { }); assertWdf(Super-Duper-XL500-42-AutoCoder's-BC, 0,0,0,1,0,0,0,0,1, null, new String[] { 42 }, new int[] { 18 }, new int[] { 20 }, new int[] { 1 }); {code} where assertWdf is {code} void assertWdf(String text, int generateWordParts, int generateNumberParts, int catenateWords, int catenateNumbers, int catenateAll, int splitOnCaseChange, int preserveOriginal, int splitOnNumerics, int stemEnglishPossessive, CharArraySet protWords, String expected[], int startOffsets[], int endOffsets[], String types[], int posIncs[]) throws IOException { TokenStream ts = new WhitespaceTokenizer(new StringReader(text)); WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts, generateNumberParts, catenateWords, catenateNumbers, catenateAll, splitOnCaseChange, preserveOriginal, splitOnNumerics, stemEnglishPossessive, protWords); assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types, posIncs); } {code} Summary: wrong tokens output from WordDelimiterFilter depending upon options (was: wrong tokens output from WordDelimiterFilter when english possessives are in the text) wrong tokens output from WordDelimiterFilter depending upon options --- Key: SOLR-1706 URL: https://issues.apache.org/jira/browse/SOLR-1706 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir below you can see that when I have requested to only output numeric concatenations (not words), some words are still sometimes output, ignoring the options i have provided, and even then, in a very inconsistent way. {code} assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null, new String[] { 42, AutoCoder }, new int[] { 18, 21 }, new int[] { 20, 30 }, new int[] { 1, 1 }); assertWdf(Super-Duper-XL500-42-AutoCoder's-56,
[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent
[ https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797841#action_12797841 ] Koji Sekiguchi commented on SOLR-1696: -- Noble, thank you for opening this and attaching the patch! Are you planning to commit this shortly? because I'm ready to commit SOLR-1268 that is using old style config. If you commit it, I'll rewrite SOLR-1268. Or I can assign SOLR-1696 to me. Deprecate old highlighting syntax and move configuration to HighlightComponent Key: SOLR-1696 URL: https://issues.apache.org/jira/browse/SOLR-1696 Project: Solr Issue Type: Improvement Components: highlighter Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1696.patch There is no reason why we should have a custom syntax for highlighter configuration. It can be treated like any other SearchComponent and all the configuration can go in there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1709) Distributed Date Faceting
Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797898#action_12797898 ] Jason Rutherglen commented on SOLR-1709: Tim, Thanks for the patch... bq. as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). TortoiseSVN works well on Windows, even for creating patches. Have you tried it? Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Solr-trunk #1024
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1024/changes