[jira] [Commented] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052362#comment-13052362 ] Simon Willnauer commented on LUCENE-3223: - bq. Simple patch fixing the problem. Do I need a CHANGES entry for trivial things like this? looks good, I don't think we need a changes entry for this. go ahead and commit! SearchWithSortTask ignores sorting by Doc - Key: LUCENE-3223 URL: https://issues.apache.org/jira/browse/LUCENE-3223 Project: Lucene - Java Issue Type: Bug Components: modules/benchmark Reporter: Chris Male Assignee: Chris Male Priority: Minor Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch During my work in LUCENE-3912, I found the following code: {code} if (field.equals(doc)) { sortField0 = SortField.FIELD_DOC; } if (field.equals(score)) { sortField0 = SortField.FIELD_SCORE; } ... {code} This means the setting of SortField.FIELD_DOC is ignored. While I don't know much about this code, this seems like a valid setting and obviously just a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052368#comment-13052368 ] Simon Willnauer commented on LUCENE-3219: - looks good to me. BTW. should we backport those changes? Change SortField types to an Enum - Key: LUCENE-3219 URL: https://issues.apache.org/jira/browse/LUCENE-3219 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Chris Male Assignee: Chris Male Priority: Minor Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch When updating my SOLR-2533 patch, one issue was that the int value I had given my new type had been used by another change in the mean time. Since we don't use these fields in a bitset kind of way, we can convert them to an enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052376#comment-13052376 ] Dawid Weiss commented on LUCENE-2341: - Thanks for the contribution, Michał. Robert: the dictionary is licensed under MPL or CC-SA (to be selected by the user depending on one's needs). Do you know which one is preferable over another? Michał: there is also another (much larger) dictionary that has been released recently and comes from the Morfeusz project. http://sgjp.pl/morfeusz/dopobrania.html This dictionary is actually licensed under BSD license, so no legal worries at all. Both dictionaries are nearly identical (they differ slightly in their convention of morphosyntactic annotations) and Morfeusz's dictionary could be compiled into an automaton for use with Morfologik. Which way should we go? What do you think? explore morfologik integration -- Key: LUCENE-2341 URL: https://issues.apache.org/jira/browse/LUCENE-2341 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer available: http://sourceforge.net/projects/morfologik/ This works differently than LUCENE-2298, and ideally would be another option for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052377#comment-13052377 ] Chris Male commented on LUCENE-3219: You'll have to guide me on the backwards compat issue since this is a break due to the fields being public and some methods changing from returning int to returning SortField.Type. Change SortField types to an Enum - Key: LUCENE-3219 URL: https://issues.apache.org/jira/browse/LUCENE-3219 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Chris Male Assignee: Chris Male Priority: Minor Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch When updating my SOLR-2533 patch, one issue was that the int value I had given my new type had been used by another change in the mean time. Since we don't use these fields in a bitset kind of way, we can convert them to an enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052380#comment-13052380 ] Dawid Weiss commented on LUCENE-2341: - I'll take a look at the differences between Morfologik and Morfeusz right now, actually. I'll post the results once I have something. explore morfologik integration -- Key: LUCENE-2341 URL: https://issues.apache.org/jira/browse/LUCENE-2341 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer available: http://sourceforge.net/projects/morfologik/ This works differently than LUCENE-2298, and ideally would be another option for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male resolved LUCENE-3223. Resolution: Fixed Committed revision 1137882. SearchWithSortTask ignores sorting by Doc - Key: LUCENE-3223 URL: https://issues.apache.org/jira/browse/LUCENE-3223 Project: Lucene - Java Issue Type: Bug Components: modules/benchmark Reporter: Chris Male Assignee: Chris Male Priority: Minor Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch During my work in LUCENE-3912, I found the following code: {code} if (field.equals(doc)) { sortField0 = SortField.FIELD_DOC; } if (field.equals(score)) { sortField0 = SortField.FIELD_SCORE; } ... {code} This means the setting of SortField.FIELD_DOC is ignored. While I don't know much about this code, this seems like a valid setting and obviously just a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3223: --- Fix Version/s: 4.0 SearchWithSortTask ignores sorting by Doc - Key: LUCENE-3223 URL: https://issues.apache.org/jira/browse/LUCENE-3223 Project: Lucene - Java Issue Type: Bug Components: modules/benchmark Reporter: Chris Male Assignee: Chris Male Priority: Minor Fix For: 4.0 Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch During my work in LUCENE-3912, I found the following code: {code} if (field.equals(doc)) { sortField0 = SortField.FIELD_DOC; } if (field.equals(score)) { sortField0 = SortField.FIELD_SCORE; } ... {code} This means the setting of SortField.FIELD_DOC is ignored. While I don't know much about this code, this seems like a valid setting and obviously just a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052388#comment-13052388 ] Uwe Schindler commented on LUCENE-3223: --- Thanks, nice catch! SearchWithSortTask ignores sorting by Doc - Key: LUCENE-3223 URL: https://issues.apache.org/jira/browse/LUCENE-3223 Project: Lucene - Java Issue Type: Bug Components: modules/benchmark Reporter: Chris Male Assignee: Chris Male Priority: Minor Fix For: 4.0 Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch During my work in LUCENE-3912, I found the following code: {code} if (field.equals(doc)) { sortField0 = SortField.FIELD_DOC; } if (field.equals(score)) { sortField0 = SortField.FIELD_SCORE; } ... {code} This means the setting of SortField.FIELD_DOC is ignored. While I don't know much about this code, this seems like a valid setting and obviously just a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052390#comment-13052390 ] Uwe Schindler commented on LUCENE-3219: --- At the end of the day, I am sure I will vote to leave it as it is in 3.x! SortField is heavy-used in Lucene client code and the backwards breaks without very sophisticated backwards layers are horrible to handle. It can be done, but I dont think its worth the work just for code beauty. Change SortField types to an Enum - Key: LUCENE-3219 URL: https://issues.apache.org/jira/browse/LUCENE-3219 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Chris Male Assignee: Chris Male Priority: Minor Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch When updating my SOLR-2533 patch, one issue was that the int value I had given my new type had been used by another change in the mean time. Since we don't use these fields in a bitset kind of way, we can convert them to an enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052394#comment-13052394 ] Noble Paul commented on SOLR-2382: -- At least the BDB based cache will have to go to a different issue. DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change. General Notes: We are near completion in converting our search functionality from a legacy search engine to Solr. However, I found that DIH did not support caching to the level of our prior product's data import utility. In order to get our data into Solr, I created these caching enhancements. Because I believe this has broad application, and because we would like this feature to be supported by the Community, I have front-ported this, enhanced, to Trunk. I have also added unit tests and verified that all
[jira] [Commented] (SOLR-2598) exampledocs/books.json should use name instead of title
[ https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052398#comment-13052398 ] Jan Høydahl commented on SOLR-2598: --- Planning for this to be my second commit to Lucene :) What do you think? exampledocs/books.json should use name instead of title --- Key: SOLR-2598 URL: https://issues.apache.org/jira/browse/SOLR-2598 Project: Solr Issue Type: Improvement Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Fix For: 3.3 Attachments: SOLR-2598.patch The file exampledocs/books.json currently contains two books. But they do not show up in the default solr/browse interface because they use title instead of name, which the Velocity template does not show. Also we should include a few more books -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2489) Remove old lucene.apache.org/solr/who page
[ https://issues.apache.org/jira/browse/SOLR-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052399#comment-13052399 ] Jan Høydahl commented on SOLR-2489: --- I plan to delete this old defunct page and commit shortly. Agree? Remove old lucene.apache.org/solr/who page -- Key: SOLR-2489 URL: https://issues.apache.org/jira/browse/SOLR-2489 Project: Solr Issue Type: Bug Affects Versions: 3.1, 3.2 Reporter: Jan Høydahl Priority: Minor Fix For: 3.3 In the distribution, docs/who.html is old - refers to the old Solr committers list at http://lucene.apache.org/solr/who Fix would be to simply delete the old page -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2489) Remove old lucene.apache.org/solr/who page
[ https://issues.apache.org/jira/browse/SOLR-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned SOLR-2489: - Assignee: Jan Høydahl Remove old lucene.apache.org/solr/who page -- Key: SOLR-2489 URL: https://issues.apache.org/jira/browse/SOLR-2489 Project: Solr Issue Type: Bug Affects Versions: 3.1, 3.2 Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Fix For: 3.3 In the distribution, docs/who.html is old - refers to the old Solr committers list at http://lucene.apache.org/solr/who Fix would be to simply delete the old page -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2599) FieldCopy Update Processor
[ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned SOLR-2599: - Assignee: Jan Høydahl FieldCopy Update Processor -- Key: SOLR-2599 URL: https://issues.apache.org/jira/browse/SOLR-2599 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Jan Høydahl Need an UpdateProcessor which can copy and move fields -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR
[ https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052400#comment-13052400 ] Jan Høydahl commented on SOLR-2487: --- Objections to choosing to parameterize the build like Hoss suggests? Do not include slf4j-jdk14 jar in WAR - Key: SOLR-2487 URL: https://issues.apache.org/jira/browse/SOLR-2487 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.2, 4.0 Reporter: Jan Høydahl Labels: logging, slf4j I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help newbies get up and running. But I find myself re-packaging the war for every customer when adapting to their choice of logger framework, which is counter-productive. It would be sufficient to have the jdk-logging binding in example/lib to let the example and tutorial still work OOTB but as soon as you deploy solr.war to production you're forced to explicitly decide what logging to use. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052402#comment-13052402 ] Chris Male commented on LUCENE-3219: For the reasons described above, I think its best we don't backport this change. Uwe, is the work here compatible with what you had planned in LUCENE-3192? If so, I'll go ahead and commit this. Change SortField types to an Enum - Key: LUCENE-3219 URL: https://issues.apache.org/jira/browse/LUCENE-3219 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Chris Male Assignee: Chris Male Priority: Minor Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch When updating my SOLR-2533 patch, one issue was that the int value I had given my new type had been used by another change in the mean time. Since we don't use these fields in a bitset kind of way, we can convert them to an enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3219) Change SortField types to an Enum
[ https://issues.apache.org/jira/browse/LUCENE-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052404#comment-13052404 ] Uwe Schindler commented on LUCENE-3219: --- Just commit this, the other issue is quite unrelated, I just had same idea. Change SortField types to an Enum - Key: LUCENE-3219 URL: https://issues.apache.org/jira/browse/LUCENE-3219 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Chris Male Assignee: Chris Male Priority: Minor Attachments: LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch, LUCENE-3219.patch When updating my SOLR-2533 patch, one issue was that the int value I had given my new type had been used by another change in the mean time. Since we don't use these fields in a bitset kind of way, we can convert them to an enum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned SOLR-2458: - Assignee: Jan Høydahl post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: post.jar Fix For: 3.3 Attachments: SOLR-2458.patch, SOLR-2458.patch SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052407#comment-13052407 ] Jan Høydahl commented on SOLR-2458: --- Has anyone got around to inspecting this patch? I'd like to get this into 3.3. post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: post.jar Fix For: 3.3 Attachments: SOLR-2458.patch, SOLR-2458.patch SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052409#comment-13052409 ] Paul Elschot commented on LUCENE-2454: -- This overlaps with the BlockJoinQuery of LUCENE-3171, this issue might even be closed as duplicate of that one. Which one is preferred? On using prev/nextSetBit in a safe range, this safe range starts with the parent and ends with the largest known child. A variant of prevSetBit could take this largest known child as an argument to limit its search, and then from the return value one has either a new parent, or one is certain that the current parent is the right one. This would also limit the worst case number of inspected bits for the group to the group size. With or without that variant, I think it would be good to add a remark in the javadocs about the possible inefficiency of the use of OpenBitSet for larger group sizes. When the typical group size gets a lot bigger than the number of bits in a long, another implementation might be faster. This remark the in javadocs would allow us to wait for someone to come along with bigger group sizes and a real performance problem here. Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2383) Velocity: Generalize range and date facet display
[ https://issues.apache.org/jira/browse/SOLR-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052410#comment-13052410 ] Jan Høydahl commented on SOLR-2383: --- 3.3 will support the [from TO to} syntax, right? Attempt to get this in for 3.3. Grant? Velocity: Generalize range and date facet display - Key: SOLR-2383 URL: https://issues.apache.org/jira/browse/SOLR-2383 Project: Solr Issue Type: Bug Components: Response Writers Reporter: Jan Høydahl Assignee: Grant Ingersoll Labels: facet, range, velocity Fix For: 3.3 Attachments: SOLR-2383-branch_32.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch, SOLR-2383.patch Velocity (/browse) GUI has hardcoded price range facet and a hardcoded manufacturedate_dt date facet. Need general solution which work for any facet.range and facet.date. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052421#comment-13052421 ] Dawid Weiss commented on LUCENE-2341: - I did some analyses on both dictionaries. {noformat} Number of lines (distict surface forms): 3.662.366 morfologik.utf8 5.086.141 sgjp.utf8 Distinct words (not in both): 2.729.334 unique.utf8 - upper/lower case (morfologik has upper case forms, morfeusz only lower case surface forms) acerze Acerze - very rare or jargon; abszminka abszytowałem acetobakteria acetarsolowi niebombiasto hakatystce hakatystycznościach warzże - differences in spelling; abelard abélard - acronyms and super-short stuff aap aar Dictinct normalized (lowercase): 2.564.366 lowered.utf8 Most of these are very infrequent words or inflection forms. There are minor differences or missing surface forms in both dictionaries, as in here (mz - morfeusz, mk - morfologik): mz hakersko mz hakerskość mz hakerskości mz hakerskością mz hakerskościach mz hakerskościami mz hakerskościom mk hakerstw mk hakerstwa ... mk hakowałyśmy mk hakowań mk hakowaniach mk hakowaniami mk hakowaniom mz hakowatość mz hakowatości mz hakowatością mz hakowatościach mz hakowatościami mz hakowatościom {noformat} So... the conclusion is pretty consistent with Zipf's law: both dictionaries have a fairly different coverage, even if they're quite large. We don't have a frequency dictionary for Polish, but I assume most of these surface forms are purely theoretical and occur super-rarely in practice. This said, I think we should use BOTH dictionaries -- after all there's no harm done if we overdo the lemmatization process a little bit, is there? So... my proposal would be this: I'll integrate Morfeusz's dictionary in Morfologik (as an alternative dictionary one can load and use). Eventually it would be probably sensible to limit the automaton for use in Lucene to store surface forms and lemmas only (no POS tags) and merge both dictionaries into a single automaton... but this can be a future improvement. explore morfologik integration -- Key: LUCENE-2341 URL: https://issues.apache.org/jira/browse/LUCENE-2341 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer available: http://sourceforge.net/projects/morfologik/ This works differently than LUCENE-2298, and ideally would be another option for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052423#comment-13052423 ] Dawid Weiss commented on LUCENE-2341: - One note wrt patch: I would use an explicit pointer over a list of returned WordData entries instead of adding them to a local list: private ListWordData stemsAcc = new ArrayListWordData(); Right now you're shifting the internal array on each call unnecessarily (just increase an int ptr instead): + termAtt.setEmpty().append(stemsAcc.remove(0).getStem().toString()); getStem() should also be enough since it's a CharSequence, right? No need for an intermediate String. explore morfologik integration -- Key: LUCENE-2341 URL: https://issues.apache.org/jira/browse/LUCENE-2341 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer available: http://sourceforge.net/projects/morfologik/ This works differently than LUCENE-2298, and ideally would be another option for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-1431: - Attachment: SOLR-1431.patch This time use a factory to create shardHandler {code:xml} requestHandler name=standard class=solr.SearchHandler default=true !-- other params go here -- shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeOut1000/int int name=connTimeOut5000/int /shardHandler /requestHandler {code} CommComponent abstracted Key: SOLR-1431 URL: https://issues.apache.org/jira/browse/SOLR-1431 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Jason Rutherglen Assignee: Noble Paul Fix For: 4.0 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052436#comment-13052436 ] Mark Harwood commented on LUCENE-2454: -- bq. This overlaps with the BlockJoinQuery of LUCENE-3171, this issue might even be closed as duplicate of that one. Which one is preferred? We need to look at the likely use cases. 2454 was created to service a use case which I expect to be a very common pattern and I'm not sure if LUCENE-3171 satisfies this need. Apps commonly need to return a selection of both matching and non-matching children along with the best parents. Why? - it's a very similar rationale to the way that highlighting returns a summary of text - it doesn't just return the matched words, it also returns surrounding text as useful context when displaying results to users. However, some texts can be very large and there's a need to limit what context is brought back. If we apply this logic to 2454 we can see that for the top parents it is common to also want some non-matching children (e.g. for a resume return a person's employment history - not just the employments that matched the original search) but it is also necessary to summarize some parent's history (e.g. the contractor who listed a gazillion positions in his employment history needs summarising). A common pattern is for solutions to ask for the best 11 children for the best parents and display only 10 - that way the app knows that for certain parents there is more data available (i.e. those with 11 matches) and can offer a more button to retrieve the extra children for parents of interest. 2454 satisfies this use case as follows: # Use a NestedDocumentQuery to get best parents with child criteria expressed as a must # Use a PerParentLimitedQuery to get a selection of children per top parent where MUST belong to a top parent (tested using primary key) and use the child criteria again but this time as a SHOULD clause to relevance rank the selection of children returned It's worth considering this sort of use case carefully before making any code decisions. Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3223) SearchWithSortTask ignores sorting by Doc
[ https://issues.apache.org/jira/browse/LUCENE-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052446#comment-13052446 ] Michael McCandless commented on LUCENE-3223: Shouldn't this be backported to 3.x too? SearchWithSortTask ignores sorting by Doc - Key: LUCENE-3223 URL: https://issues.apache.org/jira/browse/LUCENE-3223 Project: Lucene - Java Issue Type: Bug Components: modules/benchmark Reporter: Chris Male Assignee: Chris Male Priority: Minor Fix For: 4.0 Attachments: LUCENE-3223-test.patch, LUCENE-3223.patch During my work in LUCENE-3912, I found the following code: {code} if (field.equals(doc)) { sortField0 = SortField.FIELD_DOC; } if (field.equals(score)) { sortField0 = SortField.FIELD_SCORE; } ... {code} This means the setting of SortField.FIELD_DOC is ignored. While I don't know much about this code, this seems like a valid setting and obviously just a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052451#comment-13052451 ] Robert Muir commented on LUCENE-2341: - {quote} Eventually it would be probably sensible to limit the automaton for use in Lucene to store surface forms and lemmas only (no POS tags) and merge both dictionaries into a single automaton... but this can be a future improvement. {quote} or alternatively, you can expose the POS tags for each stem to lucene right, easiest way would be to put it into TypeAttribute (a string), but you could make your own strongly-typed one if thats a better fit. this could be useful for downstream processing. explore morfologik integration -- Key: LUCENE-2341 URL: https://issues.apache.org/jira/browse/LUCENE-2341 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer available: http://sourceforge.net/projects/morfologik/ This works differently than LUCENE-2298, and ideally would be another option for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR
[ https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052456#comment-13052456 ] Robert Muir commented on SOLR-2487: --- Without knowing anything about logging, I just want to say its a bit scary to parameterize the build in any way: * how are the different possibilities going to be tested? * are all possibilities supported, or is only the default/tested parameter the one we officially support? Do not include slf4j-jdk14 jar in WAR - Key: SOLR-2487 URL: https://issues.apache.org/jira/browse/SOLR-2487 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.2, 4.0 Reporter: Jan Høydahl Labels: logging, slf4j I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help newbies get up and running. But I find myself re-packaging the war for every customer when adapting to their choice of logger framework, which is counter-productive. It would be sufficient to have the jdk-logging binding in example/lib to let the example and tutorial still work OOTB but as soon as you deploy solr.war to production you're forced to explicitly decide what logging to use. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052459#comment-13052459 ] Michael McCandless commented on LUCENE-2454: {quote} bq. It uses 2 passes if you also want to collect child docs per parent I tend to work with distributed indexes so it involves a 2 pass op anyway - one to understand best parents across the multiple shards first then the perparentlimitedquery to ensure we only pay the retrieve costs for those parents that make the final cut. {quote} The distributed case can still be done single pass, using LUCENE-3171, ie each shard returns the top groups and then they are merged in the front. This should be substantially faster than doing a 2nd pass out to all shards. Also, we now have TopDocs.merge/TopGroups.merge to support this use case. bq. This overlaps with the BlockJoinQuery of LUCENE-3171, this issue might even be closed as duplicate of that one. Which one is preferred? I think they are likely dups of one another and I agree we need to make sure all important use cases are covered. bq. Apps commonly need to return a selection of both matching and non-matching children along with the best parents. LUCENE-3171 can do this as well, with the same approach as here, ie doing 2 passes with two different child queries. However, I think for both this issue and for LUCENE-3171, this means each child doc must have the parent's PK indexed against it, right? Ie, for that 2nd query you need some way to return all child docs under any of the top parents, so the child query is parentID MUST be in XX, YY, ZZ and childDoc SHOULD XYZ. In fact, we could make this a single pass capability with LUCENE-3171 and without requireing each child doc index its parent PK, ie also pull sort all other non-matching children under any top parent, because collction within each parent is done when you retrieve the TopGroups, but this can be a later enhancement. Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3218: Attachment: LUCENE-3218.patch next iteration - seems close. * moved CFW to o.a.l.store and made package private. * added createCompoundOutput to Directory instead of passing OpenMode * added write support to CompundFileDirectory * Separately written file are appended during close if possible (no other file is currently written directly to the CF). If files is locked append happens once that file is closed. * IW uses Directory methods only, addFile has been converted to Directory#copy once thing which still bugs me is the setAbortCheck on CFDirectory.. I wonder if we can solve that differently, ideas? Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action
[ https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-2610: Attachment: SOLR-2610.patch Patch adds a boolean deleteIndex parameter to core unload action. There is a close hook interface in SolrCore but it is called before the update handler and searcher(s) are closed so it cannot be used to delete the index. Changes: * Changes the CloseHook interface to an abstract class with a preClose(SolrCore) and a postClose(SolrCore) method * Changed the usage of CloseHook in ReplicationHandler, SolrCoreTest * CoreAdminHandler adds a closehook on receiving an unload action with deleteIndex=true * Added tests for the new param Since the CloseHook is used very sparingly, I think it is fine to change it to an abstract class but if people feel strongly against it, we can find another way. Add an option to delete index through CoreAdmin UNLOAD action - Key: SOLR-2610 URL: https://issues.apache.org/jira/browse/SOLR-2610 Project: Solr Issue Type: Improvement Components: multicore Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 3.3, 4.0 Attachments: SOLR-2610.patch Right now, one can unload a Solr Core but the index files are left behind and consume disk space. We should have an option to delete the index when unloading a core. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2341) explore morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052483#comment-13052483 ] Dawid Weiss commented on LUCENE-2341: - I've just published morfologik 1.5.2, Michał. This comes with two dictionaries (morfologik and morfeusz) that can be used as one (fallback for missing words) or separately, but I would stick to using morfologik as the default dictionary (possibly with an option of using morfeusz?). POS tags have a different notation in these two resources, so mixing both is probably not a good idea. Will you update the patch? Thanks. explore morfologik integration -- Key: LUCENE-2341 URL: https://issues.apache.org/jira/browse/LUCENE-2341 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Robert Muir Assignee: Dawid Weiss Attachments: LUCENE-2341.diff, morfologik-stemming-1.5.0.jar Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer available: http://sourceforge.net/projects/morfologik/ This works differently than LUCENE-2298, and ideally would be another option for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052492#comment-13052492 ] Matteo Melli commented on SOLR-2564: Hi there, I'm testing this functionality into my project and found what I think it's a bug. The revision I'm working on is 1137889. I could reproduce the bug with a really simple index (the column is of type solr.String): || Col1 || | 1 | | 2 | | 3 | The bug appear when I try to do a query with grouping mixing parameters start (with a value greather than 0) and group.main=true: http://localhost:8983/solr/test/select/?q=*:*start=1group=truegroup.field=Col1group.main=true The error trace is: Jun 21, 2011 1:32:10 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.solr.search.DocSlice$1.nextDoc(DocSlice.java:119) at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:247) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:153) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:111) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:37) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:340) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:242) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) The problem does not appear without group.main=true so this may be a related bug to that option. PS: I was not sure if there where to open a bug since the version affected is still in development. Anyway sorry for any inconvenient. Integrating grouping module into Solr 4.0 - Key: SOLR-2564 URL: https://issues.apache.org/jira/browse/SOLR-2564 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Blocker Fix For: 4.0 Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch Since work on grouping module is going well. I think it is time to wire this up in Solr. Besides the current grouping features Solr provides, Solr will then also support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Done. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: (was: LUCENE-3220.patch) Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Done. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Comment: was deleted (was: Done.) Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3171) BlockJoinQuery/Collector
[ https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052513#comment-13052513 ] Paul Elschot commented on LUCENE-3171: -- BlockJoinQuery still needs hashCode/equals, and a javadoc note (as I remarked earlier at 2454) about the possible inefficiency of the use of OpenBitSet for larger group sizes. When the typical group size gets a lot bigger than the number of bits in a long, another implementation might be faster. This remark the in javadocs would allow us to wait for someone to come along with bigger group sizes and a real performance problem here. I would prefer to use single pass and for now I only need the parent docs. That means that I have no preference for 2454 or this one. BlockJoinQuery/Collector Key: LUCENE-3171 URL: https://issues.apache.org/jira/browse/LUCENE-3171 Project: Lucene - Java Issue Type: Improvement Components: modules/other Reporter: Michael McCandless Fix For: 3.3, 4.0 Attachments: LUCENE-3171.patch, LUCENE-3171.patch I created a single-pass Query + Collector to implement nested docs. The approach is similar to LUCENE-2454, in that the app must index documents in join order, as a block (IW.add/updateDocuments), with the parent doc at the end of the block, except that this impl is one pass. Once you join at indexing time, you can take any query that matches child docs and join it up to the parent docID space, using BlockJoinQuery. You then use BlockJoinCollector, which sorts parent docs by provided Sort, to gather results, grouped by parent; this collector finds any BlockJoinQuerys (using Scorer.visitScorers) and retains the child docs corresponding to each collected parent doc. After searching is done, you retrieve the TopGroups from a provided BlockJoinQuery. Like LUCENE-2454, this is less general than the arbitrary joins in Solr (SOLR-2272) or parent/child from ElasticSearch (https://github.com/elasticsearch/elasticsearch/issues/553), since you must do the join at indexing time as a doc block, but it should be able to handle nested joins as well as joins to multiple tables, though I don't yet have test cases for these. I put this in a new Join module (modules/join); I think as we refactor join impls we should put them here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2598) exampledocs/books.json should use name instead of title
[ https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052572#comment-13052572 ] Yonik Seeley commented on SOLR-2598: Yeah, looks fine. exampledocs/books.json should use name instead of title --- Key: SOLR-2598 URL: https://issues.apache.org/jira/browse/SOLR-2598 Project: Solr Issue Type: Improvement Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Fix For: 3.3 Attachments: SOLR-2598.patch The file exampledocs/books.json currently contains two books. But they do not show up in the default solr/browse interface because they use title instead of name, which the Velocity template does not show. Also we should include a few more books -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3218: Attachment: LUCENE-3218.patch updated patch NOW containing all files :) sorry for the missing files in the last patch Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2598) exampledocs/books.json should use name instead of title
[ https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-2598. --- Resolution: Fixed Committed trunk: r1138017 3.x: r1138020 exampledocs/books.json should use name instead of title --- Key: SOLR-2598 URL: https://issues.apache.org/jira/browse/SOLR-2598 Project: Solr Issue Type: Improvement Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Fix For: 3.3 Attachments: SOLR-2598.patch The file exampledocs/books.json currently contains two books. But they do not show up in the default solr/browse interface because they use title instead of name, which the Velocity template does not show. Also we should include a few more books -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2598) exampledocs/books.json should use name instead of title
[ https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052595#comment-13052595 ] Yonik Seeley commented on SOLR-2598: Note that if you click the All tab on JIRA, it will show your two commits (hence you don't need to bother to list the revisions if you don't want). exampledocs/books.json should use name instead of title --- Key: SOLR-2598 URL: https://issues.apache.org/jira/browse/SOLR-2598 Project: Solr Issue Type: Improvement Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Fix For: 3.3 Attachments: SOLR-2598.patch The file exampledocs/books.json currently contains two books. But they do not show up in the default solr/browse interface because they use title instead of name, which the Velocity template does not show. Also we should include a few more books -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052597#comment-13052597 ] Michael McCandless commented on LUCENE-3218: Patch looks great! Can we name it createCompoundOutput? Emphasizes that we are write-once (this file shouldn't exist), and matches createOutput. On checkAbort... we could not send that to the CFW and instead call checkAbort in the outer loops? (Ie, where we .copy the files in). The existing CFW already only checks once-per-file anyway... Maybe instead of asserts for the mis-use of the CFD API (eg no entries, something is still open), we should make these real exceptions (ie, thrown even when assertions are off)? This comment looks stale (in CFW.java)?: {noformat} // Close the output stream. Set the os to null before trying to // close so that if an exception occurs during the close, the // finally clause below will not attempt to close the stream // the second time. {noformat} openCompoundOutput needs javadoc. CFD.createOutput's jdoc says Not Implememented but it is. The new test cases in TestCompoundFile names its file d.csf ;) Column stride fields lives on!! Too many tlas... Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
[ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052601#comment-13052601 ] Peter Wolanin commented on SOLR-2462: - I generated a patch for 3.2 looking at the commit on branch_3x. It looks somewhat different from the last patch by James. I also just compared the trunk commit to the last patch and it doesn't match https://issues.apache.org/jira/secure/attachment/12481574/SOLR-2462.patch Did the wrong patch get committed, or was the final patch just never get posted to this issue before commit? Using spellcheck.collate can result in extremely high memory usage -- Key: SOLR-2462 URL: https://issues.apache.org/jira/browse/SOLR-2462 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 3.1 Reporter: James Dyer Assignee: Robert Muir Priority: Critical Fix For: 3.3, 4.0 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch When using spellcheck.collate, class SpellPossibilityIterator creates a ranked list of *every* possible correction combination. But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory. This bug was introduced with SOLR-2010. However, it is triggered anytime spellcheck.collate is used. It is not necessary to use any features that were added with SOLR-2010. We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with infinite GC loops. It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app. This URL results in a search with ~12 misspelled words. We have spellcheck.count set to 15. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2598) exampledocs/books.json should use name instead of title
[ https://issues.apache.org/jira/browse/SOLR-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052600#comment-13052600 ] Jan Høydahl commented on SOLR-2598: --- Ok, thanks exampledocs/books.json should use name instead of title --- Key: SOLR-2598 URL: https://issues.apache.org/jira/browse/SOLR-2598 Project: Solr Issue Type: Improvement Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Fix For: 3.3 Attachments: SOLR-2598.patch The file exampledocs/books.json currently contains two books. But they do not show up in the default solr/browse interface because they use title instead of name, which the Velocity template does not show. Also we should include a few more books -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1750) SolrInfoMBeanHandler - replacement for stats.jsp and registry.jsp
[ https://issues.apache.org/jira/browse/SOLR-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052606#comment-13052606 ] Jan Høydahl commented on SOLR-1750: --- The /admin/stats handler is not registered by default, nor is it included in example config. I had to add requestHandler name=/admin/stats class=org.apache.solr.handler.admin.SolrInfoMBeanHandler / to my solrconfig to get it working. SolrInfoMBeanHandler - replacement for stats.jsp and registry.jsp - Key: SOLR-1750 URL: https://issues.apache.org/jira/browse/SOLR-1750 Project: Solr Issue Type: Improvement Components: web gui Reporter: Erik Hatcher Assignee: Erik Hatcher Priority: Trivial Fix For: 1.5, 3.1, 4.0 Attachments: SOLR-1750-followup.patch, SystemStatsRequestHandler.java, SystemStatsRequestHandler.java, SystemStatsRequestHandler.java stats.jsp is cool and all, but suffers from escaping issues, and also is not accessible from SolrJ or other standard Solr APIs. Here's a request handler that emits everything stats.jsp does. For now, it needs to be registered in solrconfig.xml like this: {code} requestHandler name=/admin/stats class=solr.SystemStatsRequestHandler / {code} But will register this in AdminHandlers automatically before committing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3218: Attachment: LUCENE-3218.patch final patch. * fixed javadocs + several javadoc warnings * renamed openCompoundOutput to createCompoundOutput * fixed file extensions in test CSF LOL!! * copyFileEntry now deletes files that are separately written once copied into the CFS. * converted asserts to exceptions in CFW I plan to commit this today if nobody objects. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2489) Remove old lucene.apache.org/solr/who page
[ https://issues.apache.org/jira/browse/SOLR-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-2489. --- Resolution: Fixed Remove old lucene.apache.org/solr/who page -- Key: SOLR-2489 URL: https://issues.apache.org/jira/browse/SOLR-2489 Project: Solr Issue Type: Bug Affects Versions: 3.1, 3.2 Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Fix For: 3.3 In the distribution, docs/who.html is old - refers to the old Solr committers list at http://lucene.apache.org/solr/who Fix would be to simply delete the old page -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2610) Add an option to delete index through CoreAdmin UNLOAD action
[ https://issues.apache.org/jira/browse/SOLR-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052624#comment-13052624 ] Jason Rutherglen commented on SOLR-2610: This is good! I had to write the same functionality into a custom Solr build on a project. Add an option to delete index through CoreAdmin UNLOAD action - Key: SOLR-2610 URL: https://issues.apache.org/jira/browse/SOLR-2610 Project: Solr Issue Type: Improvement Components: multicore Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 3.3, 4.0 Attachments: SOLR-2610.patch Right now, one can unload a Solr Core but the index files are left behind and consume disk space. We should have an option to delete the index when unloading a core. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
[ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052623#comment-13052623 ] James Dyer commented on SOLR-2462: -- Peter, I reviewed Robert's commits (r1132730 to branch_3x ; r1132729 to trunk), and they appear to match the 06/Jun/11 15:10 version of the patch. I looked mostly at the change in TestSpellCheckResponse.java, which is the last tweak that was made. Keep in mind there are a few things that were committed that aren't in the patch (changes.txt, etc). Did you have other specific discrepancies in mind? Using spellcheck.collate can result in extremely high memory usage -- Key: SOLR-2462 URL: https://issues.apache.org/jira/browse/SOLR-2462 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 3.1 Reporter: James Dyer Assignee: Robert Muir Priority: Critical Fix For: 3.3, 4.0 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch When using spellcheck.collate, class SpellPossibilityIterator creates a ranked list of *every* possible correction combination. But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory. This bug was introduced with SOLR-2010. However, it is triggered anytime spellcheck.collate is used. It is not necessary to use any features that were added with SOLR-2010. We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with infinite GC loops. It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app. This URL results in a search with ~12 misspelled words. We have spellcheck.count set to 15. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2548) Remove all interning of field names from flex API
[ https://issues.apache.org/jira/browse/LUCENE-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2548. Resolution: Fixed Committed! Uwe, I think I fixed all the places where we were making a placeholder term just to hold a field... Remove all interning of field names from flex API - Key: LUCENE-2548 URL: https://issues.apache.org/jira/browse/LUCENE-2548 Project: Lucene - Java Issue Type: Improvement Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2548.patch, LUCENE-2548.patch In previous versions of Lucene, interning of fields was important to minimize string comparison cost when iterating TermEnums, to detect changes in field name. As we separated field names from terms in flex, no query compares field names anymore, so the whole performance problematic interning can be removed. I will start with doing this, but we need to carefully review some places e.g. in preflex codec. Maybe before this issue we should remove the Term class completely. :-) Robert? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3222) Buffered deletes under count RAM
[ https://issues.apache.org/jira/browse/LUCENE-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3222: --- Attachment: LUCENE-3222.patch Simple patch, I'll commit shortly backport. Buffered deletes under count RAM Key: LUCENE-3222 URL: https://issues.apache.org/jira/browse/LUCENE-3222 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.3, 4.0 Attachments: LUCENE-3222.patch I found this while working on LUCENE-2548: when we freeze the deletes (create FrozenBufferedDeletes), when we set the bytesUsed we are failing to account for RAM required for the term bytes (and now term field). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3201. - Resolution: Fixed Assignee: Simon Willnauer incorporated in LUCENE-3218 I will track backporting there improved compound file handling --- Key: LUCENE-3201 URL: https://issues.apache.org/jira/browse/LUCENE-3201 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Assignee: Simon Willnauer Fix For: 3.3, 4.0 Attachments: LUCENE-3201.patch, LUCENE-3201.patch Currently CompoundFileReader could use some improvements, i see the following problems * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap. * it seeks on every readInternal * its not possible for a directory to override or improve the handling of compound files. for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput, and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should, as a user could read into the next file and be left unaware. however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file. its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(), as its position would just work. So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest case for the least code change would be to add this to Directory.java: {code} public Directory openCompoundInput(String filename) { return new CompoundFileReader(this, filename); } {code} Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override... but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052633#comment-13052633 ] Simon Willnauer commented on LUCENE-3218: - Committed in revision 1138063. I will try to backport this to 3.x if possible Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
Grouping is really worth a release! But if group count in facet is within reach, wait for that! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 21. juni 2011, at 05.53, Bill Bell wrote: +1 wait for grouping post facet counts... Go Martijn v Groningen !! On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote: i was planning on doing an RC in a few weeks actually. we have a lot of good stuff in there today already, however i wanted to give a few weeks for the grouping stuff to run on hudson. On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I would say within the next 3 month. Thoughts? On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, How soon can we expect official Lucene 3.3 release? Best regards, Lukas - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
Again, I don't think any future uncommitted features should block a release, nor should there be a shoving period where features are shoved in. I'll be now looking at producing an RC as quickly as possible before this can happen! On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote: Grouping is really worth a release! But if group count in facet is within reach, wait for that! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 21. juni 2011, at 05.53, Bill Bell wrote: +1 wait for grouping post facet counts... Go Martijn v Groningen !! On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote: i was planning on doing an RC in a few weeks actually. we have a lot of good stuff in there today already, however i wanted to give a few weeks for the grouping stuff to run on hudson. On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I would say within the next 3 month. Thoughts? On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, How soon can we expect official Lucene 3.3 release? Best regards, Lukas - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3222) Buffered deletes under count RAM
[ https://issues.apache.org/jira/browse/LUCENE-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3222. Resolution: Fixed Buffered deletes under count RAM Key: LUCENE-3222 URL: https://issues.apache.org/jira/browse/LUCENE-3222 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.3, 4.0 Attachments: LUCENE-3222.patch I found this while working on LUCENE-2548: when we freeze the deletes (create FrozenBufferedDeletes), when we set the bytesUsed we are failing to account for RAM required for the term bytes (and now term field). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR
[ https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052643#comment-13052643 ] Hoss Man commented on SOLR-2487: supported is always a vague term, but like with every other ant property in our build file, the default is the supported one that we test, and if you override a property when building from source that's a customization and we won't promise that it will always work. it's no different then if they override the javac.source property, or build.encoding, etc... Do not include slf4j-jdk14 jar in WAR - Key: SOLR-2487 URL: https://issues.apache.org/jira/browse/SOLR-2487 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.2, 4.0 Reporter: Jan Høydahl Labels: logging, slf4j I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help newbies get up and running. But I find myself re-packaging the war for every customer when adapting to their choice of logger framework, which is counter-productive. It would be sufficient to have the jdk-logging binding in example/lib to let the example and tutorial still work OOTB but as soon as you deploy solr.war to production you're forced to explicitly decide what logging to use. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3171) BlockJoinQuery/Collector
[ https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052642#comment-13052642 ] Michael McCandless commented on LUCENE-3171: bq. BlockJoinQuery still needs hashCode/equals Woops, thanks, I'll add! {quote} and a javadoc note (as I remarked earlier at 2454) about the possible inefficiency of the use of OpenBitSet for larger group sizes. When the typical group size gets a lot bigger than the number of bits in a long, another implementation might be faster. This remark the in javadocs would allow us to wait for someone to come along with bigger group sizes and a real performance problem here. {quote} Hmm: do you have an improvement in mind for OpenBitSet.prevSetBit to better handle large groups? Or, where is this possible inefficiency (is it something specific)? bq. I would prefer to use single pass and for now I only need the parent docs. That means that I have no preference for 2454 or this one. I wonder how often apps typically need just the parent docs vs the groups (w/ child docs)... But, still this patch only calls .nextSetBit() once per group so that ought to be faster than LUCENE-2454, I think... hmm, unless you typically only have 1 child match per parent. BlockJoinQuery/Collector Key: LUCENE-3171 URL: https://issues.apache.org/jira/browse/LUCENE-3171 Project: Lucene - Java Issue Type: Improvement Components: modules/other Reporter: Michael McCandless Fix For: 3.3, 4.0 Attachments: LUCENE-3171.patch, LUCENE-3171.patch I created a single-pass Query + Collector to implement nested docs. The approach is similar to LUCENE-2454, in that the app must index documents in join order, as a block (IW.add/updateDocuments), with the parent doc at the end of the block, except that this impl is one pass. Once you join at indexing time, you can take any query that matches child docs and join it up to the parent docID space, using BlockJoinQuery. You then use BlockJoinCollector, which sorts parent docs by provided Sort, to gather results, grouped by parent; this collector finds any BlockJoinQuerys (using Scorer.visitScorers) and retains the child docs corresponding to each collected parent doc. After searching is done, you retrieve the TopGroups from a provided BlockJoinQuery. Like LUCENE-2454, this is less general than the arbitrary joins in Solr (SOLR-2272) or parent/child from ElasticSearch (https://github.com/elasticsearch/elasticsearch/issues/553), since you must do the join at indexing time as a doc block, but it should be able to handle nested joins as well as joins to multiple tables, though I don't yet have test cases for these. I put this in a new Join module (modules/join); I think as we refactor join impls we should put them here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052644#comment-13052644 ] Michael McCandless commented on LUCENE-2454: bq. A variant of prevSetBit could take this largest known child as an argument to limit its search, I think we should not require the app to know the max number of children per parent? (Ie, we should just grow buffers, etc., on demand as we collect). I mean, if this information is easily available we could optimize for that case, but for some apps it's a good amount of work to record this and update it so I don't think it should be a required arg when creating the query/collectors, even though it's tempting ;) Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052648#comment-13052648 ] Michael McCandless commented on LUCENE-2454: bq. A common pattern is for solutions to ask for the best 11 children for the best parents and display only 10 - that way the app knows that for certain parents there is more data available (i.e. those with 11 matches) and can offer a more button to retrieve the extra children for parents of interest With LUCENE-3171, you should be able to just ask for 10 here, and then check if the TopDocs.totalHits is 10 to decide whether to offer the more button. Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR
[ https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052649#comment-13052649 ] Robert Muir commented on SOLR-2487: --- Hoss, ok, I just was trying to figure out the expectations for testing. Testing with a different classpath or whatever is more difficult than the other 'non-default' or 'conditional default' parameters that we randomize in Lucene to solve this issue (e.g. codecs, directories, locales, mergepolicies, ...), thats why I mentioned it. Do not include slf4j-jdk14 jar in WAR - Key: SOLR-2487 URL: https://issues.apache.org/jira/browse/SOLR-2487 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.2, 4.0 Reporter: Jan Høydahl Labels: logging, slf4j I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help newbies get up and running. But I find myself re-packaging the war for every customer when adapting to their choice of logger framework, which is counter-productive. It would be sufficient to have the jdk-logging binding in example/lib to let the example and tutorial still work OOTB but as soon as you deploy solr.war to production you're forced to explicitly decide what logging to use. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1750) SolrInfoMBeanHandler - replacement for stats.jsp and registry.jsp
[ https://issues.apache.org/jira/browse/SOLR-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052656#comment-13052656 ] Hoss Man commented on SOLR-1750: Jan: as stated above the registration i picked was /admin/mbeans - stats is too specific since the component can be used for other purposes then getting stats. it's also not a default handler -- it's registered if you register the AdminHandler Jonathan: i overlooked your comment until now. the existing SystemInfoHandler isn't deprecated -- it's still very useful and provides information about the entire system solr is running in (the jvm, the os, etc...) SolrInfoMBeanHandler - replacement for stats.jsp and registry.jsp - Key: SOLR-1750 URL: https://issues.apache.org/jira/browse/SOLR-1750 Project: Solr Issue Type: Improvement Components: web gui Reporter: Erik Hatcher Assignee: Erik Hatcher Priority: Trivial Fix For: 1.5, 3.1, 4.0 Attachments: SOLR-1750-followup.patch, SystemStatsRequestHandler.java, SystemStatsRequestHandler.java, SystemStatsRequestHandler.java stats.jsp is cool and all, but suffers from escaping issues, and also is not accessible from SolrJ or other standard Solr APIs. Here's a request handler that emits everything stats.jsp does. For now, it needs to be registered in solrconfig.xml like this: {code} requestHandler name=/admin/stats class=solr.SystemStatsRequestHandler / {code} But will register this in AdminHandlers automatically before committing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com wrote: Again, I don't think any future uncommitted features should block a release, nor should there be a shoving period where features are shoved in. +1 - release early often!!! simon I'll be now looking at producing an RC as quickly as possible before this can happen! On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote: Grouping is really worth a release! But if group count in facet is within reach, wait for that! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 21. juni 2011, at 05.53, Bill Bell wrote: +1 wait for grouping post facet counts... Go Martijn v Groningen !! On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote: i was planning on doing an RC in a few weeks actually. we have a lot of good stuff in there today already, however i wanted to give a few weeks for the grouping stuff to run on hudson. On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I would say within the next 3 month. Thoughts? On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, How soon can we expect official Lucene 3.3 release? Best regards, Lukas - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3224) bugs in ByteArrayDataInput
bugs in ByteArrayDataInput -- Key: LUCENE-3224 URL: https://issues.apache.org/jira/browse/LUCENE-3224 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some things like readVint will work, others will fail due to asserts). The problem is it doesnt set things like limit in the ctor... I think the ctor should call reset() Most code using this passes null to the ctor to initialize it, then uses reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if they want to do that. finally, reset()'s limit looks like it should be offset + len -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3224) bugs in ByteArrayDataInput
[ https://issues.apache.org/jira/browse/LUCENE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052660#comment-13052660 ] Robert Muir commented on LUCENE-3224: - also i think we want to assert all bounds checks in here, maybe have a checkBounds(int limit) called only from assert that throws read past EOF. this way we don't rely upon AIOOBE, we could be reading from slices and miss bugs in tests. bugs in ByteArrayDataInput -- Key: LUCENE-3224 URL: https://issues.apache.org/jira/browse/LUCENE-3224 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some things like readVint will work, others will fail due to asserts). The problem is it doesnt set things like limit in the ctor... I think the ctor should call reset() Most code using this passes null to the ctor to initialize it, then uses reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if they want to do that. finally, reset()'s limit looks like it should be offset + len -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: managing CHANGES.txt?
On Tue, Jun 21, 2011 at 1:09 PM, Chris Hostetter hossman_luc...@fucit.org wrote: But there is no way for someone looking at the CHANGES for 4.0 to know for certain that the bits that make up that bug fix are in the 4.0 release -- the fact that it's listed in 3.2's CHANGES isn't an assurance, because 4.0 comes from a completely different line of development. its in the 4.0 CHANGES.txt, under the 3.2 section. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
-1 on release early often. Let us say you average 6-8 releases a month, this means there will be that many versions used by users. Which means the amount of testing done on a release (by real users, in real environment) will be spread thin thus a release will not get the same amount of testing it otherwise would. Not only that, more releases means more release specific questions. Expect to see questions / issues reported and you must ask what version are you using? before you can answer. May I suggest a scheduled release, once a quarter, near the end of a quarter? -JM -Original Message- From: Simon Willnauer simon.willna...@googlemail.com To: dev@lucene.apache.org Sent: Tue, Jun 21, 2011 12:53 pm Subject: Re: Lucene 3.3 release soon? On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com wrote: Again, I don't think any future uncommitted features should block a release, nor should there be a shoving period where features are shoved in. +1 - release early often!!! simon I'll be now looking at producing an RC as quickly as possible before this can happen! On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote: Grouping is really worth a release! But if group count in facet is within each, wait for that! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 21. juni 2011, at 05.53, Bill Bell wrote: +1 wait for grouping post facet counts... Go Martijn v Groningen !! On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote: i was planning on doing an RC in a few weeks actually. we have a lot of good stuff in there today already, however i wanted to give a few weeks for the grouping stuff to run on hudson. On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I would say within the next 3 month. Thoughts? On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, How soon can we expect official Lucene 3.3 release? Best regards, Lukas - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - o unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org or additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3224) bugs in ByteArrayDataInput
[ https://issues.apache.org/jira/browse/LUCENE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-3224: -- Assignee: Michael McCandless bugs in ByteArrayDataInput -- Key: LUCENE-3224 URL: https://issues.apache.org/jira/browse/LUCENE-3224 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Assignee: Michael McCandless Attachments: LUCENE-3224.patch ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some things like readVint will work, others will fail due to asserts). The problem is it doesnt set things like limit in the ctor... I think the ctor should call reset() Most code using this passes null to the ctor to initialize it, then uses reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if they want to do that. finally, reset()'s limit looks like it should be offset + len -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3224) bugs in ByteArrayDataInput
[ https://issues.apache.org/jira/browse/LUCENE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3224: --- Attachment: LUCENE-3224.patch Patch. bugs in ByteArrayDataInput -- Key: LUCENE-3224 URL: https://issues.apache.org/jira/browse/LUCENE-3224 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Assignee: Michael McCandless Attachments: LUCENE-3224.patch ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some things like readVint will work, others will fail due to asserts). The problem is it doesnt set things like limit in the ctor... I think the ctor should call reset() Most code using this passes null to the ctor to initialize it, then uses reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if they want to do that. finally, reset()'s limit looks like it should be offset + len -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: managing CHANGES.txt?
Robert, Is the CHANGES.txt policy you advocate (and police) written up in one place? I'm sure you'd like to not have to fix up everybody's entries Steve -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Tuesday, June 21, 2011 1:14 PM To: dev@lucene.apache.org Subject: Re: managing CHANGES.txt? On Tue, Jun 21, 2011 at 1:09 PM, Chris Hostetter hossman_luc...@fucit.org wrote: But there is no way for someone looking at the CHANGES for 4.0 to know for certain that the bits that make up that bug fix are in the 4.0 release -- the fact that it's listed in 3.2's CHANGES isn't an assurance, because 4.0 comes from a completely different line of development. its in the 4.0 CHANGES.txt, under the 3.2 section. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3224) bugs in ByteArrayDataInput
[ https://issues.apache.org/jira/browse/LUCENE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052679#comment-13052679 ] Robert Muir commented on LUCENE-3224: - +1 bugs in ByteArrayDataInput -- Key: LUCENE-3224 URL: https://issues.apache.org/jira/browse/LUCENE-3224 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Assignee: Michael McCandless Attachments: LUCENE-3224.patch ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some things like readVint will work, others will fail due to asserts). The problem is it doesnt set things like limit in the ctor... I think the ctor should call reset() Most code using this passes null to the ctor to initialize it, then uses reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if they want to do that. finally, reset()'s limit looks like it should be offset + len -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: managing CHANGES.txt?
It wasn't anything i advocate, I'm just describing what it seems like we do 99% of the time? (in my example, Uwe committed it, and I didnt fix anything) On Tue, Jun 21, 2011 at 1:23 PM, Steven A Rowe sar...@syr.edu wrote: Robert, Is the CHANGES.txt policy you advocate (and police) written up in one place? I'm sure you'd like to not have to fix up everybody's entries Steve -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Tuesday, June 21, 2011 1:14 PM To: dev@lucene.apache.org Subject: Re: managing CHANGES.txt? On Tue, Jun 21, 2011 at 1:09 PM, Chris Hostetter hossman_luc...@fucit.org wrote: But there is no way for someone looking at the CHANGES for 4.0 to know for certain that the bits that make up that bug fix are in the 4.0 release -- the fact that it's listed in 3.2's CHANGES isn't an assurance, because 4.0 comes from a completely different line of development. its in the 4.0 CHANGES.txt, under the 3.2 section. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8966 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8966/ 10 tests failed. REGRESSION: org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety Error Message: Error occurred in thread Thread-72: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/test6473964755tmp/_e_1.prx (Too many open files in system) Stack Trace: junit.framework.AssertionFailedError: Error occurred in thread Thread-72: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/test6473964755tmp/_e_1.prx (Too many open files in system) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343) /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/test6473964755tmp/_e_1.prx (Too many open files in system) at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822) REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testImmediateDiskFullWithThreads Error Message: hit unexpected Throwable Stack Trace: junit.framework.AssertionFailedError: hit unexpected Throwable at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343) at org.apache.lucene.index.TestIndexWriterWithThreads.testImmediateDiskFullWithThreads(TestIndexWriterWithThreads.java:140) REGRESSION: org.apache.lucene.index.TestStressIndexing2.testRandomIWReader Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:605) FAILED: junit.framework.TestSuite.org.apache.lucene.search.TestFieldCacheRangeFilter Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/2/test3857338582tmp/_1_1.doc (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/2/test3857338582tmp/_1_1.doc (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:110) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:133) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:58) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:326) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:415) at org.apache.lucene.store.Directory.openInput(Directory.java:118) at org.apache.lucene.index.codecs.mocksep.MockSingleIntIndexInput.init(MockSingleIntIndexInput.java:40) at org.apache.lucene.index.codecs.mocksep.MockSingleIntFactory.openInput(MockSingleIntFactory.java:31) at org.apache.lucene.index.codecs.sep.IntStreamFactory.openInput(IntStreamFactory.java:28) at org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl.init(SepPostingsReaderImpl.java:66) at org.apache.lucene.index.codecs.mocksep.MockSepCodec.fieldsProducer(MockSepCodec.java:95) at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.init(PerFieldCodecWrapper.java:113) at org.apache.lucene.index.PerFieldCodecWrapper.fieldsProducer(PerFieldCodecWrapper.java:189) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:88) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:640) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3450) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3119) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1879) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1874) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1870) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1484) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1236)
[jira] [Updated] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated SOLR-2452: -- Attachment: SOLR-2452.dir.reshuffle.sh SOLR-2452-post-reshuffling.patch This version of the shell script patch removes Solrj's dependence on Solr core tests, by moving SolrJettyTestBase and ExternalPaths from Solr core to Solr's test-framework -- it turns out that these were the only two Solr core test classes that Solrj depended on. rewrite solr build system - Key: SOLR-2452 URL: https://issues.apache.org/jira/browse/SOLR-2452 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Steven Rowe Fix For: 3.3, 4.0 Attachments: SOLR-2452-post-reshuffling.patch, SOLR-2452-post-reshuffling.patch, SOLR-2452.dir.reshuffle.sh, SOLR-2452.dir.reshuffle.sh As discussed some in SOLR-2002 (but that issue is long and hard to follow), I think we should rewrite the solr build system. Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3224) bugs in ByteArrayDataInput
[ https://issues.apache.org/jira/browse/LUCENE-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3224. Resolution: Fixed Fix Version/s: 4.0 bugs in ByteArrayDataInput -- Key: LUCENE-3224 URL: https://issues.apache.org/jira/browse/LUCENE-3224 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3224.patch ByteArrayDataInput has a byte[] ctor, but it doesn't actually work (some things like readVint will work, others will fail due to asserts). The problem is it doesnt set things like limit in the ctor... I think the ctor should call reset() Most code using this passes null to the ctor to initialize it, then uses reset(), instead they could just call ByteArrayInput(BytesRef.EMPTY_BYTES) if they want to do that. finally, reset()'s limit looks like it should be offset + len -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
I think we might target fewer than 6-8 a month. That would be scary! I would guess it will be once a month at worse, and often less. Time will tell. You must already give version info with questions if you want decent help - nothing is going to change that. - Mark On Jun 21, 2011, at 1:15 PM, johnmu...@aol.com wrote: -1 on release early often. Let us say you average 6-8 releases a month, this means there will be that many versions used by users. Which means the amount of testing done on a release (by real users, in real environment) will be spread thin thus a release will not get the same amount of testing it otherwise would. Not only that, more releases means more release specific questions. Expect to see questions / issues reported and you must ask what version are you using? before you can answer. May I suggest a scheduled release, once a quarter, near the end of a quarter? -JM -Original Message- From: Simon Willnauer simon.willna...@googlemail.com To: dev@lucene.apache.org Sent: Tue, Jun 21, 2011 12:53 pm Subject: Re: Lucene 3.3 release soon? On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com wrote: Again, I don't think any future uncommitted features should block a release, nor should there be a shoving period where features are shoved in. +1 - release early often!!! simon I'll be now looking at producing an RC as quickly as possible before this can happen! On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote: Grouping is really worth a release! But if group count in facet is within reach, wait for that! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 21. juni 2011, at 05.53, Bill Bell wrote: +1 wait for grouping post facet counts... Go Martijn v Groningen !! On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote: i was planning on doing an RC in a few weeks actually. we have a lot of good stuff in there today already, however i wanted to give a few weeks for the grouping stuff to run on hudson. On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I would say within the next 3 month. Thoughts? On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, How soon can we expect official Lucene 3.3 release? Best regards, Lukas - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052696#comment-13052696 ] Michael McCandless commented on LUCENE-2454: bq. I think the only thing 3171 may be missing from my original use cases then is that I can use multiple PerParentLimitedQueries in one query to get a limit of children of different types e.g. for each parent resume, max 10 results from employment detail children and max 10 results from education background children. I think LUCENE-3171 can handle this, or something very similar: the collector tracks all of the BlockJoinQuerys involved in the top query. So, you'd have 1 BJQ matching employment detail child docs and another matching education bg child docs. The BJC collects the top parent docs, then you can retrieve separate TopGroups for each BJQ. In the end you have a TopGroups for the employment detail child docs and another TopGroups for the education bg child docs. Could that work for your use case? Nested Document query support - Key: LUCENE-2454 URL: https://issues.apache.org/jira/browse/LUCENE-2454 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: 3.0.2 Reporter: Mark Harwood Assignee: Mark Harwood Priority: Minor Attachments: LUCENE-2454.patch, LUCENE-2454.patch, LuceneNestedDocumentSupport.zip A facility for querying nested documents in a Lucene index as outlined in http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: managing CHANGES.txt?
On 6/21/2011 at 1:26 PM, Robert Muir wrote: On Tue, Jun 21, 2011 at 1:23 PM, Steven A Rowe sar...@syr.edu wrote: Is the CHANGES.txt policy you advocate (and police) written up in one place? I'm sure you'd like to not have to fix up everybody's entries It wasn't anything i advocate, I'm just describing what it seems like we do 99% of the time? (in my example, Uwe committed it, and I didnt fix anything) I'm confused - seems like you're disavowing the role you've been playing as CHANGES policeman - yet I've seen at least 10 CHANGES-policing commits within the last six weeks?: http://svn.apache.org/viewvc?rev=1137361view=rev http://svn.apache.org/viewvc?rev=1137359view=rev http://svn.apache.org/viewvc?rev=1130564view=rev http://svn.apache.org/viewvc?rev=1128248view=rev http://svn.apache.org/viewvc?rev=1128247view=rev http://svn.apache.org/viewvc?rev=1125127view=rev http://svn.apache.org/viewvc?rev=1125128view=rev http://svn.apache.org/viewvc?rev=1125134view=rev http://svn.apache.org/viewvc?rev=1125135view=rev http://svn.apache.org/viewvc?rev=1102119view=rev Again, you obviously have a concrete idea of what should be done - can you point to a writeup? Thanks, Steve
Re: managing CHANGES.txt?
On Tue, Jun 21, 2011 at 1:47 PM, Steven A Rowe sar...@syr.edu wrote: On 6/21/2011 at 1:26 PM, Robert Muir wrote: On Tue, Jun 21, 2011 at 1:23 PM, Steven A Rowe sar...@syr.edu wrote: Is the CHANGES.txt policy you advocate (and police) written up in one place? I'm sure you'd like to not have to fix up everybody's entries It wasn't anything i advocate, I'm just describing what it seems like we do 99% of the time? (in my example, Uwe committed it, and I didnt fix anything) I'm confused - seems like you're disavowing the role you've been playing as CHANGES policeman - yet I've seen at least 10 CHANGES-policing commits within the last six weeks?: I do disavow this role: when CHANGES.txt is jacked up, i fix it, I don't complain to anyone about it. I dont understand how this makes me a policeman? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: managing CHANGES.txt?
On Jun 21, 2011, at 1:47 PM, Steven A Rowe wrote: Again, you obviously have a concrete idea of what should be done - can you point to a writeup? Thanks, Steve Thank you Robert for keeping Changes pretty. -1 to more formalization, or writeups. I've seen the opinions in the emails on the topic now and before. Writeups turn into more than they should be over time, half the time. They end up stale or over followed. - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
My bad, I meant to say a “6-8 releases a year” .. grrr!! So let me try this again. I don't like the current plan of release early often because: So let me try this again. I don't like the current plan of release early often because: 1) It will spread testing thin of any release because fewer real users will be using a release when you have too many a year. 2) release early often is not a well defined production release. It will lead to undefined gaps between releases (why X.Y took N weeks, but X.Z took M months?). This is why I suggested a quarterly release plan (it's what FF is now doing) 3) Do companies jump on a Lucene release as soon as one is made? No, they have a process. With too many releasees, they will now be more confused which releases to use; they want a release that proved itself. --MJ -Original Message- From: Mark Miller markrmil...@gmail.com To: dev@lucene.apache.org Cc: simon.willna...@gmail.com Sent: Tue, Jun 21, 2011 1:32 pm Subject: Re: Lucene 3.3 release soon? I think we might target fewer than 6-8 a month. That would be scary! I would uess it will be once a month at worse, and often less. Time will tell. You must already give version info with questions if you want decent help - othing is going to change that. - Mark n Jun 21, 2011, at 1:15 PM, johnmu...@aol.com wrote: -1 on release early often. Let us say you average 6-8 releases a month, this means there will be that any versions used by users. Which means the amount of testing done on a elease (by real users, in real environment) will be spread thin thus a release ill not get the same amount of testing it otherwise would. Not only that, more eleases means more release specific questions. Expect to see questions / ssues reported and you must ask what version are you using? before you can nswer. May I suggest a scheduled release, once a quarter, near the end of a quarter? -JM -Original Message- From: Simon Willnauer simon.willna...@googlemail.com To: dev@lucene.apache.org Sent: Tue, Jun 21, 2011 12:53 pm Subject: Re: Lucene 3.3 release soon? On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com wrote: Again, I don't think any future uncommitted features should block a release, nor should there be a shoving period where features are shoved in. +1 - release early often!!! simon I'll be now looking at producing an RC as quickly as possible before this can happen! On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote: Grouping is really worth a release! But if group count in facet is within reach, wait for that! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 21. juni 2011, at 05.53, Bill Bell wrote: +1 wait for grouping post facet counts... Go Martijn v Groningen !! On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote: i was planning on doing an RC in a few weeks actually. we have a lot of good stuff in there today already, however i wanted to give a few weeks for the grouping stuff to run on hudson. On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I would say within the next 3 month. Thoughts? On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, How soon can we expect official Lucene 3.3 release? Best regards, Lukas - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
On Tue, Jun 21, 2011 at 7:15 PM, johnmu...@aol.com wrote: -1 on release early often. John, don't worry we won't do 6 or 8 a month. I think we rather balance it with the features / bugfixes we can deliver. I think 1 every two month is a good rough estimate. simon Let us say you average 6-8 releases a month, this means there will be that many versions used by users. Which means the amount of testing done on a release (by real users, in real environment) will be spread thin thus a release will not get the same amount of testing it otherwise would. Not only that, more releases means more release specific questions. Expect to see questions / issues reported and you must ask what version are you using? before you can answer. May I suggest a scheduled release, once a quarter, near the end of a quarter? -JM -Original Message- From: Simon Willnauer simon.willna...@googlemail.com To: dev@lucene.apache.org Sent: Tue, Jun 21, 2011 12:53 pm Subject: Re: Lucene 3.3 release soon? On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com wrote: Again, I don't think any future uncommitted features should block a release, nor should there be a shoving period where features are shoved in. +1 - release early often!!! simon I'll be now looking at producing an RC as quickly as possible before this can happen! On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote: Grouping is really worth a release! But if group count in facet is within reach, wait for that! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 21. juni 2011, at 05.53, Bill Bell wrote: +1 wait for grouping post facet counts... Go Martijn v Groningen !! On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote: i was planning on doing an RC in a few weeks actually. we have a lot of good stuff in there today already, however i wanted to give a few weeks for the grouping stuff to run on hudson. On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer simon.willna...@googlemail.com wrote: I would say within the next 3 month. Thoughts? On Mon, Jun 20, 2011 at 10:56 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, How soon can we expect official Lucene 3.3 release? Best regards, Lukas - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: managing CHANGES.txt?
Mark, Staleness is way better than digging through mail archives, guessing and getting it wrong, or re-invention. Word of mouth doesn't scale. The Lucene/Solr dev community is growing. Where I see an opportunity to document current practice, where it is less than obvious what to do, I will, modulo free time of course. Feel free to ignore my idiocy. Steve -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 21, 2011 1:54 PM To: dev@lucene.apache.org Subject: Re: managing CHANGES.txt? On Jun 21, 2011, at 1:47 PM, Steven A Rowe wrote: Again, you obviously have a concrete idea of what should be done - can you point to a writeup? Thanks, Steve Thank you Robert for keeping Changes pretty. -1 to more formalization, or writeups. I've seen the opinions in the emails on the topic now and before. Writeups turn into more than they should be over time, half the time. They end up stale or over followed. - Mark Miller lucidimagination.com
RE: managing CHANGES.txt?
On 6/21/2011 at 1:52 PM, Robert Muir wrote: On Tue, Jun 21, 2011 at 1:47 PM, Steven A Rowe sar...@syr.edu wrote: On 6/21/2011 at 1:26 PM, Robert Muir wrote: On Tue, Jun 21, 2011 at 1:23 PM, Steven A Rowe sar...@syr.edu wrote: Is the CHANGES.txt policy you advocate (and police) written up in one place? I'm sure you'd like to not have to fix up everybody's entries It wasn't anything i advocate, I'm just describing what it seems like we do 99% of the time? (in my example, Uwe committed it, and I didnt fix anything) I'm confused - seems like you're disavowing the role you've been playing as CHANGES policeman - yet I've seen at least 10 CHANGES- policing commits within the last six weeks?: I do disavow this role: when CHANGES.txt is jacked up, i fix it, I don't complain to anyone about it. I dont understand how this makes me a policeman? CHANGES janitor??? Echoing Mark M., thanks for scrubbing. I was looking to make it possible for others to share the load, by publicizing the target. Steve
Re: managing CHANGES.txt?
You 'remore prickly than me today Steve :) You are of course free to document anything you see fit. And I'm free to weigh in on my opinion about documenting :) That's how it works indeed, and it's a beautiful system. - Mark On Jun 21, 2011, at 2:08 PM, Steven A Rowe wrote: Mark, Staleness is way better than digging through mail archives, guessing and getting it wrong, or re-invention. Word of mouth doesn't scale. The Lucene/Solr dev community is growing. Where I see an opportunity to document current practice, where it is less than obvious what to do, I will, modulo free time of course. Feel free to ignore my idiocy. Steve -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 21, 2011 1:54 PM To: dev@lucene.apache.org Subject: Re: managing CHANGES.txt? On Jun 21, 2011, at 1:47 PM, Steven A Rowe wrote: Again, you obviously have a concrete idea of what should be done - can you point to a writeup? Thanks, Steve Thank you Robert for keeping Changes pretty. -1 to more formalization, or writeups. I've seen the opinions in the emails on the topic now and before. Writeups turn into more than they should be over time, half the time. They end up stale or over followed. - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
And here are the reasons why I think we should release often: 1) As far as corporations worried about stability, if they are really that worried, they should take a look at our stable branch and how development is done around here, and these concerned corporations should also take a look at how testing is done on this project. But in any case, I could care less what corporations think. 2) The way I see it, we started releasing more often about a month ago, and we also got a bunch of new committers (5, 6, 7? what is it exactly?) in the last month too. We have a shitload of guys committing a shitload of good stuff, and we want even more committers to get more momentum. Releasing is an important part of encouraging contributors so that they see what they do actually getting out there. 3) When I look at http://wiki.apache.org/lucene-java/ReleaseNote33 and http://wiki.apache.org/solr/ReleaseNote33, which only release major features, not bugfixes or anything (see CHANGES.txt for that!), it looks solid to me. These are major search features that users want, some of them (e.g. autocomplete and grouping stuff) have been baking in trunk for quite some time. 4) Finally, we won't make all users or even committers happy with any given release. Thats why releases only need 3 +1 votes. That being said, I'm talking about spinning up an RC soon, right before I go on vacation. Sure we slipped the last one past hossman, but for this one, its entirely possible he comes back with 87 problems in the release. Big deal, worst case the RC fails, and if I'm stuck sitting by the beach fixing everything he finds and making Lucene/Solr better - well, life could be a lot worse. On Tue, Jun 21, 2011 at 2:01 PM, johnmu...@aol.com wrote: My bad, I meant to say a “6-8 releases a year” .. grrr!! So let me try this again. I don't like the current plan of release early often because: 1) It will spread testing thin of any release because fewer real users will be using a release when you have too many a year. 2) release early often is not a well defined production release. It will lead to undefined gaps between releases (why X.Y took N weeks, but X.Z took M months?). This is why I suggested a quarterly release plan (it's what FF is now doing) 3) Do companies jump on a Lucene release as soon as one is made? No, they have a process. With too many releasees, they will now be more confused which releases to use; they want a release that proved itself. --MJ -Original Message- From: Mark Miller markrmil...@gmail.com To: dev@lucene.apache.org Cc: simon.willna...@gmail.com Sent: Tue, Jun 21, 2011 1:32 pm Subject: Re: Lucene 3.3 release soon? I think we might target fewer than 6-8 a month. That would be scary! I would guess it will be once a month at worse, and often less. Time will tell. You must already give version info with questions if you want decent help - nothing is going to change that. - Mark On Jun 21, 2011, at 1:15 PM, johnmu...@aol.com wrote: -1 on release early often. Let us say you average 6-8 releases a month, this means there will be that many versions used by users. Which means the amount of testing done on a release (by real users, in real environment) will be spread thin thus a release will not get the same amount of testing it otherwise would. Not only that, more releases means more release specific questions. Expect to see questions / issues reported and you must ask what version are you using? before you can answer. May I suggest a scheduled release, once a quarter, near the end of a quarter? -JM -Original Message- From: Simon Willnauer simon.willna...@googlemail.com To: dev@lucene.apache.org Sent: Tue, Jun 21, 2011 12:53 pm Subject: Re: Lucene 3.3 release soon? On Tue, Jun 21, 2011 at 6:09 PM, Robert Muir rcm...@gmail.com wrote: Again, I don't think any future uncommitted features should block a release, nor should there be a shoving period where features are shoved in. +1 - release early often!!! simon I'll be now looking at producing an RC as quickly as possible before this can happen! On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no wrote: Grouping is really worth a release! But if group count in facet is within reach, wait for that! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 21. juni 2011, at 05.53, Bill Bell wrote: +1 wait for grouping post facet counts... Go Martijn v Groningen !! On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com wrote: i was planning on doing an RC in a few weeks actually. we have a lot of good stuff in there today already,
Concerning LUCENE-3079: Facetiing module
Hallo I can donate our facette module to the lucene project. The implementation relies on field cache only, no index scheme, no cached filters etc. It is small (about 600 lines of code in 10 classes). I didn't measure performance, but it handles 1Mio documents (30GB) without problems. I suppose it might fit the requirements described in LUCENE-3079. The module supports - single valued facets - multi valued facets - facet filters - evaluation of facet values that would dismiss due to other facet filters. Let me explain the last point: For the user a facet query (color==green) AND (shape==circle OR shape==square) may look like Facet color [ ] (3) red [x] (5) green [ ] (7) blue Facet shape [x] (9) circle [ ] (4) line [x] (2) square The red/blue/line facet values will display even though the corresponding documents are not in the result set. Also there is support for filtered facet values with zero results, so users understand why they do not get results. So how to start? Preparing a patch against trunk (currently it is 3.1)? Stefan Trcek - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8967 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8967/ 1 tests failed. FAILED: org.apache.solr.cloud.ZkControllerTest.testUploadToCloud Error Message: Could not connect to ZooKeeper 127.0.0.1:55410/solr within 1000 ms Stack Trace: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:55410/solr within 1000 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:121) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:69) at org.apache.solr.cloud.ZkController.init(ZkController.java:104) at org.apache.solr.cloud.ZkControllerTest.testUploadToCloud(ZkControllerTest.java:188) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1425) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1343) Build Log (for compile errors): [...truncated 8538 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3171) BlockJoinQuery/Collector
[ https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3171: --- Attachment: LUCENE-3171.patch Patch, adding equals and hashCode and clone to BlockJoinQuery. Also, I now throw UOE from get/setBoost, stating that you should do so against the child query instead. BlockJoinQuery/Collector Key: LUCENE-3171 URL: https://issues.apache.org/jira/browse/LUCENE-3171 Project: Lucene - Java Issue Type: Improvement Components: modules/other Reporter: Michael McCandless Fix For: 3.3, 4.0 Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch I created a single-pass Query + Collector to implement nested docs. The approach is similar to LUCENE-2454, in that the app must index documents in join order, as a block (IW.add/updateDocuments), with the parent doc at the end of the block, except that this impl is one pass. Once you join at indexing time, you can take any query that matches child docs and join it up to the parent docID space, using BlockJoinQuery. You then use BlockJoinCollector, which sorts parent docs by provided Sort, to gather results, grouped by parent; this collector finds any BlockJoinQuerys (using Scorer.visitScorers) and retains the child docs corresponding to each collected parent doc. After searching is done, you retrieve the TopGroups from a provided BlockJoinQuery. Like LUCENE-2454, this is less general than the arbitrary joins in Solr (SOLR-2272) or parent/child from ElasticSearch (https://github.com/elasticsearch/elasticsearch/issues/553), since you must do the join at indexing time as a doc block, but it should be able to handle nested joins as well as joins to multiple tables, though I don't yet have test cases for these. I put this in a new Join module (modules/join); I think as we refactor join impls we should put them here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene 3.3 release soon?
On 06/21/2011 02:01 PM, johnmu...@aol.com wrote: My bad, I meant to say a “6-8 releases a year” .. grrr!! So let me try this again. I don't like the current plan of release early often because: 1) It will spread testing thin of any release because fewer real users will be using a release when you have too many a year. I don't follow. With a release early and often rational, there will be less changes in each release. Less to test. The testing of lucene is phenomenal and improving with each release. 2) release early often is not a well defined production release. It will lead to undefined gaps between releases (why X.Y took N weeks, but X.Z took M months?). This is why I suggested a quarterly release plan (it's what FF is now doing) I think that the pendulum needs to swing and find its natural balance. If there is a cost to frequent releases that is unacceptable, it will all balance out in the end. 3) Do companies jump on a Lucene release as soon as one is made? No, they have a process. With too many releasees, they will now be more confused which releases to use; they want a release that proved itself. I can't comment on how all companies do upgrades, but in my experience the companies I've been with don't upgrade without a business reason. Basically, if the current works then don't upgrade. If the new provides a necessary feature for a specific requirement, then determine the risk/cost/benefit and decide on whether to upgrade. But at the point of upgrade go with the current best. I don't see how there would be confusion until 4.0 is released. In my specific application, upgrades to Lucene happen when my application has a feature release and/or a bug release in it's use of Lucene. It just doesn't make sense to have an app release that does not give specific, visible benefit to end users. --MJ -Original Message- From: Mark Miller markrmil...@gmail.com To: dev@lucene.apache.org Cc: simon.willna...@gmail.com Sent: Tue, Jun 21, 2011 1:32 pm Subject: Re: Lucene 3.3 release soon? I think we might target fewer than 6-8 a month. That would be scary! I would guess it will be once a month at worse, and often less. Time will tell. You must already give version info with questions if you want decent help - nothing is going to change that. - Mark On Jun 21, 2011, at 1:15 PM,johnmu...@aol.com mailto:johnmu...@aol.com wrote: -1 on release early often. Let us say you average 6-8 releases a month, this means there will be that many versions used by users. Which means the amount of testing done on a release (by real users, in real environment) will be spread thin thus a release will not get the same amount of testing it otherwise would. Not only that, more releases means more releasespecific questions. Expect to see questions / issues reported and you must ask what version are you using? before you can answer. May I suggest a scheduled release, once a quarter, near the end of a quarter? -JM -Original Message- From: Simon Willnauersimon.willna...@googlemail.com mailto:simon.willna...@googlemail.com To:dev@lucene.apache.org mailto:dev@lucene.apache.org Sent: Tue, Jun 21, 2011 12:53 pm Subject: Re: Lucene 3.3 release soon? On Tue, Jun 21, 2011 at 6:09 PM, Robert Muirrcm...@gmail.com mailto:rcm...@gmail.com wrote: Again, I don't think any future uncommitted features should block a release, nor should there be a shoving period where features are shoved in. +1 - release early often!!! simon I'll be now looking at producing an RC as quickly as possible before this can happen! On Tue, Jun 21, 2011 at 4:13 AM, Jan Høydahl j...@hoydahl.no mailto:j...@hoydahl.no wrote: Grouping is really worth a release! But if group count in facet is within reach, wait for that! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com http://www.cominvent.com/ Solr Training - www.solrtraining.com http://www.solrtraining.com/ On 21. juni 2011, at 05.53, Bill Bell wrote: +1 wait for grouping post facet counts... Go Martijn v Groningen !! On 6/20/11 12:03 PM, Michael McCandless luc...@mikemccandless.com mailto:luc...@mikemccandless.com wrote: +1 to releasing 3.3 in a few weeks... there's a lot of new stuff after 3.2. Mike McCandless http://blog.mikemccandless.com http://blog.mikemccandless.com/ On Mon, Jun 20, 2011 at 7:36 AM, Robert Muir rcm...@gmail.com mailto:rcm...@gmail.com wrote: i was planning on doing an RC in a few weeks actually. we have a lot of good stuff in there today already, however i wanted to give a few weeks for the grouping stuff to run on hudson. On Mon, Jun 20, 2011 at 4:59 AM, Simon Willnauer simon.willna...@googlemail.com mailto:simon.willna...@googlemail.com wrote: I would say within the next 3 month. Thoughts?
Re: Concerning LUCENE-3079: Facetiing module
On Tue, Jun 21, 2011 at 2:17 PM, Stefan Trcek wzzelfz...@abas.de wrote: Hallo I can donate our facette module to the lucene project. Sounds interesting Stefan! The implementation relies on field cache only, no index scheme, no cached filters etc. It is small (about 600 lines of code in 10 classes). I didn't measure performance, but it handles 1Mio documents (30GB) without problems. I suppose it might fit the requirements described in LUCENE-3079. The module supports - single valued facets - multi valued facets - facet filters - evaluation of facet values that would dismiss due to other facet filters. Let me explain the last point: For the user a facet query (color==green) AND (shape==circle OR shape==square) may look like Facet color [ ] (3) red [x] (5) green [ ] (7) blue Facet shape [x] (9) circle [ ] (4) line [x] (2) square The red/blue/line facet values will display even though the corresponding documents are not in the result set. Solr calls this multi-select faceting Also there is support for filtered facet values with zero results, so users understand why they do not get results. So how to start? Preparing a patch against trunk (currently it is 3.1)? Yes, against trunk, which is 4.0-dev -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: managing CHANGES.txt?
: But there is no way for someone looking at the CHANGES for 4.0 to know : for certain that the bits that make up that bug fix are in the 4.0 release : -- the fact that it's listed in 3.2's CHANGES isn't an assurance, because : 4.0 comes from a completely different line of development. ... : its in the 4.0 CHANGES.txt, under the 3.2 section. (sigh ... i tried to let this go, i swear i did...) You're missing my point entirely. yes it's in the 3.2 section but all that tells the user is that it was fixed on the 3x branch just prior to the 3.2 release -- that doesn't give users *any* info about wether that bug ever affected (or was fixed) on the completely and radically different 4x branch. There were multiple commits -- the bits are not the same. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2458) post.jar fails on non-XML updateHandlers
[ https://issues.apache.org/jira/browse/SOLR-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052766#comment-13052766 ] Hoss Man commented on SOLR-2458: Jan: +1 post.jar fails on non-XML updateHandlers Key: SOLR-2458 URL: https://issues.apache.org/jira/browse/SOLR-2458 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 3.1 Reporter: Jan Høydahl Assignee: Jan Høydahl Labels: post.jar Fix For: 3.3 Attachments: SOLR-2458.patch, SOLR-2458.patch SimplePostTool.java by default tries to issue a commit after posting. Problem is that it does this by appending commit/ to the stream. This does not work when using non-XML requesthandler, such as CSV. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3171) BlockJoinQuery/Collector
[ https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052770#comment-13052770 ] Paul Elschot commented on LUCENE-3171: -- The possible inefficiency is the same as the one for a any sparsely filled OpenBitSet. Another implementation (should be another issue, but since you asked...) could be a set of increasing integers, based on a balanced tree structure with a moderate fanout (e.g. 32), and all integer values relative to the minimum determined by the data for the pointer from the parent. The whole thing could be stored in one int[], the pointers would be (forward) indexes into this one array, and each internal node would consist of two rows of integers (one data, one pointers), and each row would be compressed as a frame of reference into the array. This thing can implement {code}int next(int x){code} and {code}int previous(int x){code} easily, and an iterator over this can implement {code}advance(target){code} for a DocIdSetIterator, and because of the symmetry it can also do that in the reverse direction as needed here. Compression at higher levels might not be necessary. For now, there is code for this, except for the frame of reference. Occasionaly the need for a more space efficient filter shows up on the mailing lists, so if anyone want to give this a try... BlockJoinQuery/Collector Key: LUCENE-3171 URL: https://issues.apache.org/jira/browse/LUCENE-3171 Project: Lucene - Java Issue Type: Improvement Components: modules/other Reporter: Michael McCandless Fix For: 3.3, 4.0 Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch I created a single-pass Query + Collector to implement nested docs. The approach is similar to LUCENE-2454, in that the app must index documents in join order, as a block (IW.add/updateDocuments), with the parent doc at the end of the block, except that this impl is one pass. Once you join at indexing time, you can take any query that matches child docs and join it up to the parent docID space, using BlockJoinQuery. You then use BlockJoinCollector, which sorts parent docs by provided Sort, to gather results, grouped by parent; this collector finds any BlockJoinQuerys (using Scorer.visitScorers) and retains the child docs corresponding to each collected parent doc. After searching is done, you retrieve the TopGroups from a provided BlockJoinQuery. Like LUCENE-2454, this is less general than the arbitrary joins in Solr (SOLR-2272) or parent/child from ElasticSearch (https://github.com/elasticsearch/elasticsearch/issues/553), since you must do the join at indexing time as a doc block, but it should be able to handle nested joins as well as joins to multiple tables, though I don't yet have test cases for these. I put this in a new Join module (modules/join); I think as we refactor join impls we should put them here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052771#comment-13052771 ] Hoss Man commented on LUCENE-3130: -- bq. A QP can already solve this issue today, simply by boosting down terms with positionIncrement = 0. That assumes: a) that every TokenFilter which might inject terms like this will always put the most important one first b) that the amount of boost should be fixed what i'm suggesting is that we make this more flexible so that people wiring together their apps and analyzers have an easy way to guide the queryParsers behavior. if we have allow a well defined attribute for this people can have custom analysis that specify arbitrary boosts in cases we may not be able to specificly anticipate. (synonyms, entity recognition, common word demoting, etc..) bq. But I really think the implementation details of QP should remain in QP, the analysis chain should instead be general and describe up the text. why don't you consider an attribute that denotes this term is worth less then a typical term a general description of the text? bq. Otherwise, things get really confusing, e.g. what should a ShingleFilter do when it combines two tokens that have different BoostAttributes? It does whatever it already does when it encounters two tokens that may have attributes it doesn't know about (ignore them when creating the new token, if i remember correctly). Unrecognized attributes isn't a new problem. bq. If you do what you describe, what if you then want to tweak the ranking for synonyms? You must reindex. how is that any different from any other aspect of index time synonyms? if you use them you *always* have to reindex when you change your synonyms. I'm not arguing that index time synonyms is a good idea in general, i'm not arguing that this we look for BoostAttributes on tokens feature of the QP would be useful (or even a good idea for everyone). I'm arguing that having such a feature would provide an easy way for people who are alreayd customizing their analysis to easily modify/influence the behavior of the query parser (w/o subclassing) that could still easily work in conjunction with other techniques. Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts --- Key: LUCENE-3130 URL: https://issues.apache.org/jira/browse/LUCENE-3130 Project: Lucene - Java Issue Type: Improvement Reporter: Hoss Man A recent thread asked if there was anyway to use QueryTime synonyms such that matches on the original term specified by the user would score higher then matches on the synonym. It occurred to me later that a float Attribute could be set by the SynonymFilter in such situations, and QueryParser could use that float as a boost in the resulting Query. IThis would be fairly straightforward for the simple synonyms = BooleamQuery case, but we'd have to decide how to handle the case of synonyms with multiple terms that produce MTPQ, possibly just punt for now) Likewise, there may be other TokenFilters that inject artificial tokens at query time where it also might make sense to have a reduced boost factor... * SynonymFilter * CommonGramsFilter * WordDelimiterFilter * etc... In all of these cases, the amount of the boost could me configured, and for back compact could default to 1.0 (or null to not set a boost at all) Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied the boost attribute into the payload attribute, these same filters could give penalizing payloads to terms when used at index time) could give penalizing payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3171) BlockJoinQuery/Collector
[ https://issues.apache.org/jira/browse/LUCENE-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052770#comment-13052770 ] Paul Elschot edited comment on LUCENE-3171 at 6/21/11 7:20 PM: --- The possible inefficiency is the same as the one for a any sparsely filled OpenBitSet. Another implementation (should be another issue, but since you asked...) could be a set of increasing integers, based on a balanced tree structure with a moderate fanout (e.g. 32), and all integer values relative to the minimum determined by the data for the pointer from the parent. The whole thing could be stored in one int[], the pointers would be (forward) indexes into this one array, and each internal node would consist of two rows of integers (one data, one pointers), and each row would be compressed as a frame of reference into the array. This thing can implement {code}int next(int x){code} and {code}int previous(int x){code} easily, and an iterator over this can implement {code}advance(target){code} for a DocIdSetIterator, and because of the symmetry it can also do that in the reverse direction as needed here. Compression at higher levels might not be necessary. For now, there is no code for this, except for the frame of reference. Occasionaly the need for a more space efficient filter shows up on the mailing lists, so if anyone wants to give this a try... was (Author: paul.elsc...@xs4all.nl): The possible inefficiency is the same as the one for a any sparsely filled OpenBitSet. Another implementation (should be another issue, but since you asked...) could be a set of increasing integers, based on a balanced tree structure with a moderate fanout (e.g. 32), and all integer values relative to the minimum determined by the data for the pointer from the parent. The whole thing could be stored in one int[], the pointers would be (forward) indexes into this one array, and each internal node would consist of two rows of integers (one data, one pointers), and each row would be compressed as a frame of reference into the array. This thing can implement {code}int next(int x){code} and {code}int previous(int x){code} easily, and an iterator over this can implement {code}advance(target){code} for a DocIdSetIterator, and because of the symmetry it can also do that in the reverse direction as needed here. Compression at higher levels might not be necessary. For now, there is code for this, except for the frame of reference. Occasionaly the need for a more space efficient filter shows up on the mailing lists, so if anyone want to give this a try... BlockJoinQuery/Collector Key: LUCENE-3171 URL: https://issues.apache.org/jira/browse/LUCENE-3171 Project: Lucene - Java Issue Type: Improvement Components: modules/other Reporter: Michael McCandless Fix For: 3.3, 4.0 Attachments: LUCENE-3171.patch, LUCENE-3171.patch, LUCENE-3171.patch I created a single-pass Query + Collector to implement nested docs. The approach is similar to LUCENE-2454, in that the app must index documents in join order, as a block (IW.add/updateDocuments), with the parent doc at the end of the block, except that this impl is one pass. Once you join at indexing time, you can take any query that matches child docs and join it up to the parent docID space, using BlockJoinQuery. You then use BlockJoinCollector, which sorts parent docs by provided Sort, to gather results, grouped by parent; this collector finds any BlockJoinQuerys (using Scorer.visitScorers) and retains the child docs corresponding to each collected parent doc. After searching is done, you retrieve the TopGroups from a provided BlockJoinQuery. Like LUCENE-2454, this is less general than the arbitrary joins in Solr (SOLR-2272) or parent/child from ElasticSearch (https://github.com/elasticsearch/elasticsearch/issues/553), since you must do the join at indexing time as a doc block, but it should be able to handle nested joins as well as joins to multiple tables, though I don't yet have test cases for these. I put this in a new Join module (modules/join); I think as we refactor join impls we should put them here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2612) Add testpackage and testpackageroot conditions to clustering and analysis-extras build files
Add testpackage and testpackageroot conditions to clustering and analysis-extras build files Key: SOLR-2612 URL: https://issues.apache.org/jira/browse/SOLR-2612 Project: Solr Issue Type: Task Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 3.3, 4.0 Clustering and analysis-extras are the only two build files which do not have testpackage and testpackageroot exclusions wired into the build file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2612) Add testpackage and testpackageroot conditions to clustering and analysis-extras build files
[ https://issues.apache.org/jira/browse/SOLR-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-2612: Attachment: SOLR-2612.patch Patch to add testpackage and testpackageroot to clustering and analysis-extras build files. Add testpackage and testpackageroot conditions to clustering and analysis-extras build files Key: SOLR-2612 URL: https://issues.apache.org/jira/browse/SOLR-2612 Project: Solr Issue Type: Task Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 3.3, 4.0 Attachments: SOLR-2612.patch Clustering and analysis-extras are the only two build files which do not have testpackage and testpackageroot exclusions wired into the build file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2612) Add testpackage and testpackageroot conditions to clustering and analysis-extras build files
[ https://issues.apache.org/jira/browse/SOLR-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-2612: Component/s: Build Add testpackage and testpackageroot conditions to clustering and analysis-extras build files Key: SOLR-2612 URL: https://issues.apache.org/jira/browse/SOLR-2612 Project: Solr Issue Type: Task Components: Build Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Priority: Trivial Fix For: 3.3, 4.0 Attachments: SOLR-2612.patch Clustering and analysis-extras are the only two build files which do not have testpackage and testpackageroot exclusions wired into the build file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org