[jira] [Comment Edited] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937331#comment-13937331 ] Chris Male edited comment on LUCENE-5376 at 3/16/14 8:59 PM: - What's motivated the new branch? was (Author: cmale): Why's motivated the new branch? Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937331#comment-13937331 ] Chris Male commented on LUCENE-5376: Why's motivated the new branch? Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937343#comment-13937343 ] Chris Male commented on LUCENE-5376: Sweet! Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5483) hunspell inaccuracies
[ https://issues.apache.org/jira/browse/LUCENE-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917260#comment-13917260 ] Chris Male commented on LUCENE-5483: +1 hunspell inaccuracies - Key: LUCENE-5483 URL: https://issues.apache.org/jira/browse/LUCENE-5483 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5483.patch, LUCENE-5483.patch I added a lot of tests and greatly refined this algorithm to match correct hunspell behavior. there were many bugs: * recursionLimit was a hack: this is actually specified by the dictionary to be twofold suffix + one prefix, or if COMPLEXPREFIXES is specified, twofold prefix + one suffix. This patch removes cursion limit * recursion didn't work correctly: it didnt validate multi-level continuation classes correctly. * add COMPLEXPREFIXES support. * probably other minor bugs fixed in the process. I validated all testing against hunspell -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary
[ https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914954#comment-13914954 ] Chris Male commented on LUCENE-5468: Those are some pretty amazing reductions, well done! Hunspell very high memory use when loading dictionary - Key: LUCENE-5468 URL: https://issues.apache.org/jira/browse/LUCENE-5468 Project: Lucene - Core Issue Type: Bug Affects Versions: 3.5 Reporter: Maciej Lisiewski Priority: Minor Attachments: patch.txt Hunspell stemmer requires gigantic (for the task) amounts of memory to load dictionary/rules files. For example loading a 4.5 MB polish dictionary (with empty index!) will cause whole core to crash with various out of memory errors unless you set max heap size close to 2GB or more. By comparison Stempel using the same dictionary file works just fine with 1/8 of that (and possibly lower values as well). Sample error log entries: http://pastebin.com/fSrdd5W1 http://pastebin.com/Lmi0re7Z -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary
[ https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915045#comment-13915045 ] Chris Male commented on LUCENE-5468: Is the longestOnly option a standard Hunspell thing? (more a question of general interest) Hunspell very high memory use when loading dictionary - Key: LUCENE-5468 URL: https://issues.apache.org/jira/browse/LUCENE-5468 Project: Lucene - Core Issue Type: Bug Affects Versions: 3.5 Reporter: Maciej Lisiewski Priority: Minor Attachments: LUCENE-5468.patch, patch.txt Hunspell stemmer requires gigantic (for the task) amounts of memory to load dictionary/rules files. For example loading a 4.5 MB polish dictionary (with empty index!) will cause whole core to crash with various out of memory errors unless you set max heap size close to 2GB or more. By comparison Stempel using the same dictionary file works just fine with 1/8 of that (and possibly lower values as well). Sample error log entries: http://pastebin.com/fSrdd5W1 http://pastebin.com/Lmi0re7Z -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915047#comment-13915047 ] Chris Male commented on LUCENE-5376: Hey Mike, What's the endzone here? Any thoughts on it coming back into trunk? Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary
[ https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915053#comment-13915053 ] Chris Male commented on LUCENE-5468: Awesome, sounds like a great addition then. Hunspell very high memory use when loading dictionary - Key: LUCENE-5468 URL: https://issues.apache.org/jira/browse/LUCENE-5468 Project: Lucene - Core Issue Type: Bug Affects Versions: 3.5 Reporter: Maciej Lisiewski Priority: Minor Attachments: LUCENE-5468.patch, patch.txt Hunspell stemmer requires gigantic (for the task) amounts of memory to load dictionary/rules files. For example loading a 4.5 MB polish dictionary (with empty index!) will cause whole core to crash with various out of memory errors unless you set max heap size close to 2GB or more. By comparison Stempel using the same dictionary file works just fine with 1/8 of that (and possibly lower values as well). Sample error log entries: http://pastebin.com/fSrdd5W1 http://pastebin.com/Lmi0re7Z -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary
[ https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915231#comment-13915231 ] Chris Male commented on LUCENE-5468: I dont think we should make the recusionCap anymore complex. I put it in there simply to prevent languages from getting into infinite loops. Hunspell very high memory use when loading dictionary - Key: LUCENE-5468 URL: https://issues.apache.org/jira/browse/LUCENE-5468 Project: Lucene - Core Issue Type: Bug Affects Versions: 3.5 Reporter: Maciej Lisiewski Priority: Minor Fix For: 4.8, 5.0 Attachments: LUCENE-5468.patch, patch.txt Hunspell stemmer requires gigantic (for the task) amounts of memory to load dictionary/rules files. For example loading a 4.5 MB polish dictionary (with empty index!) will cause whole core to crash with various out of memory errors unless you set max heap size close to 2GB or more. By comparison Stempel using the same dictionary file works just fine with 1/8 of that (and possibly lower values as well). Sample error log entries: http://pastebin.com/fSrdd5W1 http://pastebin.com/Lmi0re7Z -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary
[ https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915240#comment-13915240 ] Chris Male commented on LUCENE-5468: Yeah I guess. We can go over that in a new issue. Hunspell very high memory use when loading dictionary - Key: LUCENE-5468 URL: https://issues.apache.org/jira/browse/LUCENE-5468 Project: Lucene - Core Issue Type: Bug Affects Versions: 3.5 Reporter: Maciej Lisiewski Priority: Minor Fix For: 4.8, 5.0 Attachments: LUCENE-5468.patch, patch.txt Hunspell stemmer requires gigantic (for the task) amounts of memory to load dictionary/rules files. For example loading a 4.5 MB polish dictionary (with empty index!) will cause whole core to crash with various out of memory errors unless you set max heap size close to 2GB or more. By comparison Stempel using the same dictionary file works just fine with 1/8 of that (and possibly lower values as well). Sample error log entries: http://pastebin.com/fSrdd5W1 http://pastebin.com/Lmi0re7Z -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary
[ https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909943#comment-13909943 ] Chris Male commented on LUCENE-5468: Multiple dictionaries was never in the original design either. Having an efficient and usable design seems to be of higher priority so +1 to not forking and doing this in place. Hunspell very high memory use when loading dictionary - Key: LUCENE-5468 URL: https://issues.apache.org/jira/browse/LUCENE-5468 Project: Lucene - Core Issue Type: Bug Affects Versions: 3.5 Reporter: Maciej Lisiewski Priority: Minor Attachments: patch.txt Hunspell stemmer requires gigantic (for the task) amounts of memory to load dictionary/rules files. For example loading a 4.5 MB polish dictionary (with empty index!) will cause whole core to crash with various out of memory errors unless you set max heap size close to 2GB or more. By comparison Stempel using the same dictionary file works just fine with 1/8 of that (and possibly lower values as well). Sample error log entries: http://pastebin.com/fSrdd5W1 http://pastebin.com/Lmi0re7Z -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5468) Hunspell very high memory use when loading dictionary
[ https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13909952#comment-13909952 ] Chris Male commented on LUCENE-5468: Sounds good Hunspell very high memory use when loading dictionary - Key: LUCENE-5468 URL: https://issues.apache.org/jira/browse/LUCENE-5468 Project: Lucene - Core Issue Type: Bug Affects Versions: 3.5 Reporter: Maciej Lisiewski Priority: Minor Attachments: patch.txt Hunspell stemmer requires gigantic (for the task) amounts of memory to load dictionary/rules files. For example loading a 4.5 MB polish dictionary (with empty index!) will cause whole core to crash with various out of memory errors unless you set max heap size close to 2GB or more. By comparison Stempel using the same dictionary file works just fine with 1/8 of that (and possibly lower values as well). Sample error log entries: http://pastebin.com/fSrdd5W1 http://pastebin.com/Lmi0re7Z -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867291#comment-13867291 ] Chris Male commented on SOLR-5621: -- +1 for trunk Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5057) Hunspell stemmer generates multiple tokens
[ https://issues.apache.org/jira/browse/LUCENE-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759514#comment-13759514 ] Chris Male commented on LUCENE-5057: The example you describe is sort of at the heart of the Hunspell algorithm and outputting those three different tokens is one of its major advantages. When we're doing analysis we don't know which of those different meanings the user intended, so we're providing them as all as options. I don't see that as something negative about Hunspell, in fact quite the opposite. Hunspell stemmer generates multiple tokens -- Key: LUCENE-5057 URL: https://issues.apache.org/jira/browse/LUCENE-5057 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.3 Reporter: Luca Cavanna Assignee: Adrien Grand The hunspell stemmer seems to be generating multiple tokens: the original token plus the available stems. It might be a good thing in some cases but it seems to be a different behaviour compared to the other stemmers and causes problems as well. I would rather have an option to decide whether it should output only the available stems, or the stems plus the original token. I'm not sure though if it's possible to have only a single stem indexed, which would be even better in my opinion. When I look at how snowball works only one token is indexed, the stem, and that works great. Probably there's something I'm missing in how hunspell works. Here is my issue: I have a query composed of multiple terms, which is analyzed using stemming and a boolean query is generated out of it. All fine when adding all clauses as should (OR operator), but if I add all clauses as must (AND operator), then I can get back only the documents that contain the stem originated by the exactly same original word. Example for the dutch language I'm working with: fiets (means bicycle in dutch), its plural is fietsen. If I index fietsen I get both fietsen and fiets indexed, but if I index fiets I get the only fiets indexed. When I query for fietsen whatever I get the following boolean query: field:fiets field:fietsen field:whatever. If I apply the AND operator and use must clauses for each subquery, then I can only find the documents that originally contained fietsen, not the ones that originally contained fiets, which is not really what stemming is about. Any thoughts on this? I also wonder if it can be a dictionary issue since I see that different words that have the word fiets as root don't get the same stems, and using the AND operator at query time is a big issue. I would love to contribute on this and looking forward to your feedback. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5057) Hunspell stemmer generates multiple tokens
[ https://issues.apache.org/jira/browse/LUCENE-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759863#comment-13759863 ] Chris Male commented on LUCENE-5057: I don't think the problem is related to Hunspell. Any analysis could produce multiple tokens (synonyms for example) and whatever query parser is used needs to reflect that correctly in how it creates BooleanQuerys. Consequently I don't think there is an issue that needs be re/opened? Hunspell stemmer generates multiple tokens -- Key: LUCENE-5057 URL: https://issues.apache.org/jira/browse/LUCENE-5057 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.3 Reporter: Luca Cavanna Assignee: Adrien Grand The hunspell stemmer seems to be generating multiple tokens: the original token plus the available stems. It might be a good thing in some cases but it seems to be a different behaviour compared to the other stemmers and causes problems as well. I would rather have an option to decide whether it should output only the available stems, or the stems plus the original token. I'm not sure though if it's possible to have only a single stem indexed, which would be even better in my opinion. When I look at how snowball works only one token is indexed, the stem, and that works great. Probably there's something I'm missing in how hunspell works. Here is my issue: I have a query composed of multiple terms, which is analyzed using stemming and a boolean query is generated out of it. All fine when adding all clauses as should (OR operator), but if I add all clauses as must (AND operator), then I can get back only the documents that contain the stem originated by the exactly same original word. Example for the dutch language I'm working with: fiets (means bicycle in dutch), its plural is fietsen. If I index fietsen I get both fietsen and fiets indexed, but if I index fiets I get the only fiets indexed. When I query for fietsen whatever I get the following boolean query: field:fiets field:fietsen field:whatever. If I apply the AND operator and use must clauses for each subquery, then I can only find the documents that originally contained fietsen, not the ones that originally contained fiets, which is not really what stemming is about. Any thoughts on this? I also wonder if it can be a dictionary issue since I see that different words that have the word fiets as root don't get the same stems, and using the AND operator at query time is a big issue. I would love to contribute on this and looking forward to your feedback. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4616) Clarify what the score means in SpatialStrategy#makeQuery()
[ https://issues.apache.org/jira/browse/LUCENE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529343#comment-13529343 ] Chris Male commented on LUCENE-4616: I agree with Ryan, we shouldn't try to over-define this. Returning Query gives the Strategies freedom to have a meaningful score if they support it. But we should just add a simple comment stating that the score from the Query may or may not be meaningful and the Strategy used should be checked for further details. Clarify what the score means in SpatialStrategy#makeQuery() --- Key: LUCENE-4616 URL: https://issues.apache.org/jira/browse/LUCENE-4616 Project: Lucene - Core Issue Type: Improvement Reporter: Ryan McKinley Priority: Trivial SpatialStrategy#makeQuery() returns a Query, but the docs don't make it clear with the score value should be. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4616) Clarify what the score means in SpatialStrategy#makeQuery()
[ https://issues.apache.org/jira/browse/LUCENE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529349#comment-13529349 ] Chris Male commented on LUCENE-4616: Another option, more big picture I guess, is to take this opportunity and remove the Strategy abstraction. We've touched upon this in other issues, but the fact is that each Strategy (including those not contributed) behaves differently and the notion of score is a big example of this. There is some consistently in the Prefix Strategies so having an abstraction there probably helps but otherwise I think we should just dump Strategy and let some Strategies return a Query with meaningful score and some return a CSQ showing that their score is meaningless. Clarify what the score means in SpatialStrategy#makeQuery() --- Key: LUCENE-4616 URL: https://issues.apache.org/jira/browse/LUCENE-4616 Project: Lucene - Core Issue Type: Improvement Reporter: Ryan McKinley Priority: Trivial SpatialStrategy#makeQuery() returns a Query, but the docs don't make it clear with the score value should be. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4569) Allow customization of column stride field and norms via indexing chain
[ https://issues.apache.org/jira/browse/LUCENE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505294#comment-13505294 ] Chris Male commented on LUCENE-4569: John, I don't really know much about the API you're wanting to change, but to help me understand are you able to explain more what you're trying to do in your custom indexing format / code? I think one of the major motivation for Codecs is to allow this sort of customization through their API (there is already Codecs for holding this in memory). Allow customization of column stride field and norms via indexing chain --- Key: LUCENE-4569 URL: https://issues.apache.org/jira/browse/LUCENE-4569 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: John Wang Attachments: patch.diff We are building an in-memory indexing format and managing our own segments. We are doing this by implementing a custom IndexingChain. We would like to support column-stride-fields and norms without having to wire in a codec (since we are managing our postings differently) Suggested change is consistent with the api support for passing in a custom InvertedDocConsumer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4271) Solr LocalParams for Lucene Query Parser
[ https://issues.apache.org/jira/browse/LUCENE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13499565#comment-13499565 ] Chris Male commented on LUCENE-4271: {quote} I think it's odd to add syntax to Lucene's query parser that does ... nothing? And it's strange to make Lucene's QP aware of Solr QP's syntax if it cannot do anything with it. It seems like Solr's QP should have this logic instead ... {quote} +1 {quote} Indeed - but it requires changes to the parser grammar, so subclassing doesn't cut it. I suppose the next best thing would be to make a QP specific to Solr. {quote} I don't think we should consider that a bad thing. Solr has different needs and the classic QP is sort of the lowest common denominator of parsers. bq. I don't mean to suggest that the Lucene Query Parser should know directly about the Solr-level structures such as the Solr schema, Solr params, and Solr Q Parser plugins, but I am suggesting that Lucene could declare and support abstractions for those sorts of interfaces I don't think we can practical extend the classic QP in every way just to meet Solr's needs. bq. There are lots of useful features which are available in the Solr query parsers which are unavailable directly to Lucene apps without a lot of effort, and for no good reason. .. then the Lucene apps should use the Solr QPs or a version there of. The Classic QP was moved out of Lucene core for many reasons, but one was to combat this perspective that its 'the' QP when it is in fact just one particular implementation (an implementation which has lots of limitations). Users should be encouraged to use whatever QP meets their needs and we shouldn't make the classic QP a kitchen sink. bq. The current estrangement between the Lucene and Solr query parsers is quite a black eye for Lucene/Solr that can easily be remedied, at least from a technical perspective. I think we should go further and fully divorce them. Solr has its needs and the handling of LocalParams clearly seems to be confusing users but it isn't something the classic QP should have to resolve. Equally, Solr development shouldn't be saddled with having to compromise its query features just so they fit into the classic QP. As I say, the classic QP is the lowest common denominator of query syntax and parsing and I would recommend to any user (Solr or not) that when they need to make a large syntactical change, that they roll their own parser. Solr LocalParams for Lucene Query Parser Key: LUCENE-4271 URL: https://issues.apache.org/jira/browse/LUCENE-4271 Project: Lucene - Core Issue Type: New Feature Reporter: Yonik Seeley Attachments: LUCENE-4271.patch The Lucene QueryParser should implement Solr's LocalParams syntax directly so that instead of {code} _query_:{!geodist d=10 p=20.5,30.2} {code} one could directly use {code} {!geodist d=10 p=20.5,30.2} {code} references: http://wiki.apache.org/solr/LocalParams -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492892#comment-13492892 ] Chris Male commented on LUCENE-4542: Rafał, Thanks for creating the patches, they are looking great. Couple of very small improvements: - Can we mark recursionCap as final? - Can we improve the javadoc for the recursionCap parameter so it's clear what purpose it serves? - Maybe also drop in a comment at the field about how the recursion cap of 2 is the default value based on documentation about Hunspell (as opposed to something we arbitrarily chose). Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Chris Male Attachments: LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491845#comment-13491845 ] Chris Male commented on LUCENE-4542: +1 I absolutely agree we need to make this change. There is another issue (I can't remember what just yet and I'm using a bad connection) where the recursion cap was causing analysis loops. Do you want to create a patch? We need to maintain backwards compatibility so the default experience should be using RECURSION_CAP as it is today. However users should be able to pass in a value as well (that includes the HunspellStemFilterFactory). Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male reassigned LUCENE-4542: -- Assignee: Chris Male Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Chris Male Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4511) TermsFilter might return wrong results if a field is not indexed or not present in the index
[ https://issues.apache.org/jira/browse/LUCENE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487417#comment-13487417 ] Chris Male commented on LUCENE-4511: +1 to these improvements. Another typo: to optimize for this case and to be fitler-cache friendly we - filter-cache TermsFilter might return wrong results if a field is not indexed or not present in the index Key: LUCENE-4511 URL: https://issues.apache.org/jira/browse/LUCENE-4511 Project: Lucene - Core Issue Type: Bug Components: modules/other Affects Versions: 4.0, 4.1, 5.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.1, 5.0 Attachments: LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch, LUCENE-4511.patch TermsFilter returns if a term returns null from AIR#terms(term) while it should just continue. I will upload a test fix shortly -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Alan Woodward as Lucene/Solr committer
Welcome Alan! On Wed, Oct 17, 2012 at 6:36 PM, Robert Muir rcm...@gmail.com wrote: I'm pleased to announce that the Lucene PMC has voted Alan as a Lucene/Solr committer. Alan has been contributing patches on various tricky stuff: positions iterators, span queries, highlighters, codecs, and so on. Alan: its tradition that you introduce yourself with your background. I think your account is fully working and you should be able to add yourself to the who we are page on the website as well. Congratulations! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Open Source Search Developer | elasticsearch | www.ehttp://www.dutchworks.nl lasticsearch.com
Re: [ANNOUNCE] Apache Lucene 4.0 released.
releases (see http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times- faster.html). * A new spell checker, DirectSpellChecker, finds possible corrections directly against the main search index without requiring a separate index. * Various in-memory data structures such as the term dictionary and FieldCache are represented more efficiently with less object overhead (see http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for- searching.html). * All search logic is now required to work per segment, IndexReader was therefore refactored to differentiate between atomic and composite readers (see http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html). * Lucene 4.0 provides a modular API, consolidating components such as Analyzers and Queries that were previously scattered across Lucene core, contrib, and Solr. These modules also include additional functionality such as UIMA analyzer integration and a completely reworked spatial search implementation. Noteworthy changes since 4.0-BETA: * A new Block PostingsFormat offering improved search performance and index compression. This will likely become the default format in a future release. (see http://blog.mikemccandless.com/2012/08/lucenes-new- blockpostingsformat-thanks.html). * All non-default codec implementations were moved to a separated codecs module. Just add lucene-codecs-4.0.0.jar to your classpath to test these out. * Payloads can be optionally stored on the term vectors. * Many bugfixes and optimizations. Please read CHANGES.txt and MIGRATE.txt for a full list of new features and notes on upgrading. Particularly, the new apis are not compatible with previous versions of Lucene, however, file format backwards compatibility is provided for indexes from the 3.0 series and the 4.0-alpha and -beta releases. Please report any feedback to the mailing lists (http://lucene.apache.org/core/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Apache Lucene/Solr Developers - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Open Source Search Developer | elasticsearch | www.ehttp://www.dutchworks.nl lasticsearch.com
Re: VOTE: release 4.0 (take two)
+1 On Fri, Sep 28, 2012 at 7:15 AM, Robert Muir rcm...@gmail.com wrote: artifacts are here: http://s.apache.org/lusolr40rc1 By the way, thanks for all the help improving smoketesting and packaging and so on. This will pay off in the future! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Open Source Search Developer | elasticsearch | www.ehttp://www.dutchworks.nl lasticsearch.com
[jira] [Commented] (LUCENE-4427) remove webapp from lucene/demo
[ https://issues.apache.org/jira/browse/LUCENE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463384#comment-13463384 ] Chris Male commented on LUCENE-4427: I've no great attachment to this code but it's trying to demonstrate the XML QueryParser, that's its point. If we think that has no value then sure, lets remove it. But if we just want to make it lighter weight and a little easier to maintain, then we could convert it to a simple console app and fix the problems. remove webapp from lucene/demo -- Key: LUCENE-4427 URL: https://issues.apache.org/jira/browse/LUCENE-4427 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Spinoff of SOLR-3879: I think the webapp in lucene/demo is a poor demo ... we should remove it. EG it does not close its IndexReader, it uses the [very expert] XML QueryParser, it passes Version.LUCENE_CURRENT when creating the StandardAnalyzer ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4419) Test RecursivePrefixTree indexing non-point data
[ https://issues.apache.org/jira/browse/LUCENE-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461548#comment-13461548 ] Chris Male commented on LUCENE-4419: I really don't see the benefit of randomly generating Shapes. There isn't much to be revealed with a rectangle that say covers one small part of the pacific ocean and another rectangle which covers another small part. The number of possible Shapes is just too massive to ever reveal anything. What I feel would be better is if we defined Shapes that test particularly troublesome areas. Datelines, equators, poles. We can also include massive Shapes and tiny Shapes, circles, points, and whatever else we end up supporting. Having this standardized Shape suite would be a big benefit to testing all the Strategys. I don't think it would be particularly difficult to create and once created, it wouldn't require much maintenance at all. Test RecursivePrefixTree indexing non-point data Key: LUCENE-4419 URL: https://issues.apache.org/jira/browse/LUCENE-4419 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: David Smiley RecursivePrefixTreeFilter was modified in ~July 2011 to support spatial filtering of non-point indexed shapes. It seems to work when playing with the capability but it isn't tested. It really needs to be as this is a major feature. I imagine an approach in which some randomly generated rectangles are indexed and then a randomly generated rectangle is queried. The right answer can be calculated brute-force and then compared with the filter. In order to deal with shape imprecision, the randomly generated shapes could be generated to fit a course grid (e.g. round everything to a 1 degree interval). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4419) Test RecursivePrefixTree indexing non-point data
[ https://issues.apache.org/jira/browse/LUCENE-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461564#comment-13461564 ] Chris Male commented on LUCENE-4419: bq. I'm all for what you suggest – a test that could be used by multiple strategies I didn't suggest that. I suggested a common suite of Shapes. I don't like the idea of having a single test for all Strategys since they work in different ways and support different things. bq. I like randomized tests because it can catch errors that a static test simply didn't test for Theres a difference between randomized tests and randomized Shape generation (again I didn't suggest we stopped randomized testing). The world is massive, much of it isn't remotely interesting or challenging to our spatial implementations. Just generating arbitrary Shapes somewhere on the globe seems a total waste of time. If we have a standard set of Shapes then we can use randomized testing to handle the permutations between them, but we shouldn't waste days waiting for tests to hit an interesting Shape. Test RecursivePrefixTree indexing non-point data Key: LUCENE-4419 URL: https://issues.apache.org/jira/browse/LUCENE-4419 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: David Smiley RecursivePrefixTreeFilter was modified in ~July 2011 to support spatial filtering of non-point indexed shapes. It seems to work when playing with the capability but it isn't tested. It really needs to be as this is a major feature. I imagine an approach in which some randomly generated rectangles are indexed and then a randomly generated rectangle is queried. The right answer can be calculated brute-force and then compared with the filter. In order to deal with shape imprecision, the randomly generated shapes could be generated to fit a course grid (e.g. round everything to a 1 degree interval). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4412) Reconsider FunctionValues / ValueSource API
[ https://issues.apache.org/jira/browse/LUCENE-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461320#comment-13461320 ] Chris Male edited comment on LUCENE-4412 at 9/23/12 4:03 PM: - One of the big challenges for this API is the issue of multiple-values. Applying a function to two lots of multiple-values is difficult as you begin to run into order problems and issue of what to do when the cardinalities are different. was (Author: cmale): One of the big challenges for this API is the issue of multiple-values. Applying a function to two lots of multiple-values is different as you begin to run into order problems and issue of what to do when the cardinalities are different. Reconsider FunctionValues / ValueSource API --- Key: LUCENE-4412 URL: https://issues.apache.org/jira/browse/LUCENE-4412 Project: Lucene - Core Issue Type: Improvement Components: modules/other Reporter: Chris Male Fix For: 5.0 When documenting a lot of these classes today I found myself confused and it isn't the first time with this API. I think we need to step back and reassess what we want from this API, what use cases its designed to meet, and redesign it from the ground up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4412) Reconsider FunctionValues / ValueSource API
[ https://issues.apache.org/jira/browse/LUCENE-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461320#comment-13461320 ] Chris Male commented on LUCENE-4412: One of the big challenges for this API is the issue of multiple-values. Applying a function to two lots of multiple-values is different as you begin to run into order problems and issue of what to do when the cardinalities are different. Reconsider FunctionValues / ValueSource API --- Key: LUCENE-4412 URL: https://issues.apache.org/jira/browse/LUCENE-4412 Project: Lucene - Core Issue Type: Improvement Components: modules/other Reporter: Chris Male Fix For: 5.0 When documenting a lot of these classes today I found myself confused and it isn't the first time with this API. I think we need to step back and reassess what we want from this API, what use cases its designed to meet, and redesign it from the ground up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4412) Reconsider FunctionValues / ValueSource API
[ https://issues.apache.org/jira/browse/LUCENE-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460439#comment-13460439 ] Chris Male commented on LUCENE-4412: Thanks for raising those concerns David. They're exactly what I'm referring to and what concern me greatly. If you have any thoughts on how we can better design this API (and lets not be bound by what the current API looks like) please put them in this issue. Reconsider FunctionValues / ValueSource API --- Key: LUCENE-4412 URL: https://issues.apache.org/jira/browse/LUCENE-4412 Project: Lucene - Core Issue Type: Improvement Components: modules/other Reporter: Chris Male Fix For: 5.0 When documenting a lot of these classes today I found myself confused and it isn't the first time with this API. I think we need to step back and reassess what we want from this API, what use cases its designed to meet, and redesign it from the ground up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4412) Reconsider FunctionValues / ValueSource API
Chris Male created LUCENE-4412: -- Summary: Reconsider FunctionValues / ValueSource API Key: LUCENE-4412 URL: https://issues.apache.org/jira/browse/LUCENE-4412 Project: Lucene - Core Issue Type: Improvement Components: modules/other Reporter: Chris Male Fix For: 5.0 When documenting a lot of these classes today I found myself confused and it isn't the first time with this API. I think we need to step back and reassess what we want from this API, what use cases its designed to meet, and redesign it from the ground up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4409) implement javadocs linting with eclipse ecj compiler
[ https://issues.apache.org/jira/browse/LUCENE-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459295#comment-13459295 ] Chris Male commented on LUCENE-4409: +1 That's pretty damn cool implement javadocs linting with eclipse ecj compiler Key: LUCENE-4409 URL: https://issues.apache.org/jira/browse/LUCENE-4409 Project: Lucene - Core Issue Type: Task Components: general/build Reporter: Robert Muir today we have a lot of custom python scripts checking javadocs (checking for missing stuff too). Most of this is implemented by parsing html etc (some of this should stay this way, like broken-link detection) But actually the eclipse compiler can do most of this type of linting, and has a lot of options for it. We can pull it via ivy and run it from the command-line. I tested this manually by adding a bogus throws clause to Codec.java, downloading the ecj.jar from maven and running it manually: {noformat} rmuir@beast:~/workspace/lucene-trunk/lucene/core/src/java$ java -cp ~/Downloads/ecj-3.7.2.jar org.eclipse.jdt.internal.compiler.batch.Main -source 1.6 -d none -enableJavadoc -properties ~/workspace/lucene-trunk/dev-tools/eclipse/.settings/org.eclipse.jdt.core.prefs . ... -- 120. ERROR in /home/rmuir/workspace/lucene-trunk/lucene/core/src/java/./org/apache/lucene/codecs/Codec.java (at line 59) * @throws IOException */ ^^^ Javadoc: Exception IOException is not declared -- {noformat} here i specified -d none (don't generate class files), and essentially told it to read the compiler warnings/errors options set in the dev-tools config. For javadocs-lint we would want our own separate properties file that disables the ordinary java warnings (because eclipse can warn/error/ignore on lots of things, not just javadocs, and does by default). Separately we could also use this to check/fail/warn on other things besides javadoc... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4175) Include BBox Spatial Strategy
[ https://issues.apache.org/jira/browse/LUCENE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459341#comment-13459341 ] Chris Male commented on LUCENE-4175: With the very near release of 4.0, I don't think we should backport anything untested. I also don't think we're in any immediate hurry for this since we've got other options in 4.0. But we should definitely work on the testing and push it for 4.1. Include BBox Spatial Strategy - Key: LUCENE-4175 URL: https://issues.apache.org/jira/browse/LUCENE-4175 Project: Lucene - Core Issue Type: Improvement Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: LUCENE-4175-bbox-strategy.patch This is an approach to indexing bounding boxes using 4 numeric fields (xmin,ymin,xmax,ymax) and a flag to say if it crosses the dateline. This is a modification from the Apache 2.0 code from the ESRI Geoportal: http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialClauseAdapter.java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3997) join module should not depend on grouping module
[ https://issues.apache.org/jira/browse/LUCENE-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456892#comment-13456892 ] Chris Male commented on LUCENE-3997: bq. I propose, instead of using lucene-core as the location for code used by multiple modules, that we create a (single) new module that serves this purpose, something like lucene-shared or lucene-common (though common analyzers already use this name...) I actually created lucene-common that when I first refactored out the FunctionQuery codebase. After some time it was decided (in an issue I can't remember) that the code would go into lucene-core. I agree with your assessment that we shouldn't use lucene-core as a dumping ground, but we should get a discussion about this going. join module should not depend on grouping module Key: LUCENE-3997 URL: https://issues.apache.org/jira/browse/LUCENE-3997 Project: Lucene - Core Issue Type: Task Affects Versions: 4.0-ALPHA Reporter: Robert Muir Fix For: 4.1 Attachments: LUCENE-3997.patch, LUCENE-3997.patch I think TopGroups/GroupDocs should simply be in core? Both grouping and join modules use these trivial classes, but join depends on grouping just for them. I think its better that we try to minimize these inter-module dependencies. Of course, another option is to combine grouping and join into one module, but last time i brought that up nobody could agree on a name. Anyway I think the change is pretty clean: its similar to having basic stuff like Analyzer.java in core, so other things can work with Analyzer without depending on any specific implementing modules. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: being a good citizen is hard when you can't successfully run tests....
On Tue, Sep 18, 2012 at 12:45 AM, Dawid Weiss dawid.we...@gmail.com wrote: I think we can even integrate hossman's suggestion and generate a stability report like weekly or something. I will take a look at this this week but it is definitely something that will require everyone's consensus. What would they add in addition to the test histories you can see on jenkins? Dawid Sent from mobile phone. On Sep 17, 2012 2:42 PM, Michael McCandless luc...@mikemccandless.com wrote: I agree that a test that frequently fails, and does not get fixed, is nearly pointless: everybody ignores it so it's as if the test didn't exist. And so it should be disabled. I say *nearly* because the failures are in fact useful to devs who do have the itch/time to debug/fix them. So I think we need some middle ground here, where the tests keep failing but only those that are interested in the failures see the notifications. We need to switch from a push model (any failure is broadcast to everybody) to a pull model (those devs that want to debug the failures go and check the logs), for such tests. When someone wants to make sure their change didn't break something (Erick's original use case) then these tests should not run. I like Dawid's idea (a separate test plan that Jenkins runs with these difficult tests, and it wouldn't email dev on failure). Mike McCandless http://blog.mikemccandless.com On Mon, Sep 17, 2012 at 7:58 AM, Robert Muir rcm...@gmail.com wrote: On Sun, Sep 16, 2012 at 11:10 PM, Mark Miller markrmil...@gmail.com wrote: I get value from this test - if it was disabled, I'd probably re-enable it. would be great if it didn't fail so much, but the type of fail tells me something. That means the assert in question isnt important at all. I'll remove it. Again my problem is the idea that having a failing build is ok because certain types of failures don't matter. If they dont matter they should be removed. It causes a ton of noise when people are lazy about tests in this way, and it wastes a ton of peoples time. R Remember every time one of these tests fails it sends an email, that I must read (we don't yet have a way to put in the subject header its a SOLR test fail versus a LUCENE one, or i'd filter the solr ones and not be complaining as much). -- lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Open Source Search Developer | elasticsearch | www.ehttp://www.dutchworks.nl lasticsearch.com
Re: being a good citizen is hard when you can't successfully run tests....
On Tue, Sep 18, 2012 at 1:11 AM, Dawid Weiss dawid.we...@cs.put.poznan.plwrote: What would they add in addition to the test histories you can see on jenkins? Is there a per-test history on jenkins too? I'm more familiar with Atlassian Bamboo. Obviously if it already is in Jenkins there's no need to do anything other than just run tests with Yeah there is. It's a little messy and hard to navigate, but an example: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java6/661/testReport/junit/org.apache.solr.cloud/SyncSliceTest/testDistribSearch/history/ (wait for it to load) -Dtests.haltonfailure=false I'm wondering if jenkins also considers a build failed if tests fail but ant returns with success (i.e. does it parse log XMLs and derive this information from there)? No idea sorry. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Open Source Search Developer | elasticsearch | www.ehttp://www.dutchworks.nl lasticsearch.com
[jira] [Commented] (LUCENE-4388) ShapeMatcher and ShapeValues
[ https://issues.apache.org/jira/browse/LUCENE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456561#comment-13456561 ] Chris Male commented on LUCENE-4388: Interesting idea. I like the idea of Strategys exposing ShapeValues and then having a standard DistanceValueSource which accepted a Shape, ShapeValues and a DistanceCalculator. I like that it would also make it easier to retrieve the Shape if it was needed in other places. I am little worried that this could encourage consumers, whether they be other Strategy impls or something else, to use un-inverted index structures instead of inverted and subsequently suffer in performance and in memory consumption. bq. And a strategy could support any query shape simply by implementing makeShapeValues(). I don't understand this. Can you elaborate? bq. I've been thinking about how the API handles strategies supporting indexing multiple shapes and I wonder if that could happen simply via a new MultiShapeShape One of the challenges with this API is that whether multiple values are supported is a per Strategy decision, yet whether there are multiple values is a per Document decision. Document 1 might have only a single Shape, Document 2 might have multiple Shapes. I just wonder whether we want to force Strategys which support multiple values to always use MultiShape, or whether it should change per Document and then force the consumer to check. ShapeMatcher and ShapeValues Key: LUCENE-4388 URL: https://issues.apache.org/jira/browse/LUCENE-4388 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Attachments: LUCENE-4388_ShapeValues_and_ShapeMatcher.patch This patch provides two key interfaces: ShapeMatcher and ShapeValues. The ShapeMatcher concept is borrowed from [~ryantxu]'s JtsGeoStrategy which has a similar GeometryTester. ShapeValues is basically a ValueSource/FunctionValues for shapes. This isn't working; I didn't modify any existing classes. I haven't completely thought this through but a SpatialStrategy might expose a makeShapeValues(IndexReader) and/or makeCenterShapeValues(IndexReader) (the latter is the center points of indexed data). A generic Distance ValueSource could easily be implemented in terms of makeCenterShapeValues(). And a strategy could support any query shape simply by implementing makeShapeValues(). I've been thinking about how the API handles strategies supporting indexing multiple shapes and I wonder if that could happen simply via a new MultiShapeShape. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: being a good citizen is hard when you can't successfully run tests....
On Mon, Sep 17, 2012 at 1:30 PM, Yonik Seeley yo...@lucidworks.com wrote: On Sun, Sep 16, 2012 at 2:51 PM, Robert Muir rcm...@gmail.com wrote: not so much energy spent fixing these few shitty solr tests, some of which (Like TestReplicationHandler) are totally useless and have been failing sporatically for like, years. Can you explain why it's useless (without the derogatory adjectives please)? I'm not wanting to get into issues of usefulness of tests or not, but I did just look at the build failure messages over the last few months and I've received a build failure message for this test almost every single day. I appreciate that this doesn't happen locally and makes it hard to fix, but it's hard to work with continuous integration that so commonly fails on one test. I didn't write the test to begin with, so I don't know off the top of my head all of the functionality it covers. I'd be surprised if it was all redundant and covered by other test suites of course. Notes: - I remember it passing for *long* periods of time - I just ran it in a loop 30 times on my linux box and it passed 100% of the time, and in a timely manner - It *has* found many bugs when it started failing (i.e. usefull, not useless) - Many of us (including you) *have* worked to improve the situation over time when it does deteriorate - check the logs. It's not clear what you are suggesting (unless you are volunteering to look into this issue with OS-X apparently, or volunteering to write a new replication test from scratch or something). -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Open Source Search Developer | elasticsearch | www.ehttp://www.dutchworks.nl lasticsearch.com
[jira] [Commented] (LUCENE-4388) ShapeMatcher and ShapeValues
[ https://issues.apache.org/jira/browse/LUCENE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456757#comment-13456757 ] Chris Male commented on LUCENE-4388: bq. The reasoning is similar to how a standard DistanceValueSource could then exist. For a makeFilter / makeQuery, there could be a standard ShapeFilter that consults makeShapeValues to intersect with the query shape. Of course, it should be preceded by a bbox filter or something similar. That's going to be so slow. Iterating over every Shape of every Document to see if intersects? That harks back to WildcardQuery performance of old. Even with a BBox, you could have 100,000 points within a city. I don't think we should ever support this. If a user wants to create it themselves then fine, but we should be striving for performance. bq. I'm not sure what you mean. But a problem with the other approach (forcing MultiShape for createFields) is that it would make Solr support difficult, perhaps requiring a UpdateRequestProcessor to join separate field values into one. But even putting that aside, I don't think use of a MultiShape needs to be forced, but it should be supported by the Strategy if it declares that it handles multi-valued shapes. Given this issue is about ShapeValues, I'm talking about retrieving Shapes through ShapeValues, not about indexing. What I was saying is given the ShapeValues interface: {code} S shape(int docId, IndexReader reader); {code} We need to decide what S is going to be. If S is always Shape then the consumer would need to check if the actual value returned was a MultiShape or not, in order to retrieve the multiple Shapes. If S was always MultiShape, then the ShapeValues impl would need to return a MultiShape even when there might only be one Shape associated with the given docId. This isn't a blocking problem, I was merely suggesting that we need to think through the use cases we want to support and how MultiShape fits in. ShapeMatcher and ShapeValues Key: LUCENE-4388 URL: https://issues.apache.org/jira/browse/LUCENE-4388 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Attachments: LUCENE-4388_ShapeValues_and_ShapeMatcher.patch This patch provides two key interfaces: ShapeMatcher and ShapeValues. The ShapeMatcher concept is borrowed from [~ryantxu]'s JtsGeoStrategy which has a similar GeometryTester. ShapeValues is basically a ValueSource/FunctionValues for shapes. This isn't working; I didn't modify any existing classes. I haven't completely thought this through but a SpatialStrategy might expose a makeShapeValues(IndexReader) and/or makeCenterShapeValues(IndexReader) (the latter is the center points of indexed data). A generic Distance ValueSource could easily be implemented in terms of makeCenterShapeValues(). And a strategy could support any query shape simply by implementing makeShapeValues(). I've been thinking about how the API handles strategies supporting indexing multiple shapes and I wonder if that could happen simply via a new MultiShapeShape. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4389) Fix TwoDoubles dateline support
[ https://issues.apache.org/jira/browse/LUCENE-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456405#comment-13456405 ] Chris Male commented on LUCENE-4389: I have faith in your knowledge on this and there seems to be adequate testing, so lets go ahead and commit that. Fix TwoDoubles dateline support --- Key: LUCENE-4389 URL: https://issues.apache.org/jira/browse/LUCENE-4389 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Fix For: 4.0 Attachments: LUCENE-4389_Support_dateline_and_circles_for_TwoDoubles.patch, LUCENE-4389 Support dateline for TwoDoubles.patch The dateline support can easily be fixed. After this, the TwoDoublesStrategy might not be particularly useful but at least it won't be buggy if you stay with Rectangle query shapes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance
[ https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455768#comment-13455768 ] Chris Male commented on LUCENE-4208: Things are looking pretty good, we're almost there. - Where are we on multi-valued fields? In the documentation on makeDistanceValueSource it doesn't say what happens when multiple values are indexed. Do we support that in the ValueSource implementations? is the behaviour undefined? If it is supposed to be defined, can we document it? - Returns a ValueSource useful as a score Can we drop this claim? Part of the reason we've moved to having ConstantScoreQuerys is that it isn't clear what the score for the queries should be. This value isn't useful for every spatial operation or implementation. Once these have gotten addressed, I'm +1 for committing. Spatial distance relevancy should use score of 1/distance - Key: LUCENE-4208 URL: https://issues.apache.org/jira/browse/LUCENE-4208 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Fix For: 4.0 Attachments: LUCENE-4208_makeQuery_return_ConstantScoreQuery_and_remake_TwoDoublesStrategy.patch, LUCENE-4208_makeQuery_return_ConstantScoreQuery,_standardize_makeDistanceValueSource_behav.patch, LUCENE-4208_makeQuery_return_ConstantScoreQuery,_standardize_makeDistanceValueSource_behav.patch The SpatialStrategy.makeQuery() at the moment uses the distance as the score (although some strategies -- TwoDoubles if I recall might not do anything which would be a bug). The distance is a poor value to use as the score because the score should be related to relevancy, and the distance itself is inversely related to that. A score of 1/distance would be nice. Another alternative is earthCircumference/2 - distance, although I like 1/distance better. Maybe use a different constant than 1. Credit: this is Chris Male's idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance
[ https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453746#comment-13453746 ] Chris Male commented on LUCENE-4208: I disagree that makeQuery shouldn't exist. There are optimizations to be had in Query code, such as using BooleanQuery and its associated highly optimized scorer algs. I think it should continue to exist but should have a default implementation that creates a CSQ by calling makeFilter. Spatial distance relevancy should use score of 1/distance - Key: LUCENE-4208 URL: https://issues.apache.org/jira/browse/LUCENE-4208 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Fix For: 4.0 The SpatialStrategy.makeQuery() at the moment uses the distance as the score (although some strategies -- TwoDoubles if I recall might not do anything which would be a bug). The distance is a poor value to use as the score because the score should be related to relevancy, and the distance itself is inversely related to that. A score of 1/distance would be nice. Another alternative is earthCircumference/2 - distance, although I like 1/distance better. Maybe use a different constant than 1. Credit: this is Chris Male's idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance
[ https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454581#comment-13454581 ] Chris Male commented on LUCENE-4208: bq. TwoDoubles is getting overhauled to support the dateline and any query shape--should probably go into another issue. Yes please! Spatial distance relevancy should use score of 1/distance - Key: LUCENE-4208 URL: https://issues.apache.org/jira/browse/LUCENE-4208 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Fix For: 4.0 Attachments: LUCENE-4208_makeQuery_return_ConstantScoreQuery_and_remake_TwoDoublesStrategy.patch The SpatialStrategy.makeQuery() at the moment uses the distance as the score (although some strategies -- TwoDoubles if I recall might not do anything which would be a bug). The distance is a poor value to use as the score because the score should be related to relevancy, and the distance itself is inversely related to that. A score of 1/distance would be nice. Another alternative is earthCircumference/2 - distance, although I like 1/distance better. Maybe use a different constant than 1. Credit: this is Chris Male's idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452884#comment-13452884 ] Chris Male commented on LUCENE-4369: As I say, I totally support renaming this field to something. I think calling it anything else will help with distinguishing it from TextField so I'm +1 for MatchOnly. Perhaps that'll encourage people to read the docs about it not being analyzed. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452924#comment-13452924 ] Chris Male commented on LUCENE-4369: I like ExactMatchField, good suggestion. StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4369.patch There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4375) Spatial BBoxIntersects and BBoxWithin are used incorrectly
[ https://issues.apache.org/jira/browse/LUCENE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453674#comment-13453674 ] Chris Male commented on LUCENE-4375: In the future, once we remove much of the restrictions on the SpatialStrategy interface, we could have implementations of PTS that was limited to Points and supported isWithin. Till then, I don't think we should include hacks just to support isWithin for Points. Lets leave the API nice and we'll make improvements when we can. Spatial BBoxIntersects and BBoxWithin are used incorrectly -- Key: LUCENE-4375 URL: https://issues.apache.org/jira/browse/LUCENE-4375 Project: Lucene - Core Issue Type: Bug Reporter: David Smiley Assignee: David Smiley Fix For: 4.0 Attachments: LUCENE-4375_Fix_use_of_BBoxWithin_BBoxIntersects_and_IsWithin.patch SpatialOperation has two special BBoxIntersects and BBoxWithin choices. I assumed these where the bounding boxes of the query shape but [~ryantxu] informed me these are supposed to be for the *indexed shape*. There is no strategy in Lucene spatial that could use this but there is one externally -- JtsGeoStrategy. Javadocs should be added to clarify, and various places like SpatialArgs.getShape() should be fixed to not use it incorrectly. This does remove a feature from the Solr adapters side; the test there will need to change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance
[ https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453708#comment-13453708 ] Chris Male commented on LUCENE-4208: I don't think there is a clear solution here. But I feel ValueSource provides maximum flexibility going forward. If we continue to support makeValueSource then people can sort, or include it in their query if they want, or just retrieve the value at some later stage. makeQuery() should just return a ConstantScoreQuery. We can consider in future versions what if anything we want to do around its score. WRT to TwoDoubles. This Strategy was a nice start to this work awhile back and was designed to replicate existing point-distance functionality. But it has huge limitations and it constantly feels like we're being held back by it. Every Strategy has its limitations, and I dont feel we should hold back changes just because it impacts TwoDoubles. Spatial distance relevancy should use score of 1/distance - Key: LUCENE-4208 URL: https://issues.apache.org/jira/browse/LUCENE-4208 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Fix For: 4.0 The SpatialStrategy.makeQuery() at the moment uses the distance as the score (although some strategies -- TwoDoubles if I recall might not do anything which would be a bug). The distance is a poor value to use as the score because the score should be related to relevancy, and the distance itself is inversely related to that. A score of 1/distance would be nice. Another alternative is earthCircumference/2 - distance, although I like 1/distance better. Maybe use a different constant than 1. Credit: this is Chris Male's idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance
[ https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451508#comment-13451508 ] Chris Male commented on LUCENE-4208: I actually totally agree with David here. Using ValueSource (instead of my SpatialSimilarity idea) is an excellent solution which leverages existing Lucene code. Having it this way means that even if a Strategy has a custom Query implementation (maybe for performance reasons) it would still be possible to make use of the ValueSource in scoring. I definitely think we should expose this on a per Strategy basis rather than all Strategys as some Strategys may not be able to compute distance and we shouldn't force them to. Spatial distance relevancy should use score of 1/distance - Key: LUCENE-4208 URL: https://issues.apache.org/jira/browse/LUCENE-4208 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Fix For: 4.0 The SpatialStrategy.makeQuery() at the moment uses the distance as the score (although some strategies -- TwoDoubles if I recall might not do anything which would be a bug). The distance is a poor value to use as the score because the score should be related to relevancy, and the distance itself is inversely related to that. A score of 1/distance would be nice. Another alternative is earthCircumference/2 - distance, although I like 1/distance better. Maybe use a different constant than 1. Credit: this is Chris Male's idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451579#comment-13451579 ] Chris Male commented on LUCENE-3312: David, just at a guess I imagine the branch used in this issue was created before we changed createIndexableFields to not handle storing. To satisfy the conditions at the time (indexing and storing) Nikola changed it to return Field. Lets just fix it and we'll be fine. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: 5.0 Attachments: LUCENE-3312-DocumentIterators-uwe.patch, lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch, lucene-3312-patch-10.patch, lucene-3312-patch-11.patch, lucene-3312-patch-12a.patch, lucene-3312-patch-12.patch, lucene-3312-patch-13.patch, lucene-3312-patch-14.patch, LUCENE-3312-reintegration.patch, LUCENE-3312-reintegration.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful
[ https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451730#comment-13451730 ] Chris Male commented on LUCENE-4369: I'm +1 for renaming this field (and even considering its long term future) I'm just not sure how MatchOnlyField conveys the fact it bypasses analysis? StringFields name is unintuitive and not helpful Key: LUCENE-4369 URL: https://issues.apache.org/jira/browse/LUCENE-4369 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir There's a huge difference between TextField and StringField, StringField screws up scoring and bypasses your Analyzer. (see java-user thread Custom Analyzer Not Called When Indexing as an example.) The name we use here is vital, otherwise people will get bad results. I think we should rename StringField to MatchOnlyField. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3686) fix solr/core and solr/solrj not to share a lib/ directory
[ https://issues.apache.org/jira/browse/SOLR-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451507#comment-13451507 ] Chris Male commented on SOLR-3686: -- bq. If I'm the only intellij user left, I guess I should go back to just maintaining my own simple config again since it seems like I'm getting blown out of the water every other week or so. I'm also still using IntelliJ. If you have any tweaks or fixes, please contribute them! fix solr/core and solr/solrj not to share a lib/ directory -- Key: SOLR-3686 URL: https://issues.apache.org/jira/browse/SOLR-3686 Project: Solr Issue Type: Bug Reporter: Robert Muir Fix For: 4.0, 5.0 Attachments: SOLR-3686.patch, SOLR-3686.patch This makes the build system hairy. it also prevents us from using ivy's sync=true (LUCENE-4262) which totally prevents the issue of outdated jars. We should fix this so each has its own lib/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4186) Lucene spatial's distErrPct is treated as a fraction, not a percent.
[ https://issues.apache.org/jira/browse/LUCENE-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448758#comment-13448758 ] Chris Male commented on LUCENE-4186: bq. SpatialArgs.toString()'s logic was moved to SpatialArgsParser as writeSpatialArgs(args) since it looks so close to the parsed format and I'd like to see it parsed and written in the same class. +1 Makes sense bq. SpatialArgs.toString() fixes the bug in displaying the error percent that Itamar noticed. +1 bq. Standardizes distErrPct terminology in variables and method names. Despite the pct it's actually a fraction [0 to 0.5]. +1 Do we validate somewhere that the values are between 0 and 0.5? bq. Instead of SpatialArgs.distErrPct defaulting to 0.025 it defaults to null. Now the Strategy's own distErrPct (which defaults to 0.025) is supplied to args.resolveDistErr(...) so it can see if the args overrides the one in strategy or not. If I understand correctly, your motivation for doing this is so in the default scenario (when no pct is defined) you have the same value at both index time and query time, correct? I'm starting to wonder whether it makes sense to allow the value to be set per request. Having the same value at both index and query time seems ideal so perhaps we should force the value, whether it be the pct or absolute value, be provided at construction of the Strategy. bq. SpatialArgs gains a distErr field, parsed from SpatialArgsParser. This is an alternative means that a search request can specify the distance in a more direct way. So can the user now provide either the the distErr or distErrPct and if they provide the later, it is converted to the former seamlessly? Or must the user do the conversion themselves? I'm +1 for the first option. bq. One thing I didn't do, is move the distErrPct getter setter up from PrefixTreeStrategy to the base SpatialStrategy. Why would we want to move it to SpatialStrategy? It seems unrelated to the other Strategies. Lucene spatial's distErrPct is treated as a fraction, not a percent. -- Key: LUCENE-4186 URL: https://issues.apache.org/jira/browse/LUCENE-4186 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Priority: Critical Fix For: 4.0 Attachments: LUCENE-4186_distErrPct_upgrade.patch The distance-error-percent of a query shape in Lucene spatial is, in a nutshell, the percent of the shape's area that is an error epsilon when considering search detail at its edges. The default is 2.5%, for reference. However, as configured, it is read in as a fraction: {code:xml} fieldType name=location_2d_trie class=solr.SpatialRecursivePrefixTreeFieldType distErrPct=0.025 maxDetailDist=0.001 / {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4186) Lucene spatial's distErrPct is treated as a fraction, not a percent.
[ https://issues.apache.org/jira/browse/LUCENE-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448801#comment-13448801 ] Chris Male commented on LUCENE-4186: bq. It'd be nice if this could be done at index time too but I'm not sure how it would fit into the API. Maybe an overloaded createIndexableFields(shape,distErr) I've always thought it was a little unusual createIndexableFields didn't also accept SpatialArgs, so why don't we change it so it does? Lucene spatial's distErrPct is treated as a fraction, not a percent. -- Key: LUCENE-4186 URL: https://issues.apache.org/jira/browse/LUCENE-4186 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Priority: Critical Fix For: 4.0 Attachments: LUCENE-4186_distErrPct_upgrade.patch The distance-error-percent of a query shape in Lucene spatial is, in a nutshell, the percent of the shape's area that is an error epsilon when considering search detail at its edges. The default is 2.5%, for reference. However, as configured, it is read in as a fraction: {code:xml} fieldType name=location_2d_trie class=solr.SpatialRecursivePrefixTreeFieldType distErrPct=0.025 maxDetailDist=0.001 / {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4365) The Maven build can't directly handle complex inter-module dependencies involving the test-framework modules
[ https://issues.apache.org/jira/browse/LUCENE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449400#comment-13449400 ] Chris Male commented on LUCENE-4365: Great work improving this Steven, what a mess! The Maven build can't directly handle complex inter-module dependencies involving the test-framework modules Key: LUCENE-4365 URL: https://issues.apache.org/jira/browse/LUCENE-4365 Project: Lucene - Core Issue Type: Improvement Components: general/build Reporter: Steven Rowe Assignee: Steven Rowe Priority: Minor Attachments: LUCENE-4365.patch, lucene.solr.cyclic.dependencies.removed.png, lucene.solr.dependency.cycles.png.jpg The Maven dependency model disallows cyclic dependencies, of which there are now several in the Ant build (considering test and compile dependencies together, as Maven does). All of these cycles involve either the Lucene test-framework or the Solr test-framework. The current Maven build works around this problem by incorporating dependencies' sources into dependent modules' test sources, rather than literally declaring the problematic dependencies as such. (See SOLR-3780 for a recent example of putting this workaround in place for the Solrj module.) But with the factoring out of the Lucene Codecs module, upon which Lucene test-framework has a compile-time dependency, the complexity of the workarounds required to make it all hang together is great enough that I want to attempt a (Maven-build-only) module refactoring. It should require fewer contortions and be more maintainable. The Maven build is currently broken, as of the addition of the Codecs module (LUCENE-4340). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4354) add validate-maven task to check maven dependencies
[ https://issues.apache.org/jira/browse/LUCENE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447617#comment-13447617 ] Chris Male commented on LUCENE-4354: hamcrest is a transitive dependency of junit, we'll need to exclude that specifically in our poms. add validate-maven task to check maven dependencies --- Key: LUCENE-4354 URL: https://issues.apache.org/jira/browse/LUCENE-4354 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4354.patch We had a situation where the maven artifacts depended on the wrong version of tika: we should test that the maven dependencies are correct. An easy way to do this is to force it to download all of its dependencies, and then run our existing license checks over that. This currently fails: maven is bringing in some extra 3rd party libraries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4354) add validate-maven task to check maven dependencies
[ https://issues.apache.org/jira/browse/LUCENE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447634#comment-13447634 ] Chris Male commented on LUCENE-4354: Ignoring the scope issue, the validation has revealed valid issues. For example the jdom, rome and servlet dependencies all have different versions to our license files. add validate-maven task to check maven dependencies --- Key: LUCENE-4354 URL: https://issues.apache.org/jira/browse/LUCENE-4354 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4354.patch We had a situation where the maven artifacts depended on the wrong version of tika: we should test that the maven dependencies are correct. An easy way to do this is to force it to download all of its dependencies, and then run our existing license checks over that. This currently fails: maven is bringing in some extra 3rd party libraries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4354) add validate-maven task to check maven dependencies
[ https://issues.apache.org/jira/browse/LUCENE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447638#comment-13447638 ] Chris Male commented on LUCENE-4354: It's not just an issue of what ends up in the war since we also publish individual artifacts / poms. add validate-maven task to check maven dependencies --- Key: LUCENE-4354 URL: https://issues.apache.org/jira/browse/LUCENE-4354 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4354.patch We had a situation where the maven artifacts depended on the wrong version of tika: we should test that the maven dependencies are correct. An easy way to do this is to force it to download all of its dependencies, and then run our existing license checks over that. This currently fails: maven is bringing in some extra 3rd party libraries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4354) add validate-maven task to check maven dependencies
[ https://issues.apache.org/jira/browse/LUCENE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447714#comment-13447714 ] Chris Male commented on LUCENE-4354: Yeah I think tests should be catching them. Do you have any examples? add validate-maven task to check maven dependencies --- Key: LUCENE-4354 URL: https://issues.apache.org/jira/browse/LUCENE-4354 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4354-dep-fix.patch, LUCENE-4354_hacked_lucene_only.patch, LUCENE-4354.patch, LUCENE-4354.patch, LUCENE-4354.patch We had a situation where the maven artifacts depended on the wrong version of tika: we should test that the maven dependencies are correct. An easy way to do this is to force it to download all of its dependencies, and then run our existing license checks over that. This currently fails: maven is bringing in some extra 3rd party libraries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4362) ban tab-indented source
[ https://issues.apache.org/jira/browse/LUCENE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448478#comment-13448478 ] Chris Male commented on LUCENE-4362: Well damn. ban tab-indented source --- Key: LUCENE-4362 URL: https://issues.apache.org/jira/browse/LUCENE-4362 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4362_core.patch This makes code really difficult to read and work with. Its easy enough to prevent. {noformat} Index: build.xml === --- build.xml (revision 1380979) +++ build.xml (working copy) @@ -77,11 +77,12 @@ or containsregexp expression=@author\b casesensitive=yes/ containsregexp expression=\bno(n|)commit\b casesensitive=no/ + containsregexp expression=\t casesensitive=no/ /or /fileset map from=${validate.currDir}${file.separator} to=* / /pathconvert -fail if=validate.patternsFoundThe following files contain @author tags or nocommits:${line.separator}${validate.patternsFound}/fail +fail if=validate.patternsFoundThe following files contain @author tags, tabs or nocommits:${line.separator}${validate.patternsFound}/fail /target {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446962#comment-13446962 ] Chris Male commented on LUCENE-3312: Thanks Uwe and Nikola! Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: 5.0 Attachments: LUCENE-3312-DocumentIterators-uwe.patch, lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch, lucene-3312-patch-10.patch, lucene-3312-patch-11.patch, lucene-3312-patch-12a.patch, lucene-3312-patch-12.patch, lucene-3312-patch-13.patch, lucene-3312-patch-14.patch, LUCENE-3312-reintegration.patch, LUCENE-3312-reintegration.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445842#comment-13445842 ] Chris Male commented on LUCENE-3312: I've thought about this a little bit. {quote} To me storing needs no 'type' information at all: But I guess the problem with that is that we need DocValues types since DocValues are stored fields here. {quote} We've gone back and forwards about this a lot since the Fields cleanup began but it would be nice to actually have the DocValues Types on the StorableField itself rather than on StorableFieldType. In the end the type is related to the type of the value itself, not disconnected metadata. Having it this way would also alleviate the need for StorableFieldType and make storing values as simple as possible. {quote} This basically is the same problem all over again. * You make a Document with N StorableFields * You call IR.document and get a StorableDocument back, with N-3 StorableFields. * You wonder: what happened to the other 3 fields? They were DocValues. {quote} What if they were returned? Because you're absolutely right, it seems odd for DocValues Fields to be StorableFields and then not accessible like all other StorableFields. So what if we changed how IR.document worked so you could pull DocValues Fields too. Is that something users might want? Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: LUCENE-3312-DocumentIterators-uwe.patch, lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch, lucene-3312-patch-10.patch, lucene-3312-patch-11.patch, lucene-3312-patch-12a.patch, lucene-3312-patch-12.patch, lucene-3312-patch-13.patch, lucene-3312-patch-14.patch, LUCENE-3312-reintegration.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3700) Create a Classification component
[ https://issues.apache.org/jira/browse/SOLR-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444841#comment-13444841 ] Chris Male commented on SOLR-3700: -- bq. Is there any reason not to develop it as a Lucene module? I haven't looked at the patch, but if it's not Solr-specific, or depends on Solr API, perhaps we can make this issue a LUCENE- one? +1 Create a Classification component - Key: SOLR-3700 URL: https://issues.apache.org/jira/browse/SOLR-3700 Project: Solr Issue Type: New Feature Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Attachments: SOLR-3700_2.patch, SOLR-3700.patch Lucene/Solr can host huge sets of documents containing lots of information in fields so that these can be used as training examples (w/ features) in order to very quickly create classifiers algorithms to use on new documents and / or to provide an additional service. So the idea is to create a contrib module (called 'classification') to host a ClassificationComponent that will use already seen data (the indexed documents / fields) to classify new documents / text fragments. The first version will contain a (simplistic) Lucene based Naive Bayes classifier but more implementations should be added in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4343) Clear up more Tokenizer.setReader/TokenStream.reset issues
[ https://issues.apache.org/jira/browse/LUCENE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444908#comment-13444908 ] Chris Male commented on LUCENE-4343: +1 to the improvements and pursuing making it final. Clear up more Tokenizer.setReader/TokenStream.reset issues -- Key: LUCENE-4343 URL: https://issues.apache.org/jira/browse/LUCENE-4343 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4343.patch spinoff from user-list thread. I think the rename helps, but the javadocs still have problems: they seem to only describe a totally wacky case (CachingTokenFilter) and not the normal case. Ideally setReader would be final I think, but there are a few crazy tokenstreams to fix before I could make that work. Would also need something hackish so MockTokenizer's state machine is still functional. But i worked on fixing up the mess in our various tokenstreams, which is easy for the most part. As part of this I found it was really useful in flushing out test bugs (ones that dont use MockTokenizer, which they really should), if we can do some best-effort exceptions when the consumer is broken and it costs nothing. For example: {noformat} - private int offset = 0, bufferIndex = 0, dataLen = 0, finalOffset = 0; + // note: bufferIndex is -1 here to best-effort AIOOBE consumers that don't call reset() + private int offset = 0, bufferIndex = -1, dataLen = 0, finalOffset = 0; {noformat} I think this is worth exploring more... this was really effective at finding broken tests etc. We should see if we can be more thorough/ideally throw better exceptions when consumers are broken and its free. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
Chris Male commented on LUCENE-3312 Break out StorableField from IndexableField Apply only to trunk (5.0) - so it has more time to bake? I think this change would be too big for Lucene 4.0 - and too late?? +1 to 5.0 only. It's another big change to the Document/Field API that we may want to evolve more as it bakes and earlier adopters begin to use it. Are there any other things to change? One open point is StorableFieldType. StorableFieldType seems like the only thing at this stage that needs to be addressed. This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
Chris Male commented on LUCENE-3312 Break out StorableField from IndexableField +1 This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4197) Small improvements to Lucene Spatial Module for v4
[ https://issues.apache.org/jira/browse/LUCENE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437458#comment-13437458 ] Chris Male commented on LUCENE-4197: +1 Small improvements to Lucene Spatial Module for v4 -- Key: LUCENE-4197 URL: https://issues.apache.org/jira/browse/LUCENE-4197 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Fix For: 4.0 Attachments: LUCENE-4197_rename_CachedDistanceValueSource.patch, LUCENE-4197_SpatialArgs_doesn_t_need_overloaded_toString()_with_a_ctx_param_.patch, SpatialArgs-_remove_unused_min_and_max_params.patch This issue is to capture small changes to the Lucene spatial module that don't deserve their own issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13436755#comment-13436755 ] Chris Male commented on LUCENE-3312: We definitely need to clean up StorableFieldType situation, but I think we can tackle that afterwards. I think it's best to ensure what we have now works and we're comfortable with the API. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch, lucene-3312-patch-10.patch, lucene-3312-patch-11.patch, lucene-3312-patch-12a.patch, lucene-3312-patch-12.patch, lucene-3312-patch-13.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434970#comment-13434970 ] Chris Male commented on LUCENE-3312: Is it going to be possible to address IndexableFieldType vs StorableFieldType situation resolved before this lands? I can assist if that would help. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch, lucene-3312-patch-10.patch, lucene-3312-patch-11.patch, lucene-3312-patch-12a.patch, lucene-3312-patch-12.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431719#comment-13431719 ] Chris Male commented on LUCENE-3312: Hey Nikola, bq. except for mentioned TestQualityRun.testTrecQuality. I'm happy to help work out what is going wrong here, have you done any debugging of the test yourself? What have you worked out so far? Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431828#comment-13431828 ] Chris Male commented on LUCENE-3312: Wow, I have replicated the same behaviour. On the branch the number of fields per doc is... wow. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431833#comment-13431833 ] Chris Male commented on LUCENE-3312: Ah I think I found the problem, it's in Document, I'll verify in a few seconds. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431839#comment-13431839 ] Chris Male commented on LUCENE-3312: Yup found it. The problem is in the branch {{Document#getFields()}} is creating a new List and inside {{DocMaker}} in the benchmark module, it is pulling the Fields and clearing them (using {{clear()}}). Since a new List is being created each time, it is the new List that is getting cleared rather than the actual fields. Hence each iteration just adds more fields without having the previous ones cleared. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431841#comment-13431841 ] Chris Male commented on LUCENE-3312: Nikola, we should probably move all of Document's methods over to just working with Field (and not IndexableField). I don't mind if we want to make getFields() return an immutable list but we then need to provide a clear() method so people can reuse Document instances. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431849#comment-13431849 ] Chris Male commented on LUCENE-3312: Yeah we definitely shouldn't return a new list. I think the immutable list and Document.clear() combo will suffice. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431851#comment-13431851 ] Chris Male commented on LUCENE-3312: Oh we should also include a unit test that verifies this behaviour. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431867#comment-13431867 ] Chris Male commented on LUCENE-3312: Nikola, On a totally note totally unrelated to the bug, I noticed that StorableField still returns an IndexableFieldType for type(). This lead me to GeneralField. I don't think we need this. IndexableField should only need name(), tokenStream() and type(). StorableField needs name(), type() and the various xyzValue() accessors. Its type() should be a StorableFieldType and some of the functionality from IndexableFieldType should go there. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431867#comment-13431867 ] Chris Male edited comment on LUCENE-3312 at 8/9/12 2:51 PM: Nikola, On a note totally unrelated to the bug, I noticed that StorableField still returns an IndexableFieldType for type(). This lead me to GeneralField. I don't think we need this. IndexableField should only need name(), tokenStream() and type(). StorableField needs name(), type() and the various xyzValue() accessors. Its type() should be a StorableFieldType and some of the functionality from IndexableFieldType should go there. was (Author: cmale): Nikola, On a totally note totally unrelated to the bug, I noticed that StorableField still returns an IndexableFieldType for type(). This lead me to GeneralField. I don't think we need this. IndexableField should only need name(), tokenStream() and type(). StorableField needs name(), type() and the various xyzValue() accessors. Its type() should be a StorableFieldType and some of the functionality from IndexableFieldType should go there. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13431880#comment-13431880 ] Chris Male commented on LUCENE-3312: {code} public final ListField getFields() { return Collections.unmodifiableList(fields); } {code} Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Nikola Tankovic Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch Attachments: lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions
[ https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428583#comment-13428583 ] Chris Male commented on LUCENE-3616: With all the various typed XYZField implementations we have now, what do we see as the role of Field? Is it just serving as a parent class to the implementations or do we expect users will be using it too? Illegal Field Configurations should throw exceptions Key: LUCENE-3616 URL: https://issues.apache.org/jira/browse/LUCENE-3616 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3616.patch When working on LUCENE-3615, I came across: {quote} java.lang.IllegalArgumentException: field field is stored but does not have binaryValue, stringValue nor numericValue at org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177) at org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223) at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) {quote} which is due to the using Textfield.TYPE_STORED when using a TokenStream. Since this is an illegal combination, we should throw an exception upon construction of the Field, not later when actually trying to do the indexing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3616) Illegal Field Configurations should throw exceptions
[ https://issues.apache.org/jira/browse/LUCENE-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428594#comment-13428594 ] Chris Male commented on LUCENE-3616: bq. In my opinion if i have a ShortDocValuesField, it shouldnt have a setReader method Agreed. The setABC() methods are extremely confusing and add another level of validation (using your example, we have to validate that you're not setting a Reader on a NumericField). Perhaps we can re-arrange this a little. If we genuinely feel there there are use cases out there that we haven't covered with the typed impls and that we don't want to cover, then why not make a GenericField or something, which is abstract and accepts just name, FieldType and maybe an Object value. We can then emphasis in documentation that it is expert only, should only be subclassed in the extremely rare situations that our typed impls are insufficient, and won't be validated so buyer-beware kind of thing. We can then gut Field down to a very simple abstract class / interface, and promote our typed impls to being 1st class and the recommended entry points for users. Of course if we feel we have provided adequate support through the typed impls, then we can skip straight to the gutting. Illegal Field Configurations should throw exceptions Key: LUCENE-3616 URL: https://issues.apache.org/jira/browse/LUCENE-3616 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.0-ALPHA Reporter: Grant Ingersoll Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-3616.patch When working on LUCENE-3615, I came across: {quote} java.lang.IllegalArgumentException: field field is stored but does not have binaryValue, stringValue nor numericValue at org.apache.lucene.index.codecs.DefaultStoredFieldsWriter.writeField(DefaultStoredFieldsWriter.java:177) at org.apache.lucene.index.StoredFieldsConsumer.finishDocument(StoredFieldsConsumer.java:119) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:295) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:255) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1480) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1242) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1223) at org.apache.lucene.index.Test2BTerms.test2BTerms(Test2BTerms.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:525) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:168) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:71) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:199
[jira] [Updated] (LUCENE-4256) Improve Analysis Factory configuration workflow
[ https://issues.apache.org/jira/browse/LUCENE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-4256: --- Attachment: LUCENE-4256-version.patch Going to do this in smaller steps so they are easier to review and be sure about. This patch moves the Version back into the args Map. Once this is committed I'll tackle the constructor stuff. Improve Analysis Factory configuration workflow --- Key: LUCENE-4256 URL: https://issues.apache.org/jira/browse/LUCENE-4256 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: Chris Male Attachments: LUCENE-4256-further.patch, LUCENE-4256-version.patch, LUCENE-4256_incomplete.patch With the Factorys now available for more general use, I'd like to look at ways to improve the configuration workflow. Currently it's a little disjoint and confusing, especially around using {{inform(ResourceLoader)}}. What I think we should do is: - Remove the need for {{ResourceLoaderAware}} and pass in the ResourceLoader in {{init}}, so it'd become {{init(MapString, String args, ResourceLoader loader)}} - Consider moving away from the generic args Map and using setters. This gives us better typing and could mitigate bugs due to using the wrong configure key. However it does force the consumer to invoke each setter. - If we're going to stick with using the args Map, then move the Version parameter into {{init}} as well, rather than being a setter as I currently made it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Output class folders (Eclipse).
Couldn't we classpath scan ourselves and detect services rather than generating the file? On Wed, Aug 1, 2012 at 10:13 PM, Uwe Schindler u...@thetaphi.de wrote: That's Kohsukes Annotation processor: http://weblogs.java.net/blog/kohsuke/archive/2009/03/my_project_of_t.html - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Wednesday, August 01, 2012 12:07 PM To: dev@lucene.apache.org Subject: RE: Output class folders (Eclipse). That is unfortunately needed because of SPI. The problem is: If eclipse copies all files to one folder, then all META-INF/services/ files that are in more than one module will overwrite each other. This is only solvable maybe at a later stage, when we will create META-INF files using a javac annotation processor (in that case, Eclipse would create one merged META-INF file for all modules). I have not yet opened an issue, but the SPI file creation for analyzers is error prone (it's easy to miss a new factory), so the idea is to automate this. Either: - Each codec or analyzer factory gets a compile-only annotation and a javac annotation processor will put the class name into META-INF (that would solve the Eclipse issue). There are 2 packages available that can do this (itself loaded by SPI into javac/eclipse, haha): One from Jenkin's Kohsuke and another one. The downside is: You have to mark each Codec/Postigsformat/Analysis Factory with an annotation: @Provider or similar - The 2nd option: Use ASM to find all classes in output folder that extend a base class (e.g., Codec) or interface and generate META-INF/services/oal.Codec files. Downside: Would not work with eclipse at all, as it must be run by ANT. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Dawid Weiss [mailto:dawid.we...@gmail.com] Sent: Wednesday, August 01, 2012 11:56 AM To: dev@lucene.apache.org Subject: Output class folders (Eclipse). The template file separates output folders for class files into many bin.* folders at the root level: output=bin.analysis-kuromoji Is this intentional? It's annoying, I'd rather move it under one folder or even put it into a single folder (since at Eclipse level there's no distinction of modules anyway). Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Chris Male | Open Source Search Developer | elasticsearch | www.ehttp://www.dutchworks.nl lasticsearch.com
[jira] [Commented] (LUCENE-4271) Solr LocalParams for Lucene Query Parser
[ https://issues.apache.org/jira/browse/LUCENE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425627#comment-13425627 ] Chris Male commented on LUCENE-4271: Is the intention to mandate the !var syntax too? That seems like a pretty Solr specific thing (being able to delegate the parsing to a QParser) but I can imagine someone just wanting a map of values, e.g. {code} {arg_1=val_1 arg_2=val_2} word {code} Solr LocalParams for Lucene Query Parser Key: LUCENE-4271 URL: https://issues.apache.org/jira/browse/LUCENE-4271 Project: Lucene - Core Issue Type: New Feature Reporter: Yonik Seeley The Lucene QueryParser should implement Solr's LocalParams syntax directly so that instead of {code} _query_:{!geodist d=10 p=20.5,30.2} {code} one could directly use {code} {!geodist d=10 p=20.5,30.2} {code} references: http://wiki.apache.org/solr/LocalParams -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4271) Solr LocalParams for Lucene Query Parser
[ https://issues.apache.org/jira/browse/LUCENE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424903#comment-13424903 ] Chris Male commented on LUCENE-4271: Will a query have a single set of params, or will each clause potentially have its own? Solr LocalParams for Lucene Query Parser Key: LUCENE-4271 URL: https://issues.apache.org/jira/browse/LUCENE-4271 Project: Lucene - Core Issue Type: New Feature Reporter: Yonik Seeley The Lucene QueryParser should implement Solr's LocalParams syntax directly so that instead of {code} _query_:{!geodist d=10 p=20.5,30.2} {code} one could directly use {code} {!geodist d=10 p=20.5,30.2} {code} references: http://wiki.apache.org/solr/LocalParams -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4268) Rename ResourceAsStreamReasourceLoader to ClasspathResourceLoader, supply simple FilesystemResourceLoader
[ https://issues.apache.org/jira/browse/LUCENE-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424322#comment-13424322 ] Chris Male commented on LUCENE-4268: +1 Rename ResourceAsStreamReasourceLoader to ClasspathResourceLoader, supply simple FilesystemResourceLoader - Key: LUCENE-4268 URL: https://issues.apache.org/jira/browse/LUCENE-4268 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: LUCENE-4268.patch, LUCENE-4268.patch We should rename the class and also fix some bugs: - Class/ClassLoader.getResourceAsStream() returns null when resource not found (which is a Java bug in my opinion) and does not throw IOException. SolrResourceLoader throws IOException, the Lucene example one should do the same. This prevents NPEs everywhere. Improvements: - Add no-arg CTOR that uses context class loader instead a given class. This is more what users want. Resource names must then include package name, of course. We should also provide a second implementation that allows resource names to be full filesystem paths. I think for loading the resources like custom word list, this is the most wanted implementation. Loading of classes would be delegated to ClassLoader (of course). I dont like ResourceLoader also supplying newInstance(), can we remove this for analysis? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance
[ https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13424358#comment-13424358 ] Chris Male commented on LUCENE-4208: Having thought about this more I think the best way forward is to just emulate free-text queries and have a {{SpatialSimilarity}} abstraction. I'm not sure of the exact nature of the API for this but I think there are times with using 1/x is sufficient and there are probably times when a more convoluted algorithm fits. We should allow the consumer to control what they choose. I think the Similarity should be given the Query Shape, the matched docID and the current SpatialOperation as a minimum. I'd like to somehow see a way to also pass in a pre-computed distance (for Queries that compute it as part of their matching) and possibly the matched grid hash for anything using the PrefixTrees. We might have to have subclasses for those, or maybe a Command or something, I'm not sure. Other benefits: - We immediately open up the ability to have more complex similarity scores based on overlap percentage or anything really. - It is plausible that a SpatialSimilarity might use a cache of indexed Shapes to facilitate more complex algorithms. By having this abstraction we offload the caching from the main API. - It is also plausible that a SpatialSimilarity instance could be misused to cache calculated distances if the consumer so wanted. I think we should consider whether we want SpatialSimilarities to also be given the current IndexReader (and so be able to use it in any caches or other lookups) or whether we want them to be IR independent. We will also need some custom Queries to actually make use of the SpatialSimilarity. Need to think this one through a little. Spatial distance relevancy should use score of 1/distance - Key: LUCENE-4208 URL: https://issues.apache.org/jira/browse/LUCENE-4208 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Fix For: 4.0 The SpatialStrategy.makeQuery() at the moment uses the distance as the score (although some strategies -- TwoDoubles if I recall might not do anything which would be a bug). The distance is a poor value to use as the score because the score should be related to relevancy, and the distance itself is inversely related to that. A score of 1/distance would be nice. Another alternative is earthCircumference/2 - distance, although I like 1/distance better. Maybe use a different constant than 1. Credit: this is Chris Male's idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4260) Factor subPackages out of resourceloader interface
[ https://issues.apache.org/jira/browse/LUCENE-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423811#comment-13423811 ] Chris Male commented on LUCENE-4260: +1 Factor subPackages out of resourceloader interface -- Key: LUCENE-4260 URL: https://issues.apache.org/jira/browse/LUCENE-4260 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4260.patch From Uwe on LUCENE-4257: The comment about the subpackages: This should in reality not be in ResourceLoader, its too Solr-specific. It is used internally by Solr, to resolve those solr. fake packages depending on the context. We should remove that from the general interface and only handle it internally in SolrResourceLoader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance
[ https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423851#comment-13423851 ] Chris Male commented on LUCENE-4208: I'm considered this obfuscates the actual distance too much, making it difficult to retrieve x again. It's not impossible but suddenly anybody wanting to retrieve the actual distance must calculate c again. Spatial distance relevancy should use score of 1/distance - Key: LUCENE-4208 URL: https://issues.apache.org/jira/browse/LUCENE-4208 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Fix For: 4.0 The SpatialStrategy.makeQuery() at the moment uses the distance as the score (although some strategies -- TwoDoubles if I recall might not do anything which would be a bug). The distance is a poor value to use as the score because the score should be related to relevancy, and the distance itself is inversely related to that. A score of 1/distance would be nice. Another alternative is earthCircumference/2 - distance, although I like 1/distance better. Maybe use a different constant than 1. Credit: this is Chris Male's idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance
[ https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423851#comment-13423851 ] Chris Male edited comment on LUCENE-4208 at 7/27/12 1:02 PM: - I'm concerned this obfuscates the actual distance too much, making it difficult to retrieve x again. It's not impossible but suddenly anybody wanting to retrieve the actual distance must calculate c again. was (Author: cmale): I'm considered this obfuscates the actual distance too much, making it difficult to retrieve x again. It's not impossible but suddenly anybody wanting to retrieve the actual distance must calculate c again. Spatial distance relevancy should use score of 1/distance - Key: LUCENE-4208 URL: https://issues.apache.org/jira/browse/LUCENE-4208 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Fix For: 4.0 The SpatialStrategy.makeQuery() at the moment uses the distance as the score (although some strategies -- TwoDoubles if I recall might not do anything which would be a bug). The distance is a poor value to use as the score because the score should be related to relevancy, and the distance itself is inversely related to that. A score of 1/distance would be nice. Another alternative is earthCircumference/2 - distance, although I like 1/distance better. Maybe use a different constant than 1. Credit: this is Chris Male's idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4257) factor the getLines in ResourceLoader to WordListLoader
[ https://issues.apache.org/jira/browse/LUCENE-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422869#comment-13422869 ] Chris Male commented on LUCENE-4257: Thanks for getting to this Robert, it's a good improvement. factor the getLines in ResourceLoader to WordListLoader --- Key: LUCENE-4257 URL: https://issues.apache.org/jira/browse/LUCENE-4257 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-4257.patch This is costly to have as a mandatory method on an interface: and its unrelated to resource loading, and only the factories use it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys
[ https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422878#comment-13422878 ] Chris Male commented on LUCENE-4173: I thought you were anti-degradation at indexing and querying? Remove IgnoreIncompatibleGeometry for SpatialStrategys -- Key: LUCENE-4173 URL: https://issues.apache.org/jira/browse/LUCENE-4173 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Reporter: Chris Male Attachments: LUCENE-4173.patch Silently not indexing anything for a Shape is not okay. Users should get an Exception and then they can decide how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4256) Improve Analysis Factory configuration workflow
[ https://issues.apache.org/jira/browse/LUCENE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-4256: --- Attachment: LUCENE-4256-further.patch I took Robert's patch and extended it further, fixing the tests and adding some preliminary support for adding params through ResourceLoader.newInstance. Improve Analysis Factory configuration workflow --- Key: LUCENE-4256 URL: https://issues.apache.org/jira/browse/LUCENE-4256 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: Chris Male Attachments: LUCENE-4256-further.patch, LUCENE-4256_incomplete.patch With the Factorys now available for more general use, I'd like to look at ways to improve the configuration workflow. Currently it's a little disjoint and confusing, especially around using {{inform(ResourceLoader)}}. What I think we should do is: - Remove the need for {{ResourceLoaderAware}} and pass in the ResourceLoader in {{init}}, so it'd become {{init(MapString, String args, ResourceLoader loader)}} - Consider moving away from the generic args Map and using setters. This gives us better typing and could mitigate bugs due to using the wrong configure key. However it does force the consumer to invoke each setter. - If we're going to stick with using the args Map, then move the Version parameter into {{init}} as well, rather than being a setter as I currently made it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4256) Improve Analysis Factory configuration workflow
[ https://issues.apache.org/jira/browse/LUCENE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422941#comment-13422941 ] Chris Male commented on LUCENE-4256: Thanks Robert I'll take your prototype and roll with it. Improve Analysis Factory configuration workflow --- Key: LUCENE-4256 URL: https://issues.apache.org/jira/browse/LUCENE-4256 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: Chris Male Attachments: LUCENE-4256_incomplete.patch With the Factorys now available for more general use, I'd like to look at ways to improve the configuration workflow. Currently it's a little disjoint and confusing, especially around using {{inform(ResourceLoader)}}. What I think we should do is: - Remove the need for {{ResourceLoaderAware}} and pass in the ResourceLoader in {{init}}, so it'd become {{init(MapString, String args, ResourceLoader loader)}} - Consider moving away from the generic args Map and using setters. This gives us better typing and could mitigate bugs due to using the wrong configure key. However it does force the consumer to invoke each setter. - If we're going to stick with using the args Map, then move the Version parameter into {{init}} as well, rather than being a setter as I currently made it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4256) Improve Analysis Factory configuration workflow
[ https://issues.apache.org/jira/browse/LUCENE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422876#comment-13422876 ] Chris Male commented on LUCENE-4256: {quote} I think of it as just a way of taking String/String args mostly though. The other stuff is actually already supported by Analyzer easily: its just that you have to write code to make use of it since its strongly typed. {quote} Yeah good point. I guess I was over thinking the purpose of the Factorys. bq. Maybe we could start with this? It should be a relatively rote change. Do you think I should put the Version back as a String in the args map? or leave it typed. Improve Analysis Factory configuration workflow --- Key: LUCENE-4256 URL: https://issues.apache.org/jira/browse/LUCENE-4256 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: Chris Male With the Factorys now available for more general use, I'd like to look at ways to improve the configuration workflow. Currently it's a little disjoint and confusing, especially around using {{inform(ResourceLoader)}}. What I think we should do is: - Remove the need for {{ResourceLoaderAware}} and pass in the ResourceLoader in {{init}}, so it'd become {{init(MapString, String args, ResourceLoader loader)}} - Consider moving away from the generic args Map and using setters. This gives us better typing and could mitigate bugs due to using the wrong configure key. However it does force the consumer to invoke each setter. - If we're going to stick with using the args Map, then move the Version parameter into {{init}} as well, rather than being a setter as I currently made it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys
[ https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423073#comment-13423073 ] Chris Male commented on LUCENE-4173: I do like it yeah. I think it improves 'simple' Strategies like TwoDoubles. I'm not sure we need to define it per query and actually I don't think it needs to be on the Strategy interface. Instead I think we should have it in the constructors of the appropriate Strategys. That way the consumer is forced to decide how they want to proceed at instantiation. Remove IgnoreIncompatibleGeometry for SpatialStrategys -- Key: LUCENE-4173 URL: https://issues.apache.org/jira/browse/LUCENE-4173 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Reporter: Chris Male Attachments: LUCENE-4173.patch Silently not indexing anything for a Shape is not okay. Users should get an Exception and then they can decide how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys
[ https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423083#comment-13423083 ] Chris Male commented on LUCENE-4173: Why dont we use an enum instead of a boolean? InvalidShape.COALESCE and InvalidShape.FAIL. Remove IgnoreIncompatibleGeometry for SpatialStrategys -- Key: LUCENE-4173 URL: https://issues.apache.org/jira/browse/LUCENE-4173 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Reporter: Chris Male Assignee: David Smiley Attachments: LUCENE-4173.patch Silently not indexing anything for a Shape is not okay. Users should get an Exception and then they can decide how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4044) Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and CharFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422049#comment-13422049 ] Chris Male commented on LUCENE-4044: Shall we merge and then address automation? Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and CharFilterFactory Key: LUCENE-4044 URL: https://issues.apache.org/jira/browse/LUCENE-4044 Project: Lucene - Core Issue Type: Sub-task Components: modules/analysis Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-4044.patch In LUCENE-2510 I want to move all the analysis factories out of Solr and into the directories with what they create. This is going to hamper Solr's existing strategy for supporting {{solr.*}} package names, where it replaces {{solr}} with various pre-defined package names. One way to tackle this is to use NamedSPILoader so we simply look up {{StandardTokenizerFactory}} for example, and find it wherever it is, as long as it is defined as a service. This is similar to how we support Codecs currently. As noted by Robert in LUCENE-2510, this would also have the benefit of meaning configurations could be less verbose, would aid in fully decoupling the analysis module from Solr, and make the analysis factories easier to interact with. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4044) Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and CharFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422121#comment-13422121 ] Chris Male commented on LUCENE-4044: Thanks Uwe, patches look good, +1 to committing Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and CharFilterFactory Key: LUCENE-4044 URL: https://issues.apache.org/jira/browse/LUCENE-4044 Project: Lucene - Core Issue Type: Sub-task Components: modules/analysis Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-4044-stripped.patch, LUCENE-4044.patch, LUCENE-4044.patch In LUCENE-2510 I want to move all the analysis factories out of Solr and into the directories with what they create. This is going to hamper Solr's existing strategy for supporting {{solr.*}} package names, where it replaces {{solr}} with various pre-defined package names. One way to tackle this is to use NamedSPILoader so we simply look up {{StandardTokenizerFactory}} for example, and find it wherever it is, as long as it is defined as a service. This is similar to how we support Codecs currently. As noted by Robert in LUCENE-2510, this would also have the benefit of meaning configurations could be less verbose, would aid in fully decoupling the analysis module from Solr, and make the analysis factories easier to interact with. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org