[jira] [Updated] (SOLR-3376) SolrCloud: Specifying shardId not working correctly, although the failures are inconsistent.
[ https://issues.apache.org/jira/browse/SOLR-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3376: - Description: I'm seeing some odd results when specifying shardId parameter. I'm trying the 4-node, 2-shard example from the Wiki and specifying shardIds like this: {{{ dir shardId start orderrunnng ZK port example 1 1 y8983 example22 2 y7574 example31 3 y8900 example42 4 y7500 }}} And I'm waiting a bit between starting various examples to let ZK settle down. Once all of them are started, I was looking at http://localhost:8983/solr/#/~cloud?view=graph to check out what that looks like (pretty cool IMO, especially since I didn't have to do it). The problem was that shard 2 only reported a single instance, while shard1 showed the two instances I was expecting. I'm running with 3 embedded ZK instances, just for yucks. Interestingly the node that didn't show up was the only node that was NOT running ZK. When I removed all the shardId parameters, nuked zoo_data from all directories and just started them up (with numShards=2 on the bootstrap ZK node), all 4 nodes showed up just fine. When starting with shardId specified and trying to go straight to the admin interface on the node that wasn't showing up, I'd get odd errors like This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml:. I also couldn't search directly on that machine, http://localhost:7574/solr/select?q=*:*; returns a 404 error. Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar Command for one that works fine: java -Xmx1G -Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=1 -jar start.jar Sami Siren and he reports similar issues via e-mail conversation. Sami says that ZK 3.3.5 apparently (without exhaustive tests) fixed the problem for him, but when I tried ZK 3.3.5 I saw the same issue. Of course with all the recent stuff with Ivy, I may have screwed up when/where the JARs were. So then I went back to ZK 3.3.4 and couldn't reproduce the problem. Which seems highly suspicious to me. It was failing every time before with 3.3.4, so it sounds like gremlins. And then I tried ZK 3.3.5 again (changed the ivy.xml in solrj, blew away the ZK 3.3.4, rebuilt, removed zoo_data, recopied example to three other directories) and it works fine there too now. Sh. Mostly this is a placeholder to insure we try this, I guarantee that sys admins will want to assign specific machines to specific shards, so this'll get used. was: I'm seeing some odd results when specifying shardId parameter. I'm trying the 4-node, 2-shard example from the Wiki and specifying shardIds like this: dir shardId start orderrunnng ZK port example 1 1 y8983 example22 2 y7574 example31 3 y8900 example42 4 y7500 And I'm waiting a bit between starting various examples to let ZK settle down. Once all of them are started, I was looking at http://localhost:8983/solr/#/~cloud?view=graph to check out what that looks like (pretty cool IMO, especially since I didn't have to do it). The problem was that shard 2 only reported a single instance, while shard1 showed the two instances I was expecting. I'm running with 3 embedded ZK instances, just for yucks. Interestingly the node that didn't show up was the only node that was NOT running ZK. When I removed all the shardId parameters, nuked zoo_data from all directories and just started them up (with numShards=2 on the bootstrap ZK node), all 4 nodes showed up just fine. When starting with shardId specified and trying to go straight to the admin interface on the node that wasn't showing up, I'd get odd errors like This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml:. I also couldn't search directly on that machine, http://localhost:7574/solr/select?q=*:*; returns a 404 error. Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar Command for one that works fine: java -Xmx1G -Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=1 -jar start.jar Sami Siren and he reports similar issues via e-mail conversation. Sami says that ZK 3.3.5 apparently (without exhaustive tests) fixed the problem for him, but when I tried
[jira] [Updated] (SOLR-3376) SolrCloud: Specifying shardId not working correctly, although the failures are inconsistent.
[ https://issues.apache.org/jira/browse/SOLR-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3376: - Description: I'm seeing some odd results when specifying shardId parameter. I'm trying the 4-node, 2-shard example from the Wiki and specifying shardIds like this: dir shardId start orderrunnng ZK port example 1 1 y8983 example22 2 y7574 example31 3 y8900 example42 4 y7500 And I'm waiting a bit between starting various examples to let ZK settle down. Once all of them are started, I was looking at http://localhost:8983/solr/#/~cloud?view=graph to check out what that looks like (pretty cool IMO, especially since I didn't have to do it). The problem was that shard 2 only reported a single instance, while shard1 showed the two instances I was expecting. I'm running with 3 embedded ZK instances, just for yucks. Interestingly the node that didn't show up was the only node that was NOT running ZK. When I removed all the shardId parameters, nuked zoo_data from all directories and just started them up (with numShards=2 on the bootstrap ZK node), all 4 nodes showed up just fine. When starting with shardId specified and trying to go straight to the admin interface on the node that wasn't showing up, I'd get odd errors like This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml:. I also couldn't search directly on that machine, http://localhost:7574/solr/select?q=*:*; returns a 404 error. Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar Command for one that works fine: java -Xmx1G -Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=1 -jar start.jar Sami Siren and he reports similar issues via e-mail conversation. Sami says that ZK 3.3.5 apparently (without exhaustive tests) fixed the problem for him, but when I tried ZK 3.3.5 I saw the same issue. Of course with all the recent stuff with Ivy, I may have screwed up when/where the JARs were. So then I went back to ZK 3.3.4 and couldn't reproduce the problem. Which seems highly suspicious to me. It was failing every time before with 3.3.4, so it sounds like gremlins. And then I tried ZK 3.3.5 again (changed the ivy.xml in solrj, blew away the ZK 3.3.4, rebuilt, removed zoo_data, recopied example to three other directories) and it works fine there too now. Sh. Mostly this is a placeholder to insure we try this, I guarantee that sys admins will want to assign specific machines to specific shards, so this'll get used. was: I'm seeing some odd results when specifying shardId parameter. I'm trying the 4-node, 2-shard example from the Wiki and specifying shardIds like this: dir shardId start orderrunnng ZK port example 1 1 y8983 example22 2 y7574 example31 3 y8900 example42 4 y7500 And I'm waiting a bit between starting various examples to let ZK settle down. Once all of them are started, I was looking at http://localhost:8983/solr/#/~cloud?view=graph to check out what that looks like (pretty cool IMO, especially since I didn't have to do it). The problem was that shard 2 only reported a single instance, while shard1 showed the two instances I was expecting. I'm running with 3 embedded ZK instances, just for yucks. Interestingly the node that didn't show up was the only node that was NOT running ZK. When I removed all the shardId parameters, nuked zoo_data from all directories and just started them up (with numShards=2 on the bootstrap ZK node), all 4 nodes showed up just fine. When starting with shardId specified and trying to go straight to the admin interface on the node that wasn't showing up, I'd get odd errors like This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml:. I also couldn't search directly on that machine, http://localhost:7574/solr/select?q=*:*; returns a 404 error. Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar Command for one that works fine: java -Xmx1G -Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=1 -jar start.jar Sami Siren and he reports similar issues via e-mail conversation. Sami says that ZK 3.5 apparently (without exhaustive tests) fixed the problem for him, but when I tried ZK 3.5
[jira] [Updated] (SOLR-3376) SolrCloud: Specifying shardId not working correctly, although the failures are inconsistent.
[ https://issues.apache.org/jira/browse/SOLR-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3376: - Description: I'm seeing some odd results when specifying shardId parameter. I'm trying the 4-node, 2-shard example from the Wiki and specifying shardIds like this: dir shardId start orderrunnng ZK port example 1 1 y8983 example22 2 y7574 example31 3 y8900 example42 4 y7500 And I'm waiting a bit between starting various examples to let ZK settle down. Once all of them are started, I was looking at http://localhost:8983/solr/#/~cloud?view=graph to check out what that looks like (pretty cool IMO, especially since I didn't have to do it). The problem was that shard 2 only reported a single instance, while shard1 showed the two instances I was expecting. I'm running with 3 embedded ZK instances, just for yucks. Interestingly the node that didn't show up was the only node that was NOT running ZK. When I removed all the shardId parameters, nuked zoo_data from all directories and just started them up (with numShards=2 on the bootstrap ZK node), all 4 nodes showed up just fine. When starting with shardId specified and trying to go straight to the admin interface on the node that wasn't showing up, I'd get odd errors like This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml:. I also couldn't search directly on that machine, http://localhost:7574/solr/select?q=*:*; returns a 404 error. Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar Command for one that works fine: java -Xmx1G -Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=1 -jar start.jar Sami Siren and he reports similar issues via e-mail conversation. Sami says that ZK 3.3.5 apparently (without exhaustive tests) fixed the problem for him, but when I tried ZK 3.3.5 I saw the same issue. Of course with all the recent stuff with Ivy, I may have screwed up when/where the JARs were. So then I went back to ZK 3.3.4 and couldn't reproduce the problem. Which seems highly suspicious to me. It was failing every time before with 3.3.4, so it sounds like gremlins. And then I tried ZK 3.3.5 again (changed the ivy.xml in solrj, blew away the ZK 3.3.4, rebuilt, removed zoo_data, recopied example to three other directories) and it works fine there too now. Sh. Mostly this is a placeholder to insure we try this, I guarantee that sys admins will want to assign specific machines to specific shards, so this'll get used. was: I'm seeing some odd results when specifying shardId parameter. I'm trying the 4-node, 2-shard example from the Wiki and specifying shardIds like this: {{{ dir shardId start orderrunnng ZK port example 1 1 y8983 example22 2 y7574 example31 3 y8900 example42 4 y7500 }}} And I'm waiting a bit between starting various examples to let ZK settle down. Once all of them are started, I was looking at http://localhost:8983/solr/#/~cloud?view=graph to check out what that looks like (pretty cool IMO, especially since I didn't have to do it). The problem was that shard 2 only reported a single instance, while shard1 showed the two instances I was expecting. I'm running with 3 embedded ZK instances, just for yucks. Interestingly the node that didn't show up was the only node that was NOT running ZK. When I removed all the shardId parameters, nuked zoo_data from all directories and just started them up (with numShards=2 on the bootstrap ZK node), all 4 nodes showed up just fine. When starting with shardId specified and trying to go straight to the admin interface on the node that wasn't showing up, I'd get odd errors like This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml:. I also couldn't search directly on that machine, http://localhost:7574/solr/select?q=*:*; returns a 404 error. Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar Command for one that works fine: java -Xmx1G -Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=1 -jar start.jar Sami Siren and he reports similar issues via e-mail conversation. Sami says that ZK 3.3.5 apparently (without exhaustive tests) fixed the problem for him, but when I tried
[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2242: - Attachment: SOLR-2242-3x.patch This patch applies against the 3.x code line, Bill you might want to check it, I had to do some merging by hand. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242-3x.patch, SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR-2242.solr35.patch, SOLR.2242.solr3.1.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3277) Dynamic fields do not respect concrete fields that happen to match a pattern.
[ https://issues.apache.org/jira/browse/SOLR-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3277: - Description: Here's a fragment of a schema file: fields field name=id type=string indexed=true stored=true required=true / field name=title_text type=text_general indexed=true stored=true multiValued=false / field name=title_phonetic type=phonetic indexed=true stored=true multiValued=false / dynamicField name=\*_text type=text_general indexed=true stored=false / dynamicField name=\*_phonetic type=phonetic indexed=true stored=false / /fields copyField source=\*_text dest=\*_phonetic / Here's an input doc: add doc field name=idID1/field field name=title_text1st Document/field field name=description_textAnother field/field /doc /add OK, add the doc with the above schema, and to a q=*:*fl=* The response does NOT contain title_phonetic. It looks like IndexSchema.registerCopyField won't notice that title_phonetic is a non-dynamic field and make a title_text - title_phonetic mapping. was: Here's a fragment of a schema file: fields field name=id type=string indexed=true stored=true required=true / field name=title_text type=text_general indexed=true stored=true multiValued=false / field name=title_phonetic type=phonetic indexed=true stored=true multiValued=false / dynamicField name=*_text type=text_general indexed=true stored=false / dynamicField name=*_phonetic type=phonetic indexed=true stored=false / /fields copyField source=*_text dest=*_phonetic / Here's an input doc: add doc field name=idID1/field field name=title_text1st Document/field field name=description_textAnother field/field /doc /add OK, add the doc with the above schema, and to a q=*:*fl=* The response does NOT contain title_phonetic. It looks like IndexSchema.registerCopyField won't notice that title_phonetic is a non-dynamic field and make a title_text - title_phonetic mapping. Dynamic fields do not respect concrete fields that happen to match a pattern. - Key: SOLR-3277 URL: https://issues.apache.org/jira/browse/SOLR-3277 Project: Solr Issue Type: Bug Affects Versions: 3.6, 4.0 Reporter: Erick Erickson Priority: Minor Fix For: 4.0 Here's a fragment of a schema file: fields field name=id type=string indexed=true stored=true required=true / field name=title_text type=text_general indexed=true stored=true multiValued=false / field name=title_phonetic type=phonetic indexed=true stored=true multiValued=false / dynamicField name=\*_text type=text_general indexed=true stored=false / dynamicField name=\*_phonetic type=phonetic indexed=true stored=false / /fields copyField source=\*_text dest=\*_phonetic / Here's an input doc: add doc field name=idID1/field field name=title_text1st Document/field field name=description_textAnother field/field /doc /add OK, add the doc with the above schema, and to a q=*:*fl=* The response does NOT contain title_phonetic. It looks like IndexSchema.registerCopyField won't notice that title_phonetic is a non-dynamic field and make a title_text - title_phonetic mapping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2921: - Attachment: SOLR-2921-trunk.patch SOLR-2921-3x.patch 3x r:1303937 Trunk r: 1303939 Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2921-3x.patch, SOLR-2921-3x.patch, SOLR-2921-3x.patch, SOLR-2921-trunk.patch SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3265) TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance
[ https://issues.apache.org/jira/browse/SOLR-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3265: - Affects Version/s: (was: 4.0) TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance - Key: SOLR-3265 URL: https://issues.apache.org/jira/browse/SOLR-3265 Project: Solr Issue Type: Test Affects Versions: 3.6 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.6 Attachments: SOLR-3265.patch When running ant test from the command line in 3.x, if you have a Solr server running then TestSolrentityProcessorEndToEnd fails since it uses the default port (stack trace with address already in use). This should use some other port, especially as 3.x ant test is taking 50+ minutes and I often open up a server to look at something else. In 4.0, some of the cloud tests also use 8983 as a port. Should these be changed too? And just to make my life *especially* interesting, at least one test puts the string 8983 in a document, which doesn't have to be changed G... Of course one can start your local server on a different port, but this seems trappy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2921: - Attachment: SOLR-2921-3x.patch Here's a first cut at these. The tests in TestFoldingMultitermExtrasQuery are especially weak, any help here would be extremely welcome Basically, I stole the patterns from the associated filters and removed the ones that failed for reasons I didn't understand. And I haven't checked the remaining all that carefully, I have some stuff coming up for most of the rest of today and wanted to get the first cut out in front of people. The attached patch applies against 3x, I'll need to tweak it for trunk but won't bother until after we finalize this. I also haven't run the full test suite, so this patch should NOT be committed yet. I'm not even going to try the following, I don't even know what to expect as proper results. If nobody steps up I'll split these out into another JIRA and hopefully someone with the appropriate knowledge (and keyboard) can volunteer: ArabicNormalizationFilterFactory HindiNormalizationFilterFactory IndicNormalizationFilterFactory PersianNormalizationFilterFactory ICUTransformFilterFactory Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Attachments: SOLR-2921-3x.patch SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2921: - Attachment: SOLR-2921-3x.patch Fixes test cases in analysis-extras so it runs from the command line not only in IntelliJ. Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Attachments: SOLR-2921-3x.patch, SOLR-2921-3x.patch SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-445: Fix Version/s: (was: 3.6) Update Handlers abort with bad documents Key: SOLR-445 URL: https://issues.apache.org/jira/browse/SOLR-445 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.3 Reporter: Will Johnson Assignee: Erick Erickson Fix For: 4.0 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml Has anyone run into the problem of handling bad documents / failures mid batch. Ie: add doc field name=id1/field /doc doc field name=id2/field field name=myDateFieldI_AM_A_BAD_DATE/field /doc doc field name=id3/field /doc /add Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2921: - Affects Version/s: (was: 3.6) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-445: Assignee: (was: Erick Erickson) Issue Type: Improvement (was: Bug) Update Handlers abort with bad documents Key: SOLR-445 URL: https://issues.apache.org/jira/browse/SOLR-445 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.3 Reporter: Will Johnson Fix For: 4.0 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml Has anyone run into the problem of handling bad documents / failures mid batch. Ie: add doc field name=id1/field /doc doc field name=id2/field field name=myDateFieldI_AM_A_BAD_DATE/field /doc doc field name=id3/field /doc /add Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3196) partialResults response header not propagated in distributed search
[ https://issues.apache.org/jira/browse/SOLR-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3196: - Attachment: SOLR-3196-3x.patch Patch didn't apply to 3x, apparently a few things moved around. Russel: Could you take a quick check and see if this looks OK for 3x? Any back-compat issues with changing what comes back in the responseHeader? partialResults response header not propagated in distributed search --- Key: SOLR-3196 URL: https://issues.apache.org/jira/browse/SOLR-3196 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.5, 4.0 Reporter: Russell Black Labels: patch Attachments: SOLR-3196-3x.patch, SOLR-3196-partialResults-header.patch For {{timeAllowed=true}} requests, the response contains a {{partialResults}} header that indicates when a search was terminated early due to running out of time. This header is being discarded by the collator. Patch to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3181) New Admin UI, allow user to somehow cut/paste all the old Zookeeper info.
[ https://issues.apache.org/jira/browse/SOLR-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3181: - Attachment: SOLR-3181.patch Should fix the problem with multiple escapes by using BytesRef.utf8ToString. New Admin UI, allow user to somehow cut/paste all the old Zookeeper info. --- Key: SOLR-3181 URL: https://issues.apache.org/jira/browse/SOLR-3181 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0 Environment: n/a Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Attachments: SOLR-3181.patch, SOLR-3181.patch When tracking down issues with ZK, the devs ask about various bits of data from the cloud pages. It would be convenient to be able to just capture all the data from the old /solr/admin/zookeeper.jsp page in the admin interface to be able to send it to anyone debugging the info. Perhaps just a get debug info for Apache. Or even more cool copy debug info to clipboard if that's possible. Is this just the raw data that the cloud view is manipulating? It doesn't have to be pretty although indentation would be nice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3079) Backport of Solr-1431 (CommComponent abstracted)
[ https://issues.apache.org/jira/browse/SOLR-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3079: - Attachment: SOLR-3079.patch The patch isn't in SVN format, looks like you made it with Git? The git repo is a shadow repository, not used for released code as far as I know. Through the magic of IntelliJ, I managed to apply the patch and I'm uploading that version. Can you take a look and see if it made it through the transformations OK? And any Git people out there; is there magic to make Git produce a SVN-compatibile patch? Seems like a good addition to the How to contribute page, lots of people seem to be using Git... Beyond that, I'll run the tests with it and report back if there's a problem. I'd really like someone who knows what this is all about to take a look before committing Meanwhile, keep prompting G Backport of Solr-1431 (CommComponent abstracted) Key: SOLR-3079 URL: https://issues.apache.org/jira/browse/SOLR-3079 Project: Solr Issue Type: New Feature Components: search Affects Versions: 3.5 Reporter: Greg Bowyer Attachments: 0001-Initial-backport-of-solr-cloud-ShardHandler.patch, SOLR-3079.patch Initial attempt at backporting the work done for Solr-1431 into the 3.x series -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3132) Reorganize LukeRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3132: - Attachment: SOLR-3132.patch Clears up SOLR-3121 crossed wires. Reorganize LukeRequestHandler - Key: SOLR-3132 URL: https://issues.apache.org/jira/browse/SOLR-3132 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Attachments: SOLR-3132.patch The LukeRequestHandler could made much easier to follow, and the overloading of numTerms is confusing. This was made possible by th ework on SOLR-3121 and that patch should be applied first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3121) Make new admin UI work better with big indexes
[ https://issues.apache.org/jira/browse/SOLR-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3121: - Attachment: SOLR-3121.patch Ryan: This looks great, it does what I'd hoped. I've never been all that happy with how the LukeRequestHandler was organized, so I've attached a patch that builds on yours and refactors LukeRequestHandler a bit. The old structure would go out and do the detailed information-gathering and then use it later, overloading numTerms all over the place. The patch just tries to get the detailed info when it should. It does require the fl field to get detailed info at any time though. Your patch changed the way we request fields, which made it possible to untangle the handler itself. Take a look and let me know. This re-structuring probably does NOT play nice with the old admin UI though, we really need to decide whether to stop worrying about the old UI and just cut over to this one. I know the new UI doesn't deal with cloud leaf-node expansion yet, see SOLR-3116. And it seems like this handles SOLR-3094 too. Make new admin UI work better with big indexes -- Key: SOLR-3121 URL: https://issues.apache.org/jira/browse/SOLR-3121 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Ryan McKinley Fix For: 4.0 Attachments: SOLR-3121-luke-admin-ui.patch, SOLR-3121-luke-admin-ui.patch, SOLR-3121.patch As reported in SOLR-2667, the admin UI gets pretty bad with big indexes. Mostly this seems the fault of excessive calls to luke and not limiting the number of terms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3121) Make new admin UI work better with big indexes
[ https://issues.apache.org/jira/browse/SOLR-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3121: - Attachment: SOLR-3121.patch Small change that restores the old Admin UI behavior. NOTE: The old UI behavior is going to be slow for large indexes since it does the enumeration of all the fields when you click schema browser. The right fix is to incorporate the new parameters in the right place in the old admin UI, but at least this doesn't change the old behavior, it just doesn't make it as nice as the new. Make new admin UI work better with big indexes -- Key: SOLR-3121 URL: https://issues.apache.org/jira/browse/SOLR-3121 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Ryan McKinley Fix For: 4.0 Attachments: SOLR-3121-luke-admin-ui.patch, SOLR-3121-luke-admin-ui.patch, SOLR-3121.patch, SOLR-3121.patch As reported in SOLR-2667, the admin UI gets pretty bad with big indexes. Mostly this seems the fault of excessive calls to luke and not limiting the number of terms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3111) LukeRequestHandler does not properly handle multi-field fl params. Wildcard should also be honored
[ https://issues.apache.org/jira/browse/SOLR-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3111: - Attachment: SOLR-3111-3x.patch SOLR-3111.patch NOTE: this needs to be applied after SOLR-1931 LukeRequestHandler does not properly handle multi-field fl params. Wildcard should also be honored -- Key: SOLR-3111 URL: https://issues.apache.org/jira/browse/SOLR-3111 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-3111-3x.patch, SOLR-3111.patch Specifying fl=field1 field2 for the LukeRequestHandler results in trying to find a field, you guessed it, field field2. Additionally, it makes sense for some future enhancements, to support fl=*. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3102) Document WordDelimiterFilterFactory types parameter.
[ https://issues.apache.org/jira/browse/SOLR-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3102: - Attachment: SOLR-3102.patch Trivial patch updating javadocs to include types parameter Document WordDelimiterFilterFactory types parameter. -- Key: SOLR-3102 URL: https://issues.apache.org/jira/browse/SOLR-3102 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Priority: Trivial Labels: Javadocs Attachments: SOLR-3102.patch Original Estimate: 1h Remaining Estimate: 1h SOLR-2059 added the ability to customize the mapping of specific characters to types (e.g. # could considered an ALPHA character if desired). But there's no documentation showing that this is an option. The Javadoc for the factory and the Wiki should have this added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3017) Allow edismax stopword filter factory implementation to be specified
[ https://issues.apache.org/jira/browse/SOLR-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3017: - Attachment: SOLR-3017.patch new version that: 1 removes the new schema file and just modifies schema12 instead. All tests pass with this change. 2 Adds null check to setStopwordFilterFactoryClass rather than where it's called. I guess theoretically someone could override this class, override setStopwordFilterFactoryClass, call it with null and set the member var to null then encounter an NPE in noStopwordFilterAnalyzer which they couldn't fix due to scope issues. But that doesn't sound like something we need to guard against at this point. If nobody objects, I'll commit this over the weekend or early next week. Allow edismax stopword filter factory implementation to be specified Key: SOLR-3017 URL: https://issues.apache.org/jira/browse/SOLR-3017 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Michael Dodsworth Priority: Minor Fix For: 4.0 Attachments: SOLR-3017-without-guava-alternative.patch, SOLR-3017.patch, SOLR-3017.patch, edismax_stop_filter_factory.patch Currently, the edismax query parser assumes that stopword filtering is being done by StopFilter: the removal of the stop filter is performed by looking for an instance of 'StopFilterFactory' (hard-coded) within the associated field's analysis chain. We'd like to be able to use our own stop filters whilst keeping the edismax stopword removal goodness. The supplied patch allows the stopword filter factory class to be supplied as a param, stopwordFilterClassName. If no value is given, the default (StopFilterFactory) is used. Another option I looked into was to extend StopFilterFactory to create our own filter. Unfortunately, StopFilterFactory's 'create' method returns StopFilter, not TokenStream. StopFilter is also final. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3094) The statistics entry on the new admin UI is very slow
[ https://issues.apache.org/jira/browse/SOLR-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3094: - Attachment: SOLR-3094.patch OK, anyone with good javascript skills, this would be a good time to chime in... This is a variant of SOLR-1931. The new UI calls Luke at the top level in such a way that it enumerates all the terms in all the fields to gather the histogram data, which takes a long time. Note, this is what the old admin UI/Luke handler did when you clicked schema browser link. Once that data is accumulated, then clicking on the individual fields and showing that data is very fast since the data is local. But this data is accumulated *before* any field is selected from the schema browser drop-down and stored away. I think this design is too costly, especially the get all the data for all the fields up-front bit. The users pay a penalty (many minutes demonstrated) even when they may only care about one field. So here's what I propose. 1 Tweak the LukeRequestHandler so it *requires* the fieldName parameter to gather the historgram data. That fixes the initial display of the stats issue that sparked this JIRA. I can do that in a few minutes, patch attached (do not commit yet, though). Problem is there is then no way at all to get the stats data. 2 Tweak the javascript to call the luke request handler to collect the data for individual fields only when the user selects them from the drop-down, stowing them away at that point so they can be revisited if desired. Here's where I could use some help, my javascript skills are rudimentary at best. If anyone could work the javascript I'd be happy to field test. Or even just put some comments in the code pointing me to them. Any trunk code from after 6-Jan will have the right Luke handler in it (see SOLR-1931). There's also something wrong with the display of the histogram, the bucket and count in each bucket are mashed together on the bottom. With non-trivial indexes, this is largely unreadable since they're side-by-side... Anyway, the attached patch makes it so you can get into the admin page without paying the above penalties, but you *never* get histogram data when you go into schema browser. If someone applies this to work on the admin UI bit, attaching fl=field1 field2 to the luke URL will cause the histogram data to be returned for the field(s) specified. If anyone has some spare cycles to help out here it would be outstanding. I think something similar could be done for the old admin UI as well in terms of only getting the fields when requested, otherwise the histogram data won't be returned either... The statistics entry on the new admin UI is very slow - Key: SOLR-3094 URL: https://issues.apache.org/jira/browse/SOLR-3094 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0 Environment: trunk only, all environments Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-3094.patch Prompted by Robert Reynolds (SOLR-2667), the entry point in the new Admin UI core drill down (e.g. clicking singlecore takes a long time. 28-46 *minutes* on a 13M-23M doc set. On an example Wikipedia index (11M) docs, I see 21 seconds, compared to less than 2 seconds in the old admin UI (I'm using the old admin UI linked to from the new UI page on trunk). I have a very simple index layout compared to a commercial site. Clearly something is not right. I suspect that all the terms are being walked. This is particularly an issue because this behavior happens when I click singlecore, so getting to the really neat parts of the new UI is hard. Robert reports on a separate thread that the same behavior happens just hitting admin/luke in the URL which is also slow in the 3.x world, which hints at where the problem lies. I'm going to guess that the terms are being walked and we can use the tricks used in SOLR-1931 to deal with the fact that admin/luke takes a long time, and just change the call to the entry (singlecore) for this issue. Robert: Thanks for pointing this out! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3032) Deprecate logOnce from SolrException
[ https://issues.apache.org/jira/browse/SOLR-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3032: - Attachment: SOLR-3032-3x.patch Just deprecates the various c'tors etc that are removed in the trunk patch. Deprecate logOnce from SolrException Key: SOLR-3032 URL: https://issues.apache.org/jira/browse/SOLR-3032 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Labels: exceptions, logging Fix For: 4.0 Attachments: SOLR-3032-3x.patch, SOLR-3032.patch There seems to be a growing consensus (well, Muir and Hoss agree at least) that having this logOnce concept in SolrException is more trouble than it's worth. Point in case is that trunk (4x) fails to report anything useful in the log file when you define a custom component and don't have any lib statements going to the right place. So the proposal is to remove the whole logOnce process, supporting variables etc. The first step here will be deprecating the various bits of code in SolrException and starting to remove their usages. I'm opening this up for discussion, error reporting seems to be one of those things that generates endless discussion and I'd like them aired before putting too much work into this. My goal will be to have this in the code base by next Tuesday, so speak up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3022) AbstractPluginLoader does not log caught exceptions
[ https://issues.apache.org/jira/browse/SOLR-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3022: - Attachment: SOLR-3022.patch Final version of patch. AbstractPluginLoader does not log caught exceptions --- Key: SOLR-3022 URL: https://issues.apache.org/jira/browse/SOLR-3022 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: James Dyer Assignee: Erick Erickson Priority: Trivial Fix For: 4.0 Attachments: SOLR-3022.patch, SOLR-3022.patch, SOLR-3022.patch, SOLR-3022.patch I was setting up a new 4.x environment but forgot to put a custom Analyzer in the classpath. Unfortunately AbstractPluginLoader didn't log the exception and it took a long time for me to figure out why No cores were created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3022) AbstractPluginLoader does not log caught exceptions
[ https://issues.apache.org/jira/browse/SOLR-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3022: - Affects Version/s: 3.6 AbstractPluginLoader does not log caught exceptions --- Key: SOLR-3022 URL: https://issues.apache.org/jira/browse/SOLR-3022 Project: Solr Issue Type: Bug Affects Versions: 3.6, 4.0 Reporter: James Dyer Assignee: Erick Erickson Priority: Trivial Fix For: 4.0 Attachments: SOLR-3022.patch, SOLR-3022.patch, SOLR-3022.patch, SOLR-3022.patch I was setting up a new 4.x environment but forgot to put a custom Analyzer in the classpath. Unfortunately AbstractPluginLoader didn't log the exception and it took a long time for me to figure out why No cores were created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3032) Deprecate logOnce from SolrException
[ https://issues.apache.org/jira/browse/SOLR-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-3032: - Attachment: SOLR-3032.patch OK, here's a first cut. The rule I tried to follow (and I need to go over it again with fresh eyes) was that if an exception was re-thrown, logging was unnecessary so I took it out. As a bonus, SolrConfig.severeErrors is gone as is all the stuff around CoreContainer.abortOnConfigurationError. Most of this is unutterably boring, but take a look at SolrDispatchFilter, the real changes are there. I'll add deprecation notices to the 3x code, but won't change anything else there. I'm putting this out for comments. All tests pass, but I'm not sure tests do much to deal with logging so that probably only proves that things compile. I'll look this over again tomorrow, then I expcet I'll commit on Sunday/Monday unless there are howls of protest. And I just want to add that modern IDEs make this far too easy. Back in MY day, *real* programmers used *real* editors. See: http://xkcd.com/378/ Deprecate logOnce from SolrException Key: SOLR-3032 URL: https://issues.apache.org/jira/browse/SOLR-3032 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Labels: exceptions, logging Fix For: 4.0 Attachments: SOLR-3032.patch There seems to be a growing consensus (well, Muir and Hoss agree at least) that having this logOnce concept in SolrException is more trouble than it's worth. Point in case is that trunk (4x) fails to report anything useful in the log file when you define a custom component and don't have any lib statements going to the right place. So the proposal is to remove the whole logOnce process, supporting variables etc. The first step here will be deprecating the various bits of code in SolrException and starting to remove their usages. I'm opening this up for discussion, error reporting seems to be one of those things that generates endless discussion and I'd like them aired before putting too much work into this. My goal will be to have this in the code base by next Tuesday, so speak up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2987) ExternalFileField With Invalid TrieField Key
[ https://issues.apache.org/jira/browse/SOLR-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2987: - Attachment: SOLR-2987-3x.patch SOLR-2987.patch Latest patch ExternalFileField With Invalid TrieField Key Key: SOLR-2987 URL: https://issues.apache.org/jira/browse/SOLR-2987 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.5, 3.6, 4.0 Reporter: Michael Garski Priority: Minor Attachments: SOLR-2987-3x.patch, SOLR-2987.patch, eff_key_error.patch The current error handling in reading an external file field only catches an error when parsing the float value on a line, which then skips that line. If the key field is a trie field, such as a TrieIntField, and the key value in the file cannot be parsed to an int, loading of the entire file fails. Shouldn't the call to get the indexed value of the key should be in the same try/catch as the float parsing for the line? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes
[ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1931: - Attachment: SOLR-1931-trunk.patch SOLR-1931-3x.patch Final patches attached. All honor unto whoever wrote the tests for the binary writers, I discovered that a TreeMap is unacceptable. In other words, all the tests pass now. Unless there are objections, I intend to commit these tomorrow or Friday. Schema Browser does not scale with large indexes Key: SOLR-1931 URL: https://issues.apache.org/jira/browse/SOLR-1931 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 3.6, 4.0 Reporter: Lance Norskog Assignee: Erick Erickson Priority: Minor Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch, SOLR-1931-3x.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch The Schema Browser JSP by default causes the Luke handler to scan the world. In large indexes this make the UI useless. On an index with 64m documents 8gb of disk space, the Schema Browser took 6 minutes to open and hogged all disk I/O, making Solr useless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes
[ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1931: - Attachment: SOLR-1931-3x.patch SOLR-1931-trunk.patch Thanks Robert and Yonik for pointing me at the new 4x capabilities, they make a huge difference. But you knew that. The killer for 3.x was getting the document counts via a range query, I don't think there's a good way to get the counts and not pay the penalty, so there's a new parameter recordDocCounts. Here's my latest and close-to-last cut at this, both for 3x and 4x. The data set is 89M documents, times in seconds. 3.5 637 getting doc counts 3x with this patch 552 getting doc counts 53 Stats without doc counts, but histogram etc. No option to do this before. 4x, original 450 or so as I remember, getting doc counts, histograms, etc.. 4x with patch, histograms still work. 158 Getting the doc counts the old way (span queries). I mean, you guys *said* ranges were going to be faster. 39 Getting the doc counts with terms.getDocCount(). (including histograms) Here's my proposal, I'll probably commit this next weekend at the latest unless there are objections: 1 I'll let these stew for a couple of days, and look them over again. Anyone who wants to look too, please feel free. 2 Live with getting the doc counts in 4x including the deleted docs and remove the reportDocCounts parameter (it'll live in 3.6 and other 3x versions). I think the performance is fine without carrying that kind of kludgy option forward. I could be persuaded otherwise, but an optimized index will take care of the counting of deleted documents problem if anyone really cares. Schema Browser does not scale with large indexes Key: SOLR-1931 URL: https://issues.apache.org/jira/browse/SOLR-1931 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 3.6, 4.0 Reporter: Lance Norskog Assignee: Erick Erickson Priority: Minor Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch The Schema Browser JSP by default causes the Luke handler to scan the world. In large indexes this make the UI useless. On an index with 64m documents 8gb of disk space, the Schema Browser took 6 minutes to open and hogged all disk I/O, making Solr useless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes
[ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1931: - Attachment: SOLR-1931-trunk.patch Trunk that, you know, actually compiles or something, mea culpa. Also reduces the 4x time down to 15 seconds after fixing a stupid oversight. Really gotta let this stew for a while and look at it with less-tired eyes. Schema Browser does not scale with large indexes Key: SOLR-1931 URL: https://issues.apache.org/jira/browse/SOLR-1931 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 3.6, 4.0 Reporter: Lance Norskog Assignee: Erick Erickson Priority: Minor Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch The Schema Browser JSP by default causes the Luke handler to scan the world. In large indexes this make the UI useless. On an index with 64m documents 8gb of disk space, the Schema Browser took 6 minutes to open and hogged all disk I/O, making Solr useless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes
[ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1931: - Affects Version/s: (was: 1.4) 4.0 3.6 Schema Browser does not scale with large indexes Key: SOLR-1931 URL: https://issues.apache.org/jira/browse/SOLR-1931 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 3.6, 4.0 Reporter: Lance Norskog Assignee: Erick Erickson Priority: Minor Attachments: SOLR-1931-3x.patch, SOLR-1931-trunk.patch The Schema Browser JSP by default causes the Luke handler to scan the world. In large indexes this make the UI useless. On an index with 64m documents 8gb of disk space, the Schema Browser took 6 minutes to open and hogged all disk I/O, making Solr useless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes
[ https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-1931: - Attachment: SOLR-1931-trunk.patch SOLR-1931-3x.patch Well, there are a couple of issues here. I've attached patches for trunk and 3x for consideration. I fixed a structural flaw that traversed all the terms in all the fields twice, once to get the total number of terms across all the fields and once to get the individual counts. But that's not where the bulk of the time gets spent. It turns out that getting the count of documents in which each term appears is the culprit. These two lines are executed for each field Query q = new TermRangeQuery(fieldName, null, null, false, false); TopDocs top = searcher.search(q, 1); and top.totalHits is reported. I have an index with 99M documents, mostly integer data that takes 360 seconds to return data when the above is executed and 150 without. Both versions traverse all the terms once, so these times would be greater without the patch due to the second traversal. So the attached patches default to NOT doing the above and there's a new parameter reportDocCount that can be set to true to collect that information. What do people think? And is there a better way to get the count of documents in which the term appears? And do any alternate methods respect deleted docs like this one does? I tried spinning through using TermDocs (3.6) but soon realized that the people who wrote TermRangeQuery probably got there first. So I guess my real question is whether people object to the change in behavior, that users must explicitly request doc counts. Which also means that the admin/schema browser doesn't report this by default and I haven't made it optional from that interface. I'm not inclined to since that interface is going away, but if people feel strongly I might be persuaded. That info is available by admin/luke?fl=myfieldreportDocCount=true in a less painful fashion for a particular field anyway. Along the way I alphabetized the fields without my other kludge of putting comparators in other classes. I'll kill that JIRA if this one goes forward. Note that this still doesn't scale all that well, on my test index it's still a 5 minute wait. But then I guess that this kind of data gathering will take time by its nature. If nobody objects, I'll commit this early next week after I've had a chance to put it down for a while and look at it with fresh eyes and do some more testing. I think there's some inefficiencies in the single pass that I can wring out (about 30 seconds is spent just gathering the data in the single term enumeration loop). Schema Browser does not scale with large indexes Key: SOLR-1931 URL: https://issues.apache.org/jira/browse/SOLR-1931 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 3.6, 4.0 Reporter: Lance Norskog Priority: Minor Attachments: SOLR-1931-3x.patch, SOLR-1931-trunk.patch The Schema Browser JSP by default causes the Luke handler to scan the world. In large indexes this make the UI useless. On an index with 64m documents 8gb of disk space, the Schema Browser took 6 minutes to open and hogged all disk I/O, making Solr useless. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2994) Solr no longer compiles in IntelliJ
[ https://issues.apache.org/jira/browse/SOLR-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2994: - Attachment: SOLR-2994.patch Fixes the problem on my machine in 3x. I'll probably see about 4x later. Solr no longer compiles in IntelliJ --- Key: SOLR-2994 URL: https://issues.apache.org/jira/browse/SOLR-2994 Project: Solr Issue Type: Bug Components: Build Affects Versions: 3.6, 4.0 Environment: IntelliJ Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Labels: build Attachments: SOLR-2994.patch Running the ant idea target no longer creates an IntelliJ environment that is consistent, I'm getting package org.apache.lucene.analysis.phonetic does not exist. It looks like the phonetic package moved from lucene to contrib? Note that command-line ant task continues to work just fine. I'll attach a patch that fixes it for me, but I'd really like someone who understands Idea (Steve, are you listening?) system take a look to see if it's OK. It's a magnificent single line in solr.iml. I'm assuming this is also a problem for 4.x, I'll probably be in that environment later today and see. I have no idea whether Eclipse suffers from the same problem. I've assigned it to myself just for tracking, anyone who can glance at it and say yeah, that's right please feel free to just check it in for 3x and 4x if applicable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2989) Solr admin (Luke request handler) doesn't order the fields alphabetically
[ https://issues.apache.org/jira/browse/SOLR-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2989: - Attachment: SOLR-2989.patch First cut at a patch. This is for 3x because that's where I happened to be working, but if we carry this forward, I can put it on trunk to I assume. Solr admin (Luke request handler) doesn't order the fields alphabetically - Key: SOLR-2989 URL: https://issues.apache.org/jira/browse/SOLR-2989 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.6, 4.0 Environment: all Reporter: Erick Erickson Priority: Minor Attachments: SOLR-2989.patch It's always bugged me that the fields list for admin/schema browser haven't been alphabetical. We have users who have 100s of fields and it's hard to orient in an unordered list. I'll attach a patch momentarily that starts moves toward this. The thing I need someone to render judgement on is whether implementing the Comparable interface on SchemaField and FieldType are in any way dangerous. Note that they only compare on name, secondary and tertiary sources are unnecessary I think. The other interesting bit is that the list of fields is actually (apparently) fetched in two stages. The first stage gets the ones in the schema and the second one gets dynamic fields that have been realized. So the fields section actually has two separate ordered sections. Which is kind of ugly, but given the new admin interface coming in 4.x I don't feel the urge to fix this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2906) Implement LFU Cache
[ https://issues.apache.org/jira/browse/SOLR-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2906: - Attachment: SOLR-2906.patch This should be the final patch. Added the stuff to actually get the parameter from solrconfig timeDecay which ages out the cache entries as we've discussed. Added tests to insure that it gets through from the config file. Shawn: If you'd add some data to the Wiki about this new parameter, that would be a good thing. If nobody objects, I'll probably check this in in the next couple of days. Since they're all new files, the patch will apply to both trunk and 3x cleanly. Implement LFU Cache --- Key: SOLR-2906 URL: https://issues.apache.org/jira/browse/SOLR-2906 Project: Solr Issue Type: Sub-task Components: search Affects Versions: 3.4 Reporter: Shawn Heisey Assignee: Erick Erickson Priority: Minor Attachments: ConcurrentLFUCache.java, LFUCache.java, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, TestLFUCache.java Implement an LFU (Least Frequently Used) cache as the first step towards a full ARC cache -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2242: - Attachment: SOLR-2242.patch First step in resurrecting this. This patch should apply cleanly to trunk. It incorporates the SOLR-2242.patch from 28-June and the NmFacetTermsFacetsTest from 9-July. It accounts for the fact that things seem to have been moved around a bit. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Assignee: Erick Erickson Priority: Minor Fix For: 4.0 Attachments: NumFacetTermsFacetsTest.java, SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2906) Implement LFU Cache
[ https://issues.apache.org/jira/browse/SOLR-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2906: - Attachment: SOLR-2906.patch Here's what I had in mind, at least I *think* this will do but all I've done is insured that the code compiles and the current LFU test suite runs. Look in the diff for timeDecay. This still needs some proof that the new parameter comes through from a schema file. Let me know if that presents a problem or if you can't get 'round to it, I might have some time over Christmas. I think maybe you were under the impression that this had already been done and were looking for it to be in the code already? Implement LFU Cache --- Key: SOLR-2906 URL: https://issues.apache.org/jira/browse/SOLR-2906 Project: Solr Issue Type: Sub-task Components: search Affects Versions: 3.4 Reporter: Shawn Heisey Assignee: Erick Erickson Priority: Minor Attachments: ConcurrentLFUCache.java, LFUCache.java, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, TestLFUCache.java Implement an LFU (Least Frequently Used) cache as the first step towards a full ARC cache -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2906) Implement LFU Cache
[ https://issues.apache.org/jira/browse/SOLR-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2906: - Attachment: SOLR-2906.patch Updated patch that divides by 2 and adds a unit test for aging out. Shawn: Could you add in the optional time decay as Yonik suggests? I agree that it seems like the right thing is to have this on by default. At that point, I think it'll be ready to check in. We can add documentation as we can. We could also check it in as is and raise another JIRA. Implement LFU Cache --- Key: SOLR-2906 URL: https://issues.apache.org/jira/browse/SOLR-2906 Project: Solr Issue Type: Sub-task Components: search Affects Versions: 3.4 Reporter: Shawn Heisey Assignee: Erick Erickson Priority: Minor Attachments: ConcurrentLFUCache.java, LFUCache.java, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, TestLFUCache.java Implement an LFU (Least Frequently Used) cache as the first step towards a full ARC cache -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2906) Implement LFU Cache
[ https://issues.apache.org/jira/browse/SOLR-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2906: - Attachment: SOLR-2906.patch Mostly cosmetic changes: Changed acceptableLimit to acceptableSize to keep it named consistently Formatted all the files Implemented Yonik's aging suggestion (but no tests, there doesn't seem to be a clean way to implement a test here without creating debug-only code). I'm not wholly convinced that dividing by 4 is the right thing to do here; it'll tend to flatten all the entries making removal somewhat arbitrary as after a few passes anything with hits in the low range will collapse to zero. That said, though, since the little adventure with lastAccessed, all entries with the same number of hits will be treated as LRU so I guess it works. Marked code as experimental Commented out some debugging code Implement LFU Cache --- Key: SOLR-2906 URL: https://issues.apache.org/jira/browse/SOLR-2906 Project: Solr Issue Type: Sub-task Components: search Affects Versions: 3.4 Reporter: Shawn Heisey Assignee: Erick Erickson Priority: Minor Attachments: ConcurrentLFUCache.java, LFUCache.java, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, TestLFUCache.java Implement an LFU (Least Frequently Used) cache as the first step towards a full ARC cache -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2975) Solr test failure when running under Java 1.5
[ https://issues.apache.org/jira/browse/SOLR-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2975: - Attachment: SOLR-2975.patch Running full test now, will check in shortly unless someone objects. Applies to both trunk and 3x Solr test failure when running under Java 1.5 - Key: SOLR-2975 URL: https://issues.apache.org/jira/browse/SOLR-2975 Project: Solr Issue Type: Test Components: contrib - DataImportHandler Affects Versions: 3.5, 3.6 Environment: Java 1.5 only. OS X Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2975.patch ant test -Dtestcase=TestSolrEntityProcessorUnit fails when running under Java 1.5 because of faulty assumptions in the test. From e-mail thread (Hossman): ...those lines are assuming that row.entrySet will return something that has a predictible iteration order, but row is a Map of unknown creation (returned by the entityProcessor) ... so unless the entityProcessor is explicitly defined as returning something like SortedMap (which isn't suggested anywhere in this test) the test is making a really bad assumption. From e-mail. (Steven Rowe) FYI, I see this same failure when I run the branch_3x tests with Java 1.5, but not 1.6. and Oh, and the reason Jenkins isn't seeing this failure is that it runs branch_3x tests using Java 1.6, after first *compiling* with Java 1.5 Even though we won't run Solr 4 under java 1.5, I'll change it there anyway since this is a bad assumption in the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2971) ExternalFileFields fail if valType='float', and valType should be optional
[ https://issues.apache.org/jira/browse/SOLR-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2971: - Attachment: SOLR-2971.patch SOLR-2971-3x.patch I think these patches may be ready to apply. The only thing that makes me at all nervous is the magic of calling deleteCore in the tests. The 3x tests consistently failed without it, but trunk worked just fine. So I put the call in both. Sorry, there's a bit of gratuitous formatting in there, but it's pretty much whitespace only Of course the 3x tests were enough different than the 4x ones that it needed a different patch. Siiigggh. The actual core code changes are identical though. For an issue this small, is there any reason to add anything to CHANGES.txt? ExternalFileFields fail if valType='float', and valType should be optional -- Key: SOLR-2971 URL: https://issues.apache.org/jira/browse/SOLR-2971 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.5, 4.0 Environment: all Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2971-3x.patch, SOLR-2971.patch, SOLR-2971.patch valType has never done anything except throw an error, the underlying ValueSource has always been a FileFloatSource. To add to the confusion, the documents say use float, which throws an exception on Solr startup every since float was re-defined as a TrieFloatField. pfloat works currently though. Since valType is never used for anything, we should make it optional until such a time as it is. Additionally, TrieFloatField (valtype=float|tfloat) types should be OK as a field type along with FloatField(valType=pfloat) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2971) ExternalFileFields fail if valType='float', and valType should be optional
[ https://issues.apache.org/jira/browse/SOLR-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2971: - Attachment: SOLR-2971.patch Patch for trunk. I haven't run full regression tests against it yet, but I think it's pretty solid. I'll commit in a day or two unless there are objections... ExternalFileFields fail if valType='float', and valType should be optional -- Key: SOLR-2971 URL: https://issues.apache.org/jira/browse/SOLR-2971 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.5, 4.0 Environment: all Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2971.patch valType has never done anything except throw an error, the underlying ValueSource has always been a FileFloatSource. To add to the confusion, the documents say use float, which throws an exception on Solr startup every since float was re-defined as a TrieFloatField. pfloat works currently though. Since valType is never used for anything, we should make it optional until such a time as it is. Additionally, TrieFloatField (valtype=float|tfloat) types should be OK as a field type along with FloatField(valType=pfloat) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2509) spellcheck: StringIndexOutOfBoundsException: String index out of range: -1
[ https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2509: - Attachment: SOLR-2509.patch Here's the updated patch. The only difference between this and the original is that I changed the failing test to expect pixmaa rather than pixma-a-b-c-d-e-f-g. If nobody objects, I'll commit this tomorrow (7-Dec) on both trunk and 3x. spellcheck: StringIndexOutOfBoundsException: String index out of range: -1 -- Key: SOLR-2509 URL: https://issues.apache.org/jira/browse/SOLR-2509 Project: Solr Issue Type: Bug Affects Versions: 3.1 Environment: Debian Lenny JAVA Version 1.6.0_20 Reporter: Thomas Gambier Assignee: Erick Erickson Priority: Blocker Attachments: SOLR-2509.patch, SOLR-2509.patch, SOLR-2509.patch, document.xml, schema.xml, solrconfig.xml Hi, I'm a french user of SOLR and i've encountered a problem since i've installed SOLR 3.1. I've got an error with this query : cle_frbr:LYSROUGE1149-73190 *SEE COMMENTS BELOW* I've tested to escape the minus char and the query worked : cle_frbr:LYSROUGE1149(BACKSLASH)-73190 But, strange fact, if i change one letter in my query it works : cle_frbr:LASROUGE1149-73190 I've tested the same query on SOLR 1.4 and it works ! Can someone test the query on next line on a 3.1 SOLR version and tell me if he have the same problem ? yourfield:LYSROUGE1149-73190 Where do the problem come from ? Thank you by advance for your help. Tom -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2438: - Attachment: SOLR-2438-3x.patch backport MultiTermAware version of this patch to 3.6. Again, before applying this patch you probably need to apply the 3x patch from 25-Nov. Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Assignee: Erick Erickson Fix For: 3.6, 4.0 Attachments: SOLR-2438-3x.patch, SOLR-2438-3x.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438_3x.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2921: - Description: SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory was: SOLR-2918, which drastically improves the approach of SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2438: - Attachment: SOLR-2438.patch SOLR-2438_3x.patch Patches as of the commit Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Assignee: Erick Erickson Fix For: 3.6, 4.0 Attachments: SOLR-2438-3x.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438_3x.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2438: - Attachment: SOLR-2438-3x.patch Here's what the 3x version would look like if anyone's interested. There's some refactoring that was done between 3.x and 4.0 that made reconciling these a bit of a pain. Still need to modify the CHANGES files. I'll commit these tomorrow sometime if nobody objects. Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Assignee: Erick Erickson Attachments: SOLR-2438-3x.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2438: - Attachment: SOLR-2438.patch OK, this patch does a better job with the matchVersion as per Muir. If nobody objects I'll commit it this week, probably not before Wednesday though. Then I should be able to do the backport to 3.6 shortly thereafter. I still have to run all the tests yet again, but I don't really expect much of a problem. Should SOLR 218, 219 and 757 all be closed as part of 2438? Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Assignee: Erick Erickson Attachments: SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2438: - Attachment: SOLR-2438.patch I think this patch is ready for scrutiny. Tests run successfully. I have yet to do several things: 1 update README 2 add an example to example/schema.xml 3 this is going to take a writeup on the Wiki I think, explaining that there's another (optional) section to a fieldType. Any suggestions where that should go? Originally, I'd hoped to back-port it to 3.5, but the more I look at it the more I'd like it to bake a while before being officially released and target 3.6 instead for the back-port. Can one back-port something like this after the first RC is cut or is it better to wait until after the release? I can always commit this to trunk and open another JIRA to backport after 3.5 is released. Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Assignee: Erick Erickson Attachments: SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2438: - Attachment: SOLR-2438.patch OK, this isn't nearly finished yet, but I wanted to run it by folks to see if the approach is what, particularly, Robert and Yonik have in mind. I'm assuming that the flex stuff is out of scope for this JIRA, right? Don't waste your time on details just yet, only the general approach. I'm thinking of allowing a flag to the fields to disable this functionality but make this the default, thoughts? Haven't even thought about back-porting to 3x, but it looks do-able on a quick glance. Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Assignee: Erick Erickson Attachments: SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2438: - Attachment: SOLR-2438.patch Here's a rough cut at what I *think* Yonik might have been talking about, comments? I haven't done a thing about efficiency here, just seeing if the new method in the FilterFactories (processQueryTerm) makes sense to y'all. One thing I'm not clear on: Would it make more sense to just instantiate a new instance of the filter and run each term through it rather than steal bits from the underlying Filters (see ASCIIFoldingFilterFactory and LowercaseFilterFactory for example). I just hate duplicated code but I'm not sure how efficient creating a new filter and running the token through would be for each and every token. Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Assignee: Erick Erickson Attachments: SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2438: - Attachment: SOLR-2438.patch This is not at all ready for prime-time but I'm inviting comments on the approach. It turns out that all the hard work has already been done, see QueryParserBase. The attached patch is almost all tests... But I greatly fear that I'm grossly misusing QueryParserBase.lowercaseExpandedTerms, which looks like it's for parameters on the query line? Where did *that* come from anyway? Or what the heck is it supposed to be used for, anyone know? A couple of thing make me nervous about this approach. It depends in a pretty hard-coded way on detecting LowerCaseFilterFactory and LowerCaseTokenizerFactory, if anyone adds anything else in there it'll have to be re-coded. Is there a better way? It almost seems like a flag on the field definition as Peter suggested is a more robust way of going about things. Anyway, I'm getting way past the point of diminishing returns tonight, so I thought I'd at least throw this out for comment. Ignore everything with the ASCIIFoldingFilterFactory, I detect it but don't do anything with it yet. And I can't seem to make the reversed test work, even without the casing switch. Which means I should put it down for the evening, I'm obviously fried. Anybody feeling kind can uncomment the line that starts: // make me work and get the test class to work. It's probably trivial but I'm not seeing it. Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Assignee: Erick Erickson Attachments: SOLR-2438.patch, SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types
[ https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2134: - Attachment: SOLR-2134-tests.patch Added some tests for dates. Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types - Key: SOLR-2134 URL: https://issues.apache.org/jira/browse/SOLR-2134 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Ryan McKinley Assignee: Erick Erickson Fix For: 4.0 Attachments: SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, SOLR-2134-tests.patch With the changes in LUCENE-2649, the FieldCache also returns if the bit is valid or not. This is enough to support sortMissingLast=true with Trie* fields. Then we can get rid of the Sortable* fields -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2881) Trie fields should support sortMissingLast=true
[ https://issues.apache.org/jira/browse/SOLR-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2881: - Attachment: SOLR-2881.patch SOLR-2134 fixes this issue for 4.x, this patch applies only to the 3x branch Trie fields should support sortMissingLast=true - Key: SOLR-2881 URL: https://issues.apache.org/jira/browse/SOLR-2881 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.5, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 3.5, 4.0 Attachments: SOLR-2881-3x.patch, SOLR-2881.patch Spinoff from SOLR-2134. The consensus is that the way sortMissingFirst is done in 3x is superior to 4x and when that is done (see LUCENE-3443) then the sortMissingFirst code should be incorporated into both. As of now, however, the Trie fields in 4.0 support sortMissingFirst but not yet in 3.x -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2881) Trie fields should support sortMissingLast=true
[ https://issues.apache.org/jira/browse/SOLR-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2881: - Attachment: SOLR-2881-3x.patch I think this is ready to commit if we clear one thing up. Look at the tests and you'll see that default sorting for dates is a special case. The sorting behavior for dates is, indeed, different from longs when sortMissingFirst/Last are not specified. The behavior is consistent with 3.3 (it was handy to test 3.3 rather than 3.4) however, so neither LUCENE-3443 nor this patch change sorting in this case. I'd like to commit this tomorrow (Sunday). Since the reconciliation process is a bit interesting between Mike's and my changes, I think that a patch for each is preferable, but we know I'm merge challenged. Note also that Mike, as part of 3441, made the parallel set of changes for 4.x already. That said, I'm going to create a small 4.x patch that changes the example schema.xml and incorporates the date test from this patch. I'll attach that file to SOLR-2134 Trie fields should support sortMissingLast=true - Key: SOLR-2881 URL: https://issues.apache.org/jira/browse/SOLR-2881 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.5, 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 3.5, 4.0 Attachments: SOLR-2881-3x.patch Spinoff from SOLR-2134. The consensus is that the way sortMissingFirst is done in 3x is superior to 4x and when that is done (see LUCENE-3443) then the sortMissingFirst code should be incorporated into both. As of now, however, the Trie fields in 4.0 support sortMissingFirst but not yet in 3.x -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2876) Precedence operator in conditionals with ternary operator needs to be examined.
[ https://issues.apache.org/jira/browse/SOLR-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2876: - Attachment: SOLR-2876.patch I went ahead and looked at the ternary operator (those I could find with grep) and here's the results. Not sure it's worth doing, anyone want to chime in? Although this construct is exciting... luceneSort || sortMissingFirst !reverse || sortMissingLast reverse ? : zzz; Precedence operator in conditionals with ternary operator needs to be examined. --- Key: SOLR-2876 URL: https://issues.apache.org/jira/browse/SOLR-2876 Project: Solr Issue Type: Bug Affects Versions: 3.5, 4.0 Environment: all Reporter: Erick Erickson Assignee: Erick Erickson Labels: operator, precedence, ternary Attachments: SOLR-2876.patch This is an offshoot of 2829 where the root of the bug was that precedence in the ternary operator along with without appropriate parentheses was a problem. this.parser == null ? other.parser == null : this.parser.getClass() == other.parser.getClass() (from ShortFieldSource.java). So that got me curious whether this pattern was repeated. A quick grep with the following REs produced one hit I wasn't related to 2829 with and more with || (3x code base). I'll try to get to it over the weekend. Please don't grab it just yet, I'm fixing this partially for 2829, but if anyone wants to try the grep and see if I'm hallucinating, I'd appreciate it. I'd *really* appreciate any tests for things people see... Some of the returns are false hits, but not others. See SolrIndexSearcher.getDocListAndSetNC() the last line is: return pf.filter==null pf.postFilter==null ? qr.getDocSet() : null; REs (using them in IntelliJ) \|\|[\sa-z\.0-9A-Z]+==.*\? [\sa-z\.0-9A-Z]+==.*\? I got some hits with the above and didn't pursue it any further, but if anyone wants to suggest more comprehensive REs, please attach. I'm trying for or || followed by anything without an open parentheses followed by == followed by anything followed by ? I'd rather get a manageable number of false positives than miss things. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2876) Precedence operator in conditionals with ternary operator needs to be examined.
[ https://issues.apache.org/jira/browse/SOLR-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2876: - Priority: Trivial (was: Major) Issue Type: Improvement (was: Bug) I don't see anything else that looks wrong, so what do people think about doing this? Precedence operator in conditionals with ternary operator needs to be examined. --- Key: SOLR-2876 URL: https://issues.apache.org/jira/browse/SOLR-2876 Project: Solr Issue Type: Improvement Affects Versions: 3.5, 4.0 Environment: all Reporter: Erick Erickson Assignee: Erick Erickson Priority: Trivial Labels: operator, precedence, ternary Attachments: SOLR-2876.patch This is an offshoot of 2829 where the root of the bug was that precedence in the ternary operator along with without appropriate parentheses was a problem. this.parser == null ? other.parser == null : this.parser.getClass() == other.parser.getClass() (from ShortFieldSource.java). So that got me curious whether this pattern was repeated. A quick grep with the following REs produced one hit I wasn't related to 2829 with and more with || (3x code base). I'll try to get to it over the weekend. Please don't grab it just yet, I'm fixing this partially for 2829, but if anyone wants to try the grep and see if I'm hallucinating, I'd appreciate it. I'd *really* appreciate any tests for things people see... Some of the returns are false hits, but not others. See SolrIndexSearcher.getDocListAndSetNC() the last line is: return pf.filter==null pf.postFilter==null ? qr.getDocSet() : null; REs (using them in IntelliJ) \|\|[\sa-z\.0-9A-Z]+==.*\? [\sa-z\.0-9A-Z]+==.*\? I got some hits with the above and didn't pursue it any further, but if anyone wants to suggest more comprehensive REs, please attach. I'm trying for or || followed by anything without an open parentheses followed by == followed by anything followed by ? I'd rather get a manageable number of false positives than miss things. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2829) Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields
[ https://issues.apache.org/jira/browse/SOLR-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2829: - Attachment: SOLR-2829.patch Final patch. Renamed variable as per Hoss. I hate it when he's right. Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields -- Key: SOLR-2829 URL: https://issues.apache.org/jira/browse/SOLR-2829 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.4, 4.0 Environment: N/A Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker Fix For: 3.5 Attachments: SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch I don't know how generic this is, whether it's just a problem with fqs when combined with spatial or whether it has wider applicability , but here's what I know so far. Marc Tinnemeyer in a post titled: Regarding geodist and multiple location fields outlines this. I checked this on 3.4 and trunk and it's weird in both cases. HOLD THE PRESSES: After looking at this a bit more, it looks like a caching issue, NOT a geodist issue. When I bounce Solr between changing the sfield from home to work, it seems to work as expected. H, very strange. If I comment out BOTH the filterCache and queryResultCache then it works fine. Switching from home to work in the query finds/fails to find the document. But commenting out only one of those caches doesn't fix the problem. on trunk I used this query; just flipping home to work and back: http://localhost:8983/solr/select?q=id:1fq={!geofilt sfield=home pt=52.67,7.30 d=5} The info below is what I used to test. From Marc's posts: field name=home type=location indexed=true stored=true/ field name=work type=location indexed=true stored=true/ field name=elsewhere type=location indexed=true stored=true/ At first I thought so too. Here is a simple document. add doc field name=id1/field field name=namefirst/field field name=work48.60,11.61/field field name=home52.67,7.30/field /doc /add and here is the result that shouldn't be: response ... str name=q*:*/str str name=fq{!geofilt sfield=work pt=52.67,7.30 d=5}/str ... /lst /lst result name=response numFound=1 start=0 doc str name=home52.67,7.30/str str name=id1/str str name=namefirst/str str name=work48.60,11.61/str /doc /result /response Yonik's comment** It's going to be a bug in an equals() implementation somewhere in the query. The top level equals will be SpatialDistanceQuery.equals() (from LatLonField.java) On trunk, I already see a bug introduced when the new lucene field cache stuff was done. DoubleValueSource now just inherits it's equals method from NumericFieldCacheSource... and that equals() method only tests if the CachedArrayCreator.getClass() is the same! That's definitely wrong. I don't know why 3x would also have this behavior (unless there's more than one bug!) Anyway, first step is to modify the spatial tests to catch the bug... from there it should be pretty easy to debug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2829) Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields
[ https://issues.apache.org/jira/browse/SOLR-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2829: - Fix Version/s: 4.0 Added fix version of 4.0 Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields -- Key: SOLR-2829 URL: https://issues.apache.org/jira/browse/SOLR-2829 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.4, 4.0 Environment: N/A Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker Fix For: 3.5, 4.0 Attachments: SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch I don't know how generic this is, whether it's just a problem with fqs when combined with spatial or whether it has wider applicability , but here's what I know so far. Marc Tinnemeyer in a post titled: Regarding geodist and multiple location fields outlines this. I checked this on 3.4 and trunk and it's weird in both cases. HOLD THE PRESSES: After looking at this a bit more, it looks like a caching issue, NOT a geodist issue. When I bounce Solr between changing the sfield from home to work, it seems to work as expected. H, very strange. If I comment out BOTH the filterCache and queryResultCache then it works fine. Switching from home to work in the query finds/fails to find the document. But commenting out only one of those caches doesn't fix the problem. on trunk I used this query; just flipping home to work and back: http://localhost:8983/solr/select?q=id:1fq={!geofilt sfield=home pt=52.67,7.30 d=5} The info below is what I used to test. From Marc's posts: field name=home type=location indexed=true stored=true/ field name=work type=location indexed=true stored=true/ field name=elsewhere type=location indexed=true stored=true/ At first I thought so too. Here is a simple document. add doc field name=id1/field field name=namefirst/field field name=work48.60,11.61/field field name=home52.67,7.30/field /doc /add and here is the result that shouldn't be: response ... str name=q*:*/str str name=fq{!geofilt sfield=work pt=52.67,7.30 d=5}/str ... /lst /lst result name=response numFound=1 start=0 doc str name=home52.67,7.30/str str name=id1/str str name=namefirst/str str name=work48.60,11.61/str /doc /result /response Yonik's comment** It's going to be a bug in an equals() implementation somewhere in the query. The top level equals will be SpatialDistanceQuery.equals() (from LatLonField.java) On trunk, I already see a bug introduced when the new lucene field cache stuff was done. DoubleValueSource now just inherits it's equals method from NumericFieldCacheSource... and that equals() method only tests if the CachedArrayCreator.getClass() is the same! That's definitely wrong. I don't know why 3x would also have this behavior (unless there's more than one bug!) Anyway, first step is to modify the spatial tests to catch the bug... from there it should be pretty easy to debug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2829) Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields
[ https://issues.apache.org/jira/browse/SOLR-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2829: - Attachment: SOLR-2829-3x.patch Attached the 3x patch, reconciling these is kinda unpleasant. Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields -- Key: SOLR-2829 URL: https://issues.apache.org/jira/browse/SOLR-2829 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.4, 4.0 Environment: N/A Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker Fix For: 3.5, 4.0 Attachments: SOLR-2829-3x.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch I don't know how generic this is, whether it's just a problem with fqs when combined with spatial or whether it has wider applicability , but here's what I know so far. Marc Tinnemeyer in a post titled: Regarding geodist and multiple location fields outlines this. I checked this on 3.4 and trunk and it's weird in both cases. HOLD THE PRESSES: After looking at this a bit more, it looks like a caching issue, NOT a geodist issue. When I bounce Solr between changing the sfield from home to work, it seems to work as expected. H, very strange. If I comment out BOTH the filterCache and queryResultCache then it works fine. Switching from home to work in the query finds/fails to find the document. But commenting out only one of those caches doesn't fix the problem. on trunk I used this query; just flipping home to work and back: http://localhost:8983/solr/select?q=id:1fq={!geofilt sfield=home pt=52.67,7.30 d=5} The info below is what I used to test. From Marc's posts: field name=home type=location indexed=true stored=true/ field name=work type=location indexed=true stored=true/ field name=elsewhere type=location indexed=true stored=true/ At first I thought so too. Here is a simple document. add doc field name=id1/field field name=namefirst/field field name=work48.60,11.61/field field name=home52.67,7.30/field /doc /add and here is the result that shouldn't be: response ... str name=q*:*/str str name=fq{!geofilt sfield=work pt=52.67,7.30 d=5}/str ... /lst /lst result name=response numFound=1 start=0 doc str name=home52.67,7.30/str str name=id1/str str name=namefirst/str str name=work48.60,11.61/str /doc /result /response Yonik's comment** It's going to be a bug in an equals() implementation somewhere in the query. The top level equals will be SpatialDistanceQuery.equals() (from LatLonField.java) On trunk, I already see a bug introduced when the new lucene field cache stuff was done. DoubleValueSource now just inherits it's equals method from NumericFieldCacheSource... and that equals() method only tests if the CachedArrayCreator.getClass() is the same! That's definitely wrong. I don't know why 3x would also have this behavior (unless there's more than one bug!) Anyway, first step is to modify the spatial tests to catch the bug... from there it should be pretty easy to debug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2829) Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields
[ https://issues.apache.org/jira/browse/SOLR-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2829: - Attachment: SOLR-2829.patch Patch for the 3x code line, if I don't get any objections, I'll merge it with trunk and commit over the weekend. All tests pass. The code changes aren't as interesting as the tests, anyone want to recommend improvements? I verified that the tests catch short, float, long, byte and double if the parens aren't added. Had to add a few types to the default schema.xml. I realize that the tests specific to LatLon are redundant, they're caught by the double test. But I don't see any harm leaving them in. Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields -- Key: SOLR-2829 URL: https://issues.apache.org/jira/browse/SOLR-2829 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.4, 4.0 Environment: N/A Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker Fix For: 3.5 Attachments: SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch I don't know how generic this is, whether it's just a problem with fqs when combined with spatial or whether it has wider applicability , but here's what I know so far. Marc Tinnemeyer in a post titled: Regarding geodist and multiple location fields outlines this. I checked this on 3.4 and trunk and it's weird in both cases. HOLD THE PRESSES: After looking at this a bit more, it looks like a caching issue, NOT a geodist issue. When I bounce Solr between changing the sfield from home to work, it seems to work as expected. H, very strange. If I comment out BOTH the filterCache and queryResultCache then it works fine. Switching from home to work in the query finds/fails to find the document. But commenting out only one of those caches doesn't fix the problem. on trunk I used this query; just flipping home to work and back: http://localhost:8983/solr/select?q=id:1fq={!geofilt sfield=home pt=52.67,7.30 d=5} The info below is what I used to test. From Marc's posts: field name=home type=location indexed=true stored=true/ field name=work type=location indexed=true stored=true/ field name=elsewhere type=location indexed=true stored=true/ At first I thought so too. Here is a simple document. add doc field name=id1/field field name=namefirst/field field name=work48.60,11.61/field field name=home52.67,7.30/field /doc /add and here is the result that shouldn't be: response ... str name=q*:*/str str name=fq{!geofilt sfield=work pt=52.67,7.30 d=5}/str ... /lst /lst result name=response numFound=1 start=0 doc str name=home52.67,7.30/str str name=id1/str str name=namefirst/str str name=work48.60,11.61/str /doc /result /response Yonik's comment** It's going to be a bug in an equals() implementation somewhere in the query. The top level equals will be SpatialDistanceQuery.equals() (from LatLonField.java) On trunk, I already see a bug introduced when the new lucene field cache stuff was done. DoubleValueSource now just inherits it's equals method from NumericFieldCacheSource... and that equals() method only tests if the CachedArrayCreator.getClass() is the same! That's definitely wrong. I don't know why 3x would also have this behavior (unless there's more than one bug!) Anyway, first step is to modify the spatial tests to catch the bug... from there it should be pretty easy to debug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org