from:"Erick Erickson \(Updated\) \(JIRA\)"

[jira] [Updated] (SOLR-3376) SolrCloud: Specifying shardId not working correctly, although the failures are inconsistent.

2012-04-18 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-3376:
-

Description:
I'm seeing some odd results when specifying shardId parameter. I'm trying the
4-node, 2-shard example from the Wiki and specifying shardIds like this:
{{{
dir shardId start orderrunnng ZK port
example 1 1 y8983
example22 2 y7574
example31 3 y8900
example42 4 y7500
}}}
And I'm waiting a bit between starting various examples to let ZK settle down.

Once all of them are started, I was looking at
http://localhost:8983/solr/#/~cloud?view=graph to check out what that looks
like (pretty cool IMO, especially since I didn't have to do it). The problem
was that shard 2 only reported a single instance, while shard1 showed the two
instances I was expecting. I'm running with 3 embedded ZK instances, just for
yucks. Interestingly the node that didn't show up was the only node that was
NOT running ZK.

When I removed all the shardId parameters, nuked zoo_data from all
directories and just started them up (with numShards=2 on the bootstrap ZK
node), all 4 nodes showed up just fine.

When starting with shardId specified and trying to go straight to the admin
interface on the node that wasn't showing up, I'd get odd errors like This
interface requires that you activate the admin request handlers, add the
following configuration to your solrconfig.xml:. I also couldn't search
directly on that machine, http://localhost:7574/solr/select?q=*:*; returns a
404 error.

Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar

Command for one that works fine: java -Xmx1G
-Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900
-DshardId=1 -jar start.jar

Sami Siren and he reports similar issues via e-mail conversation. Sami says
that ZK 3.3.5 apparently (without exhaustive tests) fixed the problem for him,
but when I tried ZK 3.3.5 I saw the same issue. Of course with all the recent
stuff with Ivy, I may have screwed up when/where the JARs were.

So then I went back to ZK 3.3.4 and couldn't reproduce the problem. Which seems
highly suspicious to me. It was failing every time before with 3.3.4, so it
sounds like gremlins.

And then I tried ZK 3.3.5 again (changed the ivy.xml in solrj, blew away the ZK
3.3.4, rebuilt, removed zoo_data, recopied example to three other directories)
and it works fine there too now. Sh. Mostly this is a placeholder to
insure we try this, I guarantee that sys admins will want to assign specific
machines to specific shards, so this'll get used.

was:
I'm seeing some odd results when specifying shardId parameter. I'm trying the
4-node, 2-shard example from the Wiki and specifying shardIds like this:

dir shardId start orderrunnng ZK port
example 1 1 y8983
example22 2 y7574
example31 3 y8900
example42 4 y7500

And I'm waiting a bit between starting various examples to let ZK settle down.

When I removed all the shardId parameters, nuked zoo_data from all
directories and just started them up (with numShards=2 on the bootstrap ZK
node), all 4 nodes showed up just fine.

Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar

Command for one that works fine: java -Xmx1G
-Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900
-DshardId=1 -jar start.jar

Sami Siren and he reports similar issues via e-mail conversation. Sami says
that ZK 3.3.5 apparently (without exhaustive tests) fixed the problem for him,
but when I tried

[jira] [Updated] (SOLR-3376) SolrCloud: Specifying shardId not working correctly, although the failures are inconsistent.

2012-04-18 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-3376:
-

Description:
I'm seeing some odd results when specifying shardId parameter. I'm trying the
4-node, 2-shard example from the Wiki and specifying shardIds like this:

dir shardId start orderrunnng ZK port
example 1 1 y8983
example22 2 y7574
example31 3 y8900
example42 4 y7500

And I'm waiting a bit between starting various examples to let ZK settle down.

When I removed all the shardId parameters, nuked zoo_data from all
directories and just started them up (with numShards=2 on the bootstrap ZK
node), all 4 nodes showed up just fine.

Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar

Command for one that works fine: java -Xmx1G
-Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900
-DshardId=1 -jar start.jar

So then I went back to ZK 3.3.4 and couldn't reproduce the problem. Which seems
highly suspicious to me. It was failing every time before with 3.3.4, so it
sounds like gremlins.

was:

I'm seeing some odd results when specifying shardId parameter. I'm trying the
4-node, 2-shard example from the Wiki and specifying shardIds like this:

dir shardId start orderrunnng ZK port
example 1 1 y8983
example22 2 y7574
example31 3 y8900
example42 4 y7500

And I'm waiting a bit between starting various examples to let ZK settle down.

When I removed all the shardId parameters, nuked zoo_data from all
directories and just started them up (with numShards=2 on the bootstrap ZK
node), all 4 nodes showed up just fine.

Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar

Command for one that works fine: java -Xmx1G
-Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900
-DshardId=1 -jar start.jar

Sami Siren and he reports similar issues via e-mail conversation. Sami says
that ZK 3.5 apparently (without exhaustive tests) fixed the problem for him,
but when I tried ZK 3.5

[jira] [Updated] (SOLR-3376) SolrCloud: Specifying shardId not working correctly, although the failures are inconsistent.

2012-04-18 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-3376:
-

Description:
I'm seeing some odd results when specifying shardId parameter. I'm trying the
4-node, 2-shard example from the Wiki and specifying shardIds like this:

dir shardId start orderrunnng ZK port
example 1 1 y8983
example22 2 y7574
example31 3 y8900
example42 4 y7500

And I'm waiting a bit between starting various examples to let ZK settle down.

When I removed all the shardId parameters, nuked zoo_data from all
directories and just started them up (with numShards=2 on the bootstrap ZK
node), all 4 nodes showed up just fine.

Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar

Command for one that works fine: java -Xmx1G
-Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900
-DshardId=1 -jar start.jar

So then I went back to ZK 3.3.4 and couldn't reproduce the problem. Which seems
highly suspicious to me. It was failing every time before with 3.3.4, so it
sounds like gremlins.

was:
I'm seeing some odd results when specifying shardId parameter. I'm trying the
4-node, 2-shard example from the Wiki and specifying shardIds like this:
{{{
dir shardId start orderrunnng ZK port
example 1 1 y8983
example22 2 y7574
example31 3 y8900
example42 4 y7500
}}}
And I'm waiting a bit between starting various examples to let ZK settle down.

When I removed all the shardId parameters, nuked zoo_data from all
directories and just started them up (with numShards=2 on the bootstrap ZK
node), all 4 nodes showed up just fine.

Command starting server that's giving me trouble: java -Xmx1G -Djetty.port=7500
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -DshardId=2 -jar start.jar

Command for one that works fine: java -Xmx1G
-Djetty.port=8900 -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900
-DshardId=1 -jar start.jar

Sami Siren and he reports similar issues via e-mail conversation. Sami says
that ZK 3.3.5 apparently (without exhaustive tests) fixed the problem for him,
but when I tried

[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field

2012-03-27 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2242:
-

Attachment: SOLR-2242-3x.patch

This patch applies against the 3.x code line, Bill you might want to check it, 
I had to do some merging by hand.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2242-3x.patch, SOLR-2242-solr40-3.patch, 
 SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR-2242.solr35.patch, SOLR.2242.solr3.1.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3277) Dynamic fields do not respect concrete fields that happen to match a pattern.

2012-03-26 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3277:
-

Description: 
Here's a fragment of a schema file:
fields
  field name=id type=string indexed=true stored=true required=true /
  field name=title_text type=text_general indexed=true stored=true 
multiValued=false /
  field name=title_phonetic type=phonetic indexed=true stored=true 
multiValued=false /

  dynamicField name=\*_text type=text_general indexed=true 
stored=false /
  dynamicField name=\*_phonetic type=phonetic indexed=true 
stored=false /
 /fields
 copyField source=\*_text dest=\*_phonetic /

Here's an input doc:
add
 doc
   field name=idID1/field
   field name=title_text1st Document/field
   field name=description_textAnother field/field
 /doc
/add

OK, add the doc with the above schema, and to a q=*:*fl=*

The response does NOT contain title_phonetic.

It looks like IndexSchema.registerCopyField won't notice that
title_phonetic is a non-dynamic field and make a title_text -
title_phonetic mapping.



  was:
Here's a fragment of a schema file:
fields
  field name=id type=string indexed=true stored=true required=true /
  field name=title_text type=text_general indexed=true stored=true 
multiValued=false /
  field name=title_phonetic type=phonetic indexed=true stored=true 
multiValued=false /

  dynamicField name=*_text type=text_general indexed=true stored=false 
/
  dynamicField name=*_phonetic type=phonetic indexed=true stored=false 
/
 /fields
 copyField source=*_text dest=*_phonetic /

Here's an input doc:
add
 doc
   field name=idID1/field
   field name=title_text1st Document/field
   field name=description_textAnother field/field
 /doc
/add

OK, add the doc with the above schema, and to a q=*:*fl=*

The response does NOT contain title_phonetic.

It looks like IndexSchema.registerCopyField won't notice that
title_phonetic is a non-dynamic field and make a title_text -
title_phonetic mapping.




 Dynamic fields do not respect concrete fields that happen to match a pattern.
 -

 Key: SOLR-3277
 URL: https://issues.apache.org/jira/browse/SOLR-3277
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6, 4.0
Reporter: Erick Erickson
Priority: Minor
 Fix For: 4.0


 Here's a fragment of a schema file:
 fields
   field name=id type=string indexed=true stored=true required=true 
 /
   field name=title_text type=text_general indexed=true stored=true 
 multiValued=false /
   field name=title_phonetic type=phonetic indexed=true stored=true 
 multiValued=false /
   dynamicField name=\*_text type=text_general indexed=true 
 stored=false /
   dynamicField name=\*_phonetic type=phonetic indexed=true 
 stored=false /
  /fields
  copyField source=\*_text dest=\*_phonetic /
 Here's an input doc:
 add
  doc
field name=idID1/field
field name=title_text1st Document/field
field name=description_textAnother field/field
  /doc
 /add
 OK, add the doc with the above schema, and to a q=*:*fl=*
 The response does NOT contain title_phonetic.
 It looks like IndexSchema.registerCopyField won't notice that
 title_phonetic is a non-dynamic field and make a title_text -
 title_phonetic mapping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2012-03-22 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2921:
-

Attachment: SOLR-2921-trunk.patch
SOLR-2921-3x.patch

3x r:1303937
Trunk r: 1303939

 Make any Filters, Tokenizers and CharFilters implement 
 MultiTermAwareComponent if they should
 -

 Key: SOLR-2921
 URL: https://issues.apache.org/jira/browse/SOLR-2921
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-2921-3x.patch, SOLR-2921-3x.patch, 
 SOLR-2921-3x.patch, SOLR-2921-trunk.patch


 SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr 
 to automatically assemble a multiterm analyzer that does the right thing 
 vis-a-vis transforming the individual terms of a multi-term query at query 
 time. Examples are: lower casing, folding accents, etc. Currently 
 (27-Nov-2011), the following classes implement MultiTermAwareComponent:
  * ASCIIFoldingFilterFactory
  * LowerCaseFilterFactory
  * LowerCaseTokenizerFactory
  * MappingCharFilterFactory
  * PersianCharFilterFactory
 When users put any of the above in their query analyzer, Solr will do the 
 right thing at query time and the perennial question users have, why didn't 
 my wildcard query automatically lower-case (or accent fold or) my terms? 
 will be gone. Die question die!
 But taking a quick look, for instance, at the various FilterFactories that 
 exist, there are a number of possibilities that *might* be good candidates 
 for implementing MultiTermAwareComponent. But I really don't understand the 
 correct behavior here well enough to know whether these should implement the 
 interface or not. And this doesn't include other CharFilters or Tokenizers.
 Actually implementing the interface is often trivial, see the classes above 
 for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which 
 is the right thing in this case.
 Here is a quick cull of the Filters that, just from their names, might be 
 candidates. If anyone wants to take any of them on, that would be great. If 
 all you can do is provide test cases, I could probably do the code part, just 
 let me know.
 ArabicNormalizationFilterFactory
 GreekLowerCaseFilterFactory
 HindiNormalizationFilterFactory
 ICUFoldingFilterFactory
 ICUNormalizer2FilterFactory
 ICUTransformFilterFactory
 IndicNormalizationFilterFactory
 ISOLatin1AccentFilterFactory
 PersianNormalizationFilterFactory
 RussianLowerCaseFilterFactory
 TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3265) TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance

2012-03-22 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3265:
-

Affects Version/s: (was: 4.0)

 TestSolrEntityProcessorEndToEnd fails if you have a running Solr instance
 -

 Key: SOLR-3265
 URL: https://issues.apache.org/jira/browse/SOLR-3265
 Project: Solr
  Issue Type: Test
Affects Versions: 3.6
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 3.6

 Attachments: SOLR-3265.patch


 When running ant test from the command line in 3.x, if you have a Solr 
 server running then TestSolrentityProcessorEndToEnd fails since it uses the 
 default port (stack trace with address already in use). This should use 
 some other port, especially as 3.x ant test is taking 50+ minutes and I 
 often open up a server to look at something else.
 In 4.0, some of the cloud tests also use 8983 as a port. Should these be 
 changed too?
 And just to make my life *especially* interesting, at least one test puts the 
 string 8983 in a document, which doesn't have to be changed G...
 Of course one can start your local server on a different port, but this seems 
 trappy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2012-03-21 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2921:
-

Attachment: SOLR-2921-3x.patch

Here's a first cut at these. The tests in TestFoldingMultitermExtrasQuery are
especially weak, any help here would be extremely welcome

Basically, I stole the patterns from the associated filters and removed the
ones that failed for reasons I didn't understand. And I haven't checked the
remaining all that carefully, I have some stuff coming up for most of the rest
of today and wanted to get the first cut out in front of people.

The attached patch applies against 3x, I'll need to tweak it for trunk but
won't bother until after we finalize this.

I also haven't run the full test suite, so this patch should NOT be committed
yet.

I'm not even going to try the following, I don't even know what to expect as
proper results. If nobody steps up I'll split these out into another JIRA and
hopefully someone with the appropriate knowledge (and keyboard) can volunteer:
ArabicNormalizationFilterFactory
HindiNormalizationFilterFactory
IndicNormalizationFilterFactory
PersianNormalizationFilterFactory
ICUTransformFilterFactory

Make any Filters, Tokenizers and CharFilters implement
MultiTermAwareComponent if they should
-

Key: SOLR-2921
URL: https://issues.apache.org/jira/browse/SOLR-2921
Project: Solr
Issue Type: Improvement
Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
Attachments: SOLR-2921-3x.patch

SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr
to automatically assemble a multiterm analyzer that does the right thing
vis-a-vis transforming the individual terms of a multi-term query at query
time. Examples are: lower casing, folding accents, etc. Currently
(27-Nov-2011), the following classes implement MultiTermAwareComponent:
* ASCIIFoldingFilterFactory
* LowerCaseFilterFactory
* LowerCaseTokenizerFactory
* MappingCharFilterFactory
* PersianCharFilterFactory
When users put any of the above in their query analyzer, Solr will do the
right thing at query time and the perennial question users have, why didn't
my wildcard query automatically lower-case (or accent fold or) my terms?
will be gone. Die question die!
But taking a quick look, for instance, at the various FilterFactories that
exist, there are a number of possibilities that *might* be good candidates
for implementing MultiTermAwareComponent. But I really don't understand the
correct behavior here well enough to know whether these should implement the
interface or not. And this doesn't include other CharFilters or Tokenizers.
Actually implementing the interface is often trivial, see the classes above
for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which
is the right thing in this case.
Here is a quick cull of the Filters that, just from their names, might be
candidates. If anyone wants to take any of them on, that would be great. If
all you can do is provide test cases, I could probably do the code part, just
let me know.
ArabicNormalizationFilterFactory
GreekLowerCaseFilterFactory
HindiNormalizationFilterFactory
ICUFoldingFilterFactory
ICUNormalizer2FilterFactory
ICUTransformFilterFactory
IndicNormalizationFilterFactory
ISOLatin1AccentFilterFactory
PersianNormalizationFilterFactory
RussianLowerCaseFilterFactory
TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2012-03-21 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2921:
-

Attachment: SOLR-2921-3x.patch

Fixes test cases in analysis-extras so it runs from the command line not only 
in IntelliJ.

 Make any Filters, Tokenizers and CharFilters implement 
 MultiTermAwareComponent if they should
 -

 Key: SOLR-2921
 URL: https://issues.apache.org/jira/browse/SOLR-2921
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-2921-3x.patch, SOLR-2921-3x.patch


 SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr 
 to automatically assemble a multiterm analyzer that does the right thing 
 vis-a-vis transforming the individual terms of a multi-term query at query 
 time. Examples are: lower casing, folding accents, etc. Currently 
 (27-Nov-2011), the following classes implement MultiTermAwareComponent:
  * ASCIIFoldingFilterFactory
  * LowerCaseFilterFactory
  * LowerCaseTokenizerFactory
  * MappingCharFilterFactory
  * PersianCharFilterFactory
 When users put any of the above in their query analyzer, Solr will do the 
 right thing at query time and the perennial question users have, why didn't 
 my wildcard query automatically lower-case (or accent fold or) my terms? 
 will be gone. Die question die!
 But taking a quick look, for instance, at the various FilterFactories that 
 exist, there are a number of possibilities that *might* be good candidates 
 for implementing MultiTermAwareComponent. But I really don't understand the 
 correct behavior here well enough to know whether these should implement the 
 interface or not. And this doesn't include other CharFilters or Tokenizers.
 Actually implementing the interface is often trivial, see the classes above 
 for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which 
 is the right thing in this case.
 Here is a quick cull of the Filters that, just from their names, might be 
 candidates. If anyone wants to take any of them on, that would be great. If 
 all you can do is provide test cases, I could probably do the code part, just 
 let me know.
 ArabicNormalizationFilterFactory
 GreekLowerCaseFilterFactory
 HindiNormalizationFilterFactory
 ICUFoldingFilterFactory
 ICUNormalizer2FilterFactory
 ICUTransformFilterFactory
 IndicNormalizationFilterFactory
 ISOLatin1AccentFilterFactory
 PersianNormalizationFilterFactory
 RussianLowerCaseFilterFactory
 TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-445) Update Handlers abort with bad documents

2012-03-20 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-445:


Fix Version/s: (was: 3.6)

 Update Handlers abort with bad documents
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Erick Erickson
 Fix For: 4.0

 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
 SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2012-03-20 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2921:
-

Affects Version/s: (was: 3.6)

 Make any Filters, Tokenizers and CharFilters implement 
 MultiTermAwareComponent if they should
 -

 Key: SOLR-2921
 URL: https://issues.apache.org/jira/browse/SOLR-2921
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor

 SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr 
 to automatically assemble a multiterm analyzer that does the right thing 
 vis-a-vis transforming the individual terms of a multi-term query at query 
 time. Examples are: lower casing, folding accents, etc. Currently 
 (27-Nov-2011), the following classes implement MultiTermAwareComponent:
  * ASCIIFoldingFilterFactory
  * LowerCaseFilterFactory
  * LowerCaseTokenizerFactory
  * MappingCharFilterFactory
  * PersianCharFilterFactory
 When users put any of the above in their query analyzer, Solr will do the 
 right thing at query time and the perennial question users have, why didn't 
 my wildcard query automatically lower-case (or accent fold or) my terms? 
 will be gone. Die question die!
 But taking a quick look, for instance, at the various FilterFactories that 
 exist, there are a number of possibilities that *might* be good candidates 
 for implementing MultiTermAwareComponent. But I really don't understand the 
 correct behavior here well enough to know whether these should implement the 
 interface or not. And this doesn't include other CharFilters or Tokenizers.
 Actually implementing the interface is often trivial, see the classes above 
 for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which 
 is the right thing in this case.
 Here is a quick cull of the Filters that, just from their names, might be 
 candidates. If anyone wants to take any of them on, that would be great. If 
 all you can do is provide test cases, I could probably do the code part, just 
 let me know.
 ArabicNormalizationFilterFactory
 GreekLowerCaseFilterFactory
 HindiNormalizationFilterFactory
 ICUFoldingFilterFactory
 ICUNormalizer2FilterFactory
 ICUTransformFilterFactory
 IndicNormalizationFilterFactory
 ISOLatin1AccentFilterFactory
 PersianNormalizationFilterFactory
 RussianLowerCaseFilterFactory
 TurkishLowerCaseFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-445) Update Handlers abort with bad documents

2012-03-20 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-445:


  Assignee: (was: Erick Erickson)
Issue Type: Improvement  (was: Bug)

 Update Handlers abort with bad documents
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
 Fix For: 4.0

 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
 SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3196) partialResults response header not propagated in distributed search

2012-03-13 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3196:
-

Attachment: SOLR-3196-3x.patch

Patch didn't apply to 3x, apparently a few things moved around.

Russel: 
Could you take a quick check and see if this looks OK for 3x?

Any back-compat issues with changing what comes back in the responseHeader?

 partialResults response header not propagated in distributed search
 ---

 Key: SOLR-3196
 URL: https://issues.apache.org/jira/browse/SOLR-3196
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.5, 4.0
Reporter: Russell Black
  Labels: patch
 Attachments: SOLR-3196-3x.patch, SOLR-3196-partialResults-header.patch


 For {{timeAllowed=true}} requests, the response contains a {{partialResults}} 
 header that indicates when a search was terminated early due to running out 
 of time.  This header is being discarded by the collator.  Patch to follow.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3181) New Admin UI, allow user to somehow cut/paste all the old Zookeeper info.

2012-03-06 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-3181:
-

Attachment: SOLR-3181.patch

Should fix the problem with multiple escapes by using BytesRef.utf8ToString.

New Admin UI, allow user to somehow cut/paste all the old Zookeeper info.
---

Key: SOLR-3181
URL: https://issues.apache.org/jira/browse/SOLR-3181
Project: Solr
Issue Type: Improvement
Components: web gui
Affects Versions: 4.0
Environment: n/a
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
Attachments: SOLR-3181.patch, SOLR-3181.patch

When tracking down issues with ZK, the devs ask about various bits of data
from the cloud pages. It would be convenient to be able to just capture all
the data from the old /solr/admin/zookeeper.jsp page in the admin interface
to be able to send it to anyone debugging the info.
Perhaps just a get debug info for Apache. Or even more cool copy debug
info to clipboard if that's possible. Is this just the raw data that the
cloud view is manipulating? It doesn't have to be pretty although indentation
would be nice.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3079) Backport of Solr-1431 (CommComponent abstracted)

2012-02-16 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3079:
-

Attachment: SOLR-3079.patch

The patch isn't in SVN format, looks like you made it with Git? The git repo is 
a shadow repository, not used for released code as far as I know.

Through the magic of IntelliJ, I managed to apply the patch and I'm uploading 
that version. Can you take a look and see if it made it through the 
transformations OK?

And any Git people out there; is there magic to make Git produce a 
SVN-compatibile patch? Seems like a good addition to the How to contribute 
page, lots of people seem to be using Git...

Beyond that, I'll run the tests with it and report back if there's a problem. 
I'd really like someone who knows what this is all about to take a look before 
committing

Meanwhile, keep prompting G

 Backport of Solr-1431 (CommComponent abstracted)
 

 Key: SOLR-3079
 URL: https://issues.apache.org/jira/browse/SOLR-3079
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 3.5
Reporter: Greg Bowyer
 Attachments: 0001-Initial-backport-of-solr-cloud-ShardHandler.patch, 
 SOLR-3079.patch


 Initial attempt at backporting the work done for Solr-1431 into the 3.x series

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3132) Reorganize LukeRequestHandler

2012-02-13 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3132:
-

Attachment: SOLR-3132.patch

Clears up SOLR-3121 crossed wires.

 Reorganize LukeRequestHandler
 -

 Key: SOLR-3132
 URL: https://issues.apache.org/jira/browse/SOLR-3132
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-3132.patch


 The LukeRequestHandler could made much easier to follow, and the overloading 
 of numTerms is confusing. This was made possible by th ework on SOLR-3121 and 
 that patch should be applied first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3121) Make new admin UI work better with big indexes

2012-02-12 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-3121:
-

Attachment: SOLR-3121.patch

Ryan:

This looks great, it does what I'd hoped.

I've never been all that happy with how the LukeRequestHandler was organized,
so I've attached a patch that builds on yours and refactors LukeRequestHandler
a bit. The old structure would go out and do the detailed information-gathering
and then use it later, overloading numTerms all over the place. The patch just
tries to get the detailed info when it should. It does require the fl field to
get detailed info at any time though.

Your patch changed the way we request fields, which made it possible to
untangle the handler itself.

Take a look and let me know.

This re-structuring probably does NOT play nice with the old admin UI though,
we really need to decide whether to stop worrying about the old UI and just cut
over to this one. I know the new UI doesn't deal with cloud leaf-node expansion
yet, see SOLR-3116.

And it seems like this handles SOLR-3094 too.

Make new admin UI work better with big indexes
--

Key: SOLR-3121
URL: https://issues.apache.org/jira/browse/SOLR-3121
Project: Solr
Issue Type: Improvement
Affects Versions: 4.0
Reporter: Ryan McKinley
Fix For: 4.0

Attachments: SOLR-3121-luke-admin-ui.patch,
SOLR-3121-luke-admin-ui.patch, SOLR-3121.patch

As reported in SOLR-2667, the admin UI gets pretty bad with big indexes.
Mostly this seems the fault of excessive calls to luke and not limiting the
number of terms

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3121) Make new admin UI work better with big indexes

2012-02-12 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3121:
-

Attachment: SOLR-3121.patch

Small change that restores the old Admin UI behavior.

NOTE: The old UI behavior is going to be slow for large indexes since it does 
the enumeration of all the fields when you click schema browser. The right 
fix is to incorporate the new parameters in the right place in the old admin 
UI, but at least this doesn't change the old behavior, it just doesn't make it 
as nice as the new.

 Make new admin UI work better with big indexes
 --

 Key: SOLR-3121
 URL: https://issues.apache.org/jira/browse/SOLR-3121
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: SOLR-3121-luke-admin-ui.patch, 
 SOLR-3121-luke-admin-ui.patch, SOLR-3121.patch, SOLR-3121.patch


 As reported in SOLR-2667, the admin UI gets pretty bad with big indexes.  
 Mostly this seems the fault of excessive calls to luke and not limiting the 
 number of terms

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3111) LukeRequestHandler does not properly handle multi-field fl params. Wildcard should also be honored

2012-02-08 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3111:
-

Attachment: SOLR-3111-3x.patch
SOLR-3111.patch

NOTE: this needs to be applied after SOLR-1931

 LukeRequestHandler does not properly handle multi-field fl params. Wildcard 
 should also be honored
 --

 Key: SOLR-3111
 URL: https://issues.apache.org/jira/browse/SOLR-3111
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-3111-3x.patch, SOLR-3111.patch


 Specifying fl=field1 field2 for the LukeRequestHandler results in trying 
 to find a field, you guessed it, field field2.
 Additionally, it makes sense for some future enhancements, to support fl=*.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3102) Document WordDelimiterFilterFactory types parameter.

2012-02-06 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3102:
-

Attachment: SOLR-3102.patch

Trivial patch updating javadocs to include types parameter

 Document WordDelimiterFilterFactory types parameter.
 --

 Key: SOLR-3102
 URL: https://issues.apache.org/jira/browse/SOLR-3102
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Trivial
  Labels: Javadocs
 Attachments: SOLR-3102.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 SOLR-2059 added the ability to customize the mapping of specific characters 
 to types (e.g. # could considered an ALPHA character if desired). But there's 
 no documentation showing that this is an option. The Javadoc for the factory 
 and the Wiki should have this added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3017) Allow edismax stopword filter factory implementation to be specified

2012-02-03 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-3017:
-

Attachment: SOLR-3017.patch

new version that:
1 removes the new schema file and just modifies schema12 instead. All tests
pass with this change.

2 Adds null check to setStopwordFilterFactoryClass rather than where it's
called.

I guess theoretically someone could override this class, override
setStopwordFilterFactoryClass, call it with null and set the member var to null
then encounter an NPE in noStopwordFilterAnalyzer which they couldn't fix due
to scope issues. But that doesn't sound like something we need to guard against
at this point.

If nobody objects, I'll commit this over the weekend or early next week.

Allow edismax stopword filter factory implementation to be specified

Key: SOLR-3017
URL: https://issues.apache.org/jira/browse/SOLR-3017
Project: Solr
Issue Type: Improvement
Affects Versions: 4.0
Reporter: Michael Dodsworth
Priority: Minor
Fix For: 4.0

Attachments: SOLR-3017-without-guava-alternative.patch,
SOLR-3017.patch, SOLR-3017.patch, edismax_stop_filter_factory.patch

Currently, the edismax query parser assumes that stopword filtering is being
done by StopFilter: the removal of the stop filter is performed by looking
for an instance of 'StopFilterFactory' (hard-coded) within the associated
field's analysis chain.
We'd like to be able to use our own stop filters whilst keeping the edismax
stopword removal goodness. The supplied patch allows the stopword filter
factory class to be supplied as a param, stopwordFilterClassName. If no
value is given, the default (StopFilterFactory) is used.
Another option I looked into was to extend StopFilterFactory to create our
own filter. Unfortunately, StopFilterFactory's 'create' method returns
StopFilter, not TokenStream. StopFilter is also final.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3094) The statistics entry on the new admin UI is very slow

2012-02-03 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-3094:
-

Attachment: SOLR-3094.patch

OK, anyone with good javascript skills, this would be a good time to chime in...

This is a variant of SOLR-1931. The new UI calls Luke at the top level in such
a way that it enumerates all the terms in all the fields to gather the
histogram data, which takes a long time. Note, this is what the old admin
UI/Luke handler did when you clicked schema browser link.

Once that data is accumulated, then clicking on the individual fields and
showing that data is very fast since the data is local. But this data is
accumulated *before* any field is selected from the schema browser drop-down
and stored away.

I think this design is too costly, especially the get all the data for all the
fields up-front bit. The users pay a penalty (many minutes demonstrated) even
when they may only care about one field. So here's what I propose.

1 Tweak the LukeRequestHandler so it *requires* the fieldName parameter to
gather the historgram data. That fixes the initial display of the stats issue
that sparked this JIRA. I can do that in a few minutes, patch attached (do not
commit yet, though). Problem is there is then no way at all to get the stats
data.

2 Tweak the javascript to call the luke request handler to collect the data
for individual fields only when the user selects them from the drop-down,
stowing them away at that point so they can be revisited if desired. Here's
where I could use some help, my javascript skills are rudimentary at best. If
anyone could work the javascript I'd be happy to field test. Or even just put
some comments in the code pointing me to them. Any trunk code from after 6-Jan
will have the right Luke handler in it (see SOLR-1931).

There's also something wrong with the display of the histogram, the bucket
and count in each bucket are mashed together on the bottom. With non-trivial
indexes, this is largely unreadable since they're side-by-side...

Anyway, the attached patch makes it so you can get into the admin page without
paying the above penalties, but you *never* get histogram data when you go into
schema browser. If someone applies this to work on the admin UI bit,
attaching fl=field1 field2 to the luke URL will cause the histogram data to
be returned for the field(s) specified.

If anyone has some spare cycles to help out here it would be outstanding.

I think something similar could be done for the old admin UI as well in terms
of only getting the fields when requested, otherwise the histogram data won't
be returned either...

The statistics entry on the new admin UI is very slow
-

Key: SOLR-3094
URL: https://issues.apache.org/jira/browse/SOLR-3094
Project: Solr
Issue Type: Bug
Components: Schema and Analysis
Affects Versions: 4.0
Environment: trunk only, all environments
Reporter: Erick Erickson
Assignee: Erick Erickson
Attachments: SOLR-3094.patch

Prompted by Robert Reynolds (SOLR-2667), the entry point in the new Admin UI
core drill down (e.g. clicking singlecore takes a long time. 28-46
*minutes* on a 13M-23M doc set.
On an example Wikipedia index (11M) docs, I see 21 seconds, compared to less
than 2 seconds in the old admin UI (I'm using the old admin UI linked to from
the new UI page on trunk). I have a very simple index layout compared to a
commercial site. Clearly something is not right. I suspect that all the terms
are being walked.
This is particularly an issue because this behavior happens when I click
singlecore, so getting to the really neat parts of the new UI is hard.
Robert reports on a separate thread that the same behavior happens just
hitting admin/luke in the URL which is also slow in the 3.x world, which
hints at where the problem lies.
I'm going to guess that the terms are being walked and we can use the tricks
used in SOLR-1931 to deal with the fact that admin/luke takes a long time,
and just change the call to the entry (singlecore) for this issue.
Robert: Thanks for pointing this out!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3032) Deprecate logOnce from SolrException

2012-01-16 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3032:
-

Attachment: SOLR-3032-3x.patch

Just deprecates the various c'tors etc that are removed in the trunk patch.

 Deprecate logOnce from SolrException
 

 Key: SOLR-3032
 URL: https://issues.apache.org/jira/browse/SOLR-3032
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
  Labels: exceptions, logging
 Fix For: 4.0

 Attachments: SOLR-3032-3x.patch, SOLR-3032.patch


 There seems to be a growing consensus (well, Muir and Hoss agree at least) 
 that having this logOnce concept in SolrException is more trouble than it's 
 worth. Point in case is that trunk (4x) fails to report anything useful in 
 the log file when you define a custom component and don't have any lib 
 statements going to the right place.
 So the proposal is to remove the whole logOnce process, supporting variables 
 etc. The first step here will be deprecating the various bits of code in 
 SolrException and starting to remove their usages.
 I'm opening this up for discussion, error reporting seems to be one of those 
 things that generates endless discussion and I'd like them aired before 
 putting too much work into this. My goal will be to have this in the code 
 base by next Tuesday, so speak up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3022) AbstractPluginLoader does not log caught exceptions

2012-01-12 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3022:
-

Attachment: SOLR-3022.patch

Final version of patch.

 AbstractPluginLoader does not log caught exceptions
 ---

 Key: SOLR-3022
 URL: https://issues.apache.org/jira/browse/SOLR-3022
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: James Dyer
Assignee: Erick Erickson
Priority: Trivial
 Fix For: 4.0

 Attachments: SOLR-3022.patch, SOLR-3022.patch, SOLR-3022.patch, 
 SOLR-3022.patch


 I was setting up a new 4.x environment but forgot to put a custom Analyzer in 
 the classpath.  Unfortunately AbstractPluginLoader didn't log the exception 
 and it took a long time for me to figure out why No cores were created.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3022) AbstractPluginLoader does not log caught exceptions

2012-01-12 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3022:
-

Affects Version/s: 3.6

 AbstractPluginLoader does not log caught exceptions
 ---

 Key: SOLR-3022
 URL: https://issues.apache.org/jira/browse/SOLR-3022
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6, 4.0
Reporter: James Dyer
Assignee: Erick Erickson
Priority: Trivial
 Fix For: 4.0

 Attachments: SOLR-3022.patch, SOLR-3022.patch, SOLR-3022.patch, 
 SOLR-3022.patch


 I was setting up a new 4.x environment but forgot to put a custom Analyzer in 
 the classpath.  Unfortunately AbstractPluginLoader didn't log the exception 
 and it took a long time for me to figure out why No cores were created.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3032) Deprecate logOnce from SolrException

2012-01-12 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-3032:
-

Attachment: SOLR-3032.patch

OK, here's a first cut. The rule I tried to follow (and I need to go over it
again with fresh eyes) was that if an exception was re-thrown, logging was
unnecessary so I took it out.

As a bonus, SolrConfig.severeErrors is gone as is all the stuff around
CoreContainer.abortOnConfigurationError.

Most of this is unutterably boring, but take a look at SolrDispatchFilter, the
real changes are there.

I'll add deprecation notices to the 3x code, but won't change anything else
there.

I'm putting this out for comments. All tests pass, but I'm not sure tests do
much to deal with logging so that probably only proves that things compile.

I'll look this over again tomorrow, then I expcet I'll commit on Sunday/Monday
unless there are howls of protest.

And I just want to add that modern IDEs make this far too easy. Back in MY
day, *real* programmers used *real* editors. See: http://xkcd.com/378/

Deprecate logOnce from SolrException

Key: SOLR-3032
URL: https://issues.apache.org/jira/browse/SOLR-3032
Project: Solr
Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
Labels: exceptions, logging
Fix For: 4.0

Attachments: SOLR-3032.patch

There seems to be a growing consensus (well, Muir and Hoss agree at least)
that having this logOnce concept in SolrException is more trouble than it's
worth. Point in case is that trunk (4x) fails to report anything useful in
the log file when you define a custom component and don't have any lib
statements going to the right place.
So the proposal is to remove the whole logOnce process, supporting variables
etc. The first step here will be deprecating the various bits of code in
SolrException and starting to remove their usages.
I'm opening this up for discussion, error reporting seems to be one of those
things that generates endless discussion and I'd like them aired before
putting too much work into this. My goal will be to have this in the code
base by next Tuesday, so speak up.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2987) ExternalFileField With Invalid TrieField Key

2012-01-10 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2987:
-

Attachment: SOLR-2987-3x.patch
SOLR-2987.patch

Latest patch

 ExternalFileField With Invalid TrieField Key
 

 Key: SOLR-2987
 URL: https://issues.apache.org/jira/browse/SOLR-2987
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.5, 3.6, 4.0
Reporter: Michael Garski
Priority: Minor
 Attachments: SOLR-2987-3x.patch, SOLR-2987.patch, eff_key_error.patch


 The current error handling in reading an external file field only catches an 
 error when parsing the float value on a line, which then skips that line.  If 
 the key field is a trie field, such as a TrieIntField, and the key value in 
 the file cannot be parsed to an int, loading of the entire file fails.  
 Shouldn't the call to get the indexed value of the key should be in the same 
 try/catch as the float parsing for the line?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes

2012-01-04 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-1931:
-

Attachment: SOLR-1931-trunk.patch
SOLR-1931-3x.patch

Final patches attached. All honor unto whoever wrote the tests for the binary
writers, I discovered that a TreeMap is unacceptable. In other words, all the
tests pass now.

Unless there are objections, I intend to commit these tomorrow or Friday.

Schema Browser does not scale with large indexes

Key: SOLR-1931
URL: https://issues.apache.org/jira/browse/SOLR-1931
Project: Solr
Issue Type: Improvement
Components: web gui
Affects Versions: 3.6, 4.0
Reporter: Lance Norskog
Assignee: Erick Erickson
Priority: Minor
Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch,
SOLR-1931-3x.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch,
SOLR-1931-trunk.patch, SOLR-1931-trunk.patch

The Schema Browser JSP by default causes the Luke handler to scan the
world. In large indexes this make the UI useless.
On an index with 64m documents 8gb of disk space, the Schema Browser took 6
minutes to open and hogged all disk I/O, making Solr useless.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes

2012-01-02 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-1931:
-

Attachment: SOLR-1931-3x.patch
SOLR-1931-trunk.patch

Thanks Robert and Yonik for pointing me at the new 4x capabilities, they make a
huge difference. But you knew that.

The killer for 3.x was getting the document counts via a range query, I don't
think there's a good way to get the counts and not pay the penalty, so there's
a new parameter recordDocCounts.

Here's my latest and close-to-last cut at this, both for 3x and 4x.

The data set is 89M documents, times in seconds.

3.5
637 getting doc counts

3x with this patch
552 getting doc counts
53 Stats without doc counts, but
histogram etc. No option to do
this before.

4x, original
450 or so as I remember, getting doc
counts, histograms, etc..

4x with patch, histograms still work.
158 Getting the doc counts the old way
(span queries). I mean,
you guys *said* ranges were going
to be faster.
39 Getting the doc counts with
terms.getDocCount().
(including histograms)

Here's my proposal, I'll probably commit this next weekend at the latest unless
there are objections:

1 I'll let these stew for a couple of days, and look them over again. Anyone
who wants to look too, please feel free.

2 Live with getting the doc counts in 4x including the deleted docs and remove
the reportDocCounts parameter (it'll live in 3.6 and other 3x versions). I
think the performance is fine without carrying that kind of kludgy option
forward. I could be persuaded otherwise, but an optimized index will take care
of the counting of deleted documents problem if anyone really cares.

Schema Browser does not scale with large indexes

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes

2012-01-02 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-1931:
-

Attachment: SOLR-1931-trunk.patch

Trunk that, you know, actually compiles or something, mea culpa.

Also reduces the 4x time down to 15 seconds after fixing a stupid oversight. 
Really gotta let this stew for a while and look at it with less-tired eyes.

 Schema Browser does not scale with large indexes
 

 Key: SOLR-1931
 URL: https://issues.apache.org/jira/browse/SOLR-1931
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 3.6, 4.0
Reporter: Lance Norskog
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-1931-3x.patch, SOLR-1931-3x.patch, 
 SOLR-1931-trunk.patch, SOLR-1931-trunk.patch, SOLR-1931-trunk.patch


 The Schema  Browser JSP by default causes the Luke handler to scan the 
 world. In large indexes this make the UI useless.
 On an index with 64m documents  8gb of disk space, the Schema Browser took 6 
 minutes to open and hogged all disk I/O, making Solr useless.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes

2011-12-31 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-1931:
-

Affects Version/s: (was: 1.4)
   4.0
   3.6

 Schema Browser does not scale with large indexes
 

 Key: SOLR-1931
 URL: https://issues.apache.org/jira/browse/SOLR-1931
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 3.6, 4.0
Reporter: Lance Norskog
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-1931-3x.patch, SOLR-1931-trunk.patch


 The Schema  Browser JSP by default causes the Luke handler to scan the 
 world. In large indexes this make the UI useless.
 On an index with 64m documents  8gb of disk space, the Schema Browser took 6 
 minutes to open and hogged all disk I/O, making Solr useless.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1931) Schema Browser does not scale with large indexes

2011-12-31 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-1931:
-

Attachment: SOLR-1931-trunk.patch
SOLR-1931-3x.patch

Well, there are a couple of issues here. I've attached patches for trunk and 3x
for consideration.

I fixed a structural flaw that traversed all the terms in all the fields twice,
once to get the total number of terms across all the fields and once to get the
individual counts.

But that's not where the bulk of the time gets spent. It turns out that getting
the count of documents in which each term appears is the culprit. These two
lines are executed for each field
Query q = new TermRangeQuery(fieldName, null, null, false, false);
TopDocs top = searcher.search(q, 1);

and top.totalHits is reported. I have an index with 99M documents, mostly
integer data that takes 360 seconds to return data when the above is executed
and 150 without. Both versions traverse all the terms once, so these times
would be greater without the patch due to the second traversal.

So the attached patches default to NOT doing the above and there's a new
parameter reportDocCount that can be set to true to collect that information.
What do people think? And is there a better way to get the count of documents
in which the term appears? And do any alternate methods respect deleted docs
like this one does?

I tried spinning through using TermDocs (3.6) but soon realized that the people
who wrote TermRangeQuery probably got there first.

So I guess my real question is whether people object to the change in behavior,
that users must explicitly request doc counts. Which also means that the
admin/schema browser doesn't report this by default and I haven't made it
optional from that interface. I'm not inclined to since that interface is going
away, but if people feel strongly I might be persuaded. That info is available
by admin/luke?fl=myfieldreportDocCount=true in a less painful fashion for a
particular field anyway.

Along the way I alphabetized the fields without my other kludge of putting
comparators in other classes. I'll kill that JIRA if this one goes forward.

Note that this still doesn't scale all that well, on my test index it's still a
5 minute wait. But then I guess that this kind of data gathering will take time
by its nature.

If nobody objects, I'll commit this early next week after I've had a chance to
put it down for a while and look at it with fresh eyes and do some more
testing. I think there's some inefficiencies in the single pass that I can
wring out (about 30 seconds is spent just gathering the data in the single term
enumeration loop).

Schema Browser does not scale with large indexes

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2994) Solr no longer compiles in IntelliJ

2011-12-30 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2994:
-

Attachment: SOLR-2994.patch

Fixes the problem on my machine in 3x. I'll probably see about 4x later.

 Solr no longer compiles in IntelliJ
 ---

 Key: SOLR-2994
 URL: https://issues.apache.org/jira/browse/SOLR-2994
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 3.6, 4.0
 Environment: IntelliJ
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
  Labels: build
 Attachments: SOLR-2994.patch


 Running the ant idea target no longer creates an IntelliJ environment that 
 is consistent, I'm getting package org.apache.lucene.analysis.phonetic does 
 not exist. It looks like the phonetic package moved from lucene to contrib?
 Note that command-line ant task continues to work just fine.
 I'll attach a patch that fixes it for me, but I'd really like someone who 
 understands Idea (Steve, are you listening?) system take a look to see if 
 it's OK. It's a magnificent single line in solr.iml.
 I'm assuming this is also a problem for 4.x, I'll probably be in that 
 environment later today and see.
 I have no idea whether Eclipse suffers from the same problem.
 I've assigned it to myself just for tracking, anyone who can glance at it and 
 say yeah, that's right please feel free to just check it in for 3x and 4x 
 if applicable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2989) Solr admin (Luke request handler) doesn't order the fields alphabetically

2011-12-28 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2989:
-

Attachment: SOLR-2989.patch

First cut at a patch. This is for 3x because that's where I happened to be
working, but if we carry this forward, I can put it on trunk to I assume.

Solr admin (Luke request handler) doesn't order the fields alphabetically
-

Key: SOLR-2989
URL: https://issues.apache.org/jira/browse/SOLR-2989
Project: Solr
Issue Type: Improvement
Components: SearchComponents - other
Affects Versions: 3.6, 4.0
Environment: all
Reporter: Erick Erickson
Priority: Minor
Attachments: SOLR-2989.patch

It's always bugged me that the fields list for admin/schema browser haven't
been alphabetical. We have users who have 100s of fields and it's hard to
orient in an unordered list.
I'll attach a patch momentarily that starts moves toward this. The thing I
need someone to render judgement on is whether implementing the Comparable
interface on SchemaField and FieldType are in any way dangerous. Note that
they only compare on name, secondary and tertiary sources are unnecessary I
think.
The other interesting bit is that the list of fields is actually (apparently)
fetched in two stages. The first stage gets the ones in the schema and the
second one gets dynamic fields that have been realized. So the fields
section actually has two separate ordered sections. Which is kind of ugly,
but given the new admin interface coming in 4.x I don't feel the urge to fix
this.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2906) Implement LFU Cache

2011-12-26 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2906:
-

Attachment: SOLR-2906.patch

This should be the final patch. Added the stuff to actually get the parameter 
from solrconfig timeDecay which ages out the cache entries as we've 
discussed. Added tests to insure that it gets through from the config file.

Shawn: If you'd add some data to the Wiki about this new parameter, that would 
be a good thing.

If nobody objects, I'll probably check this in in the next couple of days. 
Since they're all new files, the patch will apply to both trunk and 3x cleanly.

 Implement LFU Cache
 ---

 Key: SOLR-2906
 URL: https://issues.apache.org/jira/browse/SOLR-2906
 Project: Solr
  Issue Type: Sub-task
  Components: search
Affects Versions: 3.4
Reporter: Shawn Heisey
Assignee: Erick Erickson
Priority: Minor
 Attachments: ConcurrentLFUCache.java, LFUCache.java, SOLR-2906.patch, 
 SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, 
 SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, TestLFUCache.java


 Implement an LFU (Least Frequently Used) cache as the first step towards a 
 full ARC cache

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field

2011-12-21 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2242:
-

Attachment: SOLR-2242.patch

First step in resurrecting this. This patch should apply cleanly to trunk. It 
incorporates the SOLR-2242.patch from 28-June and the NmFacetTermsFacetsTest 
from 9-July. It accounts for the fact that things seem to have been moved 
around a bit.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.0

 Attachments: NumFacetTermsFacetsTest.java, 
 SOLR-2242-notworkingtest.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_fields
   lst name=price
 int name=numFacetTerms14/int
 int name=0.03/intint name=11.51/intint 
 name=19.951/intint name=74.991/intint name=92.01/intint 
 name=179.991/intint name=185.01/intint name=279.951/intint 
 name=329.951/intint name=350.01/intint name=399.01/intint 
 name=479.951/intint name=649.991/intint name=2199.01/int
   /lst
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2906) Implement LFU Cache

2011-12-21 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2906:
-

Attachment: SOLR-2906.patch

Here's what I had in mind, at least I *think* this will do but all I've done is 
insured that the code compiles and the current LFU test suite runs.

Look in the diff for timeDecay.

This still needs some proof that the new parameter comes through from a schema 
file. Let me know if that presents a problem or if you can't get 'round to it, 
I might have some time over Christmas.

I think maybe you were under the impression that this had already been done and 
were looking for it to be in the code already?

 Implement LFU Cache
 ---

 Key: SOLR-2906
 URL: https://issues.apache.org/jira/browse/SOLR-2906
 Project: Solr
  Issue Type: Sub-task
  Components: search
Affects Versions: 3.4
Reporter: Shawn Heisey
Assignee: Erick Erickson
Priority: Minor
 Attachments: ConcurrentLFUCache.java, LFUCache.java, SOLR-2906.patch, 
 SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, 
 SOLR-2906.patch, TestLFUCache.java


 Implement an LFU (Least Frequently Used) cache as the first step towards a 
 full ARC cache

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2906) Implement LFU Cache

2011-12-20 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2906:
-

Attachment: SOLR-2906.patch

Updated patch that divides by 2 and adds a unit test for aging out.

Shawn:

Could you add in the optional time decay as Yonik suggests? I agree that it 
seems like the right thing is to have this on by default. At that point, I 
think it'll be ready to check in. We can add documentation as we can.

We could  also check it in as is and raise another JIRA.

 Implement LFU Cache
 ---

 Key: SOLR-2906
 URL: https://issues.apache.org/jira/browse/SOLR-2906
 Project: Solr
  Issue Type: Sub-task
  Components: search
Affects Versions: 3.4
Reporter: Shawn Heisey
Assignee: Erick Erickson
Priority: Minor
 Attachments: ConcurrentLFUCache.java, LFUCache.java, SOLR-2906.patch, 
 SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, 
 TestLFUCache.java


 Implement an LFU (Least Frequently Used) cache as the first step towards a 
 full ARC cache

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2906) Implement LFU Cache

2011-12-19 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2906:
-

Attachment: SOLR-2906.patch

Mostly cosmetic changes:

Changed acceptableLimit to acceptableSize to keep it named consistently

Formatted all the files

Implemented Yonik's aging suggestion (but no tests, there doesn't seem to be a 
clean way to implement a test here without creating debug-only code).

I'm not wholly convinced that dividing by 4 is the right thing to do here; 
it'll tend to flatten all the entries making removal somewhat arbitrary as 
after a few passes anything with hits in the low range will collapse to zero. 
That said, though, since the little adventure with lastAccessed, all entries 
with the same number of hits will be treated as LRU so I guess it works.

Marked code as experimental

Commented out some debugging code

 Implement LFU Cache
 ---

 Key: SOLR-2906
 URL: https://issues.apache.org/jira/browse/SOLR-2906
 Project: Solr
  Issue Type: Sub-task
  Components: search
Affects Versions: 3.4
Reporter: Shawn Heisey
Assignee: Erick Erickson
Priority: Minor
 Attachments: ConcurrentLFUCache.java, LFUCache.java, SOLR-2906.patch, 
 SOLR-2906.patch, SOLR-2906.patch, SOLR-2906.patch, TestLFUCache.java


 Implement an LFU (Least Frequently Used) cache as the first step towards a 
 full ARC cache

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2975) Solr test failure when running under Java 1.5

2011-12-16 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2975:
-

Attachment: SOLR-2975.patch

Running full test now, will check in shortly unless someone objects.

Applies to both trunk and 3x

 Solr test failure when running under Java 1.5
 -

 Key: SOLR-2975
 URL: https://issues.apache.org/jira/browse/SOLR-2975
 Project: Solr
  Issue Type: Test
  Components: contrib - DataImportHandler
Affects Versions: 3.5, 3.6
 Environment: Java 1.5 only. OS X
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-2975.patch


 ant test -Dtestcase=TestSolrEntityProcessorUnit fails when running under Java 
 1.5 because of faulty assumptions in the test.
 From e-mail thread (Hossman):
 ...those lines are assuming that row.entrySet will return something that
 has a predictible iteration order, but row is a Map of unknown creation
 (returned by the entityProcessor) ... so unless the entityProcessor is
 explicitly defined as returning something like SortedMap (which isn't
 suggested anywhere in this test) the test is making a really bad
 assumption.
 From e-mail. (Steven Rowe)
 FYI, I see this same failure when I run the branch_3x tests with Java 1.5, 
 but not 1.6.
 and
 Oh, and the reason Jenkins isn't seeing this failure is that it runs 
 branch_3x tests using Java 1.6, after first *compiling* with Java 1.5
 Even though we won't run Solr 4 under java 1.5, I'll change it there anyway 
 since this is a bad assumption in the test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2971) ExternalFileFields fail if valType='float', and valType should be optional

2011-12-15 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2971:
-

Attachment: SOLR-2971.patch
SOLR-2971-3x.patch

I think these patches may be ready to apply. The only thing that makes me at
all nervous is the magic of calling deleteCore in the tests. The 3x tests
consistently failed without it, but trunk worked just fine. So I put the call
in both.

Sorry, there's a bit of gratuitous formatting in there, but it's pretty much
whitespace only

Of course the 3x tests were enough different than the 4x ones that it needed a
different patch. Siiigggh. The actual core code changes are identical though.

For an issue this small, is there any reason to add anything to CHANGES.txt?

ExternalFileFields fail if valType='float', and valType should be optional
--

Key: SOLR-2971
URL: https://issues.apache.org/jira/browse/SOLR-2971
Project: Solr
Issue Type: Improvement
Components: Schema and Analysis
Affects Versions: 3.5, 4.0
Environment: all
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
Fix For: 3.6, 4.0

Attachments: SOLR-2971-3x.patch, SOLR-2971.patch, SOLR-2971.patch

valType has never done anything except throw an error, the underlying
ValueSource has always been a FileFloatSource. To add to the confusion, the
documents say use float, which throws an exception on Solr startup every
since float was re-defined as a TrieFloatField. pfloat works currently
though.
Since valType is never used for anything, we should make it optional until
such a time as it is.
Additionally, TrieFloatField (valtype=float|tfloat) types should be OK as a
field type along with FloatField(valType=pfloat)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2971) ExternalFileFields fail if valType='float', and valType should be optional

2011-12-14 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2971:
-

Attachment: SOLR-2971.patch

Patch for trunk. I haven't run full regression tests against it yet, but I 
think it's pretty solid.

I'll commit in a day or two unless there are objections...

 ExternalFileFields fail if valType='float', and valType should be optional
 --

 Key: SOLR-2971
 URL: https://issues.apache.org/jira/browse/SOLR-2971
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.5, 4.0
 Environment: all
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-2971.patch


 valType has never done anything except throw an error, the underlying 
 ValueSource has always been a FileFloatSource. To add to the confusion, the 
 documents say use float, which throws an exception on Solr startup every 
 since float was re-defined as a TrieFloatField. pfloat works currently 
 though.
 Since valType is never used for anything, we should make it optional until 
 such a time as it is.
 Additionally, TrieFloatField (valtype=float|tfloat) types should be OK as a 
 field type along with FloatField(valType=pfloat)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2509) spellcheck: StringIndexOutOfBoundsException: String index out of range: -1

2011-12-06 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2509:
-

Attachment: SOLR-2509.patch

Here's the updated patch. The only difference between this and the original is 
that I changed the failing test to expect pixmaa rather than 
pixma-a-b-c-d-e-f-g.

If nobody objects, I'll commit this tomorrow (7-Dec) on both trunk and 3x.

 spellcheck: StringIndexOutOfBoundsException: String index out of range: -1
 --

 Key: SOLR-2509
 URL: https://issues.apache.org/jira/browse/SOLR-2509
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: Debian Lenny
 JAVA Version 1.6.0_20
Reporter: Thomas Gambier
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-2509.patch, SOLR-2509.patch, SOLR-2509.patch, 
 document.xml, schema.xml, solrconfig.xml


 Hi,
 I'm a french user of SOLR and i've encountered a problem since i've installed 
 SOLR 3.1.
 I've got an error with this query : 
 cle_frbr:LYSROUGE1149-73190
 *SEE COMMENTS BELOW*
 I've tested to escape the minus char and the query worked :
 cle_frbr:LYSROUGE1149(BACKSLASH)-73190
 But, strange fact, if i change one letter in my query it works :
 cle_frbr:LASROUGE1149-73190
 I've tested the same query on SOLR 1.4 and it works !
 Can someone test the query on next line on a 3.1 SOLR version and tell me if 
 he have the same problem ? 
 yourfield:LYSROUGE1149-73190
 Where do the problem come from ?
 Thank you by advance for your help.
 Tom

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-11-27 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2438:
-

Attachment: SOLR-2438-3x.patch

backport MultiTermAware version of this patch to 3.6. Again, before applying 
this patch you probably need to apply the 3x patch from 25-Nov.

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
Assignee: Erick Erickson
 Fix For: 3.6, 4.0

 Attachments: SOLR-2438-3x.patch, SOLR-2438-3x.patch, SOLR-2438.patch, 
 SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, 
 SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, 
 SOLR-2438_3x.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should

2011-11-27 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2921:
-

Description:
SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to
automatically assemble a multiterm analyzer that does the right thing
vis-a-vis transforming the individual terms of a multi-term query at query
time. Examples are: lower casing, folding accents, etc. Currently
(27-Nov-2011), the following classes implement MultiTermAwareComponent:

* ASCIIFoldingFilterFactory
* LowerCaseFilterFactory
* LowerCaseTokenizerFactory
* MappingCharFilterFactory
* PersianCharFilterFactory

When users put any of the above in their query analyzer, Solr will do the
right thing at query time and the perennial question users have, why didn't
my wildcard query automatically lower-case (or accent fold or) my terms?
will be gone. Die question die!

But taking a quick look, for instance, at the various FilterFactories that
exist, there are a number of possibilities that *might* be good candidates for
implementing MultiTermAwareComponent. But I really don't understand the correct
behavior here well enough to know whether these should implement the interface
or not. And this doesn't include other CharFilters or Tokenizers.

Actually implementing the interface is often trivial, see the classes above for
examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the
right thing in this case.

Here is a quick cull of the Filters that, just from their names, might be
candidates. If anyone wants to take any of them on, that would be great. If all
you can do is provide test cases, I could probably do the code part, just let
me know.

ArabicNormalizationFilterFactory
GreekLowerCaseFilterFactory
HindiNormalizationFilterFactory
ICUFoldingFilterFactory
ICUNormalizer2FilterFactory
ICUTransformFilterFactory
IndicNormalizationFilterFactory
ISOLatin1AccentFilterFactory
PersianNormalizationFilterFactory
RussianLowerCaseFilterFactory
TurkishLowerCaseFilterFactory

was:
SOLR-2918, which drastically improves the approach of SOLR-2438 creates a new
MultiTermAwareComponent interface. This allows Solr to automatically assemble a
multiterm analyzer that does the right thing vis-a-vis transforming the
individual terms of a multi-term query at query time. Examples are: lower
casing, folding accents, etc. Currently (27-Nov-2011), the following classes
implement MultiTermAwareComponent:

* ASCIIFoldingFilterFactory
* LowerCaseFilterFactory
* LowerCaseTokenizerFactory
* MappingCharFilterFactory
* PersianCharFilterFactory

Actually implementing the interface is often trivial, see the classes above for
examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the
right thing in this case.

Make any Filters, Tokenizers and CharFilters implement
MultiTermAwareComponent if they should
-

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-11-25 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2438:
-

Attachment: SOLR-2438.patch
SOLR-2438_3x.patch

Patches as of the commit

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
Assignee: Erick Erickson
 Fix For: 3.6, 4.0

 Attachments: SOLR-2438-3x.patch, SOLR-2438.patch, SOLR-2438.patch, 
 SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, 
 SOLR-2438.patch, SOLR-2438_3x.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-11-22 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2438:
-

Attachment: SOLR-2438-3x.patch

Here's what the 3x version would look like if anyone's interested. There's some 
refactoring that was done between 3.x and 4.0 that made reconciling these a bit 
of a pain. 

Still need to modify the CHANGES files.

I'll commit these tomorrow sometime if nobody objects.

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
Assignee: Erick Erickson
 Attachments: SOLR-2438-3x.patch, SOLR-2438.patch, SOLR-2438.patch, 
 SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-11-21 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2438:
-

Attachment: SOLR-2438.patch

OK, this patch does a better job with the matchVersion as per Muir. If nobody
objects I'll commit it this week, probably not before Wednesday though. Then I
should be able to do the backport to 3.6 shortly thereafter.

I still have to run all the tests yet again, but I don't really expect much of
a problem.

Should SOLR 218, 219 and 757 all be closed as part of 2438?

Case Insensitive Search for Wildcard Queries

Key: SOLR-2438
URL: https://issues.apache.org/jira/browse/SOLR-2438
Project: Solr
Issue Type: Improvement
Reporter: Peter Sturge
Assignee: Erick Erickson
Attachments: SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch,
SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch

This patch adds support to allow case-insensitive queries on wildcard
searches for configured TextField field types.
This patch extends the excellent work done Yonik and Michael in SOLR-219.
The approach here is different enough (imho) to warrant a separate JIRA issue.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-11-20 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2438:
-

Attachment: SOLR-2438.patch

I think this patch is ready for scrutiny. Tests run successfully.

I have yet to do several things:
1 update README
2 add an example to example/schema.xml
3 this is going to take a writeup on the Wiki I think, explaining that there's
another (optional) section to a fieldType. Any suggestions where that should
go?

Originally, I'd hoped to back-port it to 3.5, but the more I look at it the
more I'd like it to bake a while before being officially released and target
3.6 instead for the back-port. Can one back-port something like this after the
first RC is cut or is it better to wait until after the release? I can always
commit this to trunk and open another JIRA to backport after 3.5 is released.

Case Insensitive Search for Wildcard Queries

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-11-17 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2438:
-

Attachment: SOLR-2438.patch

OK, this isn't nearly finished yet, but I wanted to run it by folks to see if 
the approach is what, particularly, Robert and Yonik have in mind.

I'm assuming that the flex stuff is out of scope for this JIRA, right?

Don't waste your time on details just yet, only the general approach.

I'm thinking of allowing a flag to the fields to disable this functionality but 
make this the default, thoughts?

Haven't even thought about back-porting to 3x, but it looks do-able on a quick 
glance.

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
Assignee: Erick Erickson
 Attachments: SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, 
 SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-11-15 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2438:
-

Attachment: SOLR-2438.patch

Here's a rough cut at what I *think* Yonik might have been talking about, 
comments?

I haven't done a thing about efficiency here, just seeing if the new method in 
the FilterFactories (processQueryTerm) makes sense to y'all.

One thing I'm not clear on: Would it make more sense to just instantiate a new 
instance of the filter and run each term through it rather than steal bits from 
the underlying Filters (see ASCIIFoldingFilterFactory and 
LowercaseFilterFactory for example). I just hate duplicated code but I'm not 
sure how efficient creating a new filter and running the token through would be 
for each and every token.

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
Assignee: Erick Erickson
 Attachments: SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-11-14 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2438:
-

Attachment: SOLR-2438.patch

This is not at all ready for prime-time but I'm inviting comments on the
approach. It turns out that all the hard work has already been done, see
QueryParserBase. The attached patch is almost all tests...

But I greatly fear that I'm grossly misusing
QueryParserBase.lowercaseExpandedTerms, which looks like it's for parameters on
the query line? Where did *that* come from anyway? Or what the heck is it
supposed to be used for, anyone know?

A couple of thing make me nervous about this approach. It depends in a pretty
hard-coded way on detecting LowerCaseFilterFactory and
LowerCaseTokenizerFactory, if anyone adds anything else in there it'll have to
be re-coded. Is there a better way? It almost seems like a flag on the field
definition as Peter suggested is a more robust way of going about things.

Anyway, I'm getting way past the point of diminishing returns tonight, so I
thought I'd at least throw this out for comment.

Ignore everything with the ASCIIFoldingFilterFactory, I detect it but don't do
anything with it yet.

And I can't seem to make the reversed test work, even without the casing
switch. Which means I should put it down for the evening, I'm obviously fried.
Anybody feeling kind can uncomment the line that starts:

// make me work

and get the test class to work. It's probably trivial but I'm not seeing it.

Case Insensitive Search for Wildcard Queries

Key: SOLR-2438
URL: https://issues.apache.org/jira/browse/SOLR-2438
Project: Solr
Issue Type: Improvement
Reporter: Peter Sturge
Assignee: Erick Erickson
Attachments: SOLR-2438.patch, SOLR-2438.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2134) Trie* fields should support sortMissingLast=true, and deprecate Sortable* Field Types

2011-11-13 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2134:
-

Attachment: SOLR-2134-tests.patch

Added some tests for dates.

 Trie* fields should support sortMissingLast=true, and deprecate Sortable* 
 Field Types
 -

 Key: SOLR-2134
 URL: https://issues.apache.org/jira/browse/SOLR-2134
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Ryan McKinley
Assignee: Erick Erickson
 Fix For: 4.0

 Attachments: SOLR-2134-SortMissingLast.patch, 
 SOLR-2134-SortMissingLast.patch, SOLR-2134-SortMissingLast.patch, 
 SOLR-2134-SortMissingLast.patch, SOLR-2134-tests.patch


 With the changes in LUCENE-2649, the FieldCache also returns if the bit is 
 valid or not.  This is enough to support sortMissingLast=true with Trie* 
 fields.  Then we can get rid of the Sortable* fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2881) Trie fields should support sortMissingLast=true

2011-11-13 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2881:
-

Attachment: SOLR-2881.patch

SOLR-2134 fixes this issue for 4.x, this patch applies only to the 3x branch

 Trie fields should support sortMissingLast=true
 -

 Key: SOLR-2881
 URL: https://issues.apache.org/jira/browse/SOLR-2881
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.5, 4.0
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 3.5, 4.0

 Attachments: SOLR-2881-3x.patch, SOLR-2881.patch


 Spinoff from SOLR-2134. The consensus is that the way sortMissingFirst is 
 done in 3x is superior to 4x and when that is done (see LUCENE-3443) then the 
 sortMissingFirst code should be incorporated into both.
 As of now, however, the Trie fields in 4.0 support sortMissingFirst but not 
 yet in 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2881) Trie fields should support sortMissingLast=true

2011-11-12 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2881:
-

Attachment: SOLR-2881-3x.patch

I think this is ready to commit if we clear one thing up. Look at the tests and
you'll see that default sorting for dates is a special case. The sorting
behavior for dates is, indeed, different from longs when sortMissingFirst/Last
are not specified. The behavior is consistent with 3.3 (it was handy to test
3.3 rather than 3.4) however, so neither LUCENE-3443 nor this patch change
sorting in this case.

I'd like to commit this tomorrow (Sunday). Since the reconciliation process is
a bit interesting between Mike's and my changes, I think that a patch for
each is preferable, but we know I'm merge challenged.

Note also that Mike, as part of 3441, made the parallel set of changes for 4.x
already. That said, I'm going to create a small 4.x patch that changes the
example schema.xml and incorporates the date test from this patch. I'll attach
that file to SOLR-2134

Trie fields should support sortMissingLast=true
-

Key: SOLR-2881
URL: https://issues.apache.org/jira/browse/SOLR-2881
Project: Solr
Issue Type: Improvement
Components: Schema and Analysis
Affects Versions: 3.5, 4.0
Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Fix For: 3.5, 4.0

Attachments: SOLR-2881-3x.patch

Spinoff from SOLR-2134. The consensus is that the way sortMissingFirst is
done in 3x is superior to 4x and when that is done (see LUCENE-3443) then the
sortMissingFirst code should be incorporated into both.
As of now, however, the Trie fields in 4.0 support sortMissingFirst but not
yet in 3.x

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2876) Precedence operator in conditionals with ternary operator needs to be examined.

2011-11-06 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2876:
-

Attachment: SOLR-2876.patch

I went ahead and looked at the ternary operator (those I could find with grep) 
and here's the results. Not sure it's worth doing, anyone want to chime in?

Although this construct is exciting...
luceneSort || sortMissingFirst  !reverse || sortMissingLast  reverse ?  : 
zzz;

 Precedence operator in conditionals with ternary operator needs to be 
 examined.
 ---

 Key: SOLR-2876
 URL: https://issues.apache.org/jira/browse/SOLR-2876
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5, 4.0
 Environment: all
Reporter: Erick Erickson
Assignee: Erick Erickson
  Labels: operator, precedence, ternary
 Attachments: SOLR-2876.patch


 This is an offshoot of 2829 where the root of the bug was that precedence in 
 the ternary operator along with  without appropriate parentheses was a 
 problem.
  this.parser == null ? other.parser == null : this.parser.getClass() == 
 other.parser.getClass() (from ShortFieldSource.java).
 So that got me curious whether this pattern was repeated. A quick grep with 
 the following REs produced one hit I wasn't related to 2829 with  and more 
 with || (3x code base). I'll try to get to it over the weekend. Please don't 
 grab it just yet, I'm fixing this partially for 2829, but if anyone wants to 
 try the grep and see if I'm hallucinating, I'd appreciate it. I'd *really* 
 appreciate any tests for things people see...
 Some of the returns are false hits, but not others. See 
 SolrIndexSearcher.getDocListAndSetNC() 
 the last line is: return pf.filter==null  pf.postFilter==null ? 
 qr.getDocSet() : null; 
 REs (using them in IntelliJ)
 \|\|[\sa-z\.0-9A-Z]+==.*\?
 [\sa-z\.0-9A-Z]+==.*\?
 I got some hits with the above and didn't pursue it any further, but if 
 anyone wants to suggest more comprehensive REs, please attach. I'm trying for 
  or || followed by anything without an open parentheses followed by == 
 followed by anything followed by ? I'd rather get a manageable number of 
 false positives than miss things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2876) Precedence operator in conditionals with ternary operator needs to be examined.

2011-11-06 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2876:
-

  Priority: Trivial  (was: Major)
Issue Type: Improvement  (was: Bug)

I don't see anything else that looks wrong, so what do people think about doing 
this?

 Precedence operator in conditionals with ternary operator needs to be 
 examined.
 ---

 Key: SOLR-2876
 URL: https://issues.apache.org/jira/browse/SOLR-2876
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.5, 4.0
 Environment: all
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Trivial
  Labels: operator, precedence, ternary
 Attachments: SOLR-2876.patch


 This is an offshoot of 2829 where the root of the bug was that precedence in 
 the ternary operator along with  without appropriate parentheses was a 
 problem.
  this.parser == null ? other.parser == null : this.parser.getClass() == 
 other.parser.getClass() (from ShortFieldSource.java).
 So that got me curious whether this pattern was repeated. A quick grep with 
 the following REs produced one hit I wasn't related to 2829 with  and more 
 with || (3x code base). I'll try to get to it over the weekend. Please don't 
 grab it just yet, I'm fixing this partially for 2829, but if anyone wants to 
 try the grep and see if I'm hallucinating, I'd appreciate it. I'd *really* 
 appreciate any tests for things people see...
 Some of the returns are false hits, but not others. See 
 SolrIndexSearcher.getDocListAndSetNC() 
 the last line is: return pf.filter==null  pf.postFilter==null ? 
 qr.getDocSet() : null; 
 REs (using them in IntelliJ)
 \|\|[\sa-z\.0-9A-Z]+==.*\?
 [\sa-z\.0-9A-Z]+==.*\?
 I got some hits with the above and didn't pursue it any further, but if 
 anyone wants to suggest more comprehensive REs, please attach. I'm trying for 
  or || followed by anything without an open parentheses followed by == 
 followed by anything followed by ? I'd rather get a manageable number of 
 false positives than miss things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2829) Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields

2011-11-05 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2829:
-

Attachment: SOLR-2829.patch

Final patch. Renamed variable as per Hoss. I hate it when he's right.

Filter queries have false-positive matches. Exposed by user's list titled
Regarding geodist and multiple location fields
--

Key: SOLR-2829
URL: https://issues.apache.org/jira/browse/SOLR-2829
Project: Solr
Issue Type: Bug
Components: search
Affects Versions: 3.4, 4.0
Environment: N/A
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
Fix For: 3.5

Attachments: SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch,
SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch

I don't know how generic this is, whether it's just a
problem with fqs when combined with spatial or whether
it has wider applicability , but here's what I know so far.
Marc Tinnemeyer in a post titled:
Regarding geodist and multiple location fields
outlines this. I checked this on 3.4 and trunk and it's
weird in both cases.
HOLD THE PRESSES:
After looking at this a bit more, it looks like a caching
issue, NOT a geodist issue. When I bounce Solr
between changing the sfield from home to work,
it seems to work as expected.
H, very strange. If I comment out BOTH
the filterCache and queryResultCache then it works
fine. Switching from home to work in the query
finds/fails to find the document.
But commenting out only one of those caches
doesn't fix the problem.
on trunk I used this query; just flipping home to work and back:
http://localhost:8983/solr/select?q=id:1fq={!geofilt sfield=home
pt=52.67,7.30 d=5}
The info below is what I used to test.
From Marc's posts:
field name=home type=location indexed=true stored=true/
field name=work type=location indexed=true stored=true/
field name=elsewhere type=location indexed=true stored=true/
At first I thought so too. Here is a simple document.
add
doc
field name=id1/field
field name=namefirst/field
field name=work48.60,11.61/field
field name=home52.67,7.30/field
/doc
/add
and here is the result that shouldn't be:
response
...
str name=q*:*/str
str name=fq{!geofilt sfield=work pt=52.67,7.30 d=5}/str
...
/lst
/lst
result name=response numFound=1 start=0
doc
str name=home52.67,7.30/str
str name=id1/str
str name=namefirst/str
str name=work48.60,11.61/str
/doc
/result
/response
Yonik's comment**
It's going to be a bug in an equals() implementation somewhere in the query.
The top level equals will be SpatialDistanceQuery.equals() (from
LatLonField.java)
On trunk, I already see a bug introduced when the new lucene field
cache stuff was done.
DoubleValueSource now just inherits it's equals method from
NumericFieldCacheSource... and that equals() method only tests if the
CachedArrayCreator.getClass() is the same! That's definitely wrong.
I don't know why 3x would also have this behavior (unless there's more
than one bug!)
Anyway, first step is to modify the spatial tests to catch the bug...
from there it should be pretty easy to debug.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2829) Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields

2011-11-05 Thread Erick Erickson (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2829:
-

Fix Version/s: 4.0

Added fix version of 4.0

 Filter queries have false-positive matches. Exposed by user's list titled 
 Regarding geodist and multiple location fields
 --

 Key: SOLR-2829
 URL: https://issues.apache.org/jira/browse/SOLR-2829
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.4, 4.0
 Environment: N/A
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Fix For: 3.5, 4.0

 Attachments: SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, 
 SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch


 I don't know how generic this is, whether it's just a
 problem with fqs when combined with spatial or whether
 it has wider applicability , but here's what I know so far.
 Marc Tinnemeyer in a post titled:
 Regarding geodist and multiple location fields
 outlines this. I checked this on 3.4 and trunk and it's
 weird in both cases.
 HOLD THE PRESSES:
 After looking at this a bit more, it looks like a caching
 issue, NOT a geodist issue. When I bounce Solr
 between changing the sfield from home to work,
 it seems to work as expected.
 H, very strange. If I comment out BOTH
 the filterCache and queryResultCache then it works
 fine. Switching from home to work in the query
 finds/fails to find the document.
 But commenting out only one of those caches
 doesn't fix the problem.
 on trunk I used this query; just flipping home to work and back:
 http://localhost:8983/solr/select?q=id:1fq={!geofilt sfield=home
 pt=52.67,7.30 d=5}
 The info below is what I used to test.
 From Marc's posts:
 field name=home type=location indexed=true stored=true/
 field name=work type=location indexed=true stored=true/
 field name=elsewhere type=location indexed=true stored=true/
 At first I thought so too. Here is a simple document.
 add
   doc
   field name=id1/field
   field name=namefirst/field
   field name=work48.60,11.61/field
   field name=home52.67,7.30/field
   /doc
 /add
 and here is the result that shouldn't be:
 response
 ...
 str name=q*:*/str
 str name=fq{!geofilt sfield=work pt=52.67,7.30 d=5}/str
 ...
 /lst
 /lst
 result name=response numFound=1 start=0
 doc
 str name=home52.67,7.30/str
 str name=id1/str
 str name=namefirst/str
 str name=work48.60,11.61/str
 /doc
 /result
 /response
 Yonik's comment**
 It's going to be a bug in an equals() implementation somewhere in the query.
 The top level equals will be SpatialDistanceQuery.equals() (from
 LatLonField.java)
 On trunk, I already see a bug introduced when the new lucene field
 cache stuff was done.
 DoubleValueSource now just inherits it's equals method from
 NumericFieldCacheSource... and that equals() method only tests if the
 CachedArrayCreator.getClass() is the same!  That's definitely wrong.
 I don't know why 3x would also have this behavior (unless there's more
 than one bug!)
 Anyway, first step is to modify the spatial tests to catch the bug...
 from there it should be pretty easy to debug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2829) Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields

2011-11-05 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2829:
-

Attachment: SOLR-2829-3x.patch

Attached the 3x patch, reconciling these is kinda unpleasant.

Filter queries have false-positive matches. Exposed by user's list titled
Regarding geodist and multiple location fields
--

Attachments: SOLR-2829-3x.patch, SOLR-2829.patch, SOLR-2829.patch,
SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2829) Filter queries have false-positive matches. Exposed by user's list titled Regarding geodist and multiple location fields

2011-11-04 Thread Erick Erickson (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-2829:
-

Attachment: SOLR-2829.patch

Patch for the 3x code line, if I don't get any objections, I'll merge it with
trunk and commit over the weekend.

All tests pass.

The code changes aren't as interesting as the tests, anyone want to recommend
improvements? I verified that the tests catch short, float, long, byte and
double if the parens aren't added. Had to add a few types to the default
schema.xml.

I realize that the tests specific to LatLon are redundant, they're caught by
the double test. But I don't see any harm leaving them in.

Filter queries have false-positive matches. Exposed by user's list titled
Regarding geodist and multiple location fields
--

Attachments: SOLR-2829.patch, SOLR-2829.patch, SOLR-2829.patch,
SOLR-2829.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

61 matches

Mail list logo