date:20120912

[
https://issues.apache.org/jira/browse/LUCENE-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453811#comment-13453811
]

Uwe Schindler commented on LUCENE-4376:
---

The filter is already there, just QueryParser does not support this. To make
this work for your use case, you can override Lucene's/Solr's QueryParser to
return ConstantScoreQuery() with the LUCENE-3593 filter as replacement for the
field:* only query. The positive and negative variant works using the boolean
to the filter.

To conclude: The Query is already there, no need for the 2 new classes. The
wanted functionality is:
{code:java}
new ConstantScoreQuery(new FieldValueFilter(String field, boolean negate))
{code}

To find all document with any term in the field use negate=false, otherwise
negate=true. There is absolutely no need for a Query.

bq. Okay, so would it be straightforward and super-efficient for PrefixQuery to
do exactly that if the prefix term is zero-length?

Thats super-slow as it will search for all terms in the field. This is what
e.g. Solr is doing currently for the field:* queries. Solr should use the
filter, too, this would make that much more efficient.

Add Query subclasses for selecting documents where a field is empty or not
--

Key: LUCENE-4376
URL: https://issues.apache.org/jira/browse/LUCENE-4376
Project: Lucene - Core
Issue Type: Improvement
Components: core/query/scoring
Reporter: Jack Krupansky
Fix For: 5.0

Users frequently wish to select documents based on whether a specified
sparsely-populated field has a value or not. Lucene should provide specific
Query subclasses that optimize for these two cases, rather than force users
to guess what workaround might be most efficient. It is simplest for users to
use a simple pure wildcard term to check for non-empty fields or a negated
pure wildcard term to check for empty fields, but it has been suggested that
this can be rather inefficient, especially for text fields with many terms.
1. Add NonEmptyFieldQuery - selects all documents that have a value for the
specified field.
2. Add EmptyFieldQuery - selects all documents that do not have a value for
the specified field.
The query parsers could turn a pure wildcard query (asterisk only) into a
NonEmptyFieldQuery, and a negated pure wildcard query into an EmptyFieldQuery.
Alternatively, maybe PrefixQuery could detect pure wildcard and automatically
rewrite it into NonEmptyFieldQuery.
My assumption is that if the actual values of the field are not needed,
Lucene can much more efficiently simply detect whether values are present,
rather than, for example, the user having to create a separate boolean has
value field that they would query for true or false.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4376) Add Query subclasses for selecting documents where a field is empty or not

[
https://issues.apache.org/jira/browse/LUCENE-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453811#comment-13453811
]

Uwe Schindler edited comment on LUCENE-4376 at 9/12/12 6:50 PM:

To conclude: The Query is already there, no need for the 2 new classes. The
wanted functionality is:
{code:java}
new ConstantScoreQuery(new FieldValueFilter(String field, boolean negate))
{code}

To find all document with any term in the field use negate=false, otherwise
negate=true. There is absolutely no need for a Query.

bq. Okay, so would it be straightforward and super-efficient for PrefixQuery to
do exactly that if the prefix term is zero-length?

It would be straight forward, but we should not do this as the default
(although PrefixQuery could rewrite to that). The problem is that it
implicitely needs to build the FieldCache for that field, so automatism is
no-go here. If you need that functionality, modify QueryParser.

was (Author: thetaphi):
The filter is already there, just QueryParser does not support this. To
make this work for your use case, you can override Lucene's/Solr's QueryParser
to return ConstantScoreQuery() with the LUCENE-3593 filter as replacement for
the field:* only query. The positive and negative variant works using the
boolean to the filter.

To conclude: The Query is already there, no need for the 2 new classes. The
wanted functionality is:
{code:java}
new ConstantScoreQuery(new FieldValueFilter(String field, boolean negate))
{code}

To find all document with any term in the field use negate=false, otherwise
negate=true. There is absolutely no need for a Query.

bq. Okay, so would it be straightforward and super-efficient for PrefixQuery to
do exactly that if the prefix term is zero-length?

Add Query subclasses for selecting documents where a field is empty or not
--

Key: LUCENE-4376
URL: https://issues.apache.org/jira/browse/LUCENE-4376
Project: Lucene - Core
Issue Type: Improvement
Components: core/query/scoring
Reporter: Jack Krupansky
Fix For: 5.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4252) Detect/Fail tests when they leak RAM in static fields

2012-09-12 Thread Dawid Weiss (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-4252.
-

   Resolution: Fixed
Fix Version/s: 4.0
   5.0

 Detect/Fail tests when they leak RAM in static fields
 -

 Key: LUCENE-4252
 URL: https://issues.apache.org/jira/browse/LUCENE-4252
 Project: Lucene - Core
  Issue Type: Test
  Components: general/test
Reporter: Robert Muir
Assignee: Dawid Weiss
 Fix For: 5.0, 4.0

 Attachments: LUCENE-4252.patch, LUCENE-4252.patch, sfi.patch


 We run our junit tests without firing up a JVM each time.
 But some tests initialize lots of stuff in @BeforeClass and don't properly 
 null it out in an @AfterClass, which can cause a subsequent test in the same 
 JVM to OOM, which is difficult to debug.
 Inspiration for this was me committing Mike's cool TestPostingsFormat, which 
 forgot to do this: then we were seeing OOMs in several jenkins runs.
 We should try to detect these leaks in LuceneTestCase with RAMUsageEstimator 
 and fail the test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4345) Create a Classification module

2012-09-12 Thread Tommaso Teofili (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453853#comment-13453853
]

Tommaso Teofili commented on LUCENE-4345:
-

bq. Can we remove the ClassificationException? It only seems to box
IOException... we can just throw IOException directly instead?

sure, we can keep IOException for now

bq. What is the scale that you expect this bayesian classifier to handle? How
many training documents does it need?

I'm doing some benchmarking in these days therefore I should be able to say
something about this shortly.

Create a Classification module
--

Key: LUCENE-4345
URL: https://issues.apache.org/jira/browse/LUCENE-4345
Project: Lucene - Core
Issue Type: New Feature
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch,
SOLR-3700_2.patch, SOLR-3700.patch

Lucene/Solr can host huge sets of documents containing lots of information in
fields so that these can be used as training examples (w/ features) in order
to very quickly create classifiers algorithms to use on new documents and /
or to provide an additional service.
So the idea is to create a contrib module (called 'classification') to host a
ClassificationComponent that will use already seen data (the indexed
documents / fields) to classify new documents / text fragments.
The first version will contain a (simplistic) Lucene based Naive Bayes
classifier but more implementations should be added in the future.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Collator-based facet sorting in Solr

2012-09-12 Thread Robert Muir

On Wed, Sep 12, 2012 at 3:44 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:

 I seem to be missing something here. The ICUCollationKeyFilter can be at
 the end of the analyzer chain, so why can't the input be normalized
 before entering this filter?


ICUCollationKeyFilter is gone.

-- 
lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4345) Create a Classification module

2012-09-12 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453899#comment-13453899
 ] 

Tommaso Teofili commented on LUCENE-4345:
-

side note: it seems a bit old but I just realized something similar had been 
done in LUCENE-1039, maybe both impl could be then added in the future.

 Create a Classification module
 --

 Key: LUCENE-4345
 URL: https://issues.apache.org/jira/browse/LUCENE-4345
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
 Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, 
 SOLR-3700_2.patch, SOLR-3700.patch


 Lucene/Solr can host huge sets of documents containing lots of information in 
 fields so that these can be used as training examples (w/ features) in order 
 to very quickly create classifiers algorithms to use on new documents and / 
 or to provide an additional service.
 So the idea is to create a contrib module (called 'classification') to host a 
 ClassificationComponent that will use already seen data (the indexed 
 documents / fields) to classify new documents / text fragments.
 The first version will contain a (simplistic) Lucene based Naive Bayes 
 classifier but more implementations should be added in the future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Collator-based facet sorting in Solr

2012-09-12 Thread Robert Muir

On Wed, Sep 12, 2012 at 3:44 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:

 That would be a serious impediment. For some of our uncontrolled fields,
 the same word can be cased very differently: CD, cd, Cd. To be of the
 safe side, the client would have to ask for 3 times the wanted amount of
 facet information. But if we cannot normalize at index time,
 de-duplication on the server would require changes to the faceting code.

I'll open an issue for this. We should at least fix the analysis
factory APIs to support it, even if
the solr configuration xml doesn't yet have syntax.


 Regardless, it sounds that the idea passes the initial sanity check.
 Should I open a JIRA issue for it?

I think you should.

As an ugly workaround to the above problem: you could actually
construct a Lucene Analyzer with KeywordTokenizer(ICUCollationAtt)
followed by LowerCase/etc/etc and load that up with analyzer
class= in solr. I think that will work fine.

-- 
lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4379) Add AttributeFactory parameter to TokenizerFactory.create()

Robert Muir created LUCENE-4379:
---

 Summary: Add AttributeFactory parameter to 
TokenizerFactory.create()
 Key: LUCENE-4379
 URL: https://issues.apache.org/jira/browse/LUCENE-4379
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


Currently the analysis factories don't support using a different attribute 
factory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3828) Query Elevation component boosts excluded results in markExcludes mode

2012-09-12 Thread Alexey Serba (JIRA)

Alexey Serba created SOLR-3828:
--

 Summary: Query Elevation component boosts excluded results in 
markExcludes mode
 Key: SOLR-3828
 URL: https://issues.apache.org/jira/browse/SOLR-3828
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.0-BETA
Reporter: Alexey Serba
Priority: Trivial
 Fix For: 4.0


Query Elevation component boosts excluded results in markExcludes=true mode 
causing them to be higher on results than they should.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3828) Query Elevation component boosts excluded results in markExcludes mode

2012-09-12 Thread Alexey Serba (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serba updated SOLR-3828:
---

Attachment: SOLR-3828.patch

Attached patch (fix + test).

 Query Elevation component boosts excluded results in markExcludes mode
 --

 Key: SOLR-3828
 URL: https://issues.apache.org/jira/browse/SOLR-3828
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.0-BETA
Reporter: Alexey Serba
Priority: Trivial
 Fix For: 4.0

 Attachments: SOLR-3828.patch


 Query Elevation component boosts excluded results in markExcludes=true mode 
 causing them to be higher on results than they should.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4377) consolidate various copyBytes() methods


[ 
https://issues.apache.org/jira/browse/LUCENE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453901#comment-13453901
 ] 

Michael McCandless commented on LUCENE-4377:


+1

 consolidate various copyBytes() methods
 ---

 Key: LUCENE-4377
 URL: https://issues.apache.org/jira/browse/LUCENE-4377
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 5.0, 4.0

 Attachments: LUCENE-4377.patch


 Spinoff of LUCENE-4371:
 {quote}
 I don't think the default impl (SlicedIndexInput) should overrided BII's 
 copyBytes? Seems ... spooky.
 {quote}
 There are copyBytes everywhere, mostly not really being used.
 Particularly DataOutput.copyBytes(DataInput) versus 
 IndexInput.copyBytes(IndexOutput).
 Bulk merging already uses DataOutput.copyBytes(DataInput), its the most 
 general (as it works on DataInput/Output), and its in dst, src order.
 I think we should remove IndexInput.copyBytes, its not necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4377) consolidate various copyBytes() methods


[ 
https://issues.apache.org/jira/browse/LUCENE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453902#comment-13453902
 ] 

Uwe Schindler commented on LUCENE-4377:
---

+1, this annoyed me since long time!

 consolidate various copyBytes() methods
 ---

 Key: LUCENE-4377
 URL: https://issues.apache.org/jira/browse/LUCENE-4377
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 5.0, 4.0

 Attachments: LUCENE-4377.patch


 Spinoff of LUCENE-4371:
 {quote}
 I don't think the default impl (SlicedIndexInput) should overrided BII's 
 copyBytes? Seems ... spooky.
 {quote}
 There are copyBytes everywhere, mostly not really being used.
 Particularly DataOutput.copyBytes(DataInput) versus 
 IndexInput.copyBytes(IndexOutput).
 Bulk merging already uses DataOutput.copyBytes(DataInput), its the most 
 general (as it works on DataInput/Output), and its in dst, src order.
 I think we should remove IndexInput.copyBytes, its not necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2163) Remove synchronized from DirReader.reopen/clone


 [ 
https://issues.apache.org/jira/browse/LUCENE-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2163.


   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0
   5.0

 Remove synchronized from DirReader.reopen/clone
 ---

 Key: LUCENE-2163
 URL: https://issues.apache.org/jira/browse/LUCENE-2163
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 5.0, 4.0

 Attachments: LUCENE-2163.patch


 Spinoff from LUCENE-2161, where the fact that DirReader.reopen is
 sync'd was dangerous in the context of NRT (could block all searches
 against that reader when CMS was throttling).  So, with LUCENE-2161,
 we're removing the synchronization when it's an NRT reader that you're
 reopening.
 But... why should we sync even for a normal reopen?  There are
 various sync'd methods on IndexReader/DirReader (we are reducing that,
 with LUCENE-2161 and also LUCENE-2156), but, in general it doesn't
 seem like normal reopen really needs to be sync'd.  Performing a reopen
 shouldn't incur any chance of blocking a search...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2925) modules/* are excluded from the versioned site javadocs


 [ 
https://issues.apache.org/jira/browse/LUCENE-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2925.


   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0
   5.0

This is working now.

 modules/* are excluded from the versioned site javadocs
 ---

 Key: LUCENE-2925
 URL: https://issues.apache.org/jira/browse/LUCENE-2925
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/website, modules/analysis, modules/benchmark
Affects Versions: 4.0-ALPHA
Reporter: Steven Rowe
 Fix For: 5.0, 4.0


 The {{javadocs}} target in {{lucene/build.xml}} builds javadocs for the 
 versioned website, including for Lucene core and all contribs under 
 {{lucene/contrib/}}.  Nothing under {{modules/}} is included, but all modules 
 there should be.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3829) Admin UI Logging events broken if schema.xml defines a catch-all dynamicField with type ignored

2012-09-12 Thread Andreas Hubold (JIRA)

Andreas Hubold created SOLR-3829:


 Summary: Admin UI Logging events broken if schema.xml defines a 
catch-all dynamicField with type ignored
 Key: SOLR-3829
 URL: https://issues.apache.org/jira/browse/SOLR-3829
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.0-BETA
Reporter: Andreas Hubold


The Solr Admin page does not show any log events. There are Javascript errors

{noformat}
TypeError: doc.logger.esc is not a function
... 'abbr title=' + doc.logger.esc() + '' + doc.logger.split( '.' 
).pop().esc()...
{noformat}

This is because the response of the LoggingHandler added unexpected {{[ ... ]}} 
characters around the values for time, level, logger and message:

{noformat}
...
history:{numFound:2,start:0,docs:[{time:[2012-09-11T15:07:05.453Z],level:[WARNING],logger:[org.apache.solr.core.SolrCore],message:[New
 index directory detected: ...
{noformat}

This is caused by the way the JSON is created. 
org.apache.solr.logging.LogWatcher#toSolrDocument creates a SolrDocument which 
is then formatted with a org.apache.solr.response.JSONResponseWriter.
But the JSONResponseWriter uses the index schema to decide how to format the 
JSON. We have the following field declaration in schema.xml:

{noformat}
dynamicField name=* type=ignored /
{noformat}

The field type ignored has the attribute multiValued set to true. Because of 
this JSONResponseWriter adds [] characters in 
org.apache.solr.response.JSONWriter#writeSolrDocument

The formatting should be independent from schema.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3367) Show Logging Events in Admin UI

2012-09-12 Thread Andreas Hubold (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453919#comment-13453919
 ] 

Andreas Hubold commented on SOLR-3367:
--

This feature is broken in Solr 4.0-BETA - at least with certain schema.xml 
files. See SOLR-3829.

 Show Logging Events in Admin UI
 ---

 Key: SOLR-3367
 URL: https://issues.apache.org/jira/browse/SOLR-3367
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Ryan McKinley
Assignee: Stefan Matheis (steffkes)
 Fix For: 4.0-ALPHA

 Attachments: SOLR-3367.patch, SOLR-3367.patch, SOLR-3367.patch, 
 SOLR-3367.patch, SOLR-3367.png


 We can show logging events in the Admin UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3000) Lucene release artifacts should be named apache-lucene-*


 [ 
https://issues.apache.org/jira/browse/LUCENE-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3000.


Resolution: Won't Fix

 Lucene release artifacts should be named apache-lucene-*
 

 Key: LUCENE-3000
 URL: https://issues.apache.org/jira/browse/LUCENE-3000
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.0-ALPHA
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 4.1


 Our artifact names should be prefixed with apache-, as in 
 apache-lucene-4.0-src.tar.gz (or whatever)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4377) consolidate various copyBytes() methods


 [ 
https://issues.apache.org/jira/browse/LUCENE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4377.
-

Resolution: Fixed

 consolidate various copyBytes() methods
 ---

 Key: LUCENE-4377
 URL: https://issues.apache.org/jira/browse/LUCENE-4377
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 5.0, 4.0

 Attachments: LUCENE-4377.patch


 Spinoff of LUCENE-4371:
 {quote}
 I don't think the default impl (SlicedIndexInput) should overrided BII's 
 copyBytes? Seems ... spooky.
 {quote}
 There are copyBytes everywhere, mostly not really being used.
 Particularly DataOutput.copyBytes(DataInput) versus 
 IndexInput.copyBytes(IndexOutput).
 Bulk merging already uses DataOutput.copyBytes(DataInput), its the most 
 general (as it works on DataInput/Output), and its in dst, src order.
 I think we should remove IndexInput.copyBytes, its not necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4371) consider refactoring slicer to indexinput.slice


 [ 
https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4371:


Attachment: LUCENE-4371.patch

just syncing the patch up to trunk.

part of the funkiness i dont like is e.g. NIOFSIndexInput extends 
SimpleIndexInput. This is not good. I will see if i can clear that up in a 
separate issue.

 consider refactoring slicer to indexinput.slice
 ---

 Key: LUCENE-4371
 URL: https://issues.apache.org/jira/browse/LUCENE-4371
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4371.patch, LUCENE-4371.patch


 From LUCENE-4364:
 {quote}
 In my opinion, we should maybe check, if we can remove the whole Slicer in 
 all Indexinputs? Just make the slice(...) method return the current 
 BufferedIndexInput-based one. This could be another issue, once this is in.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4380) fix simplefs/niofshierarchy

Robert Muir created LUCENE-4380:
---

 Summary: fix simplefs/niofshierarchy
 Key: LUCENE-4380
 URL: https://issues.apache.org/jira/browse/LUCENE-4380
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4380.patch

spinoff from LUCENE-4371:

Currently NIOFSDirectory.NIOFSIndexInput extends 
SimpleFSDirectory.SimpleFSIndexInput, but this isn't an is-a relationship at 
all.

Additionally SimpleFSDirectory has a funky Descriptor class that extends 
RandomAccessFile that is useless:
{noformat}
/**
 * Extension of RandomAccessFile that tracks if the file is 
 * open.
 */
...
  // remember if the file is open, so that we don't try to close it
  // more than once
{noformat}

RandomAccessFile is closeable, this is not necessary and I don't think we 
should be subclassing it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4380) fix simplefs/niofshierarchy


 [ 
https://issues.apache.org/jira/browse/LUCENE-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4380:


Attachment: LUCENE-4380.patch

Here's a patch: i factored the shared logic into an FSIndexInput (parallel with 
FSIndexOutput) instead.

 fix simplefs/niofshierarchy
 ---

 Key: LUCENE-4380
 URL: https://issues.apache.org/jira/browse/LUCENE-4380
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4380.patch


 spinoff from LUCENE-4371:
 Currently NIOFSDirectory.NIOFSIndexInput extends 
 SimpleFSDirectory.SimpleFSIndexInput, but this isn't an is-a relationship at 
 all.
 Additionally SimpleFSDirectory has a funky Descriptor class that extends 
 RandomAccessFile that is useless:
 {noformat}
 /**
  * Extension of RandomAccessFile that tracks if the file is 
  * open.
  */
 ...
   // remember if the file is open, so that we don't try to close it
   // more than once
 {noformat}
 RandomAccessFile is closeable, this is not necessary and I don't think we 
 should be subclassing it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-12 Thread James Dyer (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454028#comment-13454028
]

James Dyer commented on SOLR-3823:
--

Hoss,

I appreciate you fixing this, but I would rather get a fix that preserves the
negative boost support (SOLR-3278). I guess I don't understand the bug this
issue was addressing. Is it simply that bq would fail if extra whitespace
was in the query? Could we write a failing testcase for that? Do you see a
reason why it would be difficult to fix this and retain the negative boosts?

The discussion of LUCENE-4378 is pertinent: we have products in our index that
we either do not sell or we know most of our customer do not want. Yet they
often score very high. The only way I can reliably prevent these from becoming
top hits is to use a negative boost. I would imagine this is a frequent
requirement.

I'm more than willing to contribute for this, but I couldn't tell that this
issue was an actual problem or a case of users putting whitespace where it
doesn't belong and prior versions being more forgiving.

Parentheses in a boost query cause errors
-

Key: SOLR-3823
URL: https://issues.apache.org/jira/browse/SOLR-3823
Project: Solr
Issue Type: Bug
Components: query parsers
Affects Versions: 4.0-BETA
Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer
Assignee: Hoss Man
Fix For: 4.0, 5.0

When using a boost query (bq) that contains a parentheses (like this example
from the Relevancy Cookbook section of the wiki):
{noformat}
? defType = dismax
q = foo bar
bq = (*:* -xxx)^999
{noformat}
You get the following error:
org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)':
Encountered ) ) at line 1, column 12. Was expecting one of: EOF
AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ...
^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM
... REGEXPTERM ... [ ... { ... NUMBER ...

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4373) BBoxStrategy should support query shapes of any type


[ 
https://issues.apache.org/jira/browse/LUCENE-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454036#comment-13454036
 ] 

David Smiley commented on LUCENE-4373:
--

As part of this, I think a makeValueSource() might be modified to alter the 
area similarity to consider the query shape's percentage of the bbox that it 
fills.  Perhaps something like this:
{code:java}
  public ValueSource makeValueSource(SpatialArgs args) {
Shape shape = args.getShape();
double queryPowerFactor = 1;
if (!(shape instanceof Rectangle)) {
  double queryBBoxArea = shape.getBoundingBox().getArea(ctx);
  double queryArea = shape.getArea(ctx);
  if (queryBBoxArea != 0)
queryPowerFactor = queryArea / queryBBoxArea;
}
return new BBoxSimilarityValueSource(
this, new AreaSimilarity(shape.getBoundingBox(), queryPower * 
queryPowerFactor, targetPower));
  }
{code}


 BBoxStrategy should support query shapes of any type
 

 Key: LUCENE-4373
 URL: https://issues.apache.org/jira/browse/LUCENE-4373
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
Priority: Minor

 It's great that BBoxStrategy has sophisticated shape area similarity based on 
 bounding box, but I think that doesn't have to preclude having a 
 non-rectangular query shape.  The bbox to bbox query implemented already is 
 probably pretty pretty fast as can work by numeric range queries, but I'd 
 like this to be the first stage of which the 2nd is a FieldCache based 
 comparison to the query shape if it's not a rectangle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys


 [ 
https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4173:
-

Attachment: 
LUCENE-4173_remove_IgnoreIncompatibleGeometry,_fail_when_given_the_exact_shape_needed.patch

Updated the patch:
* renamed the test method with the underscore to be 
convertShapeFromGetDocuments instead
* In BBoxStrategy.makeValueSource, I moved my TODO bbox shape similarity idea 
to a comment on a JIRA issue.  And I modified this makeValueSource to fail if a 
rectangle is not given, instead of coalescing via getBoundingBox().

 Remove IgnoreIncompatibleGeometry for SpatialStrategys
 --

 Key: LUCENE-4173
 URL: https://issues.apache.org/jira/browse/LUCENE-4173
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
Assignee: David Smiley
 Fix For: 4.0

 Attachments: LUCENE-4173.patch, 
 LUCENE-4173_remove_ignoreIncompatibleGeometry,_fail_when_given_the_exact_shape_needed.patch,
  
 LUCENE-4173_remove_IgnoreIncompatibleGeometry,_fail_when_given_the_exact_shape_needed.patch


 Silently not indexing anything for a Shape is not okay.  Users should get an 
 Exception and then they can decide how to proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3830) Rename LFUCache to FastLFUCache

2012-09-12 Thread Adrien Grand (JIRA)

Adrien Grand created SOLR-3830:
--

 Summary: Rename LFUCache to FastLFUCache
 Key: SOLR-3830
 URL: https://issues.apache.org/jira/browse/SOLR-3830
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0-BETA
Reporter: Adrien Grand
Priority: Minor


I find it a little disturbing that LFUCache shares most of its behavior (not 
strictly bounded size, good at concurrent reads, slow at writes unless eviction 
is performed in a separate thread) with FastLRUCache while it sounds like it is 
the LFU equivalent of LRUCache (strictly bounded size, synchronized reads, fast 
writes) so I'd like to rename it to FastLFUCache.

Maybe we should also rename these Fast*Cache to Concurrent*Cache so that people 
don't think that they are better than their non Fast alternatives in every way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-12 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454059#comment-13454059
 ] 

Erick Erickson commented on SOLR-3823:
--

James:

The problem was quite the opposite. When there was NO space in the bq clause 
it'd fail like this, i.e.
bq=(stuff). And when there was space, I don't think it worked at all

But yeah, it'd be good to have both parens and negative boosts...





 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer
Assignee: Hoss Man
 Fix For: 4.0, 5.0


 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
 {noformat}
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 {noformat}
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3830) Rename LFUCache to FastLFUCache

[
https://issues.apache.org/jira/browse/SOLR-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454067#comment-13454067
]

Yonik Seeley commented on SOLR-3830:

bq. so I'd like to rename it to FastLFUCache.

+1, it hasn't been used in the default solrconfig.xml, so this change shouldn't
really affect anyone.

I don't think we should rename the other ones that people are likely to have in
their configs already though. There is a note right next to where one would
configure these cases that tries to explain the difference. We should update
that if it's not sufficient.

Rename LFUCache to FastLFUCache
---

Key: SOLR-3830
URL: https://issues.apache.org/jira/browse/SOLR-3830
Project: Solr
Issue Type: Bug
Affects Versions: 4.0-BETA
Reporter: Adrien Grand
Priority: Minor

I find it a little disturbing that LFUCache shares most of its behavior (not
strictly bounded size, good at concurrent reads, slow at writes unless
eviction is performed in a separate thread) with FastLRUCache while it sounds
like it is the LFU equivalent of LRUCache (strictly bounded size,
synchronized reads, fast writes) so I'd like to rename it to FastLFUCache.
Maybe we should also rename these Fast*Cache to Concurrent*Cache so that
people don't think that they are better than their non Fast alternatives in
every way.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: svn commit: r1384000 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/BaseCompositeReader.java

2012-09-12 Thread Uwe Schindler

Thanks! This was overseen...

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: rm...@apache.org [mailto:rm...@apache.org]
 Sent: Wednesday, September 12, 2012 5:35 PM
 To: comm...@lucene.apache.org
 Subject: svn commit: r1384000 -
 /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/BaseCompo
 siteReader.java
 
 Author: rmuir
 Date: Wed Sep 12 15:34:56 2012
 New Revision: 1384000
 
 URL: http://svn.apache.org/viewvc?rev=1384000view=rev
 Log:
 LUCENE-4306: dont upgrade this method to public in BaseCompositeReader
 
 Modified:
 
 lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/BaseComposi
 teReader.java
 
 Modified:
 lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/BaseComposi
 teReader.java
 URL:
 http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apac
 he/lucene/index/BaseCompositeReader.java?rev=1384000r1=1383999r2=1
 384000view=diff
 
 ==
 ---
 lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/BaseComposi
 teReader.java (original)
 +++
 lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/index/BaseComposi
 teReader.java Wed Sep 12 15:34:56 2012
 @@ -151,7 +151,7 @@ public abstract class BaseCompositeReade
}
 
@Override
 -  public final List? extends R getSequentialSubReaders() {
 +  protected final List? extends R getSequentialSubReaders() {
  return subReadersList;
}
  }



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3830) Rename LFUCache to FastLFUCache


[ 
https://issues.apache.org/jira/browse/SOLR-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454085#comment-13454085
 ] 

Hoss Man commented on SOLR-3830:


-1.

Repeating my comment from SOLR-3393...

{quote}
#OhDearGodPleaseNotAnotherClassWithFastInTheName

Please, please, please lets end the madness of subjective adjectives in class 
names ... if it's an LFU cache wrapped around a hawtdb why don't we just call 
it HawtDbLFUCache ?
{quote}

we should not be adding new names with Fast in front of them - it does 
nothing to help the user understand the value of the class.

{quote}
Maybe we should also rename these Fast*Cache to Concurrent*Cache so that people 
don't think that they are better than their non Fast alternatives in every way.
{quote}

I would much rather rename FastLRUCache to something else (with a deprecated 
FastLRUCache stub subclass still provided for config backcompat) then see any 
more a new Fast*Foo class.

 Rename LFUCache to FastLFUCache
 ---

 Key: SOLR-3830
 URL: https://issues.apache.org/jira/browse/SOLR-3830
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0-BETA
Reporter: Adrien Grand
Priority: Minor

 I find it a little disturbing that LFUCache shares most of its behavior (not 
 strictly bounded size, good at concurrent reads, slow at writes unless 
 eviction is performed in a separate thread) with FastLRUCache while it sounds 
 like it is the LFU equivalent of LRUCache (strictly bounded size, 
 synchronized reads, fast writes) so I'd like to rename it to FastLFUCache.
 Maybe we should also rename these Fast*Cache to Concurrent*Cache so that 
 people don't think that they are better than their non Fast alternatives in 
 every way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Error when Integrating wordnet to Lucene

2012-09-12 Thread noname90

Hi all,

I want to integrate wordnet 3.0 to Lucene 4.0. This is my code:

1. String op = new Scanner(new
File(E:\\...\\WNprolog-3.0\\prolog\\wn_s.pl)).useDelimiter(\\Z).next() ;
2. WordnetSynonymParser parser = new WordnetSynonymParser(true, true, 

   
new StandardAnalyzer(Version.LUCENE_40));
3. parser.add(new StringReader(op));
4. SynonymMap map = parser.build();

But when the 3rd line was executed, I got this error:
/Invalid synonym rule at line 109/

I don't know what the cause is. Could you please help me with this problem?

Thank you so much. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-when-Integrating-wordnet-to-Lucene-tp4007141.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors

[
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454104#comment-13454104
]

Hoss Man commented on SOLR-3823:

bq. I couldn't tell that this issue was an actual problem or a case of users
putting whitespace where it doesn't belong and prior versions being more
forgiving.

James: the core of the bug was your use of SolrPluginUtils.parseFieldBoosts to
try and parse the bq params.

This is not safe -- if you look at the method it is an extremely trivial
utility that is specific for parsing qf/pf style strings containing a list of
field names and boosts. it's _not_ a safe way to parse an arbitrary query
string, and any non trivial query string can cause problems with it.

AS you noted in SOLR-3278, parseFieldBoosts is used for parsing the bf param
and that's actually a long standing unsafe bug as well (SOLR-2014) but since
functions tend to be much simpler, it's historically been less problematic.
when people run into problems with it, the workarround is to use
bq={!func}... instead.

bq. I would rather get a fix that preserves the negative boost support

Since SOLR-3278 had not been released publicly outside of the ALPHA/BETA, my
first priority was to fix the regression compared to 3.x where non-trivial bq
queries worked fine.

The documented method of dealing with negative boosting in solr is actually
the type of query that was the crux of this bug report, and i updated the tests
you added in SOLR-3278 to use that pattern...

https://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_negative_.28or_very_low.29_boost_to_documents_that_match_a_query.3F

I have no objections to supporting true negative boosts, but i think the
right way to do it is in the query parsers / QParsers themselves (so that the
boosts can be on any clause) and not just as a special hack for bq/bf (the fact
that it works in bf is actualy just a fluke of it's buggy implementation) but
as you can see in LUCENE-4378 this is a contentious idea.

Parentheses in a boost query cause errors
-

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4173) Remove IgnoreIncompatibleGeometry for SpatialStrategys


 [ 
https://issues.apache.org/jira/browse/LUCENE-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-4173.
--

Resolution: Fixed

I received Chris's blessing on these changes in chat and I committed now.

Trunk: r1384026, 4x: r1384028



 Remove IgnoreIncompatibleGeometry for SpatialStrategys
 --

 Key: LUCENE-4173
 URL: https://issues.apache.org/jira/browse/LUCENE-4173
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Reporter: Chris Male
Assignee: David Smiley
 Fix For: 4.0

 Attachments: LUCENE-4173.patch, 
 LUCENE-4173_remove_ignoreIncompatibleGeometry,_fail_when_given_the_exact_shape_needed.patch,
  
 LUCENE-4173_remove_IgnoreIncompatibleGeometry,_fail_when_given_the_exact_shape_needed.patch


 Silently not indexing anything for a Shape is not okay.  Users should get an 
 Exception and then they can decide how to proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3823) Parentheses in a boost query cause errors

2012-09-12 Thread James Dyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454126#comment-13454126
 ] 

James Dyer commented on SOLR-3823:
--

Hoss,  Thank you for working through this and opening Lucene-4378 to at least 
investigate changing the parser grammar.  I understand the issue with what I 
had done initially and appreciate your help on this.

 Parentheses in a boost query cause errors
 -

 Key: SOLR-3823
 URL: https://issues.apache.org/jira/browse/SOLR-3823
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-BETA
 Environment: Mac, jdk 1.6, Chrome
Reporter: Mathos Marcer
Assignee: Hoss Man
 Fix For: 4.0, 5.0


 When using a boost query (bq) that contains a parentheses (like this example 
 from the Relevancy Cookbook section of the wiki):
 {noformat}
  ? defType = dismax 
  q = foo bar 
  bq = (*:* -xxx)^999 
 {noformat}
 You get the following error:
 org.apache.lucene.queryparser.classic.ParseException: Cannot parse '-xxx)': 
 Encountered  ) )  at line 1, column 12. Was expecting one of: EOF 
 AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... 
 ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM 
 ... REGEXPTERM ... [ ... { ... NUMBER ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1223) Query Filter fq with OR operator

2012-09-12 Thread Ron Buchanan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454135#comment-13454135
 ] 

Ron Buchanan commented on SOLR-1223:


If you care for input from a nobody that's fairly new to Solr, I like Hoss 
Man's idea - and I very, very much want this   

Though my thought was that it would make sense to use the v=$paramName 
facility and just add multiple instances of paramName


 Query Filter fq with OR operator
 

 Key: SOLR-1223
 URL: https://issues.apache.org/jira/browse/SOLR-1223
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Brian Pearson

 See this 
 [issue|http://lucene.472066.n3.nabble.com/Query-Filter-fq-with-OR-operator-td499172.html]
  for some background. Today, all of the Query filters specified with the fq 
 parameter are AND'd together.
 This issue is about allowing a set of filters to be OR'd together (in 
 addition to having another set of filters that are AND'd). The OR'd filters 
 would of course be applied before any scoring is done.
 The advantage of this feature is that you will be able to break up complex 
 filters into simple, more cacheable filters, which should improve 
 performance. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3830) Rename LFUCache to FastLFUCache


[ 
https://issues.apache.org/jira/browse/SOLR-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454154#comment-13454154
 ] 

Yonik Seeley commented on SOLR-3830:


OK, let's leave things as they are then.  Documentation is the key if we need 
to clarify anything.

 Rename LFUCache to FastLFUCache
 ---

 Key: SOLR-3830
 URL: https://issues.apache.org/jira/browse/SOLR-3830
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0-BETA
Reporter: Adrien Grand
Priority: Minor

 I find it a little disturbing that LFUCache shares most of its behavior (not 
 strictly bounded size, good at concurrent reads, slow at writes unless 
 eviction is performed in a separate thread) with FastLRUCache while it sounds 
 like it is the LFU equivalent of LRUCache (strictly bounded size, 
 synchronized reads, fast writes) so I'd like to rename it to FastLFUCache.
 Maybe we should also rename these Fast*Cache to Concurrent*Cache so that 
 people don't think that they are better than their non Fast alternatives in 
 every way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454167#comment-13454167
 ] 

Robert Muir commented on LUCENE-4369:
-

How about WholeTextField? thats fine with me. Does anyone object?

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

[
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454172#comment-13454172
]

Robert Muir commented on LUCENE-4369:
-

ok just a few downsides of 'whole':
* it seems similar to full, like full-text field. but StringField is not that.
* then what is TextField, only partial?

Guys i realistically dont think we are going to come up with a perfect name
here that everyone likes.

But I think enough people agree that StringField is bad.

I seriously propose ASDFGHIJField in the interim, we gotta make some
incremental progress.

StringFields name is unintuitive and not helpful

Key: LUCENE-4369
URL: https://issues.apache.org/jira/browse/LUCENE-4369
Project: Lucene - Core
Issue Type: Bug
Reporter: Robert Muir
Attachments: LUCENE-4369.patch

There's a huge difference between TextField and StringField, StringField
screws up scoring and bypasses your Analyzer.
(see java-user thread Custom Analyzer Not Called When Indexing as an
example.)
The name we use here is vital, otherwise people will get bad results.
I think we should rename StringField to MatchOnlyField.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454175#comment-13454175
 ] 

Uwe Schindler commented on LUCENE-4369:
---

WholeTextField sounds like Starbucks...

I would like UntokenizedField.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-12 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454186#comment-13454186
 ] 

Steven Rowe commented on LUCENE-4369:
-

Some more choices: AsIsTextField, IntactTextField, UnSoiledTextField, 
HalfCaffLatteField

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-12 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454194#comment-13454194
 ] 

Shai Erera commented on LUCENE-4369:


bq. I would like UntokenizedField

+1 for that. I don't think we should underestimate Lucene users to the point 
that they don't understand what an Analyzer is, or tokenization means. When 
they create IWC, they need to specify an Analyzer. I think, seriously, that 
Analyzer is as basic as Document.

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454200#comment-13454200
 ] 

Robert Muir commented on LUCENE-4369:
-

I am +1 for UntokenizedField too. This is much more intuitive than StringField!

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454204#comment-13454204
 ] 

Hoss Man commented on LUCENE-4369:
--

Didn't we spcifically get rid of an enums called TOKENIZED and UN_TOKENIZED 
because they convoluted the concept of tokenization with analysis?  weren't 
there users who wanted keyword tokenization combined with other tokenfilters 
who thought UN_TOKENIZED was what they wanted?

Perhaps TextField should be renamed AnalyzedTextField and StringField should be 
NonAnalyzedTextField ?

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-12 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454205#comment-13454205
 ] 

Shai Erera commented on LUCENE-4369:


Great, then do we have a winner? :)

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454236#comment-13454236
 ] 

Uwe Schindler commented on LUCENE-4369:
---

I never understood the difference and why this was renamed in 2.4. For me the 
issue explains nothing and the mailing list thread referenced from there is in 
my opinion unrelated.

I am also fine with replacing tokenized with analyzed.

Inert question: why is it called Tokenizer and not Analyzerator?

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-12 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454242#comment-13454242
]

Erick Erickson commented on LUCENE-4369:

Shai:

bq: ...I don't think we should underestimate Lucene users to the point that
they don't understand what an Analyzer...

I absolutely agree with you about _Lucene_ users, but I disagree when we're
talking about _Solr_ users who are just using the schema.xml file. I flat
guarantee that they don't always look under the covers. I've seen way more than
one site with solr rocks as the first/newSearchers.

But all that said, I'm not doing the work so whatever gets chosen is fine with
me.

StringFields name is unintuitive and not helpful

Key: LUCENE-4369
URL: https://issues.apache.org/jira/browse/LUCENE-4369
Project: Lucene - Core
Issue Type: Bug
Reporter: Robert Muir
Attachments: LUCENE-4369.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2608) TestReplicationHandler is flakey


[ 
https://issues.apache.org/jira/browse/SOLR-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454243#comment-13454243
 ] 

Hoss Man commented on SOLR-2608:


I can't comment on the specific exceptions mentioned above, but havin recently 
looked at TestReplicationHandler because of SOLR-3809 i noticed a few things i 
thought i'd comment on here...

at some point it was annotated as @Slow - i believe the crux of the problem 
with why it can be very slow for some people is that the majority of the 
functionality being tested seems to rely on the slave polling the master for 
replication, and the rQuery method used through out the test will retry 
queries over and over (up to 30 seconds) until they pass.

While we should definitley have some test that the polling works, a lot of the 
functionality not-polling specific could probably be tested more reliably using 
on demand snappulling commands to the slave.

 TestReplicationHandler is flakey
 

 Key: SOLR-2608
 URL: https://issues.apache.org/jira/browse/SOLR-2608
 Project: Solr
  Issue Type: Bug
Reporter: selckin

 I've been running some while(1) tests on trunk, and TestReplicationHandler is 
 very flakey it fails about every 10th run.
 Probably not a bug, but the test not waiting correctly
 {code}
 [junit] Testsuite: org.apache.solr.handler.TestReplicationHandler
 [junit] Testcase: org.apache.solr.handler.TestReplicationHandler:   FAILED
 [junit] ERROR: SolrIndexSearcher opens=48 closes=47
 [junit] junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher 
 opens=48 closes=47
 [junit] at 
 org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:131)
 [junit] at 
 org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:74)
 [junit] 
 [junit] 
 [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 40.772 sec
 [junit] 
 [junit] - Standard Error -
 [junit] 19-Jun-2011 21:26:44 org.apache.solr.handler.SnapPuller 
 fetchLatestIndex
 [junit] SEVERE: Master at: http://localhost:51817/solr/replication is not 
 available. Index fetch failed. Exception: Connection refused
 [junit] 19-Jun-2011 21:26:49 org.apache.solr.common.SolrException log
 [junit] SEVERE: java.util.concurrent.RejectedExecutionException
 [junit] at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
 [junit] at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 [junit] at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 [junit] at 
 java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
 [junit] at 
 java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:603)
 [junit] at 
 org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1149)
 [junit] at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:346)
 [junit] at 
 org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:483)
 [junit] at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:332)
 [junit] at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:267)
 [junit] at 
 org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:166)
 [junit] at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 [junit] at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
 [junit] at 
 java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
 [junit] at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
 [junit] at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
 [junit] at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
 [junit] at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 [junit] at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 [junit] at java.lang.Thread.run(Thread.java:662)
 [junit] 
 [junit] 19-Jun-2011 21:26:51 org.apache.solr.update.SolrIndexWriter 
 finalize
 [junit] SEVERE: SolrIndexWriter was not closed prior to finalize(), 
 indicates a bug -- POSSIBLE RESOURCE LEAK!!!
 [junit] 19-Jun-2011 21:26:51 
 org.apache.solr.common.util.ConcurrentLRUCache finalize
 [junit] SEVERE: ConcurrentLRUCache was not destroyed prior to finalize(), 
 indicates a bug -- POSSIBLE RESOURCE LEAK!!!

[jira] [Updated] (SOLR-3809) Replication of config files fails when using sub directories


 [ 
https://issues.apache.org/jira/browse/SOLR-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3809:
---

Attachment: SOLR-3809.patch

i modified TestReplicationHandler to demonstrate the original bug Emma 
mentioned, and then merged in his patch to show that it fixed the problem -- 
however i then modified the fix quite a bit, as it was doing some wonky stuff 
(like equality comparisons between a string path and a File object)

I think this patch is good to go.

 Replication of config files fails when using sub directories
 

 Key: SOLR-3809
 URL: https://issues.apache.org/jira/browse/SOLR-3809
 Project: Solr
  Issue Type: Bug
Reporter: Emmanuel Espina
Assignee: Hoss Man
 Fix For: 4.0

 Attachments: SOLR-3809.patch, SOLR-3809.patch


 If you want to replicate a configuration file inside a subdirectory of conf 
 directory (eg conf/stopwords/english.txt) Solr fails because it cannot find 
 the subdirectory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

2012-09-12 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454252#comment-13454252
 ] 

Steven Rowe commented on LUCENE-4369:
-

bq. I never understood the difference and why this was renamed in 2.4. For me 
the issue explains nothing and the mailing list thread referenced from there is 
in my opinion unrelated.

Yeah, no.  Totally related, see e.g. 
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200808.mbox/%3c184419b1-6589-41cb-b5d4-3ea9c4215...@mikemccandless.com%3E

 StringFields name is unintuitive and not helpful
 

 Key: LUCENE-4369
 URL: https://issues.apache.org/jira/browse/LUCENE-4369
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4369.patch


 There's a huge difference between TextField and StringField, StringField 
 screws up scoring and bypasses your Analyzer.
 (see java-user thread Custom Analyzer Not Called When Indexing as an 
 example.)
 The name we use here is vital, otherwise people will get bad results.
 I think we should rename StringField to MatchOnlyField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4369) StringFields name is unintuitive and not helpful

[
https://issues.apache.org/jira/browse/LUCENE-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454254#comment-13454254
]

Hoss Man commented on LUCENE-4369:
--

bq. the mailing list thread referenced from there is in my opinion unrelated.

Did you read the whole thread? It's littered with comments about confusion
between how that UN_TOKENIZED related to the Analyzer configured on the
IndexWriter -- some people thought it ment the *tokenizer* in the Analyzer
wouldn't be used, bu the rest of their analyzer would. It's very
representative of lots of other threads i'd seen over the years.

bq. I disagree when we're talking about Solr users who are just using the
schema.xml file

I don't think anyone is talking about changing solr.StrField and solr.TextField
-- this issue is about the convincient subclasses of oal.document.Field...

https://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/document/Field.html

StringFields name is unintuitive and not helpful

Key: LUCENE-4369
URL: https://issues.apache.org/jira/browse/LUCENE-4369
Project: Lucene - Core
Issue Type: Bug
Reporter: Robert Muir
Attachments: LUCENE-4369.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3830) Rename LFUCache to FastLFUCache

2012-09-12 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454291#comment-13454291
 ] 

Adrien Grand commented on SOLR-3830:


bq. we should not be adding new names with Fast in front of them

This is why I also suggested to rename FastLRUCache to ConcurrentLRUCache in my 
2nd paragraph (or something else, I'm open to other ideas).

bq. OK, let's leave things as they are then. Documentation is the key if we 
need to clarify anything.

Why don't you like renaming FastLRUCache to something else and adding a 
deprecated FastLRUCache subclass for backward compatibility, as Chris suggests?

 Rename LFUCache to FastLFUCache
 ---

 Key: SOLR-3830
 URL: https://issues.apache.org/jira/browse/SOLR-3830
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0-BETA
Reporter: Adrien Grand
Priority: Minor

 I find it a little disturbing that LFUCache shares most of its behavior (not 
 strictly bounded size, good at concurrent reads, slow at writes unless 
 eviction is performed in a separate thread) with FastLRUCache while it sounds 
 like it is the LFU equivalent of LRUCache (strictly bounded size, 
 synchronized reads, fast writes) so I'd like to rename it to FastLFUCache.
 Maybe we should also rename these Fast*Cache to Concurrent*Cache so that 
 people don't think that they are better than their non Fast alternatives in 
 every way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3830) Rename LFUCache to FastLFUCache


[ 
https://issues.apache.org/jira/browse/SOLR-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454299#comment-13454299
 ] 

Yonik Seeley commented on SOLR-3830:


I have a higher bar for renaming things in config files and APIs.
Solr has a large user base with tons of people that know what things do, and we 
often overlook the downside of destroying collective knowledge by renaming 
things that are only a slight improvement.

I personally think Lucene has gone rename-crazy and wouldn't do many of those 
if it were up to me...

 Rename LFUCache to FastLFUCache
 ---

 Key: SOLR-3830
 URL: https://issues.apache.org/jira/browse/SOLR-3830
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0-BETA
Reporter: Adrien Grand
Priority: Minor

 I find it a little disturbing that LFUCache shares most of its behavior (not 
 strictly bounded size, good at concurrent reads, slow at writes unless 
 eviction is performed in a separate thread) with FastLRUCache while it sounds 
 like it is the LFU equivalent of LRUCache (strictly bounded size, 
 synchronized reads, fast writes) so I'd like to rename it to FastLFUCache.
 Maybe we should also rename these Fast*Cache to Concurrent*Cache so that 
 people don't think that they are better than their non Fast alternatives in 
 every way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3830) Rename LFUCache to FastLFUCache

2012-09-12 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved SOLR-3830.


Resolution: Won't Fix

Given that we can neither rename LFUCache to FastLFUCache nor rename 
FastLRUCache to something else, I am marking this issue as won't fix since 
there is no way to have a consistent name for these two classes.

 Rename LFUCache to FastLFUCache
 ---

 Key: SOLR-3830
 URL: https://issues.apache.org/jira/browse/SOLR-3830
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0-BETA
Reporter: Adrien Grand
Priority: Minor

 I find it a little disturbing that LFUCache shares most of its behavior (not 
 strictly bounded size, good at concurrent reads, slow at writes unless 
 eviction is performed in a separate thread) with FastLRUCache while it sounds 
 like it is the LFU equivalent of LRUCache (strictly bounded size, 
 synchronized reads, fast writes) so I'd like to rename it to FastLFUCache.
 Maybe we should also rename these Fast*Cache to Concurrent*Cache so that 
 people don't think that they are better than their non Fast alternatives in 
 every way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3815) add hash range to shard


 [ 
https://issues.apache.org/jira/browse/SOLR-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-3815:
---

Attachment: SOLR-3815_addrange.patch

Here's a start on adding ranges to shard properties.  Seems to work at first 
and then gets lost on an update currently.

Example:
{code}
{collection1:{
shard1:{replicas:{Rogue:8983_solr_collection1:{
  shard:shard1,
  leader:true,
  roles:null,
  state:active,
  core:collection1,
  collection:collection1,
  node_name:Rogue:8983_solr,
  base_url:http://Rogue:8983/solr}}},
shard2:{
  range:0-7fff,
  replicas:{
{code}

 add hash range to shard
 ---

 Key: SOLR-3815
 URL: https://issues.apache.org/jira/browse/SOLR-3815
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Attachments: SOLR-3815_addrange.patch, SOLR-3815.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3831) atomic updates to fields of type payloads do not distribute correctly

2012-09-12 Thread Jim Musil (JIRA)

Jim Musil created SOLR-3831:
---

 Summary: atomic updates to fields of type payloads do not 
distribute correctly
 Key: SOLR-3831
 URL: https://issues.apache.org/jira/browse/SOLR-3831
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-BETA
 Environment: linux
Reporter: Jim Musil


After setting up two independent solr nodes using the SolrCloud tutorial, 
atomic updates to a field of type payloads gives an error when updating the 
destination node.

The error is:

SEVERE: java.lang.NumberFormatException: For input string: 100}

The input sent to the first node is in the expected default format for a 
payload field (eg foo|100) and that update succeeds. I've found that the 
update always works for the first node, but never the second.

I've tested each server running independently and found that this update works 
as expected.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3796) I am getting 404 when accessing http://localhost:7101/wcoe-solr/admin

2012-09-12 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454405#comment-13454405
 ] 

Erick Erickson commented on SOLR-3796:
--

Is this still a problem? This is probably better raised on the user's list 
before making this a JIRA.

 I am getting 404 when accessing http://localhost:7101/wcoe-solr/admin
 -

 Key: SOLR-3796
 URL: https://issues.apache.org/jira/browse/SOLR-3796
 Project: Solr
  Issue Type: Bug
  Components: Build, web gui
Affects Versions: 3.6.1
 Environment: windows XP/Weblogic
Reporter: Sridharan

 I deployed solr.war successfully in weblogic 9
 I got the welcome page when i access http://localhost:7101/wcoe-solr/
 But it gives 404 error, when i access the admin 
 http://localhost:7101/wcoe-solr/admin;
 Please help

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Admin UI, schema browser, numbers squished together...

2012-09-12 Thread Erick Erickson

When I go into the new Admin UIschema browser and select a field
that has lots of terms in it, the display isn't correct. The count
field to the left of the each term value is cut off, making it very
hard to actually see the term counts. I'm in a situation where I have
many thousands of docs that have a particular term

Worth a JIRA? I didn't see any relevant ones on a fast scan

Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

[
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Smiley updated LUCENE-4208:
-

Attachment:
LUCENE-4208_makeQuery_return_ConstantScoreQuery_and_remake_TwoDoublesStrategy.patch

This patch is the start of something I hope to finish tonight. makeValueSource
is makeDistanceValueSource to make abundantly clear. TwoDoubles is getting
overhauled to support the dateline and any query shape--should probably go into
another issue.

Spatial distance relevancy should use score of 1/distance
-

Key: LUCENE-4208
URL: https://issues.apache.org/jira/browse/LUCENE-4208
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Fix For: 4.0

Attachments:
LUCENE-4208_makeQuery_return_ConstantScoreQuery_and_remake_TwoDoublesStrategy.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4381) support unicode 6.2

Robert Muir created LUCENE-4381:
---

 Summary: support unicode 6.2
 Key: LUCENE-4381
 URL: https://issues.apache.org/jira/browse/LUCENE-4381
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
 Fix For: 4.1, 5.0


ICU will release a new version in about a month.

They have a version for testing 
(http://site.icu-project.org/download/milestone) already out with some 
interesting features, e.g. dictionary-based CJK segmentation.

This issue is just to test it out/integrate the new stuff/etc. We should try 
out the automation Steve did as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4381) support unicode 6.2

[
https://issues.apache.org/jira/browse/LUCENE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-4381:

Attachment: LUCENE-4381.patch

A hacked up patch for testing:

I think its nice to offer the CJK dictionary-based stuff as an option? I'm not
sure how good results will be on average yet (maybe I can enlist Christian to
help investigate).

So as a test I just added a boolean option, which if enabled, keeps all
han/hiragana/katakana marked as Chinese/Japanese (uses the 15924 Japanese
code, but I overrode the toString to try to prevent confusion).

Seems to work ok: some trivial snippets from smartcn and kuromoji are analyzed
fine, and testRandomStrings is happy :)

support unicode 6.2
---

Key: LUCENE-4381
URL: https://issues.apache.org/jira/browse/LUCENE-4381
Project: Lucene - Core
Issue Type: Task
Components: modules/analysis
Reporter: Robert Muir
Fix For: 4.1, 5.0

Attachments: LUCENE-4381.patch

ICU will release a new version in about a month.
They have a version for testing
(http://site.icu-project.org/download/milestone) already out with some
interesting features, e.g. dictionary-based CJK segmentation.
This issue is just to test it out/integrate the new stuff/etc. We should try
out the automation Steve did as well.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4380) fix simplefs/niofshierarchy


[ 
https://issues.apache.org/jira/browse/LUCENE-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454495#comment-13454495
 ] 

Michael McCandless commented on LUCENE-4380:


+1, this is a very nice simplification.

 fix simplefs/niofshierarchy
 ---

 Key: LUCENE-4380
 URL: https://issues.apache.org/jira/browse/LUCENE-4380
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4380.patch


 spinoff from LUCENE-4371:
 Currently NIOFSDirectory.NIOFSIndexInput extends 
 SimpleFSDirectory.SimpleFSIndexInput, but this isn't an is-a relationship at 
 all.
 Additionally SimpleFSDirectory has a funky Descriptor class that extends 
 RandomAccessFile that is useless:
 {noformat}
 /**
  * Extension of RandomAccessFile that tracks if the file is 
  * open.
  */
 ...
   // remember if the file is open, so that we don't try to close it
   // more than once
 {noformat}
 RandomAccessFile is closeable, this is not necessary and I don't think we 
 should be subclassing it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4382) Unicode escape no longer works for non-prefix wildcard terms

2012-09-12 Thread Jack Krupansky (JIRA)

Jack Krupansky created LUCENE-4382:
--

 Summary: Unicode escape no longer works for non-prefix wildcard 
terms
 Key: LUCENE-4382
 URL: https://issues.apache.org/jira/browse/LUCENE-4382
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.0-BETA
Reporter: Jack Krupansky
 Fix For: 4.0


LUCENE-588 added support for escaping of wildcard characters, but when the 
de-escaping logic was pushed down from the query parser (QueryParserBase) into 
WildcardQuery, support for Unicode escaping (backslash, u, and the four-digit 
hex Unicode code) was not included.

Two solutions:

1. Do the Unicode de-escaping in the query parser before calling 
getWildcardQuery.
2. Support Unicode de-escaping in WildcardQuery.

A suffix wildcard does not exhibit this problem because full de-escaping is 
performed in the query parser before calling getPrefixQuery.

My test case, added at the beginning of 
TestExtendedDismaxParser.testFocusQueryParser:

{code}

assertQ(expected doc is missing (using escaped edismax w/field),
req(q, t_special:literal\\:\\u0063olo*n, 
defType, edismax),
//doc[1]/str[@name='id'][.='46']); 

{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4382) Unicode escape no longer works for non-prefix wildcard terms

2012-09-12 Thread Jack Krupansky (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated LUCENE-4382:
---

Description: 
LUCENE-588 added support for escaping of wildcard characters, but when the 
de-escaping logic was pushed down from the query parser (QueryParserBase) into 
WildcardQuery, support for Unicode escaping (backslash, u, and the four-digit 
hex Unicode code) was not included.

Two solutions:

1. Do the Unicode de-escaping in the query parser before calling 
getWildcardQuery.
2. Support Unicode de-escaping in WildcardQuery.

A suffix wildcard does not exhibit this problem because full de-escaping is 
performed in the query parser before calling getPrefixQuery.

My test case, added at the beginning of 
TestExtendedDismaxParser.testFocusQueryParser:

{code}

assertQ(expected doc is missing (using escaped edismax w/field),
req(q, t_special:literal\\:\\u0063olo*n, 
defType, edismax),
//doc[1]/str[@name='id'][.='46']); 

{code}

Note: That test case was only used to debug into WildcardQuery to see that the 
Unicode escape was not processed correctly. It fails in all cases, but that's 
because of how the field type is analyzed.

Here is a Lucene-level test case that can also be debugged to see that 
WildcardQuery is not processing the Unicode escape properly. I added it at the 
start of TestMultiAnalyzer.testMultiAnalyzer:

{code}
assertEquals(literal\\:\\u0063olo*n, 
qp.parse(literal\\:\\u0063olo*n).toString());
{code}

Note: This case will always run correctly since it is only checking the input 
pattern string for WildcardQuery and not how the de-escaping was performed 
within WildcardQuery.


  was:
LUCENE-588 added support for escaping of wildcard characters, but when the 
de-escaping logic was pushed down from the query parser (QueryParserBase) into 
WildcardQuery, support for Unicode escaping (backslash, u, and the four-digit 
hex Unicode code) was not included.

Two solutions:

1. Do the Unicode de-escaping in the query parser before calling 
getWildcardQuery.
2. Support Unicode de-escaping in WildcardQuery.

A suffix wildcard does not exhibit this problem because full de-escaping is 
performed in the query parser before calling getPrefixQuery.

My test case, added at the beginning of 
TestExtendedDismaxParser.testFocusQueryParser:

{code}

assertQ(expected doc is missing (using escaped edismax w/field),
req(q, t_special:literal\\:\\u0063olo*n, 
defType, edismax),
//doc[1]/str[@name='id'][.='46']); 

{code}



 Unicode escape no longer works for non-prefix wildcard terms
 

 Key: LUCENE-4382
 URL: https://issues.apache.org/jira/browse/LUCENE-4382
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.0-BETA
Reporter: Jack Krupansky
 Fix For: 4.0


 LUCENE-588 added support for escaping of wildcard characters, but when the 
 de-escaping logic was pushed down from the query parser (QueryParserBase) 
 into WildcardQuery, support for Unicode escaping (backslash, u, and the 
 four-digit hex Unicode code) was not included.
 Two solutions:
 1. Do the Unicode de-escaping in the query parser before calling 
 getWildcardQuery.
 2. Support Unicode de-escaping in WildcardQuery.
 A suffix wildcard does not exhibit this problem because full de-escaping is 
 performed in the query parser before calling getPrefixQuery.
 My test case, added at the beginning of 
 TestExtendedDismaxParser.testFocusQueryParser:
 {code}
 assertQ(expected doc is missing (using escaped edismax w/field),
 req(q, t_special:literal\\:\\u0063olo*n, 
 defType, edismax),
 //doc[1]/str[@name='id'][.='46']); 
 {code}
 Note: That test case was only used to debug into WildcardQuery to see that 
 the Unicode escape was not processed correctly. It fails in all cases, but 
 that's because of how the field type is analyzed.
 Here is a Lucene-level test case that can also be debugged to see that 
 WildcardQuery is not processing the Unicode escape properly. I added it at 
 the start of TestMultiAnalyzer.testMultiAnalyzer:
 {code}
 assertEquals(literal\\:\\u0063olo*n, 
 qp.parse(literal\\:\\u0063olo*n).toString());
 {code}
 Note: This case will always run correctly since it is only checking the input 
 pattern string for WildcardQuery and not how the de-escaping was performed 
 within WildcardQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:

[jira] [Updated] (LUCENE-4382) Unicode escape no longer works for non-suffix-only wildcard terms

2012-09-12 Thread Jack Krupansky (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated LUCENE-4382:
---

Description: 
LUCENE-588 added support for escaping of wildcard characters, but when the 
de-escaping logic was pushed down from the query parser (QueryParserBase) into 
WildcardQuery, support for Unicode escaping (backslash, u, and the four-digit 
hex Unicode code) was not included.

Two solutions:

1. Do the Unicode de-escaping in the query parser before calling 
getWildcardQuery.
2. Support Unicode de-escaping in WildcardQuery.

A suffix-only wildcard does not exhibit this problem because full de-escaping 
is performed in the query parser before calling getPrefixQuery.

My test case, added at the beginning of 
TestExtendedDismaxParser.testFocusQueryParser:

{code}

assertQ(expected doc is missing (using escaped edismax w/field),
req(q, t_special:literal\\:\\u0063olo*n, 
defType, edismax),
//doc[1]/str[@name='id'][.='46']); 

{code}

Note: That test case was only used to debug into WildcardQuery to see that the 
Unicode escape was not processed correctly. It fails in all cases, but that's 
because of how the field type is analyzed.

Here is a Lucene-level test case that can also be debugged to see that 
WildcardQuery is not processing the Unicode escape properly. I added it at the 
start of TestMultiAnalyzer.testMultiAnalyzer:

{code}
assertEquals(literal\\:\\u0063olo*n, 
qp.parse(literal\\:\\u0063olo*n).toString());
{code}

Note: This case will always run correctly since it is only checking the input 
pattern string for WildcardQuery and not how the de-escaping was performed 
within WildcardQuery.


  was:
LUCENE-588 added support for escaping of wildcard characters, but when the 
de-escaping logic was pushed down from the query parser (QueryParserBase) into 
WildcardQuery, support for Unicode escaping (backslash, u, and the four-digit 
hex Unicode code) was not included.

Two solutions:

1. Do the Unicode de-escaping in the query parser before calling 
getWildcardQuery.
2. Support Unicode de-escaping in WildcardQuery.

A suffix wildcard does not exhibit this problem because full de-escaping is 
performed in the query parser before calling getPrefixQuery.

My test case, added at the beginning of 
TestExtendedDismaxParser.testFocusQueryParser:

{code}

assertQ(expected doc is missing (using escaped edismax w/field),
req(q, t_special:literal\\:\\u0063olo*n, 
defType, edismax),
//doc[1]/str[@name='id'][.='46']); 

{code}

Note: That test case was only used to debug into WildcardQuery to see that the 
Unicode escape was not processed correctly. It fails in all cases, but that's 
because of how the field type is analyzed.

Here is a Lucene-level test case that can also be debugged to see that 
WildcardQuery is not processing the Unicode escape properly. I added it at the 
start of TestMultiAnalyzer.testMultiAnalyzer:

{code}
assertEquals(literal\\:\\u0063olo*n, 
qp.parse(literal\\:\\u0063olo*n).toString());
{code}

Note: This case will always run correctly since it is only checking the input 
pattern string for WildcardQuery and not how the de-escaping was performed 
within WildcardQuery.


Summary: Unicode escape no longer works for non-suffix-only wildcard 
terms  (was: Unicode escape no longer works for non-prefix wildcard terms)

 Unicode escape no longer works for non-suffix-only wildcard terms
 -

 Key: LUCENE-4382
 URL: https://issues.apache.org/jira/browse/LUCENE-4382
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.0-BETA
Reporter: Jack Krupansky
 Fix For: 4.0


 LUCENE-588 added support for escaping of wildcard characters, but when the 
 de-escaping logic was pushed down from the query parser (QueryParserBase) 
 into WildcardQuery, support for Unicode escaping (backslash, u, and the 
 four-digit hex Unicode code) was not included.
 Two solutions:
 1. Do the Unicode de-escaping in the query parser before calling 
 getWildcardQuery.
 2. Support Unicode de-escaping in WildcardQuery.
 A suffix-only wildcard does not exhibit this problem because full de-escaping 
 is performed in the query parser before calling getPrefixQuery.
 My test case, added at the beginning of 
 TestExtendedDismaxParser.testFocusQueryParser:
 {code}
 assertQ(expected doc is missing (using escaped edismax w/field),
 req(q, t_special:literal\\:\\u0063olo*n, 
 defType, edismax),
 //doc[1]/str[@name='id'][.='46']); 
 {code}
 Note: That test case was only used to debug into WildcardQuery to see that 
 the Unicode escape was not processed correctly. It fails in all cases, but 
 that's because of how the field type is analyzed.

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

2012-09-12 Thread Naomi Dushay (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454533#comment-13454533
 ] 

Naomi Dushay commented on SOLR-3589:


I may have stumbled into something.   Try setting q.op explicitly.

(baseurl)/select?q=fire-fly

gives me a lot more results than

(baseurl)/select?q=fire-flyq.op=AND


oddly,   q.op=OR   gives me the same results as setting it to AND.


Why did I stumble into this?

from 
http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

In Solr 1.4 and prior, you should basically set mm=0 if you want the 
equivilent of q.op=OR, and mm=100% if you want the equivilent of q.op=AND. In 
3.x and trunk the default value of mm is dictated by the q.op param (q.op=AND 
= mm=100%; q.op=OR = mm=0%). Keep in mind the default operator is effected by 
your schema.xml solrQueryParser defaultOperator=xxx/ entry. In older 
versions of Solr the default value is 100% (all clauses must match)

I have q.op set in my schema, thus:

solrQueryParser defaultOperator=AND /

but when I use the q.op parameter, I experience something different.  Wild! 

Does this give us any insights?

 Edismax parser does not honor mm parameter if analyzer splits a token
 -

 Key: SOLR-3589
 URL: https://issues.apache.org/jira/browse/SOLR-3589
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.6, 4.0-BETA
Reporter: Tom Burton-West
 Attachments: testSolr3589.xml.gz, testSolr3589.xml.gz


 With edismax mm set to 100%  if one of the tokens is split into two tokens by 
 the analyzer chain (i.e. fire-fly  = fire fly), the mm parameter is 
 ignored and the equivalent of  OR query for fire OR fly is produced.
 This is particularly a problem for languages that do not use white space to 
 separate words such as Chinese or Japenese.
 See these messages for more discussion:
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
 http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
 http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4208) Spatial distance relevancy should use score of 1/distance

2012-09-12 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454581#comment-13454581
]

Chris Male commented on LUCENE-4208:

bq. TwoDoubles is getting overhauled to support the dateline and any query
shape--should probably go into another issue.

Yes please!

Spatial distance relevancy should use score of 1/distance
-

Key: LUCENE-4208
URL: https://issues.apache.org/jira/browse/LUCENE-4208
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Fix For: 4.0

Attachments:
LUCENE-4208_makeQuery_return_ConstantScoreQuery_and_remake_TwoDoublesStrategy.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/ibm-j9-jdk7) - Build # 1061 - Failure!

2012-09-12 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/1061/
Java: 64bit/ibm-j9-jdk7 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

1 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.lucene.index.TestTypePromotion

Error Message:
Clean up static fields (in @AfterClass?), your test seems to hang on to 
approximately 12,023,120 bytes (threshold is 10,485,760):   - 2,409,168 bytes, 
public static org.junit.rules.TestRule 
org.apache.lucene.util.LuceneTestCase.classRules   - 2,403,728 bytes, private 
static java.util.EnumSet org.apache.lucene.index.TestTypePromotion.SORTED_BYTES 
  - 2,403,568 bytes, private static java.util.EnumSet 
org.apache.lucene.index.TestTypePromotion.UNSORTED_BYTES   - 2,403,408 bytes, 
private static java.util.EnumSet 
org.apache.lucene.index.TestTypePromotion.FLOATS   - 2,403,248 bytes, private 
static java.util.EnumSet org.apache.lucene.index.TestTypePromotion.INTEGERS

Stack Trace:
junit.framework.AssertionFailedError: Clean up static fields (in @AfterClass?), 
your test seems to hang on to approximately 12,023,120 bytes (threshold is 
10,485,760):
  - 2,409,168 bytes, public static org.junit.rules.TestRule 
org.apache.lucene.util.LuceneTestCase.classRules
  - 2,403,728 bytes, private static java.util.EnumSet 
org.apache.lucene.index.TestTypePromotion.SORTED_BYTES
  - 2,403,568 bytes, private static java.util.EnumSet 
org.apache.lucene.index.TestTypePromotion.UNSORTED_BYTES
  - 2,403,408 bytes, private static java.util.EnumSet 
org.apache.lucene.index.TestTypePromotion.FLOATS
  - 2,403,248 bytes, private static java.util.EnumSet 
org.apache.lucene.index.TestTypePromotion.INTEGERS
at __randomizedtesting.SeedInfo.seed([C2434F9AAE110129]:0)
at 
com.carrotsearch.randomizedtesting.rules.StaticFieldsInvariantRule$1.afterAlways(StaticFieldsInvariantRule.java:119)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:777)




Build Log:
[...truncated 787 lines...]
[junit4:junit4] Suite: org.apache.lucene.index.TestTypePromotion
[junit4:junit4]   2 NOTE: test params are: codec=Lucene40: 
{id=PostingsFormat(name=Lucene40WithOrds)}, sim=DefaultSimilarity, 
locale=kn_IN, timezone=Atlantic/Reykjavik
[junit4:junit4]   2 NOTE: Linux 3.2.0-29-generic amd64/IBM Corporation 1.7.0 
(64-bit)/cpus=8,threads=1,free=44841080,total=536805376
[junit4:junit4]   2 NOTE: All tests run in this JVM: 
[TestIndexWriterMergePolicy, TestPostingsOffsets, TestSizeBoundedForceMerge, 
TestToken, TestThreadedForceMerge, TestParallelTermEnum, TestRegexpRandom2, 
TestTermsEnum, TestPerFieldPostingsFormat, TestNumericRangeQuery32, 
TestSpanMultiTermQueryWrapper, TestNRTCachingDirectory, TestVersionComparator, 
TestNoMergePolicy, TestNRTManager, TestLockFactory, TestDuelingCodecs, 
TestIndexWriterDelete, TestSpansAdvanced, TestNorms, TestCopyBytes, 
TestBooleanMinShouldMatch, TestIOUtils, TestCachingTokenFilter, 
TestParallelAtomicReader, TestDateSort, TestIsCurrent, TestTopDocsCollector, 
TestPrefixQuery, TestDocument, TestLookaheadTokenFilter, TestTransactions, 
TestIndexWriterUnicode, TestPrefixCodedTerms, TestQueryWrapperFilter, 
TestMultiValuedNumericRangeQuery, TestFilteredSearch, TestTopDocsMerge, 
TestFSTs, TestConstantScoreQuery, TestPrefixRandom, TestTermRangeQuery, 
TestIndexWriterOnDiskFull, TestSentinelIntSet, Test2BTerms, TestBytesRefHash, 
TestStressIndexing, TestComplexExplanationsOfNonMatches, TestDocumentWriter, 
TestIndexInput, TestTransactionRollback, TestMultiThreadTermVectors, 
TestSpanFirstQuery, TestCheckIndex, TestBooleanQueryVisitSubscorers, 
TestIndexWriterLockRelease, TestFuzzyQuery, TestExplanations, TestBasics, 
TestMockDirectoryWrapper, TestTermVectors, TestSameTokenSamePosition, 
TestLucene40PostingsReader, TestSimilarityBase, TestDateTools, TestPackedInts, 
TestVersion, TestSpansAdvanced2, TestCharTermAttributeImpl, 
TestForceMergeForever, TestFilterIterator, TestPositionIncrement, 
TestDocumentsWriterStallControl, TestConcurrentMergeScheduler, 
TestRamUsageEstimatorOnWildAnimals, TestHugeRamFile, TestStressAdvance, 
TestMultiPhraseQuery, TestAtomicUpdate, TestIndexWriterMerging, TestOpenBitSet, 
TestSort, TestSearchWithThreads, TestLongPostings, TestIndexWriterCommit,

[jira] [Updated] (SOLR-3815) add hash range to shard


 [ 
https://issues.apache.org/jira/browse/SOLR-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-3815:
---

Attachment: SOLR-3815_clusterState_immutable.patch

Folks, while working to add the replicas level to shards (to make room for 
other properties), I noticed that the Overseer.updateSlice() method changed the 
existing ClusterState (which is advertised as being immutable).  I re-wrote the 
method to be much shorter, and immutable with respect to the existing 
ClusterState, and started getting a test failure.

I eventually tried just adding back the part of the code that erroneously 
modified the existing ClusterState, and the test passed again (see the nocommit 
block in Overseer).

Any idea what's going on?

 add hash range to shard
 ---

 Key: SOLR-3815
 URL: https://issues.apache.org/jira/browse/SOLR-3815
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Attachments: SOLR-3815_addrange.patch, 
 SOLR-3815_clusterState_immutable.patch, SOLR-3815.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-3815) add hash range to shard


[ 
https://issues.apache.org/jira/browse/SOLR-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454608#comment-13454608
 ] 

Yonik Seeley edited comment on SOLR-3815 at 9/13/12 2:38 PM:
-

Folks, while working to add the replicas level to shards (to make room for 
other properties), I noticed that the Overseer.updateSlice() method changed the 
existing ClusterState (which is advertised as being immutable).  I re-wrote the 
method to be much shorter, and immutable with respect to the existing 
ClusterState, and started getting a test failure.

I eventually tried just adding back the part of the code that erroneously 
modified the existing ClusterState, and the test passed again (see the nocommit 
block in Overseer).

Any idea what's going on?

edit: the test that failed was LeaderElectionIntegrationTest.  Not sure if it 
caused other failures.

  was (Author: ysee...@gmail.com):
Folks, while working to add the replicas level to shards (to make room 
for other properties), I noticed that the Overseer.updateSlice() method changed 
the existing ClusterState (which is advertised as being immutable).  I re-wrote 
the method to be much shorter, and immutable with respect to the existing 
ClusterState, and started getting a test failure.

I eventually tried just adding back the part of the code that erroneously 
modified the existing ClusterState, and the test passed again (see the nocommit 
block in Overseer).

Any idea what's going on?
  
 add hash range to shard
 ---

 Key: SOLR-3815
 URL: https://issues.apache.org/jira/browse/SOLR-3815
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Attachments: SOLR-3815_addrange.patch, 
 SOLR-3815_clusterState_immutable.patch, SOLR-3815.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-3815) add hash range to shard


[ 
https://issues.apache.org/jira/browse/SOLR-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454608#comment-13454608
 ] 

Yonik Seeley edited comment on SOLR-3815 at 9/13/12 2:49 PM:
-

Folks, while working to add the replicas level to shards (to make room for 
other properties), I noticed that the Overseer.updateSlice() method changed the 
existing ClusterState (which is advertised as being immutable).  I re-wrote the 
method to be much shorter, and immutable with respect to the existing 
ClusterState, and started getting a test failure.

I eventually tried just adding back the part of the code that erroneously 
modified the existing ClusterState, and the test passed again (see the nocommit 
block in Overseer).

Any idea what's going on?

edit: the test that failed was LeaderElectionIntegrationTest.  Not sure if it 
caused other failures.

edit: in Overseer.run() we have ClusterState clusterState = 
reader.getClusterState(); and that is the state that is accidentally being 
modified (that accidentally makes things work).  I assume this is OK, as the 
reader is supposed to update it's state via zookeeper - which means there is 
perhaps something wrong with reader.updateClusterState(true)?

  was (Author: ysee...@gmail.com):
Folks, while working to add the replicas level to shards (to make room 
for other properties), I noticed that the Overseer.updateSlice() method changed 
the existing ClusterState (which is advertised as being immutable).  I re-wrote 
the method to be much shorter, and immutable with respect to the existing 
ClusterState, and started getting a test failure.

I eventually tried just adding back the part of the code that erroneously 
modified the existing ClusterState, and the test passed again (see the nocommit 
block in Overseer).

Any idea what's going on?

edit: the test that failed was LeaderElectionIntegrationTest.  Not sure if it 
caused other failures.
  
 add hash range to shard
 ---

 Key: SOLR-3815
 URL: https://issues.apache.org/jira/browse/SOLR-3815
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Attachments: SOLR-3815_addrange.patch, 
 SOLR-3815_clusterState_immutable.patch, SOLR-3815.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-3815) add hash range to shard