date:20091021

[jira] Commented: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768118#action_12768118
 ] 

Uwe Schindler commented on LUCENE-1998:
---

I tested it here, we have no backwards problem (at least with "normal" usage). 
The dynamic linker of Java when running old Java 1.4 code against the new enum 
classes has no problem with the replaced superclass: Old code compiled against 
Field.Store.XXX against lucene-core-2.9.jar with superclass Parameter works 
perfectly with the new lucene-core-3.0.jar. This works because we only use the 
parameter class as a type safe enumeration an did not call any methods (only 
maybe toString()) of it. So the linker has no problem.

I would simply apply this ptach to trunk. I would also remove the Parameter 
class completely, as it breaks no code (only if somebody has used that class 
for own enums). Maybe we should deprecate Parameter in 2.9.1 and say that it 
will be removed in 3.0 as this version uses Java5's enum. But it also does not 
hurt if we keep it and mark it deprecated as in the patch.

To your patch: I only added the license header back in the Version class. It 
must be there.

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1998:
--

Attachment: LUCENE-1998_enum.patch

Patch with license header restored.

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-1998:
-

Assignee: Uwe Schindler

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1998:
--

Attachment: LUCENE-1998_enum.patch

Some fine tuning: You defined package protected abstract methods, but made them 
public in the enum constant. Changed to all-public. This was also a 
backwards-break in contrib/queryParser.

I think this is ready to commit.

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768132#action_12768132
 ] 

Uwe Schindler commented on LUCENE-1998:
---

Some samll problem that may appear in future: We had renamed some enum 
constants in 2.9 (TOKENIZED -> ANALYZED). No problems now, because deprec 
constants removed.

If we want to do the same in future, we can do it the same way, but need to do 
a hack (because it is not officially supprted by Java 5):
[http://forums.sun.com/thread.jspa?threadID=5137742]

So it works, but not with switch statements. Just as a comment. But in my 
opinion, renaming enum constants is a bad thing... 

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2000) Use covariant clone() return types

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2000:
--

Attachment: LUCENE-2000-clone_covariance.patch

> Use covariant clone() return types
> --
>
> Key: LUCENE-2000
> URL: https://issues.apache.org/jira/browse/LUCENE-2000
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Uwe Schindler
> Attachments: LUCENE-2000-clone_covariance.patch
>
>
> *Paul Cown wrote in LUCENE-1257:*
> OK, thought I'd jump in and help out here with one of my Java 5 favourites. 
> Haven't seen anyone discuss this, and don't believe any of the patches 
> address this, so thought I'd throw a patch out there (against SVN HEAD @ 
> revision 827821) which uses Java 5 covariant return types for (almost) all of 
> the Object#clone() implementations in core. 
> i.e. this:
> public Object clone() {
> changes to:
> public SpanNotQuery clone() {
> which lets us get rid of a whole bunch of now-unnecessary casts, so e.g.
> if (clone == null) clone = (SpanNotQuery) this.clone();
> becomes
> if (clone == null) clone = this.clone();
> Almost everything has been done and all downcasts removed, in core, with the 
> exception of
> Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() 
> of a SpanQuery to a SpanQuery - this can't be made covariant without 
> declaring "abstract SpanQuery clone()" in SpanQuery itself, which breaks 
> those SpanQuerys that don't declare their own clone() 
> Some IndexReaders, e.g. DirectoryReader - we can't be more specific than 
> changing .clone() to return IndexReader, because it returns the result of 
> IndexReader.clone(boolean). We could use covariant types for THAT, which 
> would work fine, but that didn't follow the pattern of the others so that 
> could be a later commit. 
> Two changes were also made in contrib/, where not making the changes would 
> have broken code by trying to widen IndexInput#clone() back out to returning 
> Object, which is not permitted. contrib/ was otherwise left untouched.
> Let me know what you think, or if you have any other questions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2000) Use covariant clone() return types

2009-10-21 Thread Uwe Schindler (JIRA)

Use covariant clone() return types
--

 Key: LUCENE-2000
 URL: https://issues.apache.org/jira/browse/LUCENE-2000
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.0
Reporter: Uwe Schindler
 Attachments: LUCENE-2000-clone_covariance.patch

*Paul Cown wrote in LUCENE-1257:*

OK, thought I'd jump in and help out here with one of my Java 5 favourites. 
Haven't seen anyone discuss this, and don't believe any of the patches address 
this, so thought I'd throw a patch out there (against SVN HEAD @ revision 
827821) which uses Java 5 covariant return types for (almost) all of the 
Object#clone() implementations in core. 
i.e. this:

public Object clone() {
changes to:
public SpanNotQuery clone() {

which lets us get rid of a whole bunch of now-unnecessary casts, so e.g.

if (clone == null) clone = (SpanNotQuery) this.clone();
becomes
if (clone == null) clone = this.clone();

Almost everything has been done and all downcasts removed, in core, with the 
exception of

Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of 
a SpanQuery to a SpanQuery - this can't be made covariant without declaring 
"abstract SpanQuery clone()" in SpanQuery itself, which breaks those SpanQuerys 
that don't declare their own clone() 
Some IndexReaders, e.g. DirectoryReader - we can't be more specific than 
changing .clone() to return IndexReader, because it returns the result of 
IndexReader.clone(boolean). We could use covariant types for THAT, which would 
work fine, but that didn't follow the pattern of the others so that could be a 
later commit. 
Two changes were also made in contrib/, where not making the changes would have 
broken code by trying to widen IndexInput#clone() back out to returning Object, 
which is not permitted. contrib/ was otherwise left untouched.

Let me know what you think, or if you have any other questions.

[ Show » ] Paul Cowan added a comment - 21/Oct/09 03:01 AM OK, thought I'd jump 
in and help out here with one of my Java 5 favourites. Haven't seen anyone 
discuss this, and don't believe any of the patches address this, so thought I'd 
throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 
covariant return types for (almost) all of the Object#clone() implementations 
in core. i.e. this: public Object clone() { changes to: public SpanNotQuery 
clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so 
e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone 
== null) clone = this.clone(); Almost everything has been done and all 
downcasts removed, in core, with the exception of 
Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of 
a SpanQuery to a SpanQuery - this can't be made covariant without declaring 
"abstract SpanQuery clone()" in SpanQuery itself, which breaks those SpanQuerys 
that don't declare their own clone() 
Some IndexReaders, e.g. DirectoryReader - we can't be more specific than 
changing .clone() to return IndexReader, because it returns the result of 
IndexReader.clone(boolean). We could use covariant types for THAT, which would 
work fine, but that didn't follow the pattern of the others so that could be a 
later commit. 
Two changes were also made in contrib/, where not making the changes would have 
broken code by trying to widen IndexInput#clone() back out to returning Object, 
which is not permitted. contrib/ was otherwise left untouched. Let me know what 
you think, or if you have any other questions. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2000) Use covariant clone() return types

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2000:
--

Description: 
*Paul Cown wrote in LUCENE-1257:*

OK, thought I'd jump in and help out here with one of my Java 5 favourites. 
Haven't seen anyone discuss this, and don't believe any of the patches address 
this, so thought I'd throw a patch out there (against SVN HEAD @ revision 
827821) which uses Java 5 covariant return types for (almost) all of the 
Object#clone() implementations in core. 
i.e. this:

public Object clone() {
changes to:
public SpanNotQuery clone() {

which lets us get rid of a whole bunch of now-unnecessary casts, so e.g.

if (clone == null) clone = (SpanNotQuery) this.clone();
becomes
if (clone == null) clone = this.clone();

Almost everything has been done and all downcasts removed, in core, with the 
exception of

Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of 
a SpanQuery to a SpanQuery - this can't be made covariant without declaring 
"abstract SpanQuery clone()" in SpanQuery itself, which breaks those SpanQuerys 
that don't declare their own clone() 
Some IndexReaders, e.g. DirectoryReader - we can't be more specific than 
changing .clone() to return IndexReader, because it returns the result of 
IndexReader.clone(boolean). We could use covariant types for THAT, which would 
work fine, but that didn't follow the pattern of the others so that could be a 
later commit. 
Two changes were also made in contrib/, where not making the changes would have 
broken code by trying to widen IndexInput#clone() back out to returning Object, 
which is not permitted. contrib/ was otherwise left untouched.

Let me know what you think, or if you have any other questions.

  was:
*Paul Cown wrote in LUCENE-1257:*

OK, thought I'd jump in and help out here with one of my Java 5 favourites. 
Haven't seen anyone discuss this, and don't believe any of the patches address 
this, so thought I'd throw a patch out there (against SVN HEAD @ revision 
827821) which uses Java 5 covariant return types for (almost) all of the 
Object#clone() implementations in core. 
i.e. this:

public Object clone() {
changes to:
public SpanNotQuery clone() {

which lets us get rid of a whole bunch of now-unnecessary casts, so e.g.

if (clone == null) clone = (SpanNotQuery) this.clone();
becomes
if (clone == null) clone = this.clone();

Almost everything has been done and all downcasts removed, in core, with the 
exception of

Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of 
a SpanQuery to a SpanQuery - this can't be made covariant without declaring 
"abstract SpanQuery clone()" in SpanQuery itself, which breaks those SpanQuerys 
that don't declare their own clone() 
Some IndexReaders, e.g. DirectoryReader - we can't be more specific than 
changing .clone() to return IndexReader, because it returns the result of 
IndexReader.clone(boolean). We could use covariant types for THAT, which would 
work fine, but that didn't follow the pattern of the others so that could be a 
later commit. 
Two changes were also made in contrib/, where not making the changes would have 
broken code by trying to widen IndexInput#clone() back out to returning Object, 
which is not permitted. contrib/ was otherwise left untouched.

Let me know what you think, or if you have any other questions.

[ Show » ] Paul Cowan added a comment - 21/Oct/09 03:01 AM OK, thought I'd jump 
in and help out here with one of my Java 5 favourites. Haven't seen anyone 
discuss this, and don't believe any of the patches address this, so thought I'd 
throw a patch out there (against SVN HEAD @ revision 827821) which uses Java 5 
covariant return types for (almost) all of the Object#clone() implementations 
in core. i.e. this: public Object clone() { changes to: public SpanNotQuery 
clone() { which lets us get rid of a whole bunch of now-unnecessary casts, so 
e.g. if (clone == null) clone = (SpanNotQuery) this.clone(); becomes if (clone 
== null) clone = this.clone(); Almost everything has been done and all 
downcasts removed, in core, with the exception of 
Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of 
a SpanQuery to a SpanQuery - this can't be made covariant without declaring 
"abstract SpanQuery clone()" in SpanQuery itself, which breaks those SpanQuerys 
that don't declare their own clone() 
Some IndexReaders, e.g. DirectoryReader - we can't be more specific than 
changing .clone() to return IndexReader, because it returns the result of 
IndexReader.clone(boolean). We could use covariant types for THAT, which would 
work fine, but that didn't follow the pattern of the others so that could be a 
later commit. 
Two changes were also made in contrib/, where not making the changes would have 
broken code by trying to widen IndexInput#clone() back out to returning Object, 
which is

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1257:
--

Attachment: (was: LUCENE-1257-clone_covariance.patch)

> Port to Java5
> -
>
> Key: LUCENE-1257
> URL: https://issues.apache.org/jira/browse/LUCENE-1257
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis, Examples, Index, Other, Query/Scoring, 
> QueryParser, Search, Store, Term Vectors
>Affects Versions: 3.0
>Reporter: Cédric Champeau
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: instantiated_fieldable.patch, 
> LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
> LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
> LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
> LUCENE-1257-CompoundFileReaderWriter.patch, 
> LUCENE-1257-ConcurrentMergeScheduler.patch, 
> LUCENE-1257-DirectoryReader.patch, 
> LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
> LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
> LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
> LUCENE-1257-IndexDeleter.patch, 
> LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
> LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
> LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
> LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
> LUCENE-1257_BooleanFilter_Generics.patch, 
> LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_javacc_upgrade.patch, 
> LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, 
> LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, 
> LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
> LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
> LUCENE-1257_o_a_l_search_spans.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, 
> LUCENE-1257_unnecessary_casts.patch, lucene1257surround1.patch, 
> lucene1257surround1.patch, shinglematrixfilter_generified.patch
>
>
> For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
> Java 5 migration had been planned for 2.1 someday in the past, but don't know 
> when it is planned now. This patch against the trunk includes :
> - most obvious generics usage (there are tons of usages of sets, ... Those 
> which are commonly used have been generified)
> - PriorityQueue generification
> - replacement of indexed for loops with for each constructs
> - removal of unnececessary unboxing
> The code is to my opinion much more readable with those features (you 
> actually *know* what is stored in collections reading the code, without the 
> need to lookup for field definitions everytime) and it simplifies many 
> algorithms.
> Note that this patch also includes an interface for the Query class. This has 
> been done for my company's needs for building custom Query classes which add 
> some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
> casts. I know this introduction is not wanted by the team, but it really 
> makes our developments easier to maintain. If you don't want to use this, 
> replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1257:
--

Comment: was deleted

(was: OK, thought I'd jump in and help out here with one of my Java 5 
favourites. Haven't seen anyone discuss this, and don't believe any of the 
patches address this, so thought I'd throw a patch out there (against SVN HEAD 
@ revision 827821) which uses Java 5 covariant return types for (almost) all of 
the Object#clone() implementations in core.

i.e. this:

  public Object clone() {
changes to:
  public SpanNotQuery clone() {

which lets us get rid of a whole bunch of now-unnecessary casts, so e.g.

  if (clone == null) clone = (SpanNotQuery) this.clone();
becomes
  if (clone == null) clone = this.clone();

Almost everything has been done and all downcasts removed, in core, with the 
exception of

* Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() 
of a SpanQuery to a SpanQuery -- this can't be made covariant without declaring 
"abstract SpanQuery clone()" in SpanQuery itself, which breaks those SpanQuerys 
that don't declare their own clone()
* Some IndexReaders, e.g. DirectoryReader -- we can't be more specific than 
changing .clone() to return IndexReader, because it returns the result of 
IndexReader.clone(boolean). We could use covariant types for THAT, which would 
work fine, but that didn't follow the pattern of the others so that could be a 
later commit.

Two changes were also made in contrib/, where not making the changes would have 
broken code by trying to widen IndexInput#clone() back out to returning Object, 
which is not permitted. contrib/ was otherwise left untouched.

Let me know what you think, or if you have any other questions.
)

> Port to Java5
> -
>
> Key: LUCENE-1257
> URL: https://issues.apache.org/jira/browse/LUCENE-1257
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis, Examples, Index, Other, Query/Scoring, 
> QueryParser, Search, Store, Term Vectors
>Affects Versions: 3.0
>Reporter: Cédric Champeau
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: instantiated_fieldable.patch, 
> LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
> LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
> LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
> LUCENE-1257-CompoundFileReaderWriter.patch, 
> LUCENE-1257-ConcurrentMergeScheduler.patch, 
> LUCENE-1257-DirectoryReader.patch, 
> LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
> LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
> LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
> LUCENE-1257-IndexDeleter.patch, 
> LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
> LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
> LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
> LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
> LUCENE-1257_BooleanFilter_Generics.patch, 
> LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_javacc_upgrade.patch, 
> LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, 
> LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, 
> LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
> LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
> LUCENE-1257_o_a_l_search_spans.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, 
> LUCENE-1257_unnecessary_casts.patch, lucene1257surround1.patch, 
> lucene1257surround1.patch, shinglematrixfilter_generified.patch
>
>
> For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
> Java 5 migration had been planned for 2.1 someday in the past, but don't know 
> when it is planned now. This patch against the trunk includes :
> - most obvious generics usage (there are tons of usages of sets, ... Those 
> which are commonly used have been generified)
> - PriorityQueue generification
> - replacement of indexed for loops with for each constructs
> - removal of unnececessary unboxing
> The code is to my opinion much more readable with those features (you 
> actually *know* what is stored in collections reading the code, without the 
> need to lookup for field definitions everytime) and it simplifies many 
> algorithms.
> Note that this patch also in

[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768136#action_12768136
 ] 

Uwe Schindler commented on LUCENE-1257:
---

Created a new issue out of clone invariance patch: LUCENE-2000

> Port to Java5
> -
>
> Key: LUCENE-1257
> URL: https://issues.apache.org/jira/browse/LUCENE-1257
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis, Examples, Index, Other, Query/Scoring, 
> QueryParser, Search, Store, Term Vectors
>Affects Versions: 3.0
>Reporter: Cédric Champeau
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: instantiated_fieldable.patch, 
> LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
> LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
> LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
> LUCENE-1257-CompoundFileReaderWriter.patch, 
> LUCENE-1257-ConcurrentMergeScheduler.patch, 
> LUCENE-1257-DirectoryReader.patch, 
> LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
> LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
> LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
> LUCENE-1257-IndexDeleter.patch, 
> LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
> LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
> LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
> LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
> LUCENE-1257_BooleanFilter_Generics.patch, 
> LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_javacc_upgrade.patch, 
> LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, 
> LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, 
> LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
> LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
> LUCENE-1257_o_a_l_search_spans.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, 
> LUCENE-1257_unnecessary_casts.patch, lucene1257surround1.patch, 
> lucene1257surround1.patch, shinglematrixfilter_generified.patch
>
>
> For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
> Java 5 migration had been planned for 2.1 someday in the past, but don't know 
> when it is planned now. This patch against the trunk includes :
> - most obvious generics usage (there are tons of usages of sets, ... Those 
> which are commonly used have been generified)
> - PriorityQueue generification
> - replacement of indexed for loops with for each constructs
> - removal of unnececessary unboxing
> The code is to my opinion much more readable with those features (you 
> actually *know* what is stored in collections reading the code, without the 
> need to lookup for field definitions everytime) and it simplifies many 
> algorithms.
> Note that this patch also includes an interface for the Query class. This has 
> been done for my company's needs for building custom Query classes which add 
> some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
> casts. I know this introduction is not wanted by the team, but it really 
> makes our developments easier to maintain. If you don't want to use this, 
> replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2000) Use covariant clone() return types

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2000:
--

Description: 
*Paul Cowan wrote in LUCENE-1257:*

OK, thought I'd jump in and help out here with one of my Java 5 favourites. 
Haven't seen anyone discuss this, and don't believe any of the patches address 
this, so thought I'd throw a patch out there (against SVN HEAD @ revision 
827821) which uses Java 5 covariant return types for (almost) all of the 
Object#clone() implementations in core. 
i.e. this:

public Object clone() {
changes to:
public SpanNotQuery clone() {

which lets us get rid of a whole bunch of now-unnecessary casts, so e.g.

if (clone == null) clone = (SpanNotQuery) this.clone();
becomes
if (clone == null) clone = this.clone();

Almost everything has been done and all downcasts removed, in core, with the 
exception of

Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of 
a SpanQuery to a SpanQuery - this can't be made covariant without declaring 
"abstract SpanQuery clone()" in SpanQuery itself, which breaks those SpanQuerys 
that don't declare their own clone() 
Some IndexReaders, e.g. DirectoryReader - we can't be more specific than 
changing .clone() to return IndexReader, because it returns the result of 
IndexReader.clone(boolean). We could use covariant types for THAT, which would 
work fine, but that didn't follow the pattern of the others so that could be a 
later commit. 
Two changes were also made in contrib/, where not making the changes would have 
broken code by trying to widen IndexInput#clone() back out to returning Object, 
which is not permitted. contrib/ was otherwise left untouched.

Let me know what you think, or if you have any other questions.

  was:
*Paul Cown wrote in LUCENE-1257:*

OK, thought I'd jump in and help out here with one of my Java 5 favourites. 
Haven't seen anyone discuss this, and don't believe any of the patches address 
this, so thought I'd throw a patch out there (against SVN HEAD @ revision 
827821) which uses Java 5 covariant return types for (almost) all of the 
Object#clone() implementations in core. 
i.e. this:

public Object clone() {
changes to:
public SpanNotQuery clone() {

which lets us get rid of a whole bunch of now-unnecessary casts, so e.g.

if (clone == null) clone = (SpanNotQuery) this.clone();
becomes
if (clone == null) clone = this.clone();

Almost everything has been done and all downcasts removed, in core, with the 
exception of

Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() of 
a SpanQuery to a SpanQuery - this can't be made covariant without declaring 
"abstract SpanQuery clone()" in SpanQuery itself, which breaks those SpanQuerys 
that don't declare their own clone() 
Some IndexReaders, e.g. DirectoryReader - we can't be more specific than 
changing .clone() to return IndexReader, because it returns the result of 
IndexReader.clone(boolean). We could use covariant types for THAT, which would 
work fine, but that didn't follow the pattern of the others so that could be a 
later commit. 
Two changes were also made in contrib/, where not making the changes would have 
broken code by trying to widen IndexInput#clone() back out to returning Object, 
which is not permitted. contrib/ was otherwise left untouched.

Let me know what you think, or if you have any other questions.

   Priority: Minor  (was: Major)

> Use covariant clone() return types
> --
>
> Key: LUCENE-2000
> URL: https://issues.apache.org/jira/browse/LUCENE-2000
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Uwe Schindler
>Priority: Minor
> Attachments: LUCENE-2000-clone_covariance.patch
>
>
> *Paul Cowan wrote in LUCENE-1257:*
> OK, thought I'd jump in and help out here with one of my Java 5 favourites. 
> Haven't seen anyone discuss this, and don't believe any of the patches 
> address this, so thought I'd throw a patch out there (against SVN HEAD @ 
> revision 827821) which uses Java 5 covariant return types for (almost) all of 
> the Object#clone() implementations in core. 
> i.e. this:
> public Object clone() {
> changes to:
> public SpanNotQuery clone() {
> which lets us get rid of a whole bunch of now-unnecessary casts, so e.g.
> if (clone == null) clone = (SpanNotQuery) this.clone();
> becomes
> if (clone == null) clone = this.clone();
> Almost everything has been done and all downcasts removed, in core, with the 
> exception of
> Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() 
> of a SpanQuery to a SpanQuery - this can't be made covariant without 
> declaring "abstract SpanQuery clone()" in SpanQuery itself, which breaks 
> those SpanQuerys that don't declare their own clone() 
> Some IndexReaders, e.g. DirectoryRead

[jira] Commented: (LUCENE-2000) Use covariant clone() return types

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768141#action_12768141
 ] 

Uwe Schindler commented on LUCENE-2000:
---

I moved this to an extra issue, because there is some discussion needed.

I am strongly against this for various reasons:
- Java 5 itsself does not override clone() with covariant return type 
(nowhere!). So e.g. String.clone() always returns jl.Object.
- This is because of backwards problems (which are not easy to explain) -- it 
has something to do, if a subclass compiled against Java 1.4 version of Lucene 
overrides clone and calls super.clone(). Because of this, the JDK does not 
provide String.clone() retrurning String. javac does its best to prevent 
problems here, but for APIs that need to be backwards compatible, it should 
return Object as always.
- Covariant clone return types need, that *all* subclasses of a class, that 
originally implemented a covariant clone() also override it covariant to be 
consistent. And because of this you have consistency problems (see your 
IndexReader problem). This is not possible for backwards compatibility. Because 
of this, covariant clone should only be done for internal classes 
(package-private, private) or final classes. Another example of this problem is 
AttributeImpl which defines an abstract clone method. Subclasses would need to 
override this covariant clone() method. Custom Attributes compiled against 
Lucene 2.9 would fail to do this -> MethodNotFoundException (I tried it out, it 
breaks)

Because of all this problems, I prefer to always cast the return value of 
clone(). This is not unsafe (and because of this you get no unchecked warning), 
because you always know how to cast the clone result. By the way: You still 
have to always clone() the super.clone() call, so you do not get any pros of 
using covariant return types.

I do not want to start a flame war here, but we should not do this.


> Use covariant clone() return types
> --
>
> Key: LUCENE-2000
> URL: https://issues.apache.org/jira/browse/LUCENE-2000
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Uwe Schindler
>Priority: Minor
> Attachments: LUCENE-2000-clone_covariance.patch
>
>
> *Paul Cowan wrote in LUCENE-1257:*
> OK, thought I'd jump in and help out here with one of my Java 5 favourites. 
> Haven't seen anyone discuss this, and don't believe any of the patches 
> address this, so thought I'd throw a patch out there (against SVN HEAD @ 
> revision 827821) which uses Java 5 covariant return types for (almost) all of 
> the Object#clone() implementations in core. 
> i.e. this:
> public Object clone() {
> changes to:
> public SpanNotQuery clone() {
> which lets us get rid of a whole bunch of now-unnecessary casts, so e.g.
> if (clone == null) clone = (SpanNotQuery) this.clone();
> becomes
> if (clone == null) clone = this.clone();
> Almost everything has been done and all downcasts removed, in core, with the 
> exception of
> Some SpanQuery stuff, where it's assumed that it's safe to cast the clone() 
> of a SpanQuery to a SpanQuery - this can't be made covariant without 
> declaring "abstract SpanQuery clone()" in SpanQuery itself, which breaks 
> those SpanQuerys that don't declare their own clone() 
> Some IndexReaders, e.g. DirectoryReader - we can't be more specific than 
> changing .clone() to return IndexReader, because it returns the result of 
> IndexReader.clone(boolean). We could use covariant types for THAT, which 
> would work fine, but that didn't follow the pattern of the others so that 
> could be a later commit. 
> Two changes were also made in contrib/, where not making the changes would 
> have broken code by trying to widen IndexInput#clone() back out to returning 
> Object, which is not permitted. contrib/ was otherwise left untouched.
> Let me know what you think, or if you have any other questions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2000) Use covariant clone() return types

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768141#action_12768141
 ] 

Uwe Schindler edited comment on LUCENE-2000 at 10/21/09 9:14 AM:
-

I moved this to an extra issue, because there is some discussion needed.

I am strongly against this for various reasons:
- Java 5 itsself does not override clone() with covariant return type 
(nowhere!). So e.g. String.clone() always returns jl.Object.
- This is because of backwards problems (which are not easy to explain) -- it 
has something to do, if a subclass compiled against Java 1.4 version of Lucene 
overrides clone and calls super.clone(). Because of this, the JDK does not 
provide String.clone() retrurning String. javac does its best to prevent 
problems here, but for APIs that need to be backwards compatible, it should 
return Object as always.
- Covariant clone return types need, that *all* subclasses of a class, that 
originally implemented a covariant clone() also override it covariant to be 
consistent. And because of this you have consistency problems (see your 
IndexReader problem). This is not possible for backwards compatibility. Because 
of this, covariant clone should only be done for internal classes 
(package-private, private) or final classes. Another example of this problem is 
AttributeImpl which defines a clone() method. Subclasses would need to override 
this covariant clone() method. Custom Attributes compiled against Lucene 2.9 
would fail to do this -> MethodNotFoundException (I tried it out, it breaks)

Because of all this problems, I prefer to always cast the return value of 
clone(). This is not unsafe (and because of this you get no unchecked warning), 
because you always know how to cast the clone result. By the way: You still 
have to always clone() the super.clone() call, so you do not get any pros of 
using covariant return types.

I do not want to start a flame war here, but we should not do this.


  was (Author: thetaphi):
I moved this to an extra issue, because there is some discussion needed.

I am strongly against this for various reasons:
- Java 5 itsself does not override clone() with covariant return type 
(nowhere!). So e.g. String.clone() always returns jl.Object.
- This is because of backwards problems (which are not easy to explain) -- it 
has something to do, if a subclass compiled against Java 1.4 version of Lucene 
overrides clone and calls super.clone(). Because of this, the JDK does not 
provide String.clone() retrurning String. javac does its best to prevent 
problems here, but for APIs that need to be backwards compatible, it should 
return Object as always.
- Covariant clone return types need, that *all* subclasses of a class, that 
originally implemented a covariant clone() also override it covariant to be 
consistent. And because of this you have consistency problems (see your 
IndexReader problem). This is not possible for backwards compatibility. Because 
of this, covariant clone should only be done for internal classes 
(package-private, private) or final classes. Another example of this problem is 
AttributeImpl which defines an abstract clone method. Subclasses would need to 
override this covariant clone() method. Custom Attributes compiled against 
Lucene 2.9 would fail to do this -> MethodNotFoundException (I tried it out, it 
breaks)

Because of all this problems, I prefer to always cast the return value of 
clone(). This is not unsafe (and because of this you get no unchecked warning), 
because you always know how to cast the clone result. By the way: You still 
have to always clone() the super.clone() call, so you do not get any pros of 
using covariant return types.

I do not want to start a flame war here, but we should not do this.

  
> Use covariant clone() return types
> --
>
> Key: LUCENE-2000
> URL: https://issues.apache.org/jira/browse/LUCENE-2000
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.0
>Reporter: Uwe Schindler
>Priority: Minor
> Attachments: LUCENE-2000-clone_covariance.patch
>
>
> *Paul Cowan wrote in LUCENE-1257:*
> OK, thought I'd jump in and help out here with one of my Java 5 favourites. 
> Haven't seen anyone discuss this, and don't believe any of the patches 
> address this, so thought I'd throw a patch out there (against SVN HEAD @ 
> revision 827821) which uses Java 5 covariant return types for (almost) all of 
> the Object#clone() implementations in core. 
> i.e. this:
> public Object clone() {
> changes to:
> public SpanNotQuery clone() {
> which lets us get rid of a whole bunch of now-unnecessary casts, so e.g.
> if (clone == null) clone = (SpanNotQuery) this.clone();
> becomes
> if (clone == null) clone = this.clone();
> Almost everything has been done and

Re: lucene 2.9 sorting algorithm

2009-10-21 Thread Michael McCandless

OK, thanks.

I can help out if you've got questions on the python code... it's
rather straightforward: it just iterates over each set of params to
test, writes an alg file, runs it, opens the resulting output & parses
it for the best run, confirms both single & multi PQ gave precisely
the same doc IDs, and prints the results.

It's remotely possible the difference in the results is a bug/overhead
in contrib/benchmark itself, which'd be good to get to the bottom of
anyway.

Mike

On Tue, Oct 20, 2009 at 9:17 PM, John Wang  wrote:
> Hi Mike:
>     That's weird. Let me take a look at the patch. Need to brush up on
> python though :)
> Thanks
> -John
>
> On Tue, Oct 20, 2009 at 10:25 AM, Michael McCandless
>  wrote:
>>
>> OK I posted a patch that folds the MultiPQ approach into
>> contrib/benchmark, plus a simple python wrapper to run old/new tests
>> across different queries, sort, topN, etc.
>>
>> But I got different results... MultiPQ looks generally slower than
>> SinglePQ.  So I think we now need to reconcile what's different
>> between our tests.
>>
>> Mike
>>
>> On Mon, Oct 19, 2009 at 9:28 PM, John Wang  wrote:
>> > Hi Michael:
>> >      Was wondering if you got a chance to take a look at this.
>> >      Since deprecated APIs are being removed in 3.0, I was wondering
>> > if/when
>> > we would decide on keeping the ScoreDocComparator API and thus would be
>> > kept
>> > for Lucene 3.0.
>> > Thanks
>> > -John
>> >
>> > On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless
>> >  wrote:
>> >>
>> >> Oh, no problem...
>> >>
>> >> Mike
>> >>
>> >> On Fri, Oct 16, 2009 at 12:33 PM, John Wang 
>> >> wrote:
>> >> > Mike, just a clarification on my first perf report email.
>> >> > The first section, numHits is incorrectly labeled, it should be 20
>> >> > instead
>> >> > of 50. Sorry about the possible confusion.
>> >> > Thanks
>> >> > -John
>> >> >
>> >> > On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless
>> >> >  wrote:
>> >> >>
>> >> >> Thanks John; I'll have a look.
>> >> >>
>> >> >> Mike
>> >> >>
>> >> >> On Fri, Oct 16, 2009 at 12:57 AM, John Wang 
>> >> >> wrote:
>> >> >> > Hi Michael:
>> >> >> >     I added classes: ScoreDocComparatorQueue
>> >> >> > and OneSortNoScoreCollector
>> >> >> > as
>> >> >> > a more general case. I think keeping the old api for
>> >> >> > ScoreDocComparator
>> >> >> > and
>> >> >> > SortComparatorSource would work.
>> >> >> >   Please take a look.
>> >> >> > Thanks
>> >> >> > -John
>> >> >> >
>> >> >> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang 
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Michael:
>> >> >> >>      It is
>> >> >> >> open, http://code.google.com/p/lucene-book/source/checkout
>> >> >> >>      I think I sent the https url instead, sorry.
>> >> >> >>     The multi PQ sorting is fairly self-contained, I have 2
>> >> >> >> versions, 1
>> >> >> >> for string and 1 for int, each are Collector impls.
>> >> >> >>      I shouldn't say the Multi Q is faster on int sort, it is
>> >> >> >> within
>> >> >> >> the
>> >> >> >> error boundary. The diff is very very small, I would stay they
>> >> >> >> are
>> >> >> >> more
>> >> >> >> equal.
>> >> >> >>      If you think it is a good thing to go this way, (if not for
>> >> >> >> the
>> >> >> >> perf,
>> >> >> >> just for the simpler api) I'd be happy to work on a patch.
>> >> >> >> Thanks
>> >> >> >> -John
>> >> >> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless
>> >> >> >>  wrote:
>> >> >> >>>
>> >> >> >>> John, looks like this requires login -- any plans to open that
>> >> >> >>> up,
>> >> >> >>> or,
>> >> >> >>> post the code on an issue?
>> >> >> >>>
>> >> >> >>> How self-contained is your Multi PQ sorting?  EG is it a
>> >> >> >>> standalone
>> >> >> >>> Collector impl that I can test?
>> >> >> >>>
>> >> >> >>> Mike
>> >> >> >>>
>> >> >> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang 
>> >> >> >>> wrote:
>> >> >> >>> > BTW, we are have a little sandbox for these experiments. And
>> >> >> >>> > all
>> >> >> >>> > my
>> >> >> >>> > testcode
>> >> >> >>> > are at. They are not very polished.
>> >> >> >>> >
>> >> >> >>> > https://lucene-book.googlecode.com/svn/trunk
>> >> >> >>> >
>> >> >> >>> > -John
>> >> >> >>> >
>> >> >> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang
>> >> >> >>> > 
>> >> >> >>> > wrote:
>> >> >> >>> >>
>> >> >> >>> >> Numbers Mike requested for Int types:
>> >> >> >>> >>
>> >> >> >>> >> only the time/cputime are posted, others are all the same
>> >> >> >>> >> since
>> >> >> >>> >> the
>> >> >> >>> >> algorithm is the same.
>> >> >> >>> >>
>> >> >> >>> >> Lucene 2.9:
>> >> >> >>> >> numhits: 10
>> >> >> >>> >> time: 14619495
>> >> >> >>> >> cpu: 146126
>> >> >> >>> >>
>> >> >> >>> >> numhits: 20
>> >> >> >>> >> time: 14550568
>> >> >> >>> >> cpu: 163242
>> >> >> >>> >>
>> >> >> >>> >> numhits: 100
>> >> >> >>> >> time: 16467647
>> >> >> >>> >> cpu: 178379
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> my test:
>> >> >> >>> >> numHits: 10
>> >> >> >>> >> time: 14101094
>> >> >> >>> >> cpu: 144715
>>

Re: lucene 2.9 sorting algorithm

2009-10-21 Thread Michael McCandless

On Tue, Oct 20, 2009 at 11:55 AM, John Wang  wrote:

> the simpler api places less restriction on the type of custom
> sorting that can be done.

Just to verify: this is not a back-compat break, right?

Because, in 2.4, such an interesting custom sort must've been
operating at the top-level index reader level, which is easy to carry
over to 2.9 (you just rebase the docIDs).

But, of course in moving to 2.9, you would like to also switch your
custom sort to be per-segment (for faster reopen/near real-time perf),
but the new sort API makes this more difficult because it requires
that you are able to compare hits across different segments during the
search, not just at the end.

But then I don't understand the difficulty of doing that: if we had a
Collector with the MultiPQ approach, at the end during merge, you'd
also have to compare results across segments, ie, upgrade your ords to
their real values.  The MultiPQ approach does this by calling
sortValue (returns Comparable) in the end.

Putting performance aside for now... when comparing bottom, you don't
actually have to "truly invert" Comparable -> ord on segment
transition.  You could, instead, get the Comparable for each and
compare, but then note the smallest ord for the current segment that
has failed to compete, and short-ciruit the compareBottom test by
checking against that ord. That should enable carrying over the custom
sort to the single PQ API without needing invert ord->value.

We'd obviously have to test performance...

Or, we could commit the MultiPQ approach as another sorting collector?
I know it's not great having two wildly differenet sort APIs, but both
APIs seem to have their strengths in different cases.

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1999) Match spotter for all query types

2009-10-21 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768163#action_12768163
 ] 

Michael McCandless commented on LUCENE-1999:


Very clever!

Since you are wrapping arbitrary query objs, couldn't the wrapper make a 
separate data structure for tracking which clause matched (instead of encoding 
it into the score)?

Also: doesn't highlighter run, separately, on each doc?  And so it's OK if the 
scores are affected?  Ie, I would run my main search with a normal query, get 
the 10 results for the current page, then step through each of those 10 doc IDs 
make a single-doc-IndexSearcher, and run this wrapper?

{quote}
Avoiding these precision issues would require a change to Lucene core to record 
docId, score AND a matchFlag byte in ScoreDoc objects and collector APIs.
This may be something we should consider.
{quote}

+1  I would love to see the Scorer API extended to optionally provide details 
on matches.  Not just which clause matched which docs/fields, but the positions 
within the field where the match occurred.  I think we could do this by 
absorbing *SpanQuery into their normal Query counterparts, making the getSpans 
API [somehow] optional so that if you didn't invoke it you don't pay a 
performance price.

> Match spotter for all query types
> -
>
> Key: LUCENE-1999
> URL: https://issues.apache.org/jira/browse/LUCENE-1999
> Project: Lucene - Java
>  Issue Type: New Feature
>Affects Versions: 2.9
>Reporter: Mark Harwood
> Attachments: matchflagger.patch
>
>
> Related to LUCENE-1929 and the current inability to highlight 
> NumericRangeQuery, spatial, cached term filters and other exotica.
> This patch provides the ability to wrap *any* Query objects and record match 
> info as flags encoded in the overall document score.
> Using this approach it would be possible to understand (and therefore 
> highlight) which fields matched clauses in a query.
> The match encoding approach loses some precision in scores as noted here: 
> http://tinyurl.com/ykt8nx7
> Avoiding these precision issues would require a change to Lucene core to 
> record docId, score AND a matchFlag byte in ScoreDoc objects and collector 
> APIs.
> This may be something we should consider.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1987:
--

Attachment: LUCENE-1987-StopFilter.patch

A new patch which resolves the Benchmark problem by adding a static method in 
NewAnalyzerTask that loads an analyzer by class name:
{code}
public static final Analyzer createAnalyzer(String className) throws Exception{
final Class clazz = 
Class.forName(className).asSubclass(Analyzer.class);
try {
  // first try to use a ctor with version parameter (needed for many new 
Analyzers that have no default one anymore
  Constructor cnstr = 
clazz.getConstructor(Version.class);
  return cnstr.newInstance(Version.LUCENE_CURRENT);
} catch (NoSuchMethodException nsme) {
  // otherwise use default ctor
  return clazz.newInstance();
}
}
{code}

This method is reused at other places where an Analyzer is created by a config 
property.

This patch now passes all test. There are still the problems with Analyzer and 
QueryParsr with wrong default properties, but I would like to commit this first 
and then solve the problems, also in 2.9.1.

Mike, are you OK with that?

> Remove rest of analysis deprecations (Token, CharacterCache)
> 
>
> Key: LUCENE-1987
> URL: https://issues.apache.org/jira/browse/LUCENE-1987
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9.1, 3.0
>
> Attachments: LUCENE-1987-StopFilter-backport29.patch, 
> LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch
>
>
> These removes the rest of the deprecations in the analysis package:
> - -Token's termText field-- (DONE)
> - -eventually un-deprecate ctors of Token taking Strings (they are still 
> useful) -> if yes remove deprec in 2.9.1- (DONE)
> - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
> - Stopwords lists
> - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
> are deprecated, but we still have the VERSION constants. Do not know, how to 
> proceed. Keep the settings alive for index compatibility? Or remove it 
> together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-21 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768167#action_12768167
 ] 

Michael McCandless commented on LUCENE-1987:


bq. Mike, are you OK with that?

Looks great!  Not only am I OK with it, it's exactly what I proposed (above -- 
https://issues.apache.org/jira/browse/LUCENE-1987?focusedCommentId=12767449&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767449).
  Maybe you missed my response there?  (I also suggested adding Version to QP 
ctor).

> Remove rest of analysis deprecations (Token, CharacterCache)
> 
>
> Key: LUCENE-1987
> URL: https://issues.apache.org/jira/browse/LUCENE-1987
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9.1, 3.0
>
> Attachments: LUCENE-1987-StopFilter-backport29.patch, 
> LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch
>
>
> These removes the rest of the deprecations in the analysis package:
> - -Token's termText field-- (DONE)
> - -eventually un-deprecate ctors of Token taking Strings (they are still 
> useful) -> if yes remove deprec in 2.9.1- (DONE)
> - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
> - Stopwords lists
> - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
> are deprecated, but we still have the VERSION constants. Do not know, how to 
> proceed. Keep the settings alive for index compatibility? Or remove it 
> together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1999) Match spotter for all query types

2009-10-21 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768173#action_12768173
 ] 

Mark Harwood commented on LUCENE-1999:
--

bq. couldn't the wrapper make a separate data structure for tracking which 
clause matched 

I was trying to keep the processing cost super-low with no object allocations 
because this is in a very tight loop. We don't really want to be generating a 
lot of state/processing while we're still evaluating potentially millions of 
candidate matches.
That seems to be the challenge doing this instrumentation in-line with the 
query execution.

bq. Also: doesn't highlighter run, separately, on each doc? And so it's OK if 
the scores are affected?

The use case I'm tackling right now involves search forms with lots of optional 
fields (spatial, numeric, "choice" etc) and I only needed a yes/no match flag 
for each field. This approach should give me these answers back immediately 
without impacting query processing speeds significantly. 
However, I can see the value in core Lucene capturing a richer data structure 
than a simple boolean where you choose to do a seperate "highlight" pass on the 
top N documents. This would suggest that you might need 2 query expressions - 
one for execution and one for adding highlighter instrumentation. I suppose the 
client could add the instrumentation requests to the initial query which are 
passive during a Lucene "results-selection" mode and become active in 
"highlight mode".



> Match spotter for all query types
> -
>
> Key: LUCENE-1999
> URL: https://issues.apache.org/jira/browse/LUCENE-1999
> Project: Lucene - Java
>  Issue Type: New Feature
>Affects Versions: 2.9
>Reporter: Mark Harwood
> Attachments: matchflagger.patch
>
>
> Related to LUCENE-1929 and the current inability to highlight 
> NumericRangeQuery, spatial, cached term filters and other exotica.
> This patch provides the ability to wrap *any* Query objects and record match 
> info as flags encoded in the overall document score.
> Using this approach it would be possible to understand (and therefore 
> highlight) which fields matched clauses in a query.
> The match encoding approach loses some precision in scores as noted here: 
> http://tinyurl.com/ykt8nx7
> Avoiding these precision issues would require a change to Lucene core to 
> record docId, score AND a matchFlag byte in ScoreDoc objects and collector 
> APIs.
> This may be something we should consider.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768178#action_12768178
 ] 

Uwe Schindler commented on LUCENE-1987:
---

I have seen your comment yesterday and implemented the benchmark thing that way.

The QP ctor with Version param also looks good, but we have to add this to 2.9, 
too, to be able to remove the no-arg ctor, too.

My patch still has a failed test int the ant task (missing no-arg ctor), will 
look into it, but fix is same like for benchmark.

> Remove rest of analysis deprecations (Token, CharacterCache)
> 
>
> Key: LUCENE-1987
> URL: https://issues.apache.org/jira/browse/LUCENE-1987
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9.1, 3.0
>
> Attachments: LUCENE-1987-StopFilter-backport29.patch, 
> LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch
>
>
> These removes the rest of the deprecations in the analysis package:
> - -Token's termText field-- (DONE)
> - -eventually un-deprecate ctors of Token taking Strings (they are still 
> useful) -> if yes remove deprec in 2.9.1- (DONE)
> - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
> - Stopwords lists
> - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
> are deprecated, but we still have the VERSION constants. Do not know, how to 
> proceed. Keep the settings alive for index compatibility? Or remove it 
> together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2001) wordnet parsing bug

2009-10-21 Thread Robert Muir (JIRA)

wordnet parsing bug
---

 Key: LUCENE-2001
 URL: https://issues.apache.org/jira/browse/LUCENE-2001
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/*
Affects Versions: 2.9
Reporter: Robert Muir
Priority: Minor


A user reported that wordnet parses the prolog file incorrectly.

Also need to check the wordnet parser in the memory contrib for this problem.

If this is a false alarm, i'm not worried, because the test will be the first 
unit test wordnet package ever had.

{noformat}
For example, looking up the synsets for the
word "king", we get:

java SynLookup wnindex king
baron
magnate
mogul
power
queen
rex
scrofula
struma
tycoon

Here, "scrofula" and "struma" are extraneous. This happens because, the line
parser code in Syns2Index.java interpretes the two consecutive single quotes
in entry s(114144247,3,'king''s evil',n,1,1) in  wn_s.pl file, as
termination
of the string and separates into "king". This entry concerns
synset of words "scrofula" and "struma", and thus they get inserted in the
synset of "king". *There 1382 such entries, in wn_s.pl* and more in other
WordNet
Prolog data-base files, where such use of two consecutive single quotes
appears.

We have resolved this by adding a statement in the line parsing portion of
Syns2Index.java, as follows:

   // parse line
   line = line.substring(2);
  * line = line.replaceAll("\'\'", "`"); // added statement*
   int comma = line.indexOf(',');
   String num = line.substring(0, comma);  ... ... etc.
In short we replace "''" by "`" (a back-quote). Then on recreating the
index, we get:

java SynLookup zwnindex king
baron
magnate
mogul
power
queen
rex
tycoon
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1987:
--

Attachment: LUCENE-1987-StopFilter.patch

Fix ant task.

> Remove rest of analysis deprecations (Token, CharacterCache)
> 
>
> Key: LUCENE-1987
> URL: https://issues.apache.org/jira/browse/LUCENE-1987
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9.1, 3.0
>
> Attachments: LUCENE-1987-StopFilter-backport29.patch, 
> LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
> LUCENE-1987.patch
>
>
> These removes the rest of the deprecations in the analysis package:
> - -Token's termText field-- (DONE)
> - -eventually un-deprecate ctors of Token taking Strings (they are still 
> useful) -> if yes remove deprec in 2.9.1- (DONE)
> - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
> - Stopwords lists
> - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
> are deprecated, but we still have the VERSION constants. Do not know, how to 
> proceed. Keep the settings alive for index compatibility? Or remove it 
> together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1999) Match spotter for all query types

2009-10-21 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768191#action_12768191
 ] 

Michael McCandless commented on LUCENE-1999:


I see, it sounds like your use case is different from the typical
highlighting use case in that 1) you don't need the positions of the
matches (just whether a given clause matched the doc or not), and 2)
you need it for every single doc visited by the query, not just for
the handful of docs that are being presented to the user on the
current "page".

bq. This would suggest that you might need 2 query expressions - one for 
execution and one for adding highlighter instrumentation.

I'm thinking it's the same query, but we fix the Scorer API for all
queries (= big change!!) to be able to produce match details on
demand, where those match details look something like what getSpans
now returns.  But for the normal case (only highlighting the docs
being shown on current page), we'd only get the match details for that
small set of docs.

Then we ideally would not need a separate mirrored set of span
queries.  Ie, SpanTermQuery would be absorbed into TermQuery, etc.

But I could easily be being too naive here :) Maybe there is some
serious performance cost to even adding the optional API in.

> Match spotter for all query types
> -
>
> Key: LUCENE-1999
> URL: https://issues.apache.org/jira/browse/LUCENE-1999
> Project: Lucene - Java
>  Issue Type: New Feature
>Affects Versions: 2.9
>Reporter: Mark Harwood
> Attachments: matchflagger.patch
>
>
> Related to LUCENE-1929 and the current inability to highlight 
> NumericRangeQuery, spatial, cached term filters and other exotica.
> This patch provides the ability to wrap *any* Query objects and record match 
> info as flags encoded in the overall document score.
> Using this approach it would be possible to understand (and therefore 
> highlight) which fields matched clauses in a query.
> The match encoding approach loses some precision in scores as noted here: 
> http://tinyurl.com/ykt8nx7
> Avoiding these precision issues would require a change to Lucene core to 
> record docId, score AND a matchFlag byte in ScoreDoc objects and collector 
> APIs.
> This may be something we should consider.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1987.
---

Resolution: Fixed

Committed in 2.9, 3.0, backwards branch.

For the QueryParser problems and other additions of version constants I will 
open another issue.

> Remove rest of analysis deprecations (Token, CharacterCache)
> 
>
> Key: LUCENE-1987
> URL: https://issues.apache.org/jira/browse/LUCENE-1987
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9.1, 3.0
>
> Attachments: LUCENE-1987-StopFilter-backport29.patch, 
> LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
> LUCENE-1987.patch
>
>
> These removes the rest of the deprecations in the analysis package:
> - -Token's termText field-- (DONE)
> - -eventually un-deprecate ctors of Token taking Strings (they are still 
> useful) -> if yes remove deprec in 2.9.1- (DONE)
> - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
> - Stopwords lists
> - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
> are deprecated, but we still have the VERSION constants. Do not know, how to 
> proceed. Keep the settings alive for index compatibility? Or remove it 
> together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1998:
--

Attachment: LUCENE-1998_enum.patch

Updated patch (merged with StandardAnalyzer version constants). Also added 
Lucene version 3.0 for completeness to enable users to build apps and do not 
need to use the CURRENT constant.

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch, LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Uwe Schindler (JIRA)

Add oal.util.Version ctor to QueryParser


 Key: LUCENE-2002
 URL: https://issues.apache.org/jira/browse/LUCENE-2002
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9, 3.0
Reporter: Uwe Schindler
 Fix For: 3.0, 2.9


This is a followup of LUCENE-1987:

If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
QueryParser, phrase queries will not work, because the StopFilter enables 
position Increments for stop words, but QueryParser ignores them per default. 
The user has to explicitely enable them.

This issue would add a ctor taking the Version constant and automatically 
enable this setting. The same applies to the contrib queryparser. Eventually 
also StopAnalyzer should add this version ctor.

To be able to remove the default ctor for 3.0 (to remove a possible trap for 
users of QueryParser), it must be deprecated and the new one also added to 
2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread DM Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768215#action_12768215
 ] 

DM Smith commented on LUCENE-1998:
--

.bq I only added the license header back in the Version class. It must be there.

Sorry about wacking the license on Version. It must have been an accident. I 
know it needs to be there.

.bq Some fine tuning: You defined package protected abstract methods, but made 
them public in the enum constant. Changed to all-public. This was also a 
backwards-break in contrib/queryParser.

Thanks. Inadvertently,  I was following the pattern for an Interface, where 
scoping does not matter.

.bq So it works, but not with switch statements.
IMHO: Having a switch statement (or cascading if-then-else) over the collection 
of values is generally indicative of a bad design (or an opportunity for an 
improved design :) By adding methods to each enum that return literals, we can 
eliminate this and at the same time, improve performance.

There is another tuning opportunity, which I didn't take. We are marshaling out 
the flags from the enums into member variables. I'm not sure how efficient the 
storage of a boolean vs an enum is. If it is a wash, then having an enum value 
as replacement would be a good thing. It sould clearly document what controls 
the flag.

The only complication would be the set/get for some of the flags. (E.g. 
AbstractField.setOmitNorms.) What's with that? Are the enum values merely a 
hint??? Does it make sense to allow omitNorms to be changed after an 
AbstractField is being used?



> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch, LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768234#action_12768234
 ] 

Uwe Schindler commented on LUCENE-1257:
---

Committed:
- LUCENE-1257_more_unnecessary_casts.patch
- Remove the rest of unchecked warnings. I added a TODO, where I do not 
understand the code and not for sure know, whats inside the collections. This 
could be fixed some time later. But the core code now compiles without any 
unchecked warning.

Revision: 828011


> Port to Java5
> -
>
> Key: LUCENE-1257
> URL: https://issues.apache.org/jira/browse/LUCENE-1257
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis, Examples, Index, Other, Query/Scoring, 
> QueryParser, Search, Store, Term Vectors
>Affects Versions: 3.0
>Reporter: Cédric Champeau
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: instantiated_fieldable.patch, 
> LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
> LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
> LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
> LUCENE-1257-CompoundFileReaderWriter.patch, 
> LUCENE-1257-ConcurrentMergeScheduler.patch, 
> LUCENE-1257-DirectoryReader.patch, 
> LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
> LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
> LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
> LUCENE-1257-IndexDeleter.patch, 
> LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
> LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
> LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, 
> LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
> LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
> LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
> LUCENE-1257_BooleanFilter_Generics.patch, 
> LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_javacc_upgrade.patch, 
> LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, 
> LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, 
> LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
> LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
> LUCENE-1257_o_a_l_search_spans.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, 
> LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, 
> LUCENE-1257_unnecessary_casts.patch, lucene1257surround1.patch, 
> lucene1257surround1.patch, shinglematrixfilter_generified.patch
>
>
> For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
> Java 5 migration had been planned for 2.1 someday in the past, but don't know 
> when it is planned now. This patch against the trunk includes :
> - most obvious generics usage (there are tons of usages of sets, ... Those 
> which are commonly used have been generified)
> - PriorityQueue generification
> - replacement of indexed for loops with for each constructs
> - removal of unnececessary unboxing
> The code is to my opinion much more readable with those features (you 
> actually *know* what is stored in collections reading the code, without the 
> need to lookup for field definitions everytime) and it simplifies many 
> algorithms.
> Note that this patch also includes an interface for the Query class. This has 
> been done for my company's needs for building custom Query classes which add 
> some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
> casts. I know this introduction is not wanted by the team, but it really 
> makes our developments easier to maintain. If you don't want to use this, 
> replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768242#action_12768242
 ] 

Uwe Schindler commented on LUCENE-1998:
---

(it's "bq." not ".bq" :-) )

{quote}
bq. So it works, but not with switch statements.
IMHO: Having a switch statement (or cascading if-then-else) over the collection 
of values is generally indicative of a bad design (or an opportunity for an 
improved design  By adding methods to each enum that return literals, we can 
eliminate this and at the same time, improve performance.
{quote}

You are right, my problem was more for client code of Lucene that may for 
example have a switch statement on Field.Index (e.g. Solr) to control some 
further indexing steps. If we rename the constant, the switch statement would 
not work (it would work in already compiled code), but not if the code is 
recompiled against the modified version. That was my problem. In 3.0 this will 
not happen as there are no deprec enum constants, but maybe later. In this 
case, a CHANGES.txt entry should be added.

bq. There is another tuning opportunity, which I didn't take. We are marshaling 
out the flags from the enums into member variables. I'm not sure how efficient 
the storage of a boolean vs an enum is. If it is a wash, then having an enum 
value as replacement would be a good thing. It sould clearly document what 
controls the flag.

This is currently not possibible because of backwards compatibility, because 
the fields are protected and not deprecated in 2.9. I think with your change we 
are fine.

bq. The only complication would be the set/get for some of the flags. (E.g. 
AbstractField.setOmitNorms.) What's with that? Are the enum values merely a 
hint??? Does it make sense to allow omitNorms to be changed after an 
AbstractField is being used?

It is perfectly legal to change these constants after creating the field, so 
the setters must be there.

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch, LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread DM Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768215#action_12768215
 ] 

DM Smith edited comment on LUCENE-1998 at 10/21/09 2:22 PM:


bq. I only added the license header back in the Version class. It must be there.

Sorry about wacking the license on Version. It must have been an accident. I 
know it needs to be there.

bq.Some fine tuning: You defined package protected abstract methods, but made 
them public in the enum constant. Changed to all-public. This was also a 
backwards-break in contrib/queryParser.

Thanks. Inadvertently,  I was following the pattern for an Interface, where 
scoping does not matter.

bq. So it works, but not with switch statements.
IMHO: Having a switch statement (or cascading if-then-else) over the collection 
of values is generally indicative of a bad design (or an opportunity for an 
improved design :) By adding methods to each enum that return literals, we can 
eliminate this and at the same time, improve performance.

There is another tuning opportunity, which I didn't take. We are marshaling out 
the flags from the enums into member variables. I'm not sure how efficient the 
storage of a boolean vs an enum is. If it is a wash, then having an enum value 
as replacement would be a good thing. It sould clearly document what controls 
the flag.

The only complication would be the set/get for some of the flags. (E.g. 
AbstractField.setOmitNorms.) What's with that? Are the enum values merely a 
hint??? Does it make sense to allow omitNorms to be changed after an 
AbstractField is being used?



  was (Author: dmsmith):
.bq I only added the license header back in the Version class. It must be 
there.

Sorry about wacking the license on Version. It must have been an accident. I 
know it needs to be there.

.bq Some fine tuning: You defined package protected abstract methods, but made 
them public in the enum constant. Changed to all-public. This was also a 
backwards-break in contrib/queryParser.

Thanks. Inadvertently,  I was following the pattern for an Interface, where 
scoping does not matter.

.bq So it works, but not with switch statements.
IMHO: Having a switch statement (or cascading if-then-else) over the collection 
of values is generally indicative of a bad design (or an opportunity for an 
improved design :) By adding methods to each enum that return literals, we can 
eliminate this and at the same time, improve performance.

There is another tuning opportunity, which I didn't take. We are marshaling out 
the flags from the enums into member variables. I'm not sure how efficient the 
storage of a boolean vs an enum is. If it is a wash, then having an enum value 
as replacement would be a good thing. It sould clearly document what controls 
the flag.

The only complication would be the set/get for some of the flags. (E.g. 
AbstractField.setOmitNorms.) What's with that? Are the enum values merely a 
hint??? Does it make sense to allow omitNorms to be changed after an 
AbstractField is being used?


  
> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch, LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1999) Match spotter for all query types

2009-10-21 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768257#action_12768257
 ] 

Mark Harwood commented on LUCENE-1999:
--

bq. and 2) you need it for every single doc visited by the query

Actually I don't need it for every doc, only the top ones  - it just happens to 
be so cheap to produce that I can afford to run this in-line with the query. (I 
haven't actually benchmarked it at scale buy my gut feel is it would be fast )

I was thinking that this might be orthogonal to the existing "free-text" based 
highlighter. The logic for this being roughly that

1) Highlighting of free-text fields is reasonably well-catered for with 
summarisation etc.
2) The remaining problem areas for highlighting (NumericRangeQuery, Spatial, 
Cached term filters on enums eg gender:male/female) are all likely to be 
non-free-text fields which don't require summarisation and only contain a 
single value.

I may be wrong in these assumptions about the existing state of play (any 
thoughts, Mark M?) but it might be useful to think of attacking the problem 
with these 2 different requirements in mind.

Regardless of type e.g. int, long etc I tend to think of fields as falling into 
these broad usage categories:

a) "Identifiers" (e.g. primary keys)
b) Quantifiers (e.g numerics, dates, spatial)
c) Free-text 
d) Controlled vocabularies (e.g. enums such as gender:m/f)

Type a ) is catered for with a straight TermQuery and therefore can be handled 
with the existing highlighter
Type b) needs special indexes/queries (spatial/trie) and isn't catered for by 
the existing term/span-based Highlighter
Type c) is catered for with the existing highlighter and its summarising 
features
Type d) involves many TermDoc.next reads so is usefully cached as filters and 
therefore not catered for by existing Highlighter

So this patch helps cater for types b) and d) where simply knowing the field 
matched is all that is required to highlight.


> Match spotter for all query types
> -
>
> Key: LUCENE-1999
> URL: https://issues.apache.org/jira/browse/LUCENE-1999
> Project: Lucene - Java
>  Issue Type: New Feature
>Affects Versions: 2.9
>Reporter: Mark Harwood
> Attachments: matchflagger.patch
>
>
> Related to LUCENE-1929 and the current inability to highlight 
> NumericRangeQuery, spatial, cached term filters and other exotica.
> This patch provides the ability to wrap *any* Query objects and record match 
> info as flags encoded in the overall document score.
> Using this approach it would be possible to understand (and therefore 
> highlight) which fields matched clauses in a query.
> The match encoding approach loses some precision in scores as noted here: 
> http://tinyurl.com/ykt8nx7
> Avoiding these precision issues would require a change to Lucene core to 
> record docId, score AND a matchFlag byte in ScoreDoc objects and collector 
> APIs.
> This may be something we should consider.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread DM Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768270#action_12768270
 ] 

DM Smith commented on LUCENE-1998:
--

I just noticed that enums are comparable. For the enum Version, we could take 
advantage for this and not store a number for each value. It would be important 
to maintain order of versions in the file from earliest to latest.

Should we do this?

Then the current patch's (comments removed for clarity):

public enum Version {
  LUCENE_CURRENT (0),
  LUCENE_20  (2000),
  LUCENE_21  (2100),
  LUCENE_22  (2200),
  LUCENE_23  (2300),
  LUCENE_24  (2400),
  LUCENE_29  (2900),
  LUCENE_30  (3000);
 
  private Version(int v) {
this.v = v;
  }
 
  public boolean onOrAfter(Version other) {
return v == 0 || v >= other.v;
  }

  private final int v;
}

Would become (the comment on strict ordering is necessary):

public enum Version {

  // These have to be ordered from the oldest to the newest version
  LUCENE_20,
  LUCENE_21,
  LUCENE_22,
  LUCENE_23,
  LUCENE_24,
  LUCENE_29,
  LUCENE_30,
  // This needs to be last
  LUCENE_CURRENT;
 
  /** A convienence method merely calling this.compareTo(other) >= 0 */
  public boolean onOrAfter(Version other) {
return compareTo(other) >= 0;
  }

}


> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch, LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768273#action_12768273
 ] 

Uwe Schindler commented on LUCENE-1998:
---

I thought about that, too: I would not do this. Especially because I want to 
have the 0-version (current) as first element for serialization purposes 
(changing the order of enum constants is bad, you should always add them at the 
end).

Eventually we want to make the accessor to the interver v somehow public (for 
more specific comaprisons and so on).

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch, LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2001) wordnet parsing bug

2009-10-21 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2001:


Attachment: LUCENE-2001.patch

fix and tests for the bug
this only affects wordnet contrib, the bug does not exist in the wordnet 
synonymfilter from the memory package, but add a test there too.


> wordnet parsing bug
> ---
>
> Key: LUCENE-2001
> URL: https://issues.apache.org/jira/browse/LUCENE-2001
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 2.9
>Reporter: Robert Muir
>Priority: Minor
> Attachments: LUCENE-2001.patch
>
>
> A user reported that wordnet parses the prolog file incorrectly.
> Also need to check the wordnet parser in the memory contrib for this problem.
> If this is a false alarm, i'm not worried, because the test will be the first 
> unit test wordnet package ever had.
> {noformat}
> For example, looking up the synsets for the
> word "king", we get:
> java SynLookup wnindex king
> baron
> magnate
> mogul
> power
> queen
> rex
> scrofula
> struma
> tycoon
> Here, "scrofula" and "struma" are extraneous. This happens because, the line
> parser code in Syns2Index.java interpretes the two consecutive single quotes
> in entry s(114144247,3,'king''s evil',n,1,1) in  wn_s.pl file, as
> termination
> of the string and separates into "king". This entry concerns
> synset of words "scrofula" and "struma", and thus they get inserted in the
> synset of "king". *There 1382 such entries, in wn_s.pl* and more in other
> WordNet
> Prolog data-base files, where such use of two consecutive single quotes
> appears.
> We have resolved this by adding a statement in the line parsing portion of
> Syns2Index.java, as follows:
>// parse line
>line = line.substring(2);
>   * line = line.replaceAll("\'\'", "`"); // added statement*
>int comma = line.indexOf(',');
>String num = line.substring(0, comma);  ... ... etc.
> In short we replace "''" by "`" (a back-quote). Then on recreating the
> index, we get:
> java SynLookup zwnindex king
> baron
> magnate
> mogul
> power
> queen
> rex
> tycoon
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2002:
---

Fix Version/s: (was: 3.0)
   (was: 2.9)
   2.9.1

> Add oal.util.Version ctor to QueryParser
> 
>
> Key: LUCENE-2002
> URL: https://issues.apache.org/jira/browse/LUCENE-2002
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
> Fix For: 2.9.1
>
>
> This is a followup of LUCENE-1987:
> If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
> QueryParser, phrase queries will not work, because the StopFilter enables 
> position Increments for stop words, but QueryParser ignores them per default. 
> The user has to explicitely enable them.
> This issue would add a ctor taking the Version constant and automatically 
> enable this setting. The same applies to the contrib queryparser. Eventually 
> also StopAnalyzer should add this version ctor.
> To be able to remove the default ctor for 3.0 (to remove a possible trap for 
> users of QueryParser), it must be deprecated and the new one also added to 
> 2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768294#action_12768294
 ] 

Michael McCandless commented on LUCENE-2002:


Uwe I can take this if you want?  Have you started?

> Add oal.util.Version ctor to QueryParser
> 
>
> Key: LUCENE-2002
> URL: https://issues.apache.org/jira/browse/LUCENE-2002
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
> Fix For: 2.9.1
>
>
> This is a followup of LUCENE-1987:
> If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
> QueryParser, phrase queries will not work, because the StopFilter enables 
> position Increments for stop words, but QueryParser ignores them per default. 
> The user has to explicitely enable them.
> This issue would add a ctor taking the Version constant and automatically 
> enable this setting. The same applies to the contrib queryparser. Eventually 
> also StopAnalyzer should add this version ctor.
> To be able to remove the default ctor for 3.0 (to remove a possible trap for 
> users of QueryParser), it must be deprecated and the new one also added to 
> 2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768295#action_12768295
 ] 

Uwe Schindler commented on LUCENE-2002:
---

Take it! I haven't started.

> Add oal.util.Version ctor to QueryParser
> 
>
> Key: LUCENE-2002
> URL: https://issues.apache.org/jira/browse/LUCENE-2002
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
> Fix For: 2.9.1
>
>
> This is a followup of LUCENE-1987:
> If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
> QueryParser, phrase queries will not work, because the StopFilter enables 
> position Increments for stop words, but QueryParser ignores them per default. 
> The user has to explicitely enable them.
> This issue would add a ctor taking the Version constant and automatically 
> enable this setting. The same applies to the contrib queryparser. Eventually 
> also StopAnalyzer should add this version ctor.
> To be able to remove the default ctor for 3.0 (to remove a possible trap for 
> users of QueryParser), it must be deprecated and the new one also added to 
> 2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768301#action_12768301
 ] 

Uwe Schindler commented on LUCENE-2002:
---

During 1987, I also found a bug in Highlighter, which is also not able to 
handle the posIncr of stopwords correctly. Add another issue?

> Add oal.util.Version ctor to QueryParser
> 
>
> Key: LUCENE-2002
> URL: https://issues.apache.org/jira/browse/LUCENE-2002
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
> Fix For: 2.9.1
>
>
> This is a followup of LUCENE-1987:
> If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
> QueryParser, phrase queries will not work, because the StopFilter enables 
> position Increments for stop words, but QueryParser ignores them per default. 
> The user has to explicitely enable them.
> This issue would add a ctor taking the Version constant and automatically 
> enable this setting. The same applies to the contrib queryparser. Eventually 
> also StopAnalyzer should add this version ctor.
> To be able to remove the default ctor for 3.0 (to remove a possible trap for 
> users of QueryParser), it must be deprecated and the new one also added to 
> 2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread DM Smith (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768304#action_12768304
 ] 

DM Smith commented on LUCENE-1998:
--

bq. changing the order of enum constants is bad, you should always add them at 
the end
Is this true?

I did not know how Java serializes enums so I went looking:
See: http://java.sun.com/j2se/1.5.0/docs/guide/serialization/relnotes15.html

Turns out it serializes the text representation of the enum constant and class 
info. This is just like the Parameter class.

If I understand it correctly, with this, an enum is resilient to changes in 
order. New constants can go in any place (for example, we can later add 
LUCENE_291 before LUCENE_30) and not break serialization compatibility.

This is especially good for the future as it allows a path for deprecations. 
(E.g. deprecation of o.a.l.d.Field.Index.COMPRESS)

So having LUCENE_CURRENT at the end is fine.

If we wanted it first (or anywhere else) we could have onOrAfter to be:
public boolean onOrAfter(Version other) { return other == LUCENE_CURRENT || 
compareTo(other) >= 0; }

If we wanted to expose version numbering info in the future, I'd suggest the 
following pattern (names are unimportant):
LUCENE_29 {
   public int getMajor() { return 2; }
   public int getMinor() { return 9; }
   public int getFix()  { return 0; }
}
because it does not require storage and unlike "2900" does not have positional 
notation meaning (PIC code), e.g. public int getMajor() { return 
int(2900/1000); }

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch, LUCENE-1998_enum.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-2002:
--

Assignee: Michael McCandless

> Add oal.util.Version ctor to QueryParser
> 
>
> Key: LUCENE-2002
> URL: https://issues.apache.org/jira/browse/LUCENE-2002
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 2.9.1
>
>
> This is a followup of LUCENE-1987:
> If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
> QueryParser, phrase queries will not work, because the StopFilter enables 
> position Increments for stop words, but QueryParser ignores them per default. 
> The user has to explicitely enable them.
> This issue would add a ctor taking the Version constant and automatically 
> enable this setting. The same applies to the contrib queryparser. Eventually 
> also StopAnalyzer should add this version ctor.
> To be able to remove the default ctor for 3.0 (to remove a possible trap for 
> users of QueryParser), it must be deprecated and the new one also added to 
> 2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768308#action_12768308
 ] 

Michael McCandless commented on LUCENE-2002:


bq. Add another issue?

+1!

> Add oal.util.Version ctor to QueryParser
> 
>
> Key: LUCENE-2002
> URL: https://issues.apache.org/jira/browse/LUCENE-2002
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 2.9.1
>
>
> This is a followup of LUCENE-1987:
> If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
> QueryParser, phrase queries will not work, because the StopFilter enables 
> position Increments for stop words, but QueryParser ignores them per default. 
> The user has to explicitely enable them.
> This issue would add a ctor taking the Version constant and automatically 
> enable this setting. The same applies to the contrib queryparser. Eventually 
> also StopAnalyzer should add this version ctor.
> To be able to remove the default ctor for 3.0 (to remove a possible trap for 
> users of QueryParser), it must be deprecated and the new one also added to 
> 2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1998:
--

Attachment: LUCENE-1998_enum_BW.patch
LUCENE-1998_enum.patch

I changed the Version enum. All test still pass. I also added a test for the 
backwards branch that tests, that the transition from Parameter -> enum is 
binary compatible and supported by Java's linker.

I will commit soon.

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum_BW.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2001) wordnet parsing bug

2009-10-21 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2001:


Lucene Fields: [New, Patch Available]  (was: [New])
Fix Version/s: 3.0
   2.9.1

Committed revision 828091 to trunk.

I set fix for 2.9.1 here, in case someone has some free time to commit the 
patch.

Thanks Parag! 

> wordnet parsing bug
> ---
>
> Key: LUCENE-2001
> URL: https://issues.apache.org/jira/browse/LUCENE-2001
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 2.9
>Reporter: Robert Muir
>Priority: Minor
> Fix For: 2.9.1, 3.0
>
> Attachments: LUCENE-2001.patch, LUCENE-2001_branch.patch, 
> LUCENE-2001_branch.patch
>
>
> A user reported that wordnet parses the prolog file incorrectly.
> Also need to check the wordnet parser in the memory contrib for this problem.
> If this is a false alarm, i'm not worried, because the test will be the first 
> unit test wordnet package ever had.
> {noformat}
> For example, looking up the synsets for the
> word "king", we get:
> java SynLookup wnindex king
> baron
> magnate
> mogul
> power
> queen
> rex
> scrofula
> struma
> tycoon
> Here, "scrofula" and "struma" are extraneous. This happens because, the line
> parser code in Syns2Index.java interpretes the two consecutive single quotes
> in entry s(114144247,3,'king''s evil',n,1,1) in  wn_s.pl file, as
> termination
> of the string and separates into "king". This entry concerns
> synset of words "scrofula" and "struma", and thus they get inserted in the
> synset of "king". *There 1382 such entries, in wn_s.pl* and more in other
> WordNet
> Prolog data-base files, where such use of two consecutive single quotes
> appears.
> We have resolved this by adding a statement in the line parsing portion of
> Syns2Index.java, as follows:
>// parse line
>line = line.substring(2);
>   * line = line.replaceAll("\'\'", "`"); // added statement*
>int comma = line.indexOf(',');
>String num = line.substring(0, comma);  ... ... etc.
> In short we replace "''" by "`" (a back-quote). Then on recreating the
> index, we get:
> java SynLookup zwnindex king
> baron
> magnate
> mogul
> power
> queen
> rex
> tycoon
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2001) wordnet parsing bug

2009-10-21 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2001:


Attachment: LUCENE-2001_branch.patch

updated patch for the branch, i forget about String.replace(String,String) 
being java 5 only... sorry guys.

> wordnet parsing bug
> ---
>
> Key: LUCENE-2001
> URL: https://issues.apache.org/jira/browse/LUCENE-2001
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 2.9
>Reporter: Robert Muir
>Priority: Minor
> Attachments: LUCENE-2001.patch, LUCENE-2001_branch.patch, 
> LUCENE-2001_branch.patch
>
>
> A user reported that wordnet parses the prolog file incorrectly.
> Also need to check the wordnet parser in the memory contrib for this problem.
> If this is a false alarm, i'm not worried, because the test will be the first 
> unit test wordnet package ever had.
> {noformat}
> For example, looking up the synsets for the
> word "king", we get:
> java SynLookup wnindex king
> baron
> magnate
> mogul
> power
> queen
> rex
> scrofula
> struma
> tycoon
> Here, "scrofula" and "struma" are extraneous. This happens because, the line
> parser code in Syns2Index.java interpretes the two consecutive single quotes
> in entry s(114144247,3,'king''s evil',n,1,1) in  wn_s.pl file, as
> termination
> of the string and separates into "king". This entry concerns
> synset of words "scrofula" and "struma", and thus they get inserted in the
> synset of "king". *There 1382 such entries, in wn_s.pl* and more in other
> WordNet
> Prolog data-base files, where such use of two consecutive single quotes
> appears.
> We have resolved this by adding a statement in the line parsing portion of
> Syns2Index.java, as follows:
>// parse line
>line = line.substring(2);
>   * line = line.replaceAll("\'\'", "`"); // added statement*
>int comma = line.indexOf(',');
>String num = line.substring(0, comma);  ... ... etc.
> In short we replace "''" by "`" (a back-quote). Then on recreating the
> index, we get:
> java SynLookup zwnindex king
> baron
> magnate
> mogul
> power
> queen
> rex
> tycoon
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2001) wordnet parsing bug

2009-10-21 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2001:


Attachment: LUCENE-2001_branch.patch

patch for the 2.9 branch (same just without java 5 constructs).

I will commit the one to trunk shortly, can someone help with this one, if we 
think it should be fixed in 2.9.1 also?

> wordnet parsing bug
> ---
>
> Key: LUCENE-2001
> URL: https://issues.apache.org/jira/browse/LUCENE-2001
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 2.9
>Reporter: Robert Muir
>Priority: Minor
> Attachments: LUCENE-2001.patch, LUCENE-2001_branch.patch
>
>
> A user reported that wordnet parses the prolog file incorrectly.
> Also need to check the wordnet parser in the memory contrib for this problem.
> If this is a false alarm, i'm not worried, because the test will be the first 
> unit test wordnet package ever had.
> {noformat}
> For example, looking up the synsets for the
> word "king", we get:
> java SynLookup wnindex king
> baron
> magnate
> mogul
> power
> queen
> rex
> scrofula
> struma
> tycoon
> Here, "scrofula" and "struma" are extraneous. This happens because, the line
> parser code in Syns2Index.java interpretes the two consecutive single quotes
> in entry s(114144247,3,'king''s evil',n,1,1) in  wn_s.pl file, as
> termination
> of the string and separates into "king". This entry concerns
> synset of words "scrofula" and "struma", and thus they get inserted in the
> synset of "king". *There 1382 such entries, in wn_s.pl* and more in other
> WordNet
> Prolog data-base files, where such use of two consecutive single quotes
> appears.
> We have resolved this by adding a statement in the line parsing portion of
> Syns2Index.java, as follows:
>// parse line
>line = line.substring(2);
>   * line = line.replaceAll("\'\'", "`"); // added statement*
>int comma = line.indexOf(',');
>String num = line.substring(0, comma);  ... ... etc.
> In short we replace "''" by "`" (a back-quote). Then on recreating the
> index, we get:
> java SynLookup zwnindex king
> baron
> magnate
> mogul
> power
> queen
> rex
> tycoon
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1998:
--

Attachment: LUCENE-1998_enum_BW.patch

Better BW test

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum_BW.patch, LUCENE-1998_enum_BW.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1998) Use Java 5 enums

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1998.
---

Resolution: Fixed

Committed revision: 828156

Thanks DM Smith!

> Use Java 5 enums
> 
>
> Key: LUCENE-1998
> URL: https://issues.apache.org/jira/browse/LUCENE-1998
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0
>Reporter: DM Smith
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, LUCENE-1998_enum.patch, 
> LUCENE-1998_enum_BW.patch, LUCENE-1998_enum_BW.patch
>
>
> Replace the use of o.a.l.util.Parameter with Java 5 enums, deprecating 
> Parameter.
> Replace other custom enum patterns with Java 5 enums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2003) Highlighter ahs problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on

2009-10-21 Thread Uwe Schindler (JIRA)

Highlighter ahs problems when you use StandardAnalyzer with LUCENE_29 or 
simplier StopFilter with stopWordsPosIncr mode switched on
---

 Key: LUCENE-2003
 URL: https://issues.apache.org/jira/browse/LUCENE-2003
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9, 3.0
Reporter: Uwe Schindler
 Fix For: 2.9.1, 3.0


This is a followup on LUCENE-1987:

If you set in HighligterTest the constant static final Version TEST_VERSION = 
Version.LUCENE_24 to LUCENE_29 or LUCENE_CURRENT, the test 
testSimpleQueryScorerPhraseHighlighting fails. Please note, that currently 
(before LUCENE-2002 is fixed), you must also set the QueryParser to respect 
posIncr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768393#action_12768393
 ] 

Uwe Schindler commented on LUCENE-2002:
---

Issue created!

> Add oal.util.Version ctor to QueryParser
> 
>
> Key: LUCENE-2002
> URL: https://issues.apache.org/jira/browse/LUCENE-2002
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 2.9.1
>
>
> This is a followup of LUCENE-1987:
> If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
> QueryParser, phrase queries will not work, because the StopFilter enables 
> position Increments for stop words, but QueryParser ignores them per default. 
> The user has to explicitely enable them.
> This issue would add a ctor taking the Version constant and automatically 
> enable this setting. The same applies to the contrib queryparser. Eventually 
> also StopAnalyzer should add this version ctor.
> To be able to remove the default ctor for 3.0 (to remove a possible trap for 
> users of QueryParser), it must be deprecated and the new one also added to 
> 2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2003) Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2003:
--

Summary: Highlighter has problems when you use StandardAnalyzer with 
LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on  (was: 
Highlighter ahs problems when you use StandardAnalyzer with LUCENE_29 or 
simplier StopFilter with stopWordsPosIncr mode switched on)

> Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or 
> simplier StopFilter with stopWordsPosIncr mode switched on
> ---
>
> Key: LUCENE-2003
> URL: https://issues.apache.org/jira/browse/LUCENE-2003
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
> Fix For: 2.9.1, 3.0
>
>
> This is a followup on LUCENE-1987:
> If you set in HighligterTest the constant static final Version TEST_VERSION = 
> Version.LUCENE_24 to LUCENE_29 or LUCENE_CURRENT, the test 
> testSimpleQueryScorerPhraseHighlighting fails. Please note, that currently 
> (before LUCENE-2002 is fixed), you must also set the QueryParser to respect 
> posIncr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2002:
---

Attachment: LUCENE-2002-29.patch

Attached patch, for 2.9..x

I added required Version param to QueryParser, MultiFieldQueryParser
and ComplexPhraseQueryParser (contrib), which enable position
increments when matchVersion >= LUCENE_19.

For the deprecated ctors it defaults to Version.LUCENE_24 for back
compat.

Unfortunately, JavaCC generates two public ctors for QueryParser (one taking
CharStream, another taking QueryParserTokenManager) that I don't know
how to override to take a Version param.

> Add oal.util.Version ctor to QueryParser
> 
>
> Key: LUCENE-2002
> URL: https://issues.apache.org/jira/browse/LUCENE-2002
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 2.9.1
>
> Attachments: LUCENE-2002-29.patch
>
>
> This is a followup of LUCENE-1987:
> If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
> QueryParser, phrase queries will not work, because the StopFilter enables 
> position Increments for stop words, but QueryParser ignores them per default. 
> The user has to explicitely enable them.
> This issue would add a ctor taking the Version constant and automatically 
> enable this setting. The same applies to the contrib queryparser. Eventually 
> also StopAnalyzer should add this version ctor.
> To be able to remove the default ctor for 3.0 (to remove a possible trap for 
> users of QueryParser), it must be deprecated and the new one also added to 
> 2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2003) Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on

2009-10-21 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-2003:
--

Assignee: Michael McCandless

> Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or 
> simplier StopFilter with stopWordsPosIncr mode switched on
> ---
>
> Key: LUCENE-2003
> URL: https://issues.apache.org/jira/browse/LUCENE-2003
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 2.9.1, 3.0
>
>
> This is a followup on LUCENE-1987:
> If you set in HighligterTest the constant static final Version TEST_VERSION = 
> Version.LUCENE_24 to LUCENE_29 or LUCENE_CURRENT, the test 
> testSimpleQueryScorerPhraseHighlighting fails. Please note, that currently 
> (before LUCENE-2002 is fixed), you must also set the QueryParser to respect 
> posIncr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768459#action_12768459
 ] 

Robert Muir commented on LUCENE-2002:
-

Mike, saw a couple of these and laughed a little :)

@param matchVersion Lucene version to *patch*; this is passed through to 
QueryParser.


> Add oal.util.Version ctor to QueryParser
> 
>
> Key: LUCENE-2002
> URL: https://issues.apache.org/jira/browse/LUCENE-2002
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 2.9.1
>
> Attachments: LUCENE-2002-29.patch
>
>
> This is a followup of LUCENE-1987:
> If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
> QueryParser, phrase queries will not work, because the StopFilter enables 
> position Increments for stop words, but QueryParser ignores them per default. 
> The user has to explicitely enable them.
> This issue would add a ctor taking the Version constant and automatically 
> enable this setting. The same applies to the contrib queryparser. Eventually 
> also StopAnalyzer should add this version ctor.
> To be able to remove the default ctor for 3.0 (to remove a possible trap for 
> users of QueryParser), it must be deprecated and the new one also added to 
> 2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser

2009-10-21 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768467#action_12768467
 ] 

Michael McCandless commented on LUCENE-2002:


Eek!  My fingers are doing the thinking, apparently :)  Been typing that word a 
bit too much!!  I'll fix.  Thanks.

> Add oal.util.Version ctor to QueryParser
> 
>
> Key: LUCENE-2002
> URL: https://issues.apache.org/jira/browse/LUCENE-2002
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9, 3.0
>Reporter: Uwe Schindler
>Assignee: Michael McCandless
> Fix For: 2.9.1
>
> Attachments: LUCENE-2002-29.patch
>
>
> This is a followup of LUCENE-1987:
> If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses 
> QueryParser, phrase queries will not work, because the StopFilter enables 
> position Increments for stop words, but QueryParser ignores them per default. 
> The user has to explicitely enable them.
> This issue would add a ctor taking the Version constant and automatically 
> enable this setting. The same applies to the contrib queryparser. Eventually 
> also StopAnalyzer should add this version ctor.
> To be able to remove the default ctor for 3.0 (to remove a possible trap for 
> users of QueryParser), it must be deprecated and the new one also added to 
> 2.9.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Build failed in Hudson: Lucene-trunk #986

2009-10-21 Thread Apache Hudson Server

See 

Changes:

[uschindler] remove unneeded import

[uschindler] LUCENE-1998: Parameter -> Java 5 enum transition

[rmuir] LUCENE-2001: Fix parsing bug in wordnet contrib

[uschindler] Add varargs to MultiSearcher

[uschindler] Fix test failure because of wrong cast. Hard stuff :( Could be 
implemented better, the hq is used for 2 different types

[uschindler] LUCENE-1257: Remove the rest of unchecked warnings and some 
unneeded casts. I added a TODO, where I do not understand the code and not for 
sure know, whats inside the collections. This could be fixed some time later. 
But the core code now compiles without any unchecked warning.

[uschindler] LUCENE-1987: Remove rest of analysis deprecations 
(StandardAnalyzer, StopAnalyzer)

--
[...truncated 16035 lines...]
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 2.643 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSort
[junit] Tests run: 22, Failures: 0, Errors: 0, Time elapsed: 9.88 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSpanQueryFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.66 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestStressSort
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.973 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermRangeFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 6.51 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermRangeQuery
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.968 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermScorer
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.761 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermVectors
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 3.004 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestThreadSafe
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.951 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 8.635 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.565 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.637 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestWildcard
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.751 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 146.423 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestDocValues
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.309 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery
[junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 3.117 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestOrdValues
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.529 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestValueSource
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.542 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadNearQuery
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.331 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadTermQuery
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 7.347 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestBasics
[junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 38.507 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestFieldMaskingSpanQuery
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 7.041 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestNearSpansOrdered
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 1.397 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestPayloadSpans
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 4.653 sec
[junit] 
[junit] - Standard Output ---
[junit] 
[junit] Spans Dump --
[junit] payloads for span:2
[junit] doc:0 s:3 e:6 three:Noise:5
[junit] doc:0 s:3 e:6 one:Entity:3
[junit] 
[junit] Spans Dump --
[junit] payloads for span:3
[junit] doc:0 s:0 e:3 xx:Entity:0
[junit] doc:0 s:0 e:3 rr:Noise:1
[junit] doc:0 s:0 e:3 yy:Noise:2
[junit] 
[jun

[jira] Updated: (LUCENE-1359) FrenchAnalyzer's tokenStream method does not honour the contract of Analyzer

2009-10-21 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1359:


Lucene Fields: [New, Patch Available]  (was: [New])
Fix Version/s: 3.0
 Assignee: Robert Muir

> FrenchAnalyzer's tokenStream method does not honour the contract of Analyzer
> 
>
> Key: LUCENE-1359
> URL: https://issues.apache.org/jira/browse/LUCENE-1359
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 2.2
>Reporter: Andrew Lynch
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1359.patch
>
>
> In {{Analyzer}} :
> {code}
> /** Creates a TokenStream which tokenizes all the text in the provided
> Reader.  Default implementation forwards to tokenStream(Reader) for 
> compatibility with older version.  Override to allow Analyzer to choose 
> strategy based on document and/or field.  Must be able to handle null
> field name for backward compatibility. */
>   public abstract TokenStream tokenStream(String fieldName, Reader reader);
> {code}
> and in {{FrenchAnalyzer}}
> {code}
> public final TokenStream tokenStream(String fieldName, Reader reader) {
> if (fieldName == null) throw new IllegalArgumentException("fieldName must 
> not be null");
> if (reader == null) throw new IllegalArgumentException("reader must not 
> be null");
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1359) FrenchAnalyzer's tokenStream method does not honour the contract of Analyzer

2009-10-21 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-1359.
-

Resolution: Fixed

Committed revision 828298.

this inconsistency annoyed me too.

thanks Andrew!

> FrenchAnalyzer's tokenStream method does not honour the contract of Analyzer
> 
>
> Key: LUCENE-1359
> URL: https://issues.apache.org/jira/browse/LUCENE-1359
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 2.2
>Reporter: Andrew Lynch
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1359.patch
>
>
> In {{Analyzer}} :
> {code}
> /** Creates a TokenStream which tokenizes all the text in the provided
> Reader.  Default implementation forwards to tokenStream(Reader) for 
> compatibility with older version.  Override to allow Analyzer to choose 
> strategy based on document and/or field.  Must be able to handle null
> field name for backward compatibility. */
>   public abstract TokenStream tokenStream(String fieldName, Reader reader);
> {code}
> and in {{FrenchAnalyzer}}
> {code}
> public final TokenStream tokenStream(String fieldName, Reader reader) {
> if (fieldName == null) throw new IllegalArgumentException("fieldName must 
> not be null");
> if (reader == null) throw new IllegalArgumentException("reader must not 
> be null");
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1904) move wordnet based synonym code out of contrib/memory and into contrib/wordnet (or somewhere else)

2009-10-21 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1904:


Fix Version/s: 3.0
 Assignee: Robert Muir

will bring this patch up to speed. its silly to be in the memory contrib 
instead of wordnet where it belongs.

> move wordnet based synonym code out of contrib/memory and into 
> contrib/wordnet (or somewhere else)
> --
>
> Key: LUCENE-1904
> URL: https://issues.apache.org/jira/browse/LUCENE-1904
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Hoss Man
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1904.patch, LUCENE-1904.patch
>
>
> see LUCENE-387 ... some synonym related code has been living in 
> contrib/memory for a very long time ... it should be refactored out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Build failed in Hudson: Lucene-trunk #986

2009-10-21 Thread Uwe Schindler

I'll fix, my version comparison did'nt seem to work on hudson.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> From: Apache Hudson Server [mailto:hud...@hudson.zones.apache.org]
> Sent: Thursday, October 22, 2009 5:15 AM
> To: java-dev@lucene.apache.org
> Subject: Build failed in Hudson: Lucene-trunk #986
> 
> See 
> 
> Changes:
> 
> [uschindler] remove unneeded import
> 
> [uschindler] LUCENE-1998: Parameter -> Java 5 enum transition
> 
> [rmuir] LUCENE-2001: Fix parsing bug in wordnet contrib
> 
> [uschindler] Add varargs to MultiSearcher
> 
> [uschindler] Fix test failure because of wrong cast. Hard stuff :( Could
> be implemented better, the hq is used for 2 different types
> 
> [uschindler] LUCENE-1257: Remove the rest of unchecked warnings and some
> unneeded casts. I added a TODO, where I do not understand the code and not
> for sure know, whats inside the collections. This could be fixed some time
> later. But the core code now compiles without any unchecked warning.
> 
> [uschindler] LUCENE-1987: Remove rest of analysis deprecations
> (StandardAnalyzer, StopAnalyzer)
> 
> --
> [...truncated 16035 lines...]
> [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 2.643 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestSort
> [junit] Tests run: 22, Failures: 0, Errors: 0, Time elapsed: 9.88 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestSpanQueryFilter
> [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.66 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestStressSort
> [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.973 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestTermRangeFilter
> [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 6.51 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestTermRangeQuery
> [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.968 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestTermScorer
> [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.761 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestTermVectors
> [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 3.004 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestThreadSafe
> [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.951 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector
> [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 8.635 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector
> [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.565 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector
> [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.637 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.TestWildcard
> [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.751 sec
> [junit]
> [junit] Testsuite:
> org.apache.lucene.search.function.TestCustomScoreQuery
> [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 146.423
> sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.function.TestDocValues
> [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.309 sec
> [junit]
> [junit] Testsuite:
> org.apache.lucene.search.function.TestFieldScoreQuery
> [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 3.117 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.function.TestOrdValues
> [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.529 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.function.TestValueSource
> [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.542 sec
> [junit]
> [junit] Testsuite:
> org.apache.lucene.search.payloads.TestPayloadNearQuery
> [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.331 sec
> [junit]
> [junit] Testsuite:
> org.apache.lucene.search.payloads.TestPayloadTermQuery
> [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 7.347 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.spans.TestBasics
> [junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 38.507
> sec
> [junit]
> [junit] Testsuite:
> org.apache.lucene.search.spans.TestFieldMaskingSpanQuery
> [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 7.041 sec
> [junit]
> [junit] Testsuite: org.apache.lucene.search.spans.TestNearSpansOrdered
> [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 1.397 sec
> [junit]
> [junit] Testsuite: org.

Re: lucene 2.9 sorting algorithm

2009-10-21 Thread John Wang

Hi Mike:
 I have been playing with the patch, and I think I have some information
that you might like.

 Let me spend sometime and gather some more numbers and update in jira.

Thanks

btw:

 About the conversion on multi values fields, I am not sure I get it
(sorry for being ignorant):

 say bottom has ords 23, 45, 76, each corresponding to a string. When
moving to the next segment, you need to make bottom to have ords that can be
comparable to other docs in this new segment, so you would need to find the
new ords for the values in 23,45 and 76, don't you? To find it, assuming the
values are s1,s2,s3, you would do a bin. search on the new val array, and
find index for s1,s2,s3. Which is 3 bin searches per convert, I am not sure
how you can short circuit it. Are you suggesting we call Comparable on
compareBottom until some doc beats it? That would hurt performance I lot
though, no?

-John

On Wed, Oct 21, 2009 at 3:11 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Tue, Oct 20, 2009 at 11:55 AM, John Wang  wrote:
>
> > the simpler api places less restriction on the type of custom
> > sorting that can be done.
>
> Just to verify: this is not a back-compat break, right?
>
> Because, in 2.4, such an interesting custom sort must've been
> operating at the top-level index reader level, which is easy to carry
> over to 2.9 (you just rebase the docIDs).
>
> But, of course in moving to 2.9, you would like to also switch your
> custom sort to be per-segment (for faster reopen/near real-time perf),
> but the new sort API makes this more difficult because it requires
> that you are able to compare hits across different segments during the
> search, not just at the end.
>
> But then I don't understand the difficulty of doing that: if we had a
> Collector with the MultiPQ approach, at the end during merge, you'd
> also have to compare results across segments, ie, upgrade your ords to
> their real values.  The MultiPQ approach does this by calling
> sortValue (returns Comparable) in the end.
>
> Putting performance aside for now... when comparing bottom, you don't
> actually have to "truly invert" Comparable -> ord on segment
> transition.  You could, instead, get the Comparable for each and
> compare, but then note the smallest ord for the current segment that
> has failed to compete, and short-ciruit the compareBottom test by
> checking against that ord. That should enable carrying over the custom
> sort to the single PQ API without needing invert ord->value.
>
> We'd obviously have to test performance...
>
> Or, we could commit the MultiPQ approach as another sorting collector?
> I know it's not great having two wildly differenet sort APIs, but both
> APIs seem to have their strengths in different cases.
>
> Mike
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

[jira] Created: (LUCENE-2004) Constants.LUCENE_MAIN_VERSION is inlined in code compiled against Lucene JAR, so version detection is incorrect

2009-10-21 Thread Uwe Schindler (JIRA)

Constants.LUCENE_MAIN_VERSION is inlined in code compiled against Lucene JAR, 
so version detection is incorrect
---

 Key: LUCENE-2004
 URL: https://issues.apache.org/jira/browse/LUCENE-2004
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.1, 3.0


When you compile your own code against the Lucene 2.9 version of the JARs and 
use the LUCENE_MAIN_VERSION constant and then run the code against the 3.0 JAR, 
the constant still contains 2.9, because javac inlines primitives and Strings 
into the class files if they are public static final and are generated by a 
constant (not method).

The attached fix will fix this by using a ident(String) functions that return 
the String itsself to prevent this inlining.

Will apply to 2.9, trunk and 2.9 BW branch. No I can also reenable one test I 
removed because of this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2004) Constants.LUCENE_MAIN_VERSION is inlined in code compiled against Lucene JAR, so version detection is incorrect

2009-10-21 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2004:
--

Attachment: LUCENE-2004.patch

See also: http://www.javaworld.com/community/node/3400

> Constants.LUCENE_MAIN_VERSION is inlined in code compiled against Lucene JAR, 
> so version detection is incorrect
> ---
>
> Key: LUCENE-2004
> URL: https://issues.apache.org/jira/browse/LUCENE-2004
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9.1, 3.0
>
> Attachments: LUCENE-2004.patch
>
>
> When you compile your own code against the Lucene 2.9 version of the JARs and 
> use the LUCENE_MAIN_VERSION constant and then run the code against the 3.0 
> JAR, the constant still contains 2.9, because javac inlines primitives and 
> Strings into the class files if they are public static final and are 
> generated by a constant (not method).
> The attached fix will fix this by using a ident(String) functions that return 
> the String itsself to prevent this inlining.
> Will apply to 2.9, trunk and 2.9 BW branch. No I can also reenable one test I 
> removed because of this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

63 matches

Mail list logo