[jira] Commented: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764431#action_12764431
 ] 

Uwe Schindler commented on LUCENE-1946:
---

+1 from me too. The patch needs some more work, as I stopped somewhere in 
contrib to remove, as it was not clear to remove it for 3.0I will do the rest 
now. They only problem is InstantiatedIndexWriter, which still uses the old API 
to consume. I will try to modify it. But maybe this contrib will get removed 
(see other issue LUCENE-1948).

So I will find out what to do.

> Remove deprecated TokenStream API
> -
>
> Key: LUCENE-1946
> URL: https://issues.apache.org/jira/browse/LUCENE-1946
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis, contrib/analyzers
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1946.patch, LUCENE-1946.patch
>
>
> I looked into clover analysis: It seems to be no longer used since I removed 
> the tests yesterday - I am happy!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: 3.0 release date

2009-10-11 Thread Michael McCandless
+1

Mike

On Sun, Oct 11, 2009 at 1:23 AM, Michael Busch  wrote:
> Hi all,
>
> I was wondering what our 3.0 release target date is? I think the outstanding
> issues are removal of the deprecated APIs and more Java 1.5 updates
> (especially adding generics to public APIs). Should we try to get it out
> early November, in about 3-4 weeks?
>
>  Michael
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Basem Narmok (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Basem Narmok updated LUCENE-1966:
-

Attachment: LUCENE-1966.patch

Robert, you are correct, to solve the problem we have two options: 
1- to remove words like علي and وفي
2- to use unnormalized stiowirds list, before the normalization filter.

I think the best is the second option, so this patch only modifies the list 
(unnormalized), please try it.

> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: 3.0 release date

2009-10-11 Thread Uwe Schindler
+1. Adding generics internally should be left for later work. Especially
some big Eclipse/IDEA-related code refactoring (see other thread) as
proposed by Earwin and Timo Nentwig should be left for later code to get 3.0
out as early as posible.

For users, generification of public APIs should be really done for 3.0.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael Busch [mailto:busch...@gmail.com]
> Sent: Sunday, October 11, 2009 7:23 AM
> To: java-dev@lucene.apache.org
> Subject: 3.0 release date
> 
> Hi all,
> 
> I was wondering what our 3.0 release target date is? I think the
> outstanding issues are removal of the deprecated APIs and more Java 1.5
> updates (especially adding generics to public APIs). Should we try to
> get it out early November, in about 3-4 weeks?
> 
>   Michael
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764449#action_12764449
 ] 

Robert Muir commented on LUCENE-1966:
-

Basem, thanks. I like the new list.

I have one very minor question: in the list we have أيضا / ايضا twice.

I wanted to check with you, is this by accident or did you have some other 
spellings in mind?

If it is by accident, let me know, I can just remove the duplicates before 
committing.

> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Change HashSet to Set in WordlistLoader - BackCompat Issue

2009-10-11 Thread Robert Muir
Simon, so I don't forget, we also have a custom WordListLoader in
org.apache.lucene.analysis.nl that we can delete for 3.0 (it is deprecated)

For your question though, maybe one idea is to return HashSet/HashMap but
with a comment saying the return value will change to Set/Map in 3.1?
If the user reads this, and treats it as an interface in their code: Map x =
WordListLoader.foo(), would their code still work in 3.1... would they need
to recompile?

On Sat, Oct 10, 2009 at 3:26 PM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:

> Hey there,
>
> in the context of https://issues.apache.org/jira/browse/LUCENE-1967 I
> was looking at org.apache.lucene.analysis.WordlistLoader. I noticed
> that all static methods return HashSet / HashMap instead of their
> corresponding interfaces Map / Set. As I'm going to add another static
> helper to this class I was wondering if we should change the concrete
> impl. to interface return values.
> I guess this would break back-compat. for 3.0 so would it make sense
> to mark the current static methods as deprecated and add new ones or
> should we just keep it as it is while not being best practice though.
>
> simon
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com


Re: Change HashSet to Set in WordlistLoader - BackCompat Issue

2009-10-11 Thread Simon Willnauer
On Sun, Oct 11, 2009 at 3:48 PM, Robert Muir  wrote:
> Simon, so I don't forget, we also have a custom WordListLoader in
> org.apache.lucene.analysis.nl that we can delete for 3.0 (it is deprecated)
Saw that already -- thanks for pointing it out again
>
> For your question though, maybe one idea is to return HashSet/HashMap but
> with a comment saying the return value will change to Set/Map in 3.1?
> If the user reads this, and treats it as an interface in their code: Map x =
> WordListLoader.foo(), would their code still work in 3.1... would they need
> to recompile?
If they would use the Interface in the assignment they could just drop
in though.
I would also leave it like it is and change it later though. Its a
very minor case and for that reason simply a style issue.
I will add a comment that it can change in future releases.
Everything else would introduce too many problems for this minor change though.

simon
>
> On Sat, Oct 10, 2009 at 3:26 PM, Simon Willnauer
>  wrote:
>>
>> Hey there,
>>
>> in the context of https://issues.apache.org/jira/browse/LUCENE-1967 I
>> was looking at org.apache.lucene.analysis.WordlistLoader. I noticed
>> that all static methods return HashSet / HashMap instead of their
>> corresponding interfaces Map / Set. As I'm going to add another static
>> helper to this class I was wondering if we should change the concrete
>> impl. to interface return values.
>> I guess this would break back-compat. for 3.0 so would it make sense
>> to mark the current static methods as deprecated and add new ones or
>> should we just keep it as it is while not being best practice though.
>>
>> simon
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Basem Narmok (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764456#action_12764456
 ] 

Basem Narmok commented on LUCENE-1966:
--

Hi Robert,

Regarding ايضا / أيضا ...

No, not by accident, I included both formats (normalized,unnormalized). Arabic 
users tend to use both on the internet (different spellings), another example 
is words like أي / اي

> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764462#action_12764462
 ] 

Robert Muir commented on LUCENE-1966:
-

Basem, I meant: there are two entries for أيضا , and two entries for ايضا 
(total of four)


> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: svn commit: r824015 - in /lucene/java/trunk/src/java/org/apache/lucene/util: DocIdBitSet.java OpenBitSetIterator.java SortedVIntList.java

2009-10-11 Thread Uwe Schindler
Hi Michael,

this patch leads to failures in contrib/queries. We should first also remove
the methods in DocIdSetIterator and make the new ones abstract. Because of
this we get no compile errors, instead UnsupportedOperationExceptions at
some places. When the deprecated methods are removed everywhere, we know
exactly what is needed to be implemented different.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: busc...@apache.org [mailto:busc...@apache.org]
> Sent: Sunday, October 11, 2009 6:00 AM
> To: java-comm...@lucene.apache.org
> Subject: svn commit: r824015 - in
> /lucene/java/trunk/src/java/org/apache/lucene/util: DocIdBitSet.java
> OpenBitSetIterator.java SortedVIntList.java
> 
> Author: buschmi
> Date: Sun Oct 11 04:00:02 2009
> New Revision: 824015
> 
> URL: http://svn.apache.org/viewvc?rev=824015&view=rev
> Log:
> Remove deprecated metthods in DocIdBitSet.
> 
> Modified:
> lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java
> 
> lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
> lucene/java/trunk/src/java/org/apache/lucene/util/SortedVIntList.java
> 
> Modified:
> lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java
> URL:
> http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/
> util/DocIdBitSet.java?rev=824015&r1=824014&r2=824015&view=diff
> ==
> 
> --- lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java
> (original)
> +++ lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java Sun
> Oct 11 04:00:02 2009
> @@ -56,22 +56,10 @@
>this.docId = -1;
>  }
> 
> -/** @deprecated use {...@link #docID()} instead. */
> -public int doc() {
> -  assert docId != -1;
> -  return docId;
> -}
> -
>  public int docID() {
>return docId;
>  }
> 
> -/** @deprecated use {...@link #nextDoc()} instead. */
> -public boolean next() {
> -  // (docId + 1) on next line requires -1 initial value for docNr:
> -  return nextDoc() != NO_MORE_DOCS;
> -}
> -
>  public int nextDoc() {
>// (docId + 1) on next line requires -1 initial value for docNr:
>int d = bitSet.nextSetBit(docId + 1);
> @@ -80,11 +68,6 @@
>return docId;
>  }
> 
> -/** @deprecated use {...@link #advance(int)} instead. */
> -public boolean skipTo(int skipDocNr) {
> -  return advance(skipDocNr) != NO_MORE_DOCS;
> -}
> -
>  public int advance(int target) {
>int d = bitSet.nextSetBit(target);
>// -1 returned by BitSet.nextSetBit() when exhausted
> 
> Modified:
> lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
> URL:
> http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/
> util/OpenBitSetIterator.java?rev=824015&r1=824014&r2=824015&view=diff
> ==
> 
> ---
> lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
> (original)
> +++
> lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
> Sun Oct 11 04:00:02 2009
> @@ -128,11 +128,6 @@
>}
>**/
> 
> -  /** @deprecated use {...@link #nextDoc()} instead. */
> -  public boolean next() {
> -return nextDoc() != NO_MORE_DOCS;
> -  }
> -
>public int nextDoc() {
>  if (indexArray == 0) {
>if (word != 0) {
> @@ -160,11 +155,6 @@
>  return curDocId = (i<<6) + bitIndex;
>}
> 
> -  /** @deprecated use {...@link #advance(int)} instead. */
> -  public boolean skipTo(int target) {
> -return advance(target) != NO_MORE_DOCS;
> -  }
> -
>public int advance(int target) {
>  indexArray = 0;
>  i = target >> 6;
> @@ -195,11 +185,6 @@
>  return curDocId = (i<<6) + bitIndex;
>}
> 
> -  /** @deprecated use {...@link #docID()} instead. */
> -  public int doc() {
> -return curDocId;
> -  }
> -
>public int docID() {
>  return curDocId;
>}
> 
> Modified:
> lucene/java/trunk/src/java/org/apache/lucene/util/SortedVIntList.java
> URL:
> http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/
> util/SortedVIntList.java?rev=824015&r1=824014&r2=824015&view=diff
> ==
> 
> --- lucene/java/trunk/src/java/org/apache/lucene/util/SortedVIntList.java
> (original)
> +++ lucene/java/trunk/src/java/org/apache/lucene/util/SortedVIntList.java
> Sun Oct 11 04:00:02 2009
> @@ -204,18 +204,10 @@
>  }
>}
> 
> -  /** @deprecated use {...@link #docID()} instead. */
> -  public int doc() {return lastInt;}
> -
>public int docID() {
>  return doc;
>}
> 
> -  /** @deprecated use {...@link #nextDoc()} instead. */
> -  public boolean next() {
> -return nextDoc() != NO_MORE_DOC

[jira] Issue Comment Edited: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764462#action_12764462
 ] 

Robert Muir edited comment on LUCENE-1966 at 10/11/09 8:10 AM:
---

Basem, I meant: there are two entries for أيضا , and two entries for ايضا 
(total of four)

edit: here are the relevant line numbers from the new stopwords.txt:

Lines 72 and 73:
{noformat}
ايضا
أيضا
{noformat}

Lines 123 and 124:
{noformat}
ايضا
أيضا
{noformat}

  was (Author: rcmuir):
Basem, I meant: there are two entries for أيضا , and two entries for ايضا 
(total of four)

  
> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764465#action_12764465
 ] 

Robert Muir commented on LUCENE-1966:
-

Basem I can simply remove 123 & 124 if this is the case, but I did not want to 
do this without checking first.

The reason is, I wonder if perhaps you intended for these two to be أيضاً and 
ايضاً (with fathatan)

> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: svn commit: r824015 - in /lucene/java/trunk/src/java/org/apache/lucene/util: DocIdBitSet.java OpenBitSetIterator.java SortedVIntList.java

2009-10-11 Thread Uwe Schindler
Repaired test in contrib and committed. The source of the problem is still
on the TODO list.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Sunday, October 11, 2009 5:07 PM
> To: java-dev@lucene.apache.org
> Subject: RE: svn commit: r824015 - in
> /lucene/java/trunk/src/java/org/apache/lucene/util: DocIdBitSet.java
> OpenBitSetIterator.java SortedVIntList.java
> 
> Hi Michael,
> 
> this patch leads to failures in contrib/queries. We should first also
> remove
> the methods in DocIdSetIterator and make the new ones abstract. Because of
> this we get no compile errors, instead UnsupportedOperationExceptions at
> some places. When the deprecated methods are removed everywhere, we know
> exactly what is needed to be implemented different.
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> > -Original Message-
> > From: busc...@apache.org [mailto:busc...@apache.org]
> > Sent: Sunday, October 11, 2009 6:00 AM
> > To: java-comm...@lucene.apache.org
> > Subject: svn commit: r824015 - in
> > /lucene/java/trunk/src/java/org/apache/lucene/util: DocIdBitSet.java
> > OpenBitSetIterator.java SortedVIntList.java
> >
> > Author: buschmi
> > Date: Sun Oct 11 04:00:02 2009
> > New Revision: 824015
> >
> > URL: http://svn.apache.org/viewvc?rev=824015&view=rev
> > Log:
> > Remove deprecated metthods in DocIdBitSet.
> >
> > Modified:
> > lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java
> >
> >
> lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
> >
> lucene/java/trunk/src/java/org/apache/lucene/util/SortedVIntList.java
> >
> > Modified:
> > lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java
> > URL:
> >
> http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/
> > util/DocIdBitSet.java?rev=824015&r1=824014&r2=824015&view=diff
> >
> ==
> > 
> > --- lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java
> > (original)
> > +++ lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java
> Sun
> > Oct 11 04:00:02 2009
> > @@ -56,22 +56,10 @@
> >this.docId = -1;
> >  }
> >
> > -/** @deprecated use {...@link #docID()} instead. */
> > -public int doc() {
> > -  assert docId != -1;
> > -  return docId;
> > -}
> > -
> >  public int docID() {
> >return docId;
> >  }
> >
> > -/** @deprecated use {...@link #nextDoc()} instead. */
> > -public boolean next() {
> > -  // (docId + 1) on next line requires -1 initial value for docNr:
> > -  return nextDoc() != NO_MORE_DOCS;
> > -}
> > -
> >  public int nextDoc() {
> >// (docId + 1) on next line requires -1 initial value for docNr:
> >int d = bitSet.nextSetBit(docId + 1);
> > @@ -80,11 +68,6 @@
> >return docId;
> >  }
> >
> > -/** @deprecated use {...@link #advance(int)} instead. */
> > -public boolean skipTo(int skipDocNr) {
> > -  return advance(skipDocNr) != NO_MORE_DOCS;
> > -}
> > -
> >  public int advance(int target) {
> >int d = bitSet.nextSetBit(target);
> >// -1 returned by BitSet.nextSetBit() when exhausted
> >
> > Modified:
> >
> lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
> > URL:
> >
> http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/
> > util/OpenBitSetIterator.java?rev=824015&r1=824014&r2=824015&view=diff
> >
> ==
> > 
> > ---
> >
> lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
> > (original)
> > +++
> >
> lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
> > Sun Oct 11 04:00:02 2009
> > @@ -128,11 +128,6 @@
> >}
> >**/
> >
> > -  /** @deprecated use {...@link #nextDoc()} instead. */
> > -  public boolean next() {
> > -return nextDoc() != NO_MORE_DOCS;
> > -  }
> > -
> >public int nextDoc() {
> >  if (indexArray == 0) {
> >if (word != 0) {
> > @@ -160,11 +155,6 @@
> >  return curDocId = (i<<6) + bitIndex;
> >}
> >
> > -  /** @deprecated use {...@link #advance(int)} instead. */
> > -  public boolean skipTo(int target) {
> > -return advance(target) != NO_MORE_DOCS;
> > -  }
> > -
> >public int advance(int target) {
> >  indexArray = 0;
> >  i = target >> 6;
> > @@ -195,11 +185,6 @@
> >  return curDocId = (i<<6) + bitIndex;
> >}
> >
> > -  /** @deprecated use {...@link #docID()} instead. */
> > -  public int doc() {
> > -return curDocId;
> > -  }
> > -
> >public int docID() {
> >  return curDocId;
> >}
> >
> > Modified:
> > lucene/java/trunk/src/java/org/apache/lucene/util/SortedVIntList.java
> > URL:

[jira] Updated: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1946:
--

Attachment: LUCENE-1946.patch

Here the patch with all contrib's fixed:
- PrecedenceQueryParser was missing new TokenStream API, I fixed it somehow 
with States and restoreState. I also added a javacc-target, which was missing
- InstantiatedIndexWriter was also changed to use the new TokenStream API. The 
fix is very hackish, but works for the beginning. The class uses lots of 
Lists/Sets with cloned Token instances inside, so I simple used an 
AttributeImpl iterator and used copyTo(token). This works most cases (other 
cases are ignored by a empty Exception catch block). But this should really be 
fixed or the whole class removed (as suggested)

There is one question: I removed IsoLatin1Filter. Thismay be a backwards break, 
so that old indexes using this filter in the analyzer need to be reindexed. But 
for most cases the AccentFilter would also work, but some hits may be missing 
when you query such an index. What should we do. Leave the deprecated analyzer 
in and remove it with 4.0 when all old indexes cannot be read anymore?

> Remove deprecated TokenStream API
> -
>
> Key: LUCENE-1946
> URL: https://issues.apache.org/jira/browse/LUCENE-1946
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis, contrib/analyzers
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1946.patch, LUCENE-1946.patch, LUCENE-1946.patch
>
>
> I looked into clover analysis: It seems to be no longer used since I removed 
> the tests yesterday - I am happy!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764467#action_12764467
 ] 

Robert Muir commented on LUCENE-1946:
-

{quote}
There is one question: I removed IsoLatin1Filter. Thismay be a backwards break, 
so that old indexes using this filter in the analyzer need to be reindexed. But 
for most cases the AccentFilter would also work, but some hits may be missing 
when you query such an index. What should we do. Leave the deprecated analyzer 
in and remove it with 4.0 when all old indexes cannot be read anymore?
{quote}

I do not think this is a backwards break, because IsoLatin1Filter is deprecated?

{noformat}
 * @deprecated in favor of {...@link ASCIIFoldingFilter} which covers a 
superset 
 * of Latin 1. This class will be removed in Lucene 3.0.
{noformat}


> Remove deprecated TokenStream API
> -
>
> Key: LUCENE-1946
> URL: https://issues.apache.org/jira/browse/LUCENE-1946
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis, contrib/analyzers
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1946.patch, LUCENE-1946.patch, LUCENE-1946.patch
>
>
> I looked into clover analysis: It seems to be no longer used since I removed 
> the tests yesterday - I am happy!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Mark Miller

Waiting till 4 might not be a bad idea.

- Mark

http://www.lucidimagination.com (mobile)

On Oct 11, 2009, at 11:44 AM, "Robert Muir (JIRA)"   
wrote:




   [ https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764467#action_12764467 
 ]


Robert Muir commented on LUCENE-1946:
-

{quote}
There is one question: I removed IsoLatin1Filter. Thismay be a  
backwards break, so that old indexes using this filter in the  
analyzer need to be reindexed. But for most cases the AccentFilter  
would also work, but some hits may be missing when you query such an  
index. What should we do. Leave the deprecated analyzer in and  
remove it with 4.0 when all old indexes cannot be read anymore?

{quote}

I do not think this is a backwards break, because IsoLatin1Filter is  
deprecated?


{noformat}
* @deprecated in favor of {...@link ASCIIFoldingFilter} which covers a  
superset

* of Latin 1. This class will be removed in Lucene 3.0.
{noformat}



Remove deprecated TokenStream API
-

   Key: LUCENE-1946
   URL: https://issues.apache.org/jira/browse/LUCENE-1946
   Project: Lucene - Java
Issue Type: Task
Components: Analysis, contrib/analyzers
  Reporter: Uwe Schindler
  Assignee: Uwe Schindler
   Fix For: 3.0

   Attachments: LUCENE-1946.patch, LUCENE-1946.patch,  
LUCENE-1946.patch



I looked into clover analysis: It seems to be no longer used since  
I removed the tests yesterday - I am happy!


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764472#action_12764472
 ] 

Uwe Schindler commented on LUCENE-1946:
---

This is correct. But e.g. NumberTools and DateTools will also stay deprecated 
in 3.0, because you need them to use Indexes from previous versions.

So there is a difference between deprecated APIs and deprecated functionality 
that is maybe needed, as long as old indexes are available. With 4.0 we will 
change the index format and then the problem is gone.

But I agree with you, the differences between both classes are so minimal and 
only western european languages would have used the ISO filter. The superset 
does not hurt, only when extended chars (>255) are involved.

> Remove deprecated TokenStream API
> -
>
> Key: LUCENE-1946
> URL: https://issues.apache.org/jira/browse/LUCENE-1946
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis, contrib/analyzers
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1946.patch, LUCENE-1946.patch, LUCENE-1946.patch
>
>
> I looked into clover analysis: It seems to be no longer used since I removed 
> the tests yesterday - I am happy!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764473#action_12764473
 ] 

Robert Muir commented on LUCENE-1946:
-

Uwe, well this class already uses new tokenstream API, so it is not hurting 
anything right?
maybe follow what Mark said and keep it, change the javadoc to say 'this class 
will be removed in Lucene 4.0' ?


> Remove deprecated TokenStream API
> -
>
> Key: LUCENE-1946
> URL: https://issues.apache.org/jira/browse/LUCENE-1946
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis, contrib/analyzers
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1946.patch, LUCENE-1946.patch, LUCENE-1946.patch
>
>
> I looked into clover analysis: It seems to be no longer used since I removed 
> the tests yesterday - I am happy!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764473#action_12764473
 ] 

Robert Muir edited comment on LUCENE-1946 at 10/11/09 8:56 AM:
---

Uwe, well this class already uses new tokenstream API, so it is not hurting 
anything right?
maybe follow what Mark said and keep it, change the javadoc to say 'this class 
will be removed in Lucene 4.0' ?

edit. also maybe another option, you could move it to contrib?

  was (Author: rcmuir):
Uwe, well this class already uses new tokenstream API, so it is not hurting 
anything right?
maybe follow what Mark said and keep it, change the javadoc to say 'this class 
will be removed in Lucene 4.0' ?

  
> Remove deprecated TokenStream API
> -
>
> Key: LUCENE-1946
> URL: https://issues.apache.org/jira/browse/LUCENE-1946
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis, contrib/analyzers
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1946.patch, LUCENE-1946.patch, LUCENE-1946.patch
>
>
> I looked into clover analysis: It seems to be no longer used since I removed 
> the tests yesterday - I am happy!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764474#action_12764474
 ] 

Uwe Schindler commented on LUCENE-1946:
---

Cool idea. I move it to contrib/analyzers/common. But the class name will stay, 
the same so it will be in the top-level dir (without language suffix) as the 
only one.

With NumberTools/DateField/DateTools we can do the same? This would be good in 
misc or somewhere else. But we then have to remove support for them from core's 
query parser? --- mhhm bad idea for the beginning. I leave it there :-)

> Remove deprecated TokenStream API
> -
>
> Key: LUCENE-1946
> URL: https://issues.apache.org/jira/browse/LUCENE-1946
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis, contrib/analyzers
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1946.patch, LUCENE-1946.patch, LUCENE-1946.patch
>
>
> I looked into clover analysis: It seems to be no longer used since I removed 
> the tests yesterday - I am happy!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764476#action_12764476
 ] 

Robert Muir commented on LUCENE-1946:
-

bq. Cool idea. I move it to contrib/analyzers/common. But the class name will 
stay, the same so it will be in the top-level dir (without language suffix) as 
the only one. 

or maybe put it in misc also? I don't understand the Number/Date issues like 
you do, so maybe contrib is a bad idea... just throwing it out there.


> Remove deprecated TokenStream API
> -
>
> Key: LUCENE-1946
> URL: https://issues.apache.org/jira/browse/LUCENE-1946
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis, contrib/analyzers
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1946.patch, LUCENE-1946.patch, LUCENE-1946.patch
>
>
> I looked into clover analysis: It seems to be no longer used since I removed 
> the tests yesterday - I am happy!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1970) Remove deprecated DocIdSetIterator methods

2009-10-11 Thread Michael Busch (JIRA)
Remove deprecated DocIdSetIterator methods
--

 Key: LUCENE-1970
 URL: https://issues.apache.org/jira/browse/LUCENE-1970
 Project: Lucene - Java
  Issue Type: Task
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 3.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: svn commit: r824015 - in /lucene/java/trunk/src/java/org/apache/lucene/util: DocIdBitSet.java OpenBitSetIterator.java SortedVIntList.java

2009-10-11 Thread Michael Busch
Thanks for the fix, Uwe! Removing the DocIdSetIterator methods is still 
on my Todo list, I just opened LUCENE-1970.


 Michael

On 10/11/09 8:06 AM, Uwe Schindler wrote:

Hi Michael,

this patch leads to failures in contrib/queries. We should first also remove
the methods in DocIdSetIterator and make the new ones abstract. Because of
this we get no compile errors, instead UnsupportedOperationExceptions at
some places. When the deprecated methods are removed everywhere, we know
exactly what is needed to be implemented different.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

   

-Original Message-
From: busc...@apache.org [mailto:busc...@apache.org]
Sent: Sunday, October 11, 2009 6:00 AM
To: java-comm...@lucene.apache.org
Subject: svn commit: r824015 - in
/lucene/java/trunk/src/java/org/apache/lucene/util: DocIdBitSet.java
OpenBitSetIterator.java SortedVIntList.java

Author: buschmi
Date: Sun Oct 11 04:00:02 2009
New Revision: 824015

URL: http://svn.apache.org/viewvc?rev=824015&view=rev
Log:
Remove deprecated metthods in DocIdBitSet.

Modified:
 lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java

lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
 lucene/java/trunk/src/java/org/apache/lucene/util/SortedVIntList.java

Modified:
lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java
URL:
http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/
util/DocIdBitSet.java?rev=824015&r1=824014&r2=824015&view=diff
==

--- lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java
(original)
+++ lucene/java/trunk/src/java/org/apache/lucene/util/DocIdBitSet.java Sun
Oct 11 04:00:02 2009
@@ -56,22 +56,10 @@
this.docId = -1;
  }

-/** @deprecated use {...@link #docID()} instead. */
-public int doc() {
-  assert docId != -1;
-  return docId;
-}
-
  public int docID() {
return docId;
  }

-/** @deprecated use {...@link #nextDoc()} instead. */
-public boolean next() {
-  // (docId + 1) on next line requires -1 initial value for docNr:
-  return nextDoc() != NO_MORE_DOCS;
-}
-
  public int nextDoc() {
// (docId + 1) on next line requires -1 initial value for docNr:
int d = bitSet.nextSetBit(docId + 1);
@@ -80,11 +68,6 @@
return docId;
  }

-/** @deprecated use {...@link #advance(int)} instead. */
-public boolean skipTo(int skipDocNr) {
-  return advance(skipDocNr) != NO_MORE_DOCS;
-}
-
  public int advance(int target) {
int d = bitSet.nextSetBit(target);
// -1 returned by BitSet.nextSetBit() when exhausted

Modified:
lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
URL:
http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/
util/OpenBitSetIterator.java?rev=824015&r1=824014&r2=824015&view=diff
==

---
lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
(original)
+++
lucene/java/trunk/src/java/org/apache/lucene/util/OpenBitSetIterator.java
Sun Oct 11 04:00:02 2009
@@ -128,11 +128,6 @@
}
**/

-  /** @deprecated use {...@link #nextDoc()} instead. */
-  public boolean next() {
-return nextDoc() != NO_MORE_DOCS;
-  }
-
public int nextDoc() {
  if (indexArray == 0) {
if (word != 0) {
@@ -160,11 +155,6 @@
  return curDocId = (i<<6) + bitIndex;
}

-  /** @deprecated use {...@link #advance(int)} instead. */
-  public boolean skipTo(int target) {
-return advance(target) != NO_MORE_DOCS;
-  }
-
public int advance(int target) {
  indexArray = 0;
  i = target>>  6;
@@ -195,11 +185,6 @@
  return curDocId = (i<<6) + bitIndex;
}

-  /** @deprecated use {...@link #docID()} instead. */
-  public int doc() {
-return curDocId;
-  }
-
public int docID() {
  return curDocId;
}

Modified:
lucene/java/trunk/src/java/org/apache/lucene/util/SortedVIntList.java
URL:
http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/
util/SortedVIntList.java?rev=824015&r1=824014&r2=824015&view=diff
==

--- lucene/java/trunk/src/java/org/apache/lucene/util/SortedVIntList.java
(original)
+++ lucene/java/trunk/src/java/org/apache/lucene/util/SortedVIntList.java
Sun Oct 11 04:00:02 2009
@@ -204,18 +204,10 @@
  }
}

-  /** @deprecated use {...@link #docID()} instead. */
-  public int doc() {return lastInt;}
-
public int docID() {
  return doc;
}

-  /** @deprecated use {...@link #nextDoc()} instead. */
-  public boolean next() {
-return nextDoc() != NO_MORE_DOCS;
-  }
-
public int nextDoc() {
  if (bytePos>= lastByt

[jira] Updated: (LUCENE-1970) Remove deprecated DocIdSetIterator methods

2009-10-11 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch updated LUCENE-1970:
--

Attachment: lucene-1970.patch

All core & contrib tests pass.

> Remove deprecated DocIdSetIterator methods
> --
>
> Key: LUCENE-1970
> URL: https://issues.apache.org/jira/browse/LUCENE-1970
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.0
>
> Attachments: lucene-1970.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1970) Remove deprecated DocIdSetIterator methods

2009-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764490#action_12764490
 ] 

Uwe Schindler commented on LUCENE-1970:
---

looks good!

> Remove deprecated DocIdSetIterator methods
> --
>
> Key: LUCENE-1970
> URL: https://issues.apache.org/jira/browse/LUCENE-1970
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.0
>
> Attachments: lucene-1970.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: 3.0 release date

2009-10-11 Thread Michael Busch

Cool. So I guess we need a 3.0 release manager. Any volunteers? ;)

 Michael

On 10/11/09 2:57 AM, Uwe Schindler wrote:

+1. Adding generics internally should be left for later work. Especially
some big Eclipse/IDEA-related code refactoring (see other thread) as
proposed by Earwin and Timo Nentwig should be left for later code to get 3.0
out as early as posible.

For users, generification of public APIs should be really done for 3.0.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


   

-Original Message-
From: Michael Busch [mailto:busch...@gmail.com]
Sent: Sunday, October 11, 2009 7:23 AM
To: java-dev@lucene.apache.org
Subject: 3.0 release date

Hi all,

I was wondering what our 3.0 release target date is? I think the
outstanding issues are removal of the deprecated APIs and more Java 1.5
updates (especially adding generics to public APIs). Should we try to
get it out early November, in about 3-4 weeks?

   Michael

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


   



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1970) Remove deprecated DocIdSetIterator methods

2009-10-11 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764491#action_12764491
 ] 

Michael Busch commented on LUCENE-1970:
---

Thanks for reviewing!  I'll commit soon...

> Remove deprecated DocIdSetIterator methods
> --
>
> Key: LUCENE-1970
> URL: https://issues.apache.org/jira/browse/LUCENE-1970
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.0
>
> Attachments: lucene-1970.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1970) Remove deprecated DocIdSetIterator methods

2009-10-11 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch resolved LUCENE-1970.
---

Resolution: Fixed

Committed revision 824111.

> Remove deprecated DocIdSetIterator methods
> --
>
> Key: LUCENE-1970
> URL: https://issues.apache.org/jira/browse/LUCENE-1970
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.0
>
> Attachments: lucene-1970.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Basem Narmok (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764493#action_12764493
 ] 

Basem Narmok commented on LUCENE-1966:
--

Oh, my mistake, sorry, yes please remove the last two on 123 & 124.

no, they are just duplicate of the ones on line 72 & 73



> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1946:
--

Attachment: LUCENE-1946.patch

Patch with ISOLatin1Filter deprecated with a hint to 4.0 and additional text. I 
also made the missing TokenStreams (intended not to be extended) final (see 
LUCENE-1753).

All test pass. I will commit soon.

> Remove deprecated TokenStream API
> -
>
> Key: LUCENE-1946
> URL: https://issues.apache.org/jira/browse/LUCENE-1946
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis, contrib/analyzers
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1946.patch, LUCENE-1946.patch, LUCENE-1946.patch, 
> LUCENE-1946.patch
>
>
> I looked into clover analysis: It seems to be no longer used since I removed 
> the tests yesterday - I am happy!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764495#action_12764495
 ] 

Robert Muir commented on LUCENE-1966:
-

Basem, ok! Thanks a lot for your help here. I will commit soon.

> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: 3.0 release date

2009-10-11 Thread Uwe Schindler
As Mark was the one for 2.9 and I am the only new core committer, I will be
the next one... :-)

I only need a certificate and key to sign the artifacts. Maybe we can do a
meeting in Oakland for a face-to-face meeting!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Michael Busch [mailto:busch...@gmail.com]
> Sent: Sunday, October 11, 2009 7:14 PM
> To: java-dev@lucene.apache.org
> Subject: Re: 3.0 release date
> 
> Cool. So I guess we need a 3.0 release manager. Any volunteers? ;)
> 
>   Michael
> 
> On 10/11/09 2:57 AM, Uwe Schindler wrote:
> > +1. Adding generics internally should be left for later work. Especially
> > some big Eclipse/IDEA-related code refactoring (see other thread) as
> > proposed by Earwin and Timo Nentwig should be left for later code to get
> 3.0
> > out as early as posible.
> >
> > For users, generification of public APIs should be really done for 3.0.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >
> >> -Original Message-
> >> From: Michael Busch [mailto:busch...@gmail.com]
> >> Sent: Sunday, October 11, 2009 7:23 AM
> >> To: java-dev@lucene.apache.org
> >> Subject: 3.0 release date
> >>
> >> Hi all,
> >>
> >> I was wondering what our 3.0 release target date is? I think the
> >> outstanding issues are removal of the deprecated APIs and more Java 1.5
> >> updates (especially adding generics to public APIs). Should we try to
> >> get it out early November, in about 3-4 weeks?
> >>
> >>Michael
> >>
> >> -
> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-dev-h...@lucene.apache.org
> >
> >
> >
> 
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: 3.0 release date

2009-10-11 Thread Michael Busch

Awesome, thanks Uwe!

There should be a key signing session at the ApacheCon.

 Michael

On 10/11/09 10:26 AM, Uwe Schindler wrote:

As Mark was the one for 2.9 and I am the only new core committer, I will be
the next one... :-)

I only need a certificate and key to sign the artifacts. Maybe we can do a
meeting in Oakland for a face-to-face meeting!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

   

-Original Message-
From: Michael Busch [mailto:busch...@gmail.com]
Sent: Sunday, October 11, 2009 7:14 PM
To: java-dev@lucene.apache.org
Subject: Re: 3.0 release date

Cool. So I guess we need a 3.0 release manager. Any volunteers? ;)

   Michael

On 10/11/09 2:57 AM, Uwe Schindler wrote:
 

+1. Adding generics internally should be left for later work. Especially
some big Eclipse/IDEA-related code refactoring (see other thread) as
proposed by Earwin and Timo Nentwig should be left for later code to get
   

3.0
 

out as early as posible.

For users, generification of public APIs should be really done for 3.0.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



   

-Original Message-
From: Michael Busch [mailto:busch...@gmail.com]
Sent: Sunday, October 11, 2009 7:23 AM
To: java-dev@lucene.apache.org
Subject: 3.0 release date

Hi all,

I was wondering what our 3.0 release target date is? I think the
outstanding issues are removal of the deprecated APIs and more Java 1.5
updates (especially adding generics to public APIs). Should we try to
get it out early November, in about 3-4 weeks?

Michael

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

 


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



   


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


   



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1946) Remove deprecated TokenStream API

2009-10-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1946.
---

Resolution: Fixed

Committed revision: 824116

> Remove deprecated TokenStream API
> -
>
> Key: LUCENE-1946
> URL: https://issues.apache.org/jira/browse/LUCENE-1946
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis, contrib/analyzers
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1946.patch, LUCENE-1946.patch, LUCENE-1946.patch, 
> LUCENE-1946.patch
>
>
> I looked into clover analysis: It seems to be no longer used since I removed 
> the tests yesterday - I am happy!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1753) Make not yet final core/contrib TokenStream/Filter implementations final

2009-10-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1753.
---

Resolution: Fixed
  Assignee: Uwe Schindler

Committed revision: 824116

> Make not yet final core/contrib TokenStream/Filter implementations final
> 
>
> Key: LUCENE-1753
> URL: https://issues.apache.org/jira/browse/LUCENE-1753
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Analysis
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
>
> Lucene's analysis package is designed in a way, that you can plug different 
> *implementations* of analysis in chains of TokenStreams and TokenFilters. An 
> analyzer is build of several TokenStreams/Filters that do the tokenization of 
> text. If you want to modify the behaviour of tokenization, you implement a 
> new subclass of TokenStream/-Filter/Tokenizer.
> Most classes in the core are correctly implemented like that. They are 
> itsself final or their implementation methods are final (CharTokenizer).
> A lot of problems with backwards-compatibility of LUCENE-1693 are some 
> classes in Lucene's core/contrib not yet final:
> - KeywordTokenizer should be declared final or its implementation methods 
> should be final
> - StandardTokenizer should be declared final or its implementation methods 
> should be final
> - ISOLatin1Filter is deprecated, so it will be removed in 3.0, nothing to do.
> CharTokenizer is the abstract base class of several other classes. The design 
> is correct: Child classes cannot override the implementation, they can only 
> change the behaviour of this final implementation.
> Contrib should be checked, that all implementation classes are at least final 
> or they are designed in the same way like CharTokenizer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764501#action_12764501
 ] 

Robert Muir commented on LUCENE-1966:
-

before I commit this, I want to solicit any comments/concerns about backwards 
compat, assuming the following notice:

{noformat}
Changes in runtime behavior

 * LUCENE-1966: Modified and cleaned the default Arabic stopwords list used
   by ArabicAnalyzer. You'll need to fully re-index any previously created 
   indexes.  (Basem Narmok via Robert Muir)
{noformat}

i know contrib has no bw compat guarantee, but just want to double-check. 
Perhaps in the future someone might help fix the Persian stopwords file also so 
this may happen again :)


> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1971) Remove deprecated RangeQuery classes

2009-10-11 Thread Uwe Schindler (JIRA)
Remove deprecated RangeQuery classes


 Key: LUCENE-1971
 URL: https://issues.apache.org/jira/browse/LUCENE-1971
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0


Remove deprecated RangeQuery classes

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1972) Remove (deprecated) ExtendedFieldCache and Auto/Custom caches and sort

2009-10-11 Thread Uwe Schindler (JIRA)
Remove (deprecated) ExtendedFieldCache and Auto/Custom caches and sort
--

 Key: LUCENE-1972
 URL: https://issues.apache.org/jira/browse/LUCENE-1972
 Project: Lucene - Java
  Issue Type: Task
  Components: Search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0


Remove (deprecated) ExtendedFieldCache and Auto/Custom caches and sort

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1971) Remove deprecated RangeQuery classes

2009-10-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1971:
--

Component/s: Search

> Remove deprecated RangeQuery classes
> 
>
> Key: LUCENE-1971
> URL: https://issues.apache.org/jira/browse/LUCENE-1971
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
>
> Remove deprecated RangeQuery classes

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1971) Remove deprecated RangeQuery classes

2009-10-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1971:
--

Attachment: LUCENE-1971.patch

Patch.

Will commit when tests passed.

> Remove deprecated RangeQuery classes
> 
>
> Key: LUCENE-1971
> URL: https://issues.apache.org/jira/browse/LUCENE-1971
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1971.patch
>
>
> Remove deprecated RangeQuery classes

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1973) Remove deprecated query components

2009-10-11 Thread Uwe Schindler (JIRA)
Remove deprecated query components
--

 Key: LUCENE-1973
 URL: https://issues.apache.org/jira/browse/LUCENE-1973
 Project: Lucene - Java
  Issue Type: Task
  Components: Search
Reporter: Uwe Schindler
 Fix For: 3.0


Remove deprecated query components around HitCollector

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1932) Convert PrecedenceQueryParser to new TokenStream API

2009-10-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1932.
---

Resolution: Fixed
  Assignee: Uwe Schindler  (was: Adriano Crestani)

I have done this during removal of old TokenStream API.

> Convert PrecedenceQueryParser to new TokenStream API
> 
>
> Key: LUCENE-1932
> URL: https://issues.apache.org/jira/browse/LUCENE-1932
> Project: Lucene - Java
>  Issue Type: Task
>  Components: contrib/*
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1932.patch, LUCENE-1932.patch
>
>
> Adriano Crestani provided a patch, that updates the PQP to use the new 
> TokenStream API...all tests still pass. 
> I hope this helps to keep the PQP 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS

2009-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764512#action_12764512
 ] 

Uwe Schindler commented on LUCENE-1960:
---

Sorry, I forgot this one, will check tomorrow with some old indexes using 
compressed fields.

> Remove deprecated Field.Store.COMPRESS
> --
>
> Key: LUCENE-1960
> URL: https://issues.apache.org/jira/browse/LUCENE-1960
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.0
>
> Attachments: lucene-1960-1.patch, lucene-1960.patch
>
>
> Also remove FieldForMerge and related code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1971) Remove deprecated RangeQuery classes

2009-10-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1971.
---

Resolution: Fixed

Committed revision: 824175

> Remove deprecated RangeQuery classes
> 
>
> Key: LUCENE-1971
> URL: https://issues.apache.org/jira/browse/LUCENE-1971
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Search
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1971.patch
>
>
> Remove deprecated RangeQuery classes

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Basem Narmok (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764515#action_12764515
 ] 

Basem Narmok commented on LUCENE-1966:
--

Seems good.

BTW with FAST ESP we never used stopwords, as hits from stopwords get low 
relevancy (keywords with high number of hits = low value, low importance, so 
less relevant), so such hits will never get into the top results. Also, using 
stopwords will affect phrase search, most of the search engines avoid removing 
them. But, at the end it depends on the client's application, and what she 
really wants, as enterprise search could have very specific and different needs 
than Internet search.

Anyways, still I am testing the Arabic Analyzer, and I will provide you with 
more comments soon. but for the stopwords they are good for now :)

> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1756) contrib/memory: PatternAnalyzerTest is a very, very, VERY, bad unit test

2009-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1756:


Attachment: LUCENE-1756.patch

improved unit test for this analyzer

> contrib/memory: PatternAnalyzerTest is a very, very, VERY, bad unit test
> 
>
> Key: LUCENE-1756
> URL: https://issues.apache.org/jira/browse/LUCENE-1756
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Reporter: Hoss Man
>Priority: Minor
> Attachments: LUCENE-1756.patch
>
>
> while working on something else i was started getting consistent 
> IllegalStateExceptions from PatternAnalyzerTest -- but only when running the 
> test from the top level.
> Digging into the test, i've found numerous things that are very scary...
> * instead of using assertions to test that tokens streams match, it throws an 
> IllegalStateExceptions when they don't, and then logs a bunch of info about 
> the token streams to System.out -- having assertion messages that tell you 
> *exactly* what doens't match would make a lot more sense.
> * it builds up a list of files to analyze using patsh thta it evaluates 
> relative to the current working directory -- which means you get different 
> files depending on wether you run the tests fro mthe contrib level, or from 
> the top level build file
> * the list of files it looks for include: "../../*.txt", "../../*.html", 
> "../../*.xml" ... so not only do you get different results when you run the 
> tests in the contrib vs at the top level, but different people runing the 
> tests via the top level build file will get different results depending on 
> what types of text, html, and xml files they happen to have two directories 
> above where they checked out lucene.
> * the test comments indicates that it's purpose is to show that 
> PatternAnalyzer produces the same tokens as other analyzers - but points out 
> this will fail for WhitespaceAnalyzer because of the 255 character token 
> limit WhitespaceTokenizer imposes -- the test then proceeds to compare 
> PaternAnalyzer to WhitespaceTokenizer, garunteeing a test failure for anyone 
> who happens to have a text file containing more then 255 characters of 
> non-whitespace in a row somewhere in "../../" (in my case: my bookmarks.html 
> file, and the hex encoded favicon.gif images)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1756) contrib/memory: PatternAnalyzerTest is a very, very, VERY, bad unit test

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764518#action_12764518
 ] 

Robert Muir commented on LUCENE-1756:
-

I think this test was complex because it was trying to be a both a test and a 
benchmark.

I think removing the benchmark stuff is ok, because we can use the benchmark 
package for that purpose instead?

> contrib/memory: PatternAnalyzerTest is a very, very, VERY, bad unit test
> 
>
> Key: LUCENE-1756
> URL: https://issues.apache.org/jira/browse/LUCENE-1756
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Reporter: Hoss Man
>Priority: Minor
> Attachments: LUCENE-1756.patch
>
>
> while working on something else i was started getting consistent 
> IllegalStateExceptions from PatternAnalyzerTest -- but only when running the 
> test from the top level.
> Digging into the test, i've found numerous things that are very scary...
> * instead of using assertions to test that tokens streams match, it throws an 
> IllegalStateExceptions when they don't, and then logs a bunch of info about 
> the token streams to System.out -- having assertion messages that tell you 
> *exactly* what doens't match would make a lot more sense.
> * it builds up a list of files to analyze using patsh thta it evaluates 
> relative to the current working directory -- which means you get different 
> files depending on wether you run the tests fro mthe contrib level, or from 
> the top level build file
> * the list of files it looks for include: "../../*.txt", "../../*.html", 
> "../../*.xml" ... so not only do you get different results when you run the 
> tests in the contrib vs at the top level, but different people runing the 
> tests via the top level build file will get different results depending on 
> what types of text, html, and xml files they happen to have two directories 
> above where they checked out lucene.
> * the test comments indicates that it's purpose is to show that 
> PatternAnalyzer produces the same tokens as other analyzers - but points out 
> this will fail for WhitespaceAnalyzer because of the 255 character token 
> limit WhitespaceTokenizer imposes -- the test then proceeds to compare 
> PaternAnalyzer to WhitespaceTokenizer, garunteeing a test failure for anyone 
> who happens to have a text file containing more then 255 characters of 
> non-whitespace in a row somewhere in "../../" (in my case: my bookmarks.html 
> file, and the hex encoded favicon.gif images)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1966) Arabic Analyzer: Stopwords list needs enhancement

2009-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764519#action_12764519
 ] 

Robert Muir commented on LUCENE-1966:
-

Basem, yes I think the improvements are good.

My question is really: is it OK to commit this for 3.0 or should we wait for 
3.1?


> Arabic Analyzer: Stopwords list needs enhancement
> -
>
> Key: LUCENE-1966
> URL: https://issues.apache.org/jira/browse/LUCENE-1966
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Basem Narmok
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 3.0
>
> Attachments: arabic-stopwords-comments.txt, LUCENE-1966.patch, 
> LUCENE-1966.patch
>
>
> The provided Arabic stopwords list needs some enhancements (e.g. it contains 
> a lot of words that not stopwords, and some cleanup) . patch will be provided 
> with this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



A problem about BooleanQuery

2009-10-11 Thread fulin tang
Hi, all:

I came up to a very confused problem about BooleanQuery , maybe I can
describe it use the output of my code:


query: (name:tang*)
doc=5137 score=1.0  doc:Document>
doc=11377 score=1.0  doc:Document>
query: name:tang* name:notexistnames
doc=5137 score=0.048133932  doc:Document>


It is two queries on the same index, one is just a prefix query in a
boolean query, and the other is a prefix query plus a term query in a
boolean query, all with Occur.SHOULD .

what I wonder is why the later query can not find the doc=11377 doc ?

the problem can be repreduced by the code in the attachment .


thanks very much!
package org.fulin.search.test;

import java.io.IOException;

import junit.framework.TestCase;

import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.PrefixQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.RAMDirectory;

/**
 * represent the bug of 
 * 
 * 		BooleanScorer.score(Collector collector, int max, int firstDocID)
 * 
 * Line 273, end=8192, subScorerDocID=11378, then more got false?
 * 
 * @author tangfulin 
 *
 */
public class BooleanQueryTest extends TestCase {

	private static final String FIELD = "name";
	private static RAMDirectory directory = new RAMDirectory();
	private static String[] values = new String[] { "tangfulin" };

	protected void setUp() throws Exception {
		IndexWriter writer = new IndexWriter(directory,
new WhitespaceAnalyzer(), true,
IndexWriter.MaxFieldLength.LIMITED);

		for (int i = 0; i < 5137; ++i) {
			Document doc = new Document();
			doc.add(new Field(FIELD, "meaninglessnames", Field.Store.YES,
	Field.Index.NOT_ANALYZED));
			writer.addDocument(doc);
		}

		for (int i = 0; i < values.length; i++) {
			Document doc = new Document();
			doc.add(new Field(FIELD, values[i], Field.Store.YES,
	Field.Index.NOT_ANALYZED));
			writer.addDocument(doc);
		}

		for (int i = 5138; i < 11377; ++i) {
			Document doc = new Document();
			doc.add(new Field(FIELD, "meaninglessnames", Field.Store.YES,
	Field.Index.NOT_ANALYZED));
			writer.addDocument(doc);
		}

		for (int i = 0; i < values.length; i++) {
			Document doc = new Document();
			doc.add(new Field(FIELD, values[i], Field.Store.YES,
	Field.Index.NOT_ANALYZED));
			writer.addDocument(doc);
		}

		writer.close();
	}

	public void testBooleanPrefixQuery() {
		try {
			IndexSearcher indexSearcher = new IndexSearcher(directory, true);
			BooleanQuery query;
			ScoreDoc[] hits;

			PrefixQuery pq = new PrefixQuery(new Term(FIELD, "tang"));
			BooleanQuery booleanQuery1 = new BooleanQuery();
			booleanQuery1.add(pq, BooleanClause.Occur.SHOULD);

			query = new BooleanQuery();
			query.add(booleanQuery1, BooleanClause.Occur.SHOULD);
			hits = indexSearcher.search(query, null, 1000).scoreDocs;

			System.out.println("query: " + query);
			for (ScoreDoc hit : hits) {
System.out.println(hit + "  doc:" + indexSearcher.doc(hit.doc));
			}

			assertEquals("Number of matched documents", 2, hits.length);

			query = new BooleanQuery();
			query.add(pq, BooleanClause.Occur.SHOULD);

			query.add(new TermQuery(new Term(FIELD, "notexistnames")),
	BooleanClause.Occur.SHOULD);

			hits = indexSearcher.search(query, null, 1000).scoreDocs;

			System.out.println("query: " + query);
			for (ScoreDoc hit : hits) {
System.out.println(hit + "  doc:" + indexSearcher.doc(hit.doc));
			}

			assertEquals("Number of matched documents", 2, hits.length);

		} catch (IOException e) {
			fail(e.getMessage());
		}
	}

}

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1974) BooleanQuery can not find all matches in special condition

2009-10-11 Thread tangfulin (JIRA)
BooleanQuery can not find all matches in special condition
--

 Key: LUCENE-1974
 URL: https://issues.apache.org/jira/browse/LUCENE-1974
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9
Reporter: tangfulin


query: (name:tang*)
doc=5137 score=1.0  doc:Document>
doc=11377 score=1.0  doc:Document>
query: name:tang* name:notexistnames
doc=5137 score=0.048133932  doc:Document>

It is two queries on the same index, one is just a prefix query in a
boolean query, and the other is a prefix query plus a term query in a
boolean query, all with Occur.SHOULD .

what I wonder is why the later query can not find the doc=11377 doc ?

the problem can be repreduced by the code in the attachment .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1974) BooleanQuery can not find all matches in special condition

2009-10-11 Thread tangfulin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangfulin updated LUCENE-1974:
--

Attachment: BooleanQueryTest.java

> BooleanQuery can not find all matches in special condition
> --
>
> Key: LUCENE-1974
> URL: https://issues.apache.org/jira/browse/LUCENE-1974
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Query/Scoring
>Affects Versions: 2.9
>Reporter: tangfulin
> Attachments: BooleanQueryTest.java
>
>
> query: (name:tang*)
> doc=5137 score=1.0  doc:Document>
> doc=11377 score=1.0  doc:Document>
> query: name:tang* name:notexistnames
> doc=5137 score=0.048133932  doc:Document>
> It is two queries on the same index, one is just a prefix query in a
> boolean query, and the other is a prefix query plus a term query in a
> boolean query, all with Occur.SHOULD .
> what I wonder is why the later query can not find the doc=11377 doc ?
> the problem can be repreduced by the code in the attachment .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764531#action_12764531
 ] 

Mark Miller commented on LUCENE-1458:
-

Okay, after all that poking around in the dark, tonight I decided to actually 
try turning on the DEBUG stuff you have and figuring out how things actually 
work ;) Always too lazy to open that instruction manual till I've wasted plenty 
of time spinning in circles.

So I've got it working -

When it was working like 99% I benched the speed at 6300-6500 r/s with the 
samerdr bench as compared to 9500-11000 with the trunk version I had checked 
out.

But that last 1% meant adding two TermRef clones, and that dropped things to 
about 5800 or so.

I'm sure I might have a few wasteful instructions and/or there can be a little 
more eeked out, but I think it will still come up short.

I dont see seek(ord) being called using eclipse (other than in tests), but it 
may be missing it? So I'm not really sure if it needs to be cached or not - no 
code to test it with at the moment.

> Further steps towards flexible indexing
> ---
>
> Key: LUCENE-1458
> URL: https://issues.apache.org/jira/browse/LUCENE-1458
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, 
> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, 
> LUCENE-1458.patch, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, 
> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2
>
>
> I attached a very rough checkpoint of my current patch, to get early
> feedback.  All tests pass, though back compat tests don't pass due to
> changes to package-private APIs plus certain bugs in tests that
> happened to work (eg call TermPostions.nextPosition() too many times,
> which the new API asserts against).
> [Aside: I think, when we commit changes to package-private APIs such
> that back-compat tests don't pass, we could go back, make a branch on
> the back-compat tag, commit changes to the tests to use the new
> package private APIs on that branch, then fix nightly build to use the
> tip of that branch?o]
> There's still plenty to do before this is committable! This is a
> rather large change:
>   * Switches to a new more efficient terms dict format.  This still
> uses tii/tis files, but the tii only stores term & long offset
> (not a TermInfo).  At seek points, tis encodes term & freq/prox
> offsets absolutely instead of with deltas delta.  Also, tis/tii
> are structured by field, so we don't have to record field number
> in every term.
> .
> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB
> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB).
> .
> RAM usage when loading terms dict index is significantly less
> since we only load an array of offsets and an array of String (no
> more TermInfo array).  It should be faster to init too.
> .
> This part is basically done.
>   * Introduces modular reader codec that strongly decouples terms dict
> from docs/positions readers.  EG there is no more TermInfo used
> when reading the new format.
> .
> There's nice symmetry now between reading & writing in the codec
> chain -- the current docs/prox format is captured in:
> {code}
> FormatPostingsTermsDictWriter/Reader
> FormatPostingsDocsWriter/Reader (.frq file) and
> FormatPostingsPositionsWriter/Reader (.prx file).
> {code}
> This part is basically done.
>   * Introduces a new "flex" API for iterating through the fields,
> terms, docs and positions:
> {code}
> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
> {code}
> This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
> old API on top of the new API to keep back-compat.
> 
> Next steps:
>   * Plug in new codecs (pulsing, pfor) to exercise the modularity /
> fix any hidden assumptions.
>   * Expose new API out of IndexReader, deprecate old API but emulate
> old API on top of new one, switch all core/contrib users to the
> new API.
>   * Maybe switch to AttributeSources as the base class for TermsEnum,
> DocsEnum, PostingsEnum -- this would give readers API flexibility
> (not just index-file-format flexibility).  EG if someone wanted
> to store payload at the term-doc level instead o

new sorting api and some perf numbers

2009-10-11 Thread John Wang
Hi guys:
The new FieldComparator api looks really scary :)

But after some perf testing with numbers I'd like to share, I guess it
is worth it:

HW: Mac Pro with 16G memory
jvm: 1.6.0_13"
jvm arg: -Xms1g -Xmx1g -server

setup

index:
1M docs even split into 8 segments (to make sure the test is fair across
segment boundaries)
each doc has 3 fields:
1) id - stored
2) val - random number, indexed, not analyzed, no norms, omit tf
3) string - "even" or "odd" of the corresponding id, not analyzed, no norms,
omit tf

built with lucene 2.4.1 to keep the same index across lucene 2.4.1 and
lucene 2.9.0 search tests

Search:
query on the term: "even" (TermQuery, minimizes the overhead of the text
search), matches 500k docs, and across segment boundary, sort by val, sort
type: string. Numhits, e.g. number of slots = 100.

ran 20 iterations of the same query for each test.

First query, includes loading

lucene 2.4.1: 4858ms, lucene 2.9.0: 816ms, gain of 595%

avg of the rest 19 queries:

lucene 2.4.1: 32ms, lucene 2.9.0: 17ms , gain of 188%

I ran this test about 5 times, the findings are similar.

The performance gain is significant!

Great job!

-John


[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS

2009-10-11 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764563#action_12764563
 ] 

Michael Busch commented on LUCENE-1960:
---

I created an index with some compressed binary and String fields with 2.4 and 
verified that it gets decompressed correctly. The test fails currently on trunk 
(as expected) and passes with the latest patch.

However, there's one issue here: the compressed field gets silently 
uncompressed during merge, *only* if in the less efficient merge mode that 
doesn't use FieldsReader#rawDocs() and FieldsWriter#addRawDocuments(). So now 
this doesn't sound like a great solution that we sometimes uncompress the 
fields automatically and sometimes don't. 

I think we have three options:
1. Change FieldsWriter#addRawDocuments() to uncompress on-the-fly
2. Revert the FieldForMerge changes too and never uncompress automatically 
during merge
3. Make it possible for the user to uncompress fields with CompressionTools, no 
matter which UTF format the data was stored with

I don't really want to do 1., because it will have a performance impact for all 
fields (you have to look at the field bits even in raw merge mode). With 2. we 
will have to keep most of the compress/uncompress code in Lucene until 4.0, 
we'll just not make it possible anymore to add Store.COMPRESS fields with 3.0 
(that's already how trunk is). For 3. we'd have to add a deprecated 
isCompressed() method that the user can call.

> Remove deprecated Field.Store.COMPRESS
> --
>
> Key: LUCENE-1960
> URL: https://issues.apache.org/jira/browse/LUCENE-1960
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.0
>
> Attachments: lucene-1960-1.patch, lucene-1960.patch
>
>
> Also remove FieldForMerge and related code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org