[jira] Commented: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735358#action_12735358
 ] 

Uwe Schindler commented on LUCENE-1762:
---

bq. setOnlyUseNewAPI(false) does not exist, it was removed with some of the 
patches lately. It gets automatically detected via reflection?

No, this is a static global switch in TokenStream. If you switch it on, 
TokenStreams and Filters use only the new API forcefully and therefore use the 
separate Attribute implementations from o.a.l.analysis.tokenattributes. If it 
is switched off, a old Token instance is used instead, see 
[http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/core/org/apache/lucene/analysis/TokenStream.html#setOnlyUseNewAPI(boolean)].
 The red color bug is fixed in trunk now :)

There is one problem with the 6 new single attribute instances: They are code 
duplicates from Token but have no Test. I also think, I should add a missing 
test similar to TestToken.java and do the same test with 6 Attribute instances.

I will review the other changes later, I have no time today.

> Slightly more readable code in TermAttributeImpl 
> -
>
> Key: LUCENE-1762
> URL: https://issues.apache.org/jira/browse/LUCENE-1762
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Reporter: Eks Dev
>Assignee: Uwe Schindler
>Priority: Trivial
> Attachments: LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch
>
>
> No big deal. 
> growTermBuffer(int newSize) was using correct, but slightly hard to follow 
> code. 
> the method was returning null as a hint that the current termBuffer has 
> enough space to the upstream code or reallocated buffer.
> this patch simplifies logic   making this method to only reallocate buffer, 
> nothing more.  
> It reduces number of if(null) checks in a few methods and reduces amount of 
> code. 
> all tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API

2009-07-26 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch updated LUCENE-1460:
--

Attachment: lucene-1460.patch

Some more progress - mostly in contrib/memory.

> Change all contrib TokenStreams/Filters to use the new TokenStream API
> --
>
> Key: LUCENE-1460
> URL: https://issues.apache.org/jira/browse/LUCENE-1460
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.9
>
> Attachments: lucene-1460.patch, lucene-1460.patch, lucene-1460.patch, 
> LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, 
> LUCENE-1460_contrib_partial.txt, LUCENE-1460_core.txt, LUCENE-1460_partial.txt
>
>
> Now that we have the new TokenStream API (LUCENE-1422) we should change all 
> contrib modules to use it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



MergePolicy and IndexWriter methods argument

2009-07-26 Thread Shai Erera
Hi

While reading LogMergePolicy I noticed that it uses IndexWriter's member and
method arg inconsistently:
1) Some methods that receive IW as a parameer, do: this.indexWriter =
indexWriter, and then use the member instance.
2) Others set the member instance, but continue to use the method arg.
3) Others don't set the member instance at all.
4) Some use the member, w/ the possibility of hitting NPE (if, say, the
findMerge* methods were not called yet).

As far as I understand, the member instance is defined just for methods that
need to use IW, but since the class does not require IW to be passed during
construction, they rely on one of the findMerge* methods to set the member
instance to the one they got. Is that right? I guess it is possible for the
same MergePolicy instance to receive different IW instances during its life
span, but is it something we should support?

Leaving back-compat aside for a moment, if a MP lives within an IndexWriter,
why not require an IW instance to be passed during an MP construction
(passing 'this' for IW own instantiation)? Then we can remove the IW method
arg and rely, safely, on the existence of IW.

Shai


Re: MergePolicy and IndexWriter methods argument

2009-07-26 Thread Michael McCandless
I agree it's messy now.  I think requiring the writer to be specified
on creating the merge policy would make sense.  You can't safely share
a LMP today across multiple writers, yet the class "pretends" that you
can...

You'd also need to deprecate the public methods that take a writer in
favor of new methods that don't take one (and use the member instead)?

Wanna cons up a patch?

Mike

On Sun, Jul 26, 2009 at 7:30 AM, Shai Erera wrote:
> Hi
>
> While reading LogMergePolicy I noticed that it uses IndexWriter's member and
> method arg inconsistently:
> 1) Some methods that receive IW as a parameer, do: this.indexWriter =
> indexWriter, and then use the member instance.
> 2) Others set the member instance, but continue to use the method arg.
> 3) Others don't set the member instance at all.
> 4) Some use the member, w/ the possibility of hitting NPE (if, say, the
> findMerge* methods were not called yet).
>
> As far as I understand, the member instance is defined just for methods that
> need to use IW, but since the class does not require IW to be passed during
> construction, they rely on one of the findMerge* methods to set the member
> instance to the one they got. Is that right? I guess it is possible for the
> same MergePolicy instance to receive different IW instances during its life
> span, but is it something we should support?
>
> Leaving back-compat aside for a moment, if a MP lives within an IndexWriter,
> why not require an IW instance to be passed during an MP construction
> (passing 'this' for IW own instantiation)? Then we can remove the IW method
> arg and rely, safely, on the existence of IW.
>
> Shai
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1758) improve arabic analyzer: light8 -> light10

2009-07-26 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1758:


Attachment: LUCENE-1758.patch

also updated the stopwords list, it was in need of much improvement.


> improve arabic analyzer: light8 -> light10
> --
>
> Key: LUCENE-1758
> URL: https://issues.apache.org/jira/browse/LUCENE-1758
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Priority: Minor
> Attachments: LUCENE-1758.patch, LUCENE-1758.txt
>
>
> Someone mentioned on the java user list that the arabic analysis was not as 
> good as they would like.
> This patch adds the لل- prefix (light10 algorithm versus light8 algorithm).
> In the light10 paper, this improves precision from .390 to .413
> They mention this is not statistically significant, but it makes linguistic 
> sense and at least has been shown not to hurt.
> In the future, I hope openrelevance will allow us to try some more 
> approaches. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API

2009-07-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735444#action_12735444
 ] 

Robert Muir commented on LUCENE-1460:
-

Michael, I looked at your patch. 

What do you think about the remaining ones? should they be left as is for now?
or do you think some of these should still expose Token (i.e. in their 
public/protected methods) but just as back compat/convenience and work w/ the 
new api behind the scenes?


> Change all contrib TokenStreams/Filters to use the new TokenStream API
> --
>
> Key: LUCENE-1460
> URL: https://issues.apache.org/jira/browse/LUCENE-1460
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.9
>
> Attachments: lucene-1460.patch, lucene-1460.patch, lucene-1460.patch, 
> LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, 
> LUCENE-1460_contrib_partial.txt, LUCENE-1460_core.txt, LUCENE-1460_partial.txt
>
>
> Now that we have the new TokenStream API (LUCENE-1422) we should change all 
> contrib modules to use it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API

2009-07-26 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1460:


Attachment: LUCENE-1460.patch

with analyzers/compound

> Change all contrib TokenStreams/Filters to use the new TokenStream API
> --
>
> Key: LUCENE-1460
> URL: https://issues.apache.org/jira/browse/LUCENE-1460
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1460.patch, lucene-1460.patch, lucene-1460.patch, 
> lucene-1460.patch, LUCENE-1460_contrib_partial.txt, 
> LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, 
> LUCENE-1460_core.txt, LUCENE-1460_partial.txt
>
>
> Now that we have the new TokenStream API (LUCENE-1422) we should change all 
> contrib modules to use it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: MergePolicy and IndexWriter methods argument

2009-07-26 Thread Shai Erera
I'll open an issue and work out a patch. Though this deprecation stuff is
what I was worried of - they always tend to expand more than I plan to :).

Shai

On Sun, Jul 26, 2009 at 9:44 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> I agree it's messy now.  I think requiring the writer to be specified
> on creating the merge policy would make sense.  You can't safely share
> a LMP today across multiple writers, yet the class "pretends" that you
> can...
>
> You'd also need to deprecate the public methods that take a writer in
> favor of new methods that don't take one (and use the member instead)?
>
> Wanna cons up a patch?
>
> Mike
>
> On Sun, Jul 26, 2009 at 7:30 AM, Shai Erera wrote:
> > Hi
> >
> > While reading LogMergePolicy I noticed that it uses IndexWriter's member
> and
> > method arg inconsistently:
> > 1) Some methods that receive IW as a parameer, do: this.indexWriter =
> > indexWriter, and then use the member instance.
> > 2) Others set the member instance, but continue to use the method arg.
> > 3) Others don't set the member instance at all.
> > 4) Some use the member, w/ the possibility of hitting NPE (if, say, the
> > findMerge* methods were not called yet).
> >
> > As far as I understand, the member instance is defined just for methods
> that
> > need to use IW, but since the class does not require IW to be passed
> during
> > construction, they rely on one of the findMerge* methods to set the
> member
> > instance to the one they got. Is that right? I guess it is possible for
> the
> > same MergePolicy instance to receive different IW instances during its
> life
> > span, but is it something we should support?
> >
> > Leaving back-compat aside for a moment, if a MP lives within an
> IndexWriter,
> > why not require an IW instance to be passed during an MP construction
> > (passing 'this' for IW own instantiation)? Then we can remove the IW
> method
> > arg and rely, safely, on the existence of IW.
> >
> > Shai
> >
> >
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>