[jira] Commented: (LUCENE-1762) Slightly more readable code in TermAttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735358#action_12735358 ] Uwe Schindler commented on LUCENE-1762: --- bq. setOnlyUseNewAPI(false) does not exist, it was removed with some of the patches lately. It gets automatically detected via reflection? No, this is a static global switch in TokenStream. If you switch it on, TokenStreams and Filters use only the new API forcefully and therefore use the separate Attribute implementations from o.a.l.analysis.tokenattributes. If it is switched off, a old Token instance is used instead, see [http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/core/org/apache/lucene/analysis/TokenStream.html#setOnlyUseNewAPI(boolean)]. The red color bug is fixed in trunk now :) There is one problem with the 6 new single attribute instances: They are code duplicates from Token but have no Test. I also think, I should add a missing test similar to TestToken.java and do the same test with 6 Attribute instances. I will review the other changes later, I have no time today. > Slightly more readable code in TermAttributeImpl > - > > Key: LUCENE-1762 > URL: https://issues.apache.org/jira/browse/LUCENE-1762 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Eks Dev >Assignee: Uwe Schindler >Priority: Trivial > Attachments: LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch > > > No big deal. > growTermBuffer(int newSize) was using correct, but slightly hard to follow > code. > the method was returning null as a hint that the current termBuffer has > enough space to the upstream code or reallocated buffer. > this patch simplifies logic making this method to only reallocate buffer, > nothing more. > It reduces number of if(null) checks in a few methods and reduces amount of > code. > all tests pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API
[ https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1460: -- Attachment: lucene-1460.patch Some more progress - mostly in contrib/memory. > Change all contrib TokenStreams/Filters to use the new TokenStream API > -- > > Key: LUCENE-1460 > URL: https://issues.apache.org/jira/browse/LUCENE-1460 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 2.9 > > Attachments: lucene-1460.patch, lucene-1460.patch, lucene-1460.patch, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_core.txt, LUCENE-1460_partial.txt > > > Now that we have the new TokenStream API (LUCENE-1422) we should change all > contrib modules to use it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
MergePolicy and IndexWriter methods argument
Hi While reading LogMergePolicy I noticed that it uses IndexWriter's member and method arg inconsistently: 1) Some methods that receive IW as a parameer, do: this.indexWriter = indexWriter, and then use the member instance. 2) Others set the member instance, but continue to use the method arg. 3) Others don't set the member instance at all. 4) Some use the member, w/ the possibility of hitting NPE (if, say, the findMerge* methods were not called yet). As far as I understand, the member instance is defined just for methods that need to use IW, but since the class does not require IW to be passed during construction, they rely on one of the findMerge* methods to set the member instance to the one they got. Is that right? I guess it is possible for the same MergePolicy instance to receive different IW instances during its life span, but is it something we should support? Leaving back-compat aside for a moment, if a MP lives within an IndexWriter, why not require an IW instance to be passed during an MP construction (passing 'this' for IW own instantiation)? Then we can remove the IW method arg and rely, safely, on the existence of IW. Shai
Re: MergePolicy and IndexWriter methods argument
I agree it's messy now. I think requiring the writer to be specified on creating the merge policy would make sense. You can't safely share a LMP today across multiple writers, yet the class "pretends" that you can... You'd also need to deprecate the public methods that take a writer in favor of new methods that don't take one (and use the member instead)? Wanna cons up a patch? Mike On Sun, Jul 26, 2009 at 7:30 AM, Shai Erera wrote: > Hi > > While reading LogMergePolicy I noticed that it uses IndexWriter's member and > method arg inconsistently: > 1) Some methods that receive IW as a parameer, do: this.indexWriter = > indexWriter, and then use the member instance. > 2) Others set the member instance, but continue to use the method arg. > 3) Others don't set the member instance at all. > 4) Some use the member, w/ the possibility of hitting NPE (if, say, the > findMerge* methods were not called yet). > > As far as I understand, the member instance is defined just for methods that > need to use IW, but since the class does not require IW to be passed during > construction, they rely on one of the findMerge* methods to set the member > instance to the one they got. Is that right? I guess it is possible for the > same MergePolicy instance to receive different IW instances during its life > span, but is it something we should support? > > Leaving back-compat aside for a moment, if a MP lives within an IndexWriter, > why not require an IW instance to be passed during an MP construction > (passing 'this' for IW own instantiation)? Then we can remove the IW method > arg and rely, safely, on the existence of IW. > > Shai > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1758) improve arabic analyzer: light8 -> light10
[ https://issues.apache.org/jira/browse/LUCENE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1758: Attachment: LUCENE-1758.patch also updated the stopwords list, it was in need of much improvement. > improve arabic analyzer: light8 -> light10 > -- > > Key: LUCENE-1758 > URL: https://issues.apache.org/jira/browse/LUCENE-1758 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers >Reporter: Robert Muir >Priority: Minor > Attachments: LUCENE-1758.patch, LUCENE-1758.txt > > > Someone mentioned on the java user list that the arabic analysis was not as > good as they would like. > This patch adds the لل- prefix (light10 algorithm versus light8 algorithm). > In the light10 paper, this improves precision from .390 to .413 > They mention this is not statistically significant, but it makes linguistic > sense and at least has been shown not to hurt. > In the future, I hope openrelevance will allow us to try some more > approaches. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API
[ https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735444#action_12735444 ] Robert Muir commented on LUCENE-1460: - Michael, I looked at your patch. What do you think about the remaining ones? should they be left as is for now? or do you think some of these should still expose Token (i.e. in their public/protected methods) but just as back compat/convenience and work w/ the new api behind the scenes? > Change all contrib TokenStreams/Filters to use the new TokenStream API > -- > > Key: LUCENE-1460 > URL: https://issues.apache.org/jira/browse/LUCENE-1460 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 2.9 > > Attachments: lucene-1460.patch, lucene-1460.patch, lucene-1460.patch, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_core.txt, LUCENE-1460_partial.txt > > > Now that we have the new TokenStream API (LUCENE-1422) we should change all > contrib modules to use it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API
[ https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1460: Attachment: LUCENE-1460.patch with analyzers/compound > Change all contrib TokenStreams/Filters to use the new TokenStream API > -- > > Key: LUCENE-1460 > URL: https://issues.apache.org/jira/browse/LUCENE-1460 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1460.patch, lucene-1460.patch, lucene-1460.patch, > lucene-1460.patch, LUCENE-1460_contrib_partial.txt, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, > LUCENE-1460_core.txt, LUCENE-1460_partial.txt > > > Now that we have the new TokenStream API (LUCENE-1422) we should change all > contrib modules to use it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: MergePolicy and IndexWriter methods argument
I'll open an issue and work out a patch. Though this deprecation stuff is what I was worried of - they always tend to expand more than I plan to :). Shai On Sun, Jul 26, 2009 at 9:44 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > I agree it's messy now. I think requiring the writer to be specified > on creating the merge policy would make sense. You can't safely share > a LMP today across multiple writers, yet the class "pretends" that you > can... > > You'd also need to deprecate the public methods that take a writer in > favor of new methods that don't take one (and use the member instead)? > > Wanna cons up a patch? > > Mike > > On Sun, Jul 26, 2009 at 7:30 AM, Shai Erera wrote: > > Hi > > > > While reading LogMergePolicy I noticed that it uses IndexWriter's member > and > > method arg inconsistently: > > 1) Some methods that receive IW as a parameer, do: this.indexWriter = > > indexWriter, and then use the member instance. > > 2) Others set the member instance, but continue to use the method arg. > > 3) Others don't set the member instance at all. > > 4) Some use the member, w/ the possibility of hitting NPE (if, say, the > > findMerge* methods were not called yet). > > > > As far as I understand, the member instance is defined just for methods > that > > need to use IW, but since the class does not require IW to be passed > during > > construction, they rely on one of the findMerge* methods to set the > member > > instance to the one they got. Is that right? I guess it is possible for > the > > same MergePolicy instance to receive different IW instances during its > life > > span, but is it something we should support? > > > > Leaving back-compat aside for a moment, if a MP lives within an > IndexWriter, > > why not require an IW instance to be passed during an MP construction > > (passing 'this' for IW own instantiation)? Then we can remove the IW > method > > arg and rely, safely, on the existence of IW. > > > > Shai > > > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >