[jira] [Commented] (LUCENE-6584) Docs on StandardTokenizer don't mention the behaviour change in Version.LUCENE_4_7_0

2015-06-19 Thread Daniel Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593306#comment-14593306
 ] 

Daniel Collins commented on LUCENE-6584:


I think the point is that in Lucene 4.7, this update was made:

{quote}
LUCENE-5357: Upgrade StandardTokenizer and UAX29URLEmailTokenizer to Unicode 
6.3; update UAX29URLEmailTokenizer's recognized top level domains in URLs and 
Emails from the IANA Root Zone Database. 
{quote}

but that never made it to the Javadoc page..

 Docs on StandardTokenizer don't mention the behaviour change in 
 Version.LUCENE_4_7_0
 

 Key: LUCENE-6584
 URL: https://issues.apache.org/jira/browse/LUCENE-6584
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.10.4
Reporter: Trejkaz
Priority: Minor

 The following test shows that the behaviour of StandardTokenizer differs once 
 you start passing Version.LUCENE_4_7_0 or greater:
 {code}
 import java.io.StringReader;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.standard.StandardTokenizer;
 import org.apache.lucene.util.Version;
 import org.junit.Test;
 import static org.hamcrest.Matchers.is;
 import static org.junit.Assert.assertThat;
 public class TestStandardTokenizerStandalone
 {
 @Test
 public void testLucene4_6_1() throws Exception
 {
 doTest(Version.LUCENE_4_6_1);
 }
 @Test
 public void testLucene4_7_0() throws Exception
 {
 doTest(Version.LUCENE_4_7_0);
 }
 public void doTest(Version version) throws Exception
 {
 try (TokenStream stream = new StandardTokenizer(version, new 
 StringReader(makeLongString(2550
 {
 stream.reset();
 assertThat(stream.incrementToken(), is(false));
 }
 }
 private String makeLongString(int length)
 {
 StringBuilder builder = new StringBuilder(length);
 for (int i = 0; i  length; i++)
 {
 builder.append('x');
 }
 return builder.toString();
 }
 }
 {code}
 However, the Javadoc only mentions the behaviour changes in versions 3.1 and 
 3.4.
 The constructor for passing the version is deprecated, presumably under the 
 false impression that no changes occurred during Lucene 4. I know the Version 
 parameter was killed off entirely in version 5, which presumably means that 
 people who tokenised stuff in Lucene 4.6 or earlier have now been trapped and 
 have to copy the tokeniser from Lucene 4 to keep their queries working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6584) Docs on StandardTokenizer don't mention the behaviour change in Version.LUCENE_4_7_0

2015-06-18 Thread Trejkaz (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592934#comment-14592934
 ] 

Trejkaz commented on LUCENE-6584:
-

http://lucene.apache.org/core/4_10_4/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html


 Docs on StandardTokenizer don't mention the behaviour change in 
 Version.LUCENE_4_7_0
 

 Key: LUCENE-6584
 URL: https://issues.apache.org/jira/browse/LUCENE-6584
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.10.4
Reporter: Trejkaz
Priority: Minor

 The following test shows that the behaviour of StandardTokenizer differs once 
 you start passing Version.LUCENE_4_7_0 or greater:
 {code}
 import java.io.StringReader;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.standard.StandardTokenizer;
 import org.apache.lucene.util.Version;
 import org.junit.Test;
 import static org.hamcrest.Matchers.is;
 import static org.junit.Assert.assertThat;
 public class TestStandardTokenizerStandalone
 {
 @Test
 public void testLucene4_6_1() throws Exception
 {
 doTest(Version.LUCENE_4_6_1);
 }
 @Test
 public void testLucene4_7_0() throws Exception
 {
 doTest(Version.LUCENE_4_7_0);
 }
 public void doTest(Version version) throws Exception
 {
 try (TokenStream stream = new StandardTokenizer(version, new 
 StringReader(makeLongString(2550
 {
 stream.reset();
 assertThat(stream.incrementToken(), is(false));
 }
 }
 private String makeLongString(int length)
 {
 StringBuilder builder = new StringBuilder(length);
 for (int i = 0; i  length; i++)
 {
 builder.append('x');
 }
 return builder.toString();
 }
 }
 {code}
 However, the Javadoc only mentions the behaviour changes in versions 3.1 and 
 3.4.
 The constructor for passing the version is deprecated, presumably under the 
 false impression that no changes occurred during Lucene 4. I know the Version 
 parameter was killed off entirely in version 5, which presumably means that 
 people who tokenised stuff in Lucene 4.6 or earlier have now been trapped and 
 have to copy the tokeniser from Lucene 4 to keep their queries working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6584) Docs on StandardTokenizer don't mention the behaviour change in Version.LUCENE_4_7_0

2015-06-18 Thread Ryan Ernst (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592827#comment-14592827
 ] 

Ryan Ernst commented on LUCENE-6584:


StandardTokenizerFactory now handles versioning, like with other analysis 
components. Pass luceneMatchVersion to the factory args. You can also 
construct it directly: 
{{org.apache.lucene.analysis.standard.std40.StandardTokenizer40}}

To which javadocs are you referring?

 Docs on StandardTokenizer don't mention the behaviour change in 
 Version.LUCENE_4_7_0
 

 Key: LUCENE-6584
 URL: https://issues.apache.org/jira/browse/LUCENE-6584
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.10.4
Reporter: Trejkaz
Priority: Minor

 The following test shows that the behaviour of StandardTokenizer differs once 
 you start passing Version.LUCENE_4_7_0 or greater:
 {code}
 import java.io.StringReader;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.standard.StandardTokenizer;
 import org.apache.lucene.util.Version;
 import org.junit.Test;
 import static org.hamcrest.Matchers.is;
 import static org.junit.Assert.assertThat;
 public class TestStandardTokenizerStandalone
 {
 @Test
 public void testLucene4_6_1() throws Exception
 {
 doTest(Version.LUCENE_4_6_1);
 }
 @Test
 public void testLucene4_7_0() throws Exception
 {
 doTest(Version.LUCENE_4_7_0);
 }
 public void doTest(Version version) throws Exception
 {
 try (TokenStream stream = new StandardTokenizer(version, new 
 StringReader(makeLongString(2550
 {
 stream.reset();
 assertThat(stream.incrementToken(), is(false));
 }
 }
 private String makeLongString(int length)
 {
 StringBuilder builder = new StringBuilder(length);
 for (int i = 0; i  length; i++)
 {
 builder.append('x');
 }
 return builder.toString();
 }
 }
 {code}
 However, the Javadoc only mentions the behaviour changes in versions 3.1 and 
 3.4.
 The constructor for passing the version is deprecated, presumably under the 
 false impression that no changes occurred during Lucene 4. I know the Version 
 parameter was killed off entirely in version 5, which presumably means that 
 people who tokenised stuff in Lucene 4.6 or earlier have now been trapped and 
 have to copy the tokeniser from Lucene 4 to keep their queries working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org