[jira] [Updated] (LUCENE-8186) CustomAnalyzer with a LowerCaseTokenizerFactory fails to normalize multiterms

2018-03-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-8186:

Attachment: LUCENE-8186.patch

> CustomAnalyzer with a LowerCaseTokenizerFactory fails to normalize multiterms 
> --
>
> Key: LUCENE-8186
> URL: https://issues.apache.org/jira/browse/LUCENE-8186
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
> Attachments: LUCENE-8186.patch
>
>
> While working on SOLR-12034, a unit test that relied on the 
> LowerCaseTokenizerFactory failed.
> After some digging, I was able to replicate this at the Lucene level.
> Unit test:
> {noformat}
>   @Test
>   public void testLCTokenizerFactoryNormalize() throws Exception {
> Analyzer analyzer =  
> CustomAnalyzer.builder().withTokenizer(LowerCaseTokenizerFactory.class).build();
> //fails
> assertEquals(new BytesRef("hello"), analyzer.normalize("f", "Hello"));
> 
> //now try an integration test with the classic query parser
> QueryParser p = new QueryParser("f", analyzer);
> Query q = p.parse("Hello");
> //passes
> assertEquals(new TermQuery(new Term("f", "hello")), q);
> q = p.parse("Hello*");
> //fails
> assertEquals(new PrefixQuery(new Term("f", "hello")), q);
> q = p.parse("Hel*o");
> //fails
> assertEquals(new WildcardQuery(new Term("f", "hel*o")), q);
>   }
> {noformat}
> The problem is that the CustomAnalyzer iterates through the tokenfilters, but 
> does not call the tokenizer, which, in the case of the LowerCaseTokenizer, 
> does the filtering work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8186) CustomAnalyzer with a LowerCaseTokenizerFactory fails to normalize multiterms

2018-02-26 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated LUCENE-8186:

Description: 
While working on SOLR-12034, a unit test that relied on the 
LowerCaseTokenizerFactory failed.

After some digging, I was able to replicate this at the Lucene level.

Unit test:
{noformat}
  @Test
  public void testLCTokenizerFactoryNormalize() throws Exception {

Analyzer analyzer =  
CustomAnalyzer.builder().withTokenizer(LowerCaseTokenizerFactory.class).build();

//fails
assertEquals(new BytesRef("hello"), analyzer.normalize("f", "Hello"));

//now try an integration test with the classic query parser
QueryParser p = new QueryParser("f", analyzer);
Query q = p.parse("Hello");
//passes
assertEquals(new TermQuery(new Term("f", "hello")), q);

q = p.parse("Hello*");
//fails
assertEquals(new PrefixQuery(new Term("f", "hello")), q);

q = p.parse("Hel*o");
//fails
assertEquals(new WildcardQuery(new Term("f", "hel*o")), q);
  }
{noformat}

The problem is that the CustomAnalyzer iterates through the tokenfilters, but 
does not call the tokenizer, which, in the case of the LowerCaseTokenizer, does 
the filtering work.

  was:
While working on SOLR-12034, a unit test that relied on the 
LowerCaseTokenizerFactory failed.

After some digging, I was able to replicate this at the Lucene level.

Unit test:
{noformat}
  @Test
  public void testLCTokenizerFactoryNormalize() throws Exception {

Analyzer analyzer = CustomAnalyzer.builder().withTokenizer(new 
LowerCaseTokenizerFactory(Collections.EMPTY_MAP)).build();

//fails
assertEquals(new BytesRef("hello"), analyzer.normalize("f", "Hello"));

//now try an integration test with the classic query parser
QueryParser p = new QueryParser("f", analyzer);
Query q = p.parse("Hello");
//passes
assertEquals(new TermQuery(new Term("f", "hello")), q);

q = p.parse("Hello*");
//fails
assertEquals(new PrefixQuery(new Term("f", "hello")), q);

q = p.parse("Hel*o");
//fails
assertEquals(new WildcardQuery(new Term("f", "hel*o")), q);
  }
{noformat}

The problem is that the CustomAnalyzer iterates through the tokenfilters, but 
does not call the tokenizer, which, in the case of the LowerCaseTokenizer, does 
the filtering work.


> CustomAnalyzer with a LowerCaseTokenizerFactory fails to normalize multiterms 
> --
>
> Key: LUCENE-8186
> URL: https://issues.apache.org/jira/browse/LUCENE-8186
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> While working on SOLR-12034, a unit test that relied on the 
> LowerCaseTokenizerFactory failed.
> After some digging, I was able to replicate this at the Lucene level.
> Unit test:
> {noformat}
>   @Test
>   public void testLCTokenizerFactoryNormalize() throws Exception {
> Analyzer analyzer =  
> CustomAnalyzer.builder().withTokenizer(LowerCaseTokenizerFactory.class).build();
> //fails
> assertEquals(new BytesRef("hello"), analyzer.normalize("f", "Hello"));
> 
> //now try an integration test with the classic query parser
> QueryParser p = new QueryParser("f", analyzer);
> Query q = p.parse("Hello");
> //passes
> assertEquals(new TermQuery(new Term("f", "hello")), q);
> q = p.parse("Hello*");
> //fails
> assertEquals(new PrefixQuery(new Term("f", "hello")), q);
> q = p.parse("Hel*o");
> //fails
> assertEquals(new WildcardQuery(new Term("f", "hel*o")), q);
>   }
> {noformat}
> The problem is that the CustomAnalyzer iterates through the tokenfilters, but 
> does not call the tokenizer, which, in the case of the LowerCaseTokenizer, 
> does the filtering work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8186) CustomAnalyzer with a LowerCaseTokenizerFactory fails to normalize multiterms

2018-02-26 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated LUCENE-8186:

Description: 
While working on SOLR-12034, a unit test that relied on the 
LowerCaseTokenizerFactory failed.

After some digging, I was able to replicate this at the Lucene level.

Unit test:
{noformat}
  @Test
  public void testLCTokenizerFactoryNormalize() throws Exception {

Analyzer analyzer = CustomAnalyzer.builder().withTokenizer(new 
LowerCaseTokenizerFactory(Collections.EMPTY_MAP)).build();

//fails
assertEquals(new BytesRef("hello"), analyzer.normalize("f", "Hello"));

//now try an integration test with the classic query parser
QueryParser p = new QueryParser("f", analyzer);
Query q = p.parse("Hello");
//passes
assertEquals(new TermQuery(new Term("f", "hello")), q);

q = p.parse("Hello*");
//fails
assertEquals(new PrefixQuery(new Term("f", "hello")), q);

q = p.parse("Hel*o");
//fails
assertEquals(new WildcardQuery(new Term("f", "hel*o")), q);
  }
{noformat}

The problem is that the CustomAnalyzer iterates through the tokenfilters, but 
does not call the tokenizer, which, in the case of the LowerCaseTokenizer, does 
the filtering work.

  was:
While working on SOLR-12034, a unit test that relied on the 
LowerCaseTokenizerFactory failed.

After some digging, I was able to replicate this at the Lucene level.

Unit test:
{noformat}
  @Test
  public void testLCTokenizerFactoryNormalize() throws Exception {

Analyzer analyzer = CustomAnalyzer.builder().withTokenizer(new 
LowerCaseTokenizerFactory(Collections.EMPTY_MAP)).build();

//fails
assertEquals(new BytesRef("hello"), analyzer.normalize("f", "Hello"));

//now try an integration test with the classic query parser
QueryParser p = new QueryParser("f", analyzer);
Query q = p.parse("Hello");
//passes
assertEquals(new TermQuery(new Term("f", "hello")), q);

q = p.parse("Hello*");
//fails
assertEquals(new PrefixQuery(new Term("f", "hello")), q);

q = p.parse("Hel*o");
//fails
assertEquals(new WildcardQuery(new Term("f", "hel*o")), q);
  }
{noformat}

The problem is that the CustomAnalyzer iterates through the tokenfilters, but 
does not call the tokenizer, which, in the case of the LowerCaseAnalyzer, does 
the filtering work.


> CustomAnalyzer with a LowerCaseTokenizerFactory fails to normalize multiterms 
> --
>
> Key: LUCENE-8186
> URL: https://issues.apache.org/jira/browse/LUCENE-8186
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> While working on SOLR-12034, a unit test that relied on the 
> LowerCaseTokenizerFactory failed.
> After some digging, I was able to replicate this at the Lucene level.
> Unit test:
> {noformat}
>   @Test
>   public void testLCTokenizerFactoryNormalize() throws Exception {
> Analyzer analyzer = CustomAnalyzer.builder().withTokenizer(new 
> LowerCaseTokenizerFactory(Collections.EMPTY_MAP)).build();
> //fails
> assertEquals(new BytesRef("hello"), analyzer.normalize("f", "Hello"));
> 
> //now try an integration test with the classic query parser
> QueryParser p = new QueryParser("f", analyzer);
> Query q = p.parse("Hello");
> //passes
> assertEquals(new TermQuery(new Term("f", "hello")), q);
> q = p.parse("Hello*");
> //fails
> assertEquals(new PrefixQuery(new Term("f", "hello")), q);
> q = p.parse("Hel*o");
> //fails
> assertEquals(new WildcardQuery(new Term("f", "hel*o")), q);
>   }
> {noformat}
> The problem is that the CustomAnalyzer iterates through the tokenfilters, but 
> does not call the tokenizer, which, in the case of the LowerCaseTokenizer, 
> does the filtering work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org