[jira] [Updated] (LUCENE-4656) Fix EmptyTokenizer

2013-01-03 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4656:
-

Attachment: LUCENE-4656.patch

Patch. I wasn't sure whether to add a CharTermAttribute to EmptyTokenizer or to 
try fixing BaseTokenStreamTestCase but I couldn't think of a non-trivial 
tokenizer that wouldn't have a CharTermAttribute so I left the assertion that 
checks that a token stream always has a CharTermAttribute.

 Fix EmptyTokenizer
 --

 Key: LUCENE-4656
 URL: https://issues.apache.org/jira/browse/LUCENE-4656
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Attachments: LUCENE-4656.patch


 TestRandomChains can fail because EmptyTokenizer doesn't have a 
 CharTermAttribute and doesn't compute the end offset (if the offset attribute 
 was added by a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4656) Fix EmptyTokenizer

2013-01-03 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4656:
-

Attachment: LUCENE-4656.patch

Alternative patch that fixes BaseTokenStreamTestCase. I needed to add a quick 
hack to add a TermToBytesRefAttribute when the tokenstream doesn't have one so 
that TermsHashPerField doesn't complain that it can't find this attribute when 
indexing.

 Fix EmptyTokenizer
 --

 Key: LUCENE-4656
 URL: https://issues.apache.org/jira/browse/LUCENE-4656
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Attachments: LUCENE-4656.patch, LUCENE-4656.patch


 TestRandomChains can fail because EmptyTokenizer doesn't have a 
 CharTermAttribute and doesn't compute the end offset (if the offset attribute 
 was added by a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4656) Fix EmptyTokenizer

2013-01-03 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4656:
--

Attachment: LUCENE-4656-IW-bug.patch

Here a patch showing the bug in the public class EmptyTokenStream from 
analysis-common working together with IndexWriter.

It also has a test that assertTokenStreamContents actually works.

 Fix EmptyTokenizer
 --

 Key: LUCENE-4656
 URL: https://issues.apache.org/jira/browse/LUCENE-4656
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Attachments: LUCENE-4656-IW-bug.patch, LUCENE-4656.patch, 
 LUCENE-4656.patch


 TestRandomChains can fail because EmptyTokenizer doesn't have a 
 CharTermAttribute and doesn't compute the end offset (if the offset attribute 
 was added by a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4656) Fix EmptyTokenizer

2013-01-03 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4656:
--

Attachment: LUCENE-4656-IW-fix.patch

Here the fix that solves the DocInverterPerField issue (it also removes the 
horrible for(;;) loop where the first clause is a if ... break.

Now only BaseTokenStreamTestCase should be able to handle the missing 
attribute. It should *only* complain when actually tokens are emitted.

 Fix EmptyTokenizer
 --

 Key: LUCENE-4656
 URL: https://issues.apache.org/jira/browse/LUCENE-4656
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Attachments: LUCENE-4656-IW-bug.patch, LUCENE-4656-IW-fix.patch, 
 LUCENE-4656.patch, LUCENE-4656.patch


 TestRandomChains can fail because EmptyTokenizer doesn't have a 
 CharTermAttribute and doesn't compute the end offset (if the offset attribute 
 was added by a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4656) Fix EmptyTokenizer

2013-01-03 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4656:
--

Attachment: LUCENE-4656-IW-fix.patch

Better patch, ueses do...while, which is more readable.

 Fix EmptyTokenizer
 --

 Key: LUCENE-4656
 URL: https://issues.apache.org/jira/browse/LUCENE-4656
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Attachments: LUCENE-4656-IW-bug.patch, LUCENE-4656-IW-fix.patch, 
 LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, LUCENE-4656.patch


 TestRandomChains can fail because EmptyTokenizer doesn't have a 
 CharTermAttribute and doesn't compute the end offset (if the offset attribute 
 was added by a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4656) Fix EmptyTokenizer

2013-01-03 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4656:


Attachment: LUCENE-4656_bttc.patch

here's a patch for BaseTokenStreamTestCase. I think it should work for this 
EmptyTokenizer too.

 Fix EmptyTokenizer
 --

 Key: LUCENE-4656
 URL: https://issues.apache.org/jira/browse/LUCENE-4656
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, 
 LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, 
 LUCENE-4656.patch


 TestRandomChains can fail because EmptyTokenizer doesn't have a 
 CharTermAttribute and doesn't compute the end offset (if the offset attribute 
 was added by a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4656) Fix EmptyTokenizer

2013-01-03 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4656:
--

Attachment: LUCENE-4656.patch

New patch merged with Adrien's. I am not sure if the Fix in 
BaseTokenStreamTestCase is correct, because if you pass the String[] you expect 
tokens and the fix is different like the one for offsets or positionincrements.

 Fix EmptyTokenizer
 --

 Key: LUCENE-4656
 URL: https://issues.apache.org/jira/browse/LUCENE-4656
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Trivial
 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, 
 LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, 
 LUCENE-4656.patch, LUCENE-4656.patch


 TestRandomChains can fail because EmptyTokenizer doesn't have a 
 CharTermAttribute and doesn't compute the end offset (if the offset attribute 
 was added by a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4656) Fix EmptyTokenizer

2013-01-03 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4656:
--

Attachment: LUCENE-4656.patch

Patch merged with Robert's.

 Fix EmptyTokenizer
 --

 Key: LUCENE-4656
 URL: https://issues.apache.org/jira/browse/LUCENE-4656
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Adrien Grand
Assignee: Uwe Schindler
Priority: Trivial
 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, 
 LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, 
 LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch


 TestRandomChains can fail because EmptyTokenizer doesn't have a 
 CharTermAttribute and doesn't compute the end offset (if the offset attribute 
 was added by a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4656) Fix EmptyTokenizer

2013-01-03 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4656:
--

Attachment: LUCENE-4656.patch

Add a check that the document is really in IW after indexing.

 Fix EmptyTokenizer
 --

 Key: LUCENE-4656
 URL: https://issues.apache.org/jira/browse/LUCENE-4656
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Adrien Grand
Assignee: Uwe Schindler
Priority: Trivial
 Attachments: LUCENE-4656_bttc.patch, LUCENE-4656-IW-bug.patch, 
 LUCENE-4656-IW-fix.patch, LUCENE-4656-IW-fix.patch, LUCENE-4656.patch, 
 LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch, LUCENE-4656.patch


 TestRandomChains can fail because EmptyTokenizer doesn't have a 
 CharTermAttribute and doesn't compute the end offset (if the offset attribute 
 was added by a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org