[jira] [Commented] (LUCENE-4880) Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory

2013-04-24 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640816#comment-13640816
 ] 

Tim Allison commented on LUCENE-4880:
-

Thank you!

 Difference in offset handling between IndexReader created by MemoryIndex and 
 one created by RAMDirectory
 

 Key: LUCENE-4880
 URL: https://issues.apache.org/jira/browse/LUCENE-4880
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.2
 Environment: Windows 7 (probably irrelevant)
Reporter: Tim Allison
 Fix For: 5.0, 4.3

 Attachments: LUCENE-4880.patch, 
 MemoryIndexVsRamDirZeroLengthTermTest.java


 MemoryIndex skips tokens that have length == 0 when building the index; the 
 result is that it does not increment the token offset (nor does it store the 
 position offsets if that option is set) for tokens of length == 0.  A regular 
 index (via, say, RAMDirectory) does not appear to do this.
 When using the ICUFoldingFilter, it is possible to have a term of zero length 
 (the \u0640 character separated by spaces).  If that occurs in a document, 
 the offsets returned at search time differ between the MemoryIndex and a 
 regular index.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4880) Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory

2013-03-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614073#comment-13614073
 ] 

Robert Muir commented on LUCENE-4880:
-

Thanks for raising this Timothy. 

I think its a bug in MemoryIndex: it shouldn't skip terms that are of zero 
length.

 Difference in offset handling between IndexReader created by MemoryIndex and 
 one created by RAMDirectory
 

 Key: LUCENE-4880
 URL: https://issues.apache.org/jira/browse/LUCENE-4880
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.2
 Environment: Windows 7 (probably irrelevant)
Reporter: Timothy Allison
 Attachments: MemoryIndexVsRamDirZeroLengthTermTest.java


 MemoryIndex skips tokens that have length == 0 when building the index; the 
 result is that it does not increment the token offset (nor does it store the 
 position offsets if that option is set) for tokens of length == 0.  A regular 
 index (via, say, RAMDirectory) does not appear to do this.
 When using the ICUFoldingFilter, it is possible to have a term of zero length 
 (the \u0640 character separated by spaces).  If that occurs in a document, 
 the offsets returned at search time differ between the MemoryIndex and a 
 regular index.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4880) Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory

2013-03-26 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614080#comment-13614080
 ] 

Uwe Schindler commented on LUCENE-4880:
---

Yes, I this is a bug in MemoryIndex. In earlier Lucene versions I think we 
skipped empty terms in standard IndexWriter, but thats no longer the case. So 
MemoryIndex must be consistent.

 Difference in offset handling between IndexReader created by MemoryIndex and 
 one created by RAMDirectory
 

 Key: LUCENE-4880
 URL: https://issues.apache.org/jira/browse/LUCENE-4880
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.2
 Environment: Windows 7 (probably irrelevant)
Reporter: Timothy Allison
 Attachments: MemoryIndexVsRamDirZeroLengthTermTest.java


 MemoryIndex skips tokens that have length == 0 when building the index; the 
 result is that it does not increment the token offset (nor does it store the 
 position offsets if that option is set) for tokens of length == 0.  A regular 
 index (via, say, RAMDirectory) does not appear to do this.
 When using the ICUFoldingFilter, it is possible to have a term of zero length 
 (the \u0640 character separated by spaces).  If that occurs in a document, 
 the offsets returned at search time differ between the MemoryIndex and a 
 regular index.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4880) Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory

2013-03-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614086#comment-13614086
 ] 

Robert Muir commented on LUCENE-4880:
-

I also think its stupid you get 0640 as a token by itself in any case. I dont 
agree with the unicode property of letter for this character as that doesnt 
makes sense to me, in my opinion it should be format. I sure hope there is 
some good reason for this, but to me its crazy.

 Difference in offset handling between IndexReader created by MemoryIndex and 
 one created by RAMDirectory
 

 Key: LUCENE-4880
 URL: https://issues.apache.org/jira/browse/LUCENE-4880
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.2
 Environment: Windows 7 (probably irrelevant)
Reporter: Timothy Allison
 Attachments: MemoryIndexVsRamDirZeroLengthTermTest.java


 MemoryIndex skips tokens that have length == 0 when building the index; the 
 result is that it does not increment the token offset (nor does it store the 
 position offsets if that option is set) for tokens of length == 0.  A regular 
 index (via, say, RAMDirectory) does not appear to do this.
 When using the ICUFoldingFilter, it is possible to have a term of zero length 
 (the \u0640 character separated by spaces).  If that occurs in a document, 
 the offsets returned at search time differ between the MemoryIndex and a 
 regular index.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org