[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-18 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712377#comment-13712377 ] ASF subversion and git services commented on LUCENE-5030: - Commit

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-18 Thread ASF subversion and git services (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712380#comment-13712380 ] ASF subversion and git services commented on LUCENE-5030: - Commit

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-18 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712408#comment-13712408 ] Uwe Schindler commented on LUCENE-5030: --- JUH! :-) Thanks for heavy committing -

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-18 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712488#comment-13712488 ] Artem Lukanin commented on LUCENE-5030: --- Great! Thanks for reviewing.

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-17 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710797#comment-13710797 ] Artem Lukanin commented on LUCENE-5030: --- Then I have to override (and copy a lot of

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-17 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710889#comment-13710889 ] Artem Lukanin commented on LUCENE-5030: --- Michael, I got your idea. I will refactor

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-17 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711050#comment-13711050 ] Michael McCandless commented on LUCENE-5030: Patch looks great! Thanks

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-16 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709695#comment-13709695 ] Michael McCandless commented on LUCENE-5030: Sorry for the long delay here

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698951#comment-13698951 ] Michael McCandless commented on LUCENE-5030: Maybe we should rename

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699074#comment-13699074 ] Michael McCandless commented on LUCENE-5030: Hmm also ant precommit is

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-03 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699304#comment-13699304 ] Artem Lukanin commented on LUCENE-5030: --- in ant precommit I get this error:

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699329#comment-13699329 ] Michael McCandless commented on LUCENE-5030: OK no problem, I can fix it.

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-02 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697704#comment-13697704 ] Robert Muir commented on LUCENE-5030: - {noformat} + /** Include this flag in the

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-01 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697173#comment-13697173 ] Michael McCandless commented on LUCENE-5030: I plan to commit the last patch

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-07-01 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697537#comment-13697537 ] Artem Lukanin commented on LUCENE-5030: --- Cool! FuzzySuggester has

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-27 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694610#comment-13694610 ] Artem Lukanin commented on LUCENE-5030: --- BTW, for your {code}// TODO: is there a

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694801#comment-13694801 ] Michael McCandless commented on LUCENE-5030: Thanks Artem, new patch looks

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-26 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693931#comment-13693931 ] Michael McCandless commented on LUCENE-5030: Hmm, testStolenBytes should be

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-25 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693227#comment-13693227 ] Michael McCandless commented on LUCENE-5030: Thanks Artem! I don't

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-24 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691744#comment-13691744 ] Artem Lukanin commented on LUCENE-5030: --- I have added UNICODE_AWARE option in

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-21 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690102#comment-13690102 ] Artem Lukanin commented on LUCENE-5030: --- I'm uploading 3 results of benchmarking:

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-21 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690124#comment-13690124 ] Artem Lukanin commented on LUCENE-5030: --- OK, I will add a new option UNICODE_AWARE

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690183#comment-13690183 ] Michael McCandless commented on LUCENE-5030: I'm a little confused by the

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-21 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690614#comment-13690614 ] Michael McCandless commented on LUCENE-5030: Oh, duh, the conversion from

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-20 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688924#comment-13688924 ] Artem Lukanin commented on LUCENE-5030: --- I ran this command: {code}ant

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-20 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689231#comment-13689231 ] Artem Lukanin commented on LUCENE-5030: --- OK, in general the performance is worse

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689269#comment-13689269 ] Michael McCandless commented on LUCENE-5030: Hmm can you post the full output

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-20 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689296#comment-13689296 ] Artem Lukanin commented on LUCENE-5030: --- The last patch with INFO_SEP/2 was posted

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689323#comment-13689323 ] Michael McCandless commented on LUCENE-5030: Oh, woops, I missed it. Thanks.

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687822#comment-13687822 ] Artem Lukanin commented on LUCENE-5030: --- I see, that some tests in

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687902#comment-13687902 ] Robert Muir commented on LUCENE-5030: - I dont think changing SEP_LABEL from a single

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687917#comment-13687917 ] Artem Lukanin commented on LUCENE-5030: --- Possibly we should change it to INFO_SEP2

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688185#comment-13688185 ] Michael McCandless commented on LUCENE-5030: The easy performance tester to

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-18 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686447#comment-13686447 ] Artem Lukanin commented on LUCENE-5030: --- you already have private static final int

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-17 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685469#comment-13685469 ] Michael McCandless commented on LUCENE-5030: Hmm POS_SEP and HOLE are still

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-17 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685481#comment-13685481 ] Artem Lukanin commented on LUCENE-5030: --- Sorry for autoformatting, I will upload

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-17 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685516#comment-13685516 ] Artem Lukanin commented on LUCENE-5030: --- BTW, if I replace it with

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-17 Thread Artem Lukanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685551#comment-13685551 ] Artem Lukanin commented on LUCENE-5030: --- I see, the patch still has autoformatting

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-17 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685567#comment-13685567 ] Michael McCandless commented on LUCENE-5030: Oh, right, we can't just use

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-13 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682162#comment-13682162 ] Michael McCandless commented on LUCENE-5030: Thanks Artem! I think we need