[jira] [Updated] (LUCENE-5838) hunspell buggy with over 64k affixes

2014-07-21 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5838:


Attachment: LUCENE-5838.patch

patch with test that generates its own file, so it doesnt need a 1MB test data 
file.

 hunspell buggy with over 64k affixes
 

 Key: LUCENE-5838
 URL: https://issues.apache.org/jira/browse/LUCENE-5838
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5838.patch, LUCENE-5838.patch


 currently we build TreeMapString,ListCharacter in ram, to sort before 
 adding to the FST (which encodes the list as IntsRef). 
 char overflows here if there are more than 64k affixes (e.g. basque).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5838) hunspell buggy with over 64k affixes

2014-07-20 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5838:


Attachment: LUCENE-5838.patch

patch with a minimal test.

 hunspell buggy with over 64k affixes
 

 Key: LUCENE-5838
 URL: https://issues.apache.org/jira/browse/LUCENE-5838
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5838.patch


 currently we build TreeMapString,ListCharacter in ram, to sort before 
 adding to the FST (which encodes the list as IntsRef). 
 char overflows here if there are more than 64k affixes (e.g. basque).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org