[ https://issues.apache.org/jira/browse/LUCENE-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright reassigned LUCENE-8219: ----------------------------------- Assignee: Karl Wright > LevenshteinAutomata should estimate the number of states and transitions > ------------------------------------------------------------------------ > > Key: LUCENE-8219 > URL: https://issues.apache.org/jira/browse/LUCENE-8219 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Christian Ziech > Assignee: Karl Wright > Priority: Major > > Currently the toAutomaton() method of the LevenshteinAutomata class uses the > default constructor of the Automaton although it exactly knows how many > states the automaton will have and can do a reasonable estimation on how many > transitions it will need as well. > I suggest changing the lines > {code:language=java|firstline=154|linenumbers=true} > // the number of states is based on the length of the word and n > int numStates = description.size(); > Automaton a = new Automaton(); > int lastState; > {code} > to > {code:language=java|firstline=154|linenumbers=true} > // the number of states is based on the length of the word and n > final int numStates = description.size(); > final int numTransitions = numStates * Math.min(1 + 2 * n, alphabet.length); > final int prefixStates = prefix != null ? prefix.codePointCount(0, > prefix.length()) : 0; > final Automaton a = new Automaton(numStates + prefixStates, numTransitions); > int lastState; > {code} > For my test data this cut down on the total amount of memory needed for int > arrays by factor 4. The estimation of "1 + 2 * editDistance" should maybe > rather be replaced by a value coming from the ParametricDescription used. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org