[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
[ https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-7914: - Fix Version/s: (was: 7.1) 7.0 > Add safeguards to RegExp.toAutomaton and Operations > --- > > Key: LUCENE-7914 > URL: https://issues.apache.org/jira/browse/LUCENE-7914 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Fix For: 7.0, master (8.0) > > Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, > LUCENE-7914.patch, LUCENE-7914.patch > > > When creating an automaton from a regexp, some operators can create more > states than maxDeterminizedStates: > {code} > a{1} > {code} > The example above creates a single path with 10k states already > determinized so maxDeterminizedStates is not checked. > Some operations on automaton like Operations.isFinite or > Operations.topoSortStates are recursive and the maximum level of recursion > depends on the longest path in the automaton. So a large automaton like above > can exceed java's stack. > In most of the cases we are covered by maxDeterminizedStates but there will > always be adversarial cases where a large automaton is created from a small > input so I think we should also have safeguards in the recursive methods. > I've attached a patch that adds a max recursion level to Operations.isFinite > and Operations.topoSortStates in order to limit stack overflows. The limit is > set to 1000 so any automaton with a path bigger than 1000 would throw an > IllegalStateException. > The patch also uses maxDeterminizedStates to limit the number of states that > a repeat operator can create and throw a TooComplex..Exception when this > limit is reached. > Finally the patch adds the ability to skip Operations.isFinite on > AutomatonQuery and uses this as an optimization for PrefixQuery that uses > infinite automatons only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
[ https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-7914: - Fix Version/s: 7.1 master (8.0) > Add safeguards to RegExp.toAutomaton and Operations > --- > > Key: LUCENE-7914 > URL: https://issues.apache.org/jira/browse/LUCENE-7914 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Fix For: master (8.0), 7.1 > > Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, > LUCENE-7914.patch, LUCENE-7914.patch > > > When creating an automaton from a regexp, some operators can create more > states than maxDeterminizedStates: > {code} > a{1} > {code} > The example above creates a single path with 10k states already > determinized so maxDeterminizedStates is not checked. > Some operations on automaton like Operations.isFinite or > Operations.topoSortStates are recursive and the maximum level of recursion > depends on the longest path in the automaton. So a large automaton like above > can exceed java's stack. > In most of the cases we are covered by maxDeterminizedStates but there will > always be adversarial cases where a large automaton is created from a small > input so I think we should also have safeguards in the recursive methods. > I've attached a patch that adds a max recursion level to Operations.isFinite > and Operations.topoSortStates in order to limit stack overflows. The limit is > set to 1000 so any automaton with a path bigger than 1000 would throw an > IllegalStateException. > The patch also uses maxDeterminizedStates to limit the number of states that > a repeat operator can create and throw a TooComplex..Exception when this > limit is reached. > Finally the patch adds the ability to skip Operations.isFinite on > AutomatonQuery and uses this as an optimization for PrefixQuery that uses > infinite automatons only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
[ https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-7914: - Attachment: LUCENE-7914.patch Thanks for the great feedback [~jpountz] and [~rcmuir] I pushed another iteration that replaces the ISEs in IAEs and cleans up the unnecessary changes. > Add safeguards to RegExp.toAutomaton and Operations > --- > > Key: LUCENE-7914 > URL: https://issues.apache.org/jira/browse/LUCENE-7914 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, > LUCENE-7914.patch, LUCENE-7914.patch > > > When creating an automaton from a regexp, some operators can create more > states than maxDeterminizedStates: > {code} > a{1} > {code} > The example above creates a single path with 10k states already > determinized so maxDeterminizedStates is not checked. > Some operations on automaton like Operations.isFinite or > Operations.topoSortStates are recursive and the maximum level of recursion > depends on the longest path in the automaton. So a large automaton like above > can exceed java's stack. > In most of the cases we are covered by maxDeterminizedStates but there will > always be adversarial cases where a large automaton is created from a small > input so I think we should also have safeguards in the recursive methods. > I've attached a patch that adds a max recursion level to Operations.isFinite > and Operations.topoSortStates in order to limit stack overflows. The limit is > set to 1000 so any automaton with a path bigger than 1000 would throw an > IllegalStateException. > The patch also uses maxDeterminizedStates to limit the number of states that > a repeat operator can create and throw a TooComplex..Exception when this > limit is reached. > Finally the patch adds the ability to skip Operations.isFinite on > AutomatonQuery and uses this as an optimization for PrefixQuery that uses > infinite automatons only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
[ https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-7914: - Attachment: LUCENE-7914.patch It's not, it's just a copy/paste from another test in the class. I pushed another iteration that removes the serialization/deserialization from TestRegExp completely. > Add safeguards to RegExp.toAutomaton and Operations > --- > > Key: LUCENE-7914 > URL: https://issues.apache.org/jira/browse/LUCENE-7914 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, > LUCENE-7914.patch > > > When creating an automaton from a regexp, some operators can create more > states than maxDeterminizedStates: > {code} > a{1} > {code} > The example above creates a single path with 10k states already > determinized so maxDeterminizedStates is not checked. > Some operations on automaton like Operations.isFinite or > Operations.topoSortStates are recursive and the maximum level of recursion > depends on the longest path in the automaton. So a large automaton like above > can exceed java's stack. > In most of the cases we are covered by maxDeterminizedStates but there will > always be adversarial cases where a large automaton is created from a small > input so I think we should also have safeguards in the recursive methods. > I've attached a patch that adds a max recursion level to Operations.isFinite > and Operations.topoSortStates in order to limit stack overflows. The limit is > set to 1000 so any automaton with a path bigger than 1000 would throw an > IllegalStateException. > The patch also uses maxDeterminizedStates to limit the number of states that > a repeat operator can create and throw a TooComplex..Exception when this > limit is reached. > Finally the patch adds the ability to skip Operations.isFinite on > AutomatonQuery and uses this as an optimization for PrefixQuery that uses > infinite automatons only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
[ https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-7914: - Attachment: LUCENE-7914.patch I removed the new constructor in AutomatonQuery and the optim in PrefixQuery. This is useless if we have the max recursion in place and I agree that it makes the constructor ugly. So here is a new patch. > Add safeguards to RegExp.toAutomaton and Operations > --- > > Key: LUCENE-7914 > URL: https://issues.apache.org/jira/browse/LUCENE-7914 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch > > > When creating an automaton from a regexp, some operators can create more > states than maxDeterminizedStates: > {code} > a{1} > {code} > The example above creates a single path with 10k states already > determinized so maxDeterminizedStates is not checked. > Some operations on automaton like Operations.isFinite or > Operations.topoSortStates are recursive and the maximum level of recursion > depends on the longest path in the automaton. So a large automaton like above > can exceed java's stack. > In most of the cases we are covered by maxDeterminizedStates but there will > always be adversarial cases where a large automaton is created from a small > input so I think we should also have safeguards in the recursive methods. > I've attached a patch that adds a max recursion level to Operations.isFinite > and Operations.topoSortStates in order to limit stack overflows. The limit is > set to 1000 so any automaton with a path bigger than 1000 would throw an > IllegalStateException. > The patch also uses maxDeterminizedStates to limit the number of states that > a repeat operator can create and throw a TooComplex..Exception when this > limit is reached. > Finally the patch adds the ability to skip Operations.isFinite on > AutomatonQuery and uses this as an optimization for PrefixQuery that uses > infinite automatons only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
[ https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-7914: - Description: When creating an automaton from a regexp, some operators can create more states than maxDeterminizedStates: {code} a{1} {code} The example above creates a single path with 10k states already determinized so maxDeterminizedStates is not checked. Some operations on automaton like Operations.isFinite or Operations.topoSortStates are recursive and the maximum level of recursion depends on the longest path in the automaton. So a large automaton like above can exceed java's stack. In most of the cases we are covered by maxDeterminizedStates but there will always be adversarial cases where a large automaton is created from a small input so I think we should also have safeguards in the recursive methods. I've attached a patch that adds a max recursion level to Operations.isFinite and Operations.topoSortStates in order to limit stack overflows. The limit is set to 1000 so any automaton with a path bigger than 1000 would throw an IllegalStateException. The patch also uses maxDeterminizedStates to limit the number of states that a repeat operator can create and throw a TooComplex..Exception when this limit is reached. Finally the patch adds the ability to skip Operations.isFinite on AutomatonQuery and uses this as an optimization for PrefixQuery that uses infinite automatons only. was: When creating an automaton from a regexp, some operators can create more states than maxDeterminizedStates: {code} a{1} {code} The example above creates a single path with 10k states already determinized so maxDeterminizedStates is not checked. Some operations on automaton like Operations.isFinite or Operations.topoSortStates are recursive and the maximum level of recursion depends on the longest path in the automaton. So a large automaton like above can exceed java's stack. In most of the cases we are covered by maxDeterminizedStates but there will always be adversarial cases where a large automaton is created from a small input so I think we should also have safeguards in the recursive methods. I've attached a patch that adds a max recursion level to Operations.isFinite and Operations.topoSortStates in order to limit stack overflows. The limit is set to 1000 so any automaton with a path bigger than 1000 would throw an IllegalStateException. The patch also uses maxDeterminizedStates to limit the number of states that a repeat operator can create and throw a TooComplex..Exception when this limit is reached. Finally the patch adds the ability to skip Operations.isFinite on AutomatonQuery and uses this as an optimization for PrefixQuery that uses finite automatons only. > Add safeguards to RegExp.toAutomaton and Operations > --- > > Key: LUCENE-7914 > URL: https://issues.apache.org/jira/browse/LUCENE-7914 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7914.patch, LUCENE-7914.patch > > > When creating an automaton from a regexp, some operators can create more > states than maxDeterminizedStates: > {code} > a{1} > {code} > The example above creates a single path with 10k states already > determinized so maxDeterminizedStates is not checked. > Some operations on automaton like Operations.isFinite or > Operations.topoSortStates are recursive and the maximum level of recursion > depends on the longest path in the automaton. So a large automaton like above > can exceed java's stack. > In most of the cases we are covered by maxDeterminizedStates but there will > always be adversarial cases where a large automaton is created from a small > input so I think we should also have safeguards in the recursive methods. > I've attached a patch that adds a max recursion level to Operations.isFinite > and Operations.topoSortStates in order to limit stack overflows. The limit is > set to 1000 so any automaton with a path bigger than 1000 would throw an > IllegalStateException. > The patch also uses maxDeterminizedStates to limit the number of states that > a repeat operator can create and throw a TooComplex..Exception when this > limit is reached. > Finally the patch adds the ability to skip Operations.isFinite on > AutomatonQuery and uses this as an optimization for PrefixQuery that uses > infinite automatons only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
[ https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-7914: - Attachment: LUCENE-7914.patch Right sorry, a prefix query is indeed infinite. I reverted the logic in the patch. This is just a very small optimization and I can remove it from the patch if it makes the constructor too complex. It's just that setting a limit on a prefix query just because it might lead to StackOverflow is error prone. > Add safeguards to RegExp.toAutomaton and Operations > --- > > Key: LUCENE-7914 > URL: https://issues.apache.org/jira/browse/LUCENE-7914 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7914.patch, LUCENE-7914.patch > > > When creating an automaton from a regexp, some operators can create more > states than maxDeterminizedStates: > {code} > a{1} > {code} > The example above creates a single path with 10k states already > determinized so maxDeterminizedStates is not checked. > Some operations on automaton like Operations.isFinite or > Operations.topoSortStates are recursive and the maximum level of recursion > depends on the longest path in the automaton. So a large automaton like above > can exceed java's stack. > In most of the cases we are covered by maxDeterminizedStates but there will > always be adversarial cases where a large automaton is created from a small > input so I think we should also have safeguards in the recursive methods. > I've attached a patch that adds a max recursion level to Operations.isFinite > and Operations.topoSortStates in order to limit stack overflows. The limit is > set to 1000 so any automaton with a path bigger than 1000 would throw an > IllegalStateException. > The patch also uses maxDeterminizedStates to limit the number of states that > a repeat operator can create and throw a TooComplex..Exception when this > limit is reached. > Finally the patch adds the ability to skip Operations.isFinite on > AutomatonQuery and uses this as an optimization for PrefixQuery that uses > finite automatons only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
[ https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-7914: - Attachment: LUCENE-7914.patch > Add safeguards to RegExp.toAutomaton and Operations > --- > > Key: LUCENE-7914 > URL: https://issues.apache.org/jira/browse/LUCENE-7914 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7914.patch > > > When creating an automaton from a regexp, some operators can create more > states than maxDeterminizedStates: > {code} > a{1} > {code} > The example above creates a single path with 10k states already > determinized so maxDeterminizedStates is not checked. > Some operations on automaton like Operations.isFinite or > Operations.topoSortStates are recursive and the maximum level of recursion > depends on the longest path in the automaton. So a large automaton like above > can exceed java's stack. > In most of the cases we are covered by maxDeterminizedStates but there will > always be adversarial cases where a large automaton is created from a small > input so I think we should also have safeguards in the recursive methods. > I've attached a patch that adds a max recursion level to Operations.isFinite > and Operations.topoSortStates in order to limit stack overflows. The limit is > set to 1000 so any automaton with a path bigger than 1000 would throw an > IllegalStateException. > The patch also uses maxDeterminizedStates to limit the number of states that > a repeat operator can create and throw a TooComplex..Exception when this > limit is reached. > Finally the patch adds the ability to skip Operations.isFinite on > AutomatonQuery and uses this as an optimization for PrefixQuery that uses > finite automatons only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
[ https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Ferenczi updated LUCENE-7914: - Description: When creating an automaton from a regexp, some operators can create more states than maxDeterminizedStates: {code} a{1} {code} The example above creates a single path with 10k states already determinized so maxDeterminizedStates is not checked. Some operations on automaton like Operations.isFinite or Operations.topoSortStates are recursive and the maximum level of recursion depends on the longest path in the automaton. So a large automaton like above can exceed java's stack. In most of the cases we are covered by maxDeterminizedStates but there will always be adversarial cases where a large automaton is created from a small input so I think we should also have safeguards in the recursive methods. I've attached a patch that adds a max recursion level to Operations.isFinite and Operations.topoSortStates in order to limit stack overflows. The limit is set to 1000 so any automaton with a path bigger than 1000 would throw an IllegalStateException. The patch also uses maxDeterminizedStates to limit the number of states that a repeat operator can create and throw a TooComplex..Exception when this limit is reached. Finally the patch adds the ability to skip Operations.isFinite on AutomatonQuery and uses this as an optimization for PrefixQuery that uses finite automatons only. was: When creating an automaton from a regexp, some operators can create more states than maxDeterminizedStates: a{1} The example above creates a single path with 10k states already determinized so maxDeterminizedStates is not checked. Some operations on automaton like Operations.isFinite or Operations.topoSortStates are recursive and the maximum level of recursion depends on the longest path in the automaton. So a large automaton like above can exceed java's stack. In most of the cases we are covered by maxDeterminizedStates but there will always be adversarial cases where a large automaton is created from a small input so I think we should also have safeguards in the recursive methods. I've attached a patch that adds a max recursion level to Operations.isFinite and Operations.topoSortStates in order to limit stack overflows. The limit is set to 1000 so any automaton with a path bigger than 1000 would throw an IllegalStateException. The patch also uses maxDeterminizedStates to limit the number of states that a repeat operator can create and throw a TooComplex..Exception when this limit is reached. Finally the patch adds the ability to skip Operations.isFinite on AutomatonQuery and uses this as an optimization for PrefixQuery that uses finite automatons only. > Add safeguards to RegExp.toAutomaton and Operations > --- > > Key: LUCENE-7914 > URL: https://issues.apache.org/jira/browse/LUCENE-7914 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > > When creating an automaton from a regexp, some operators can create more > states than maxDeterminizedStates: > {code} > a{1} > {code} > The example above creates a single path with 10k states already > determinized so maxDeterminizedStates is not checked. > Some operations on automaton like Operations.isFinite or > Operations.topoSortStates are recursive and the maximum level of recursion > depends on the longest path in the automaton. So a large automaton like above > can exceed java's stack. > In most of the cases we are covered by maxDeterminizedStates but there will > always be adversarial cases where a large automaton is created from a small > input so I think we should also have safeguards in the recursive methods. > I've attached a patch that adds a max recursion level to Operations.isFinite > and Operations.topoSortStates in order to limit stack overflows. The limit is > set to 1000 so any automaton with a path bigger than 1000 would throw an > IllegalStateException. > The patch also uses maxDeterminizedStates to limit the number of states that > a repeat operator can create and throw a TooComplex..Exception when this > limit is reached. > Finally the patch adds the ability to skip Operations.isFinite on > AutomatonQuery and uses this as an optimization for PrefixQuery that uses > finite automatons only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org