[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-10 Thread Jim Ferenczi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-7914:
-
Fix Version/s: (was: 7.1)
   7.0

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: 7.0, master (8.0)
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-04 Thread Jim Ferenczi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-7914:
-
Fix Version/s: 7.1
   master (8.0)

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Jim Ferenczi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-7914:
-
Attachment: LUCENE-7914.patch

Thanks for the great feedback [~jpountz] and [~rcmuir]

I pushed another iteration that replaces the ISEs in IAEs and cleans up the 
unnecessary changes.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Jim Ferenczi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-7914:
-
Attachment: LUCENE-7914.patch

It's not, it's just a copy/paste from another test in the class. I pushed 
another iteration that removes the serialization/deserialization from 
TestRegExp completely.

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Jim Ferenczi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-7914:
-
Attachment: LUCENE-7914.patch

I removed the new constructor in AutomatonQuery and the optim in PrefixQuery. 
This is useless if we have the max recursion in place and I agree that it makes 
the constructor ugly. 
So here is a new patch.

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Jim Ferenczi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-7914:
-
Description: 
When creating an automaton from a regexp, some operators can create more states 
than maxDeterminizedStates:

{code}
a{1}
{code}

The example above creates a single path with 10k states already 
determinized so maxDeterminizedStates is not checked. 
Some operations on automaton like Operations.isFinite or 
Operations.topoSortStates are recursive and the maximum level of recursion 
depends on the longest path in the automaton. So a large automaton like above 
can exceed java's stack.
In most of the cases we are covered by maxDeterminizedStates but there will 
always be adversarial cases where a large automaton is created from a small 
input so I think we should also have safeguards in the recursive methods. 

I've attached a patch that adds a max recursion level to Operations.isFinite 
and Operations.topoSortStates in order to limit stack overflows. The limit is 
set to 1000 so any automaton with a path bigger than 1000 would throw an 
IllegalStateException.
The patch also uses maxDeterminizedStates to limit the number of states that a 
repeat operator can create and throw a TooComplex..Exception when this limit is 
reached.
Finally the patch adds the ability to skip Operations.isFinite on 
AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
infinite automatons only.


  was:
When creating an automaton from a regexp, some operators can create more states 
than maxDeterminizedStates:

{code}
a{1}
{code}

The example above creates a single path with 10k states already 
determinized so maxDeterminizedStates is not checked. 
Some operations on automaton like Operations.isFinite or 
Operations.topoSortStates are recursive and the maximum level of recursion 
depends on the longest path in the automaton. So a large automaton like above 
can exceed java's stack.
In most of the cases we are covered by maxDeterminizedStates but there will 
always be adversarial cases where a large automaton is created from a small 
input so I think we should also have safeguards in the recursive methods. 

I've attached a patch that adds a max recursion level to Operations.isFinite 
and Operations.topoSortStates in order to limit stack overflows. The limit is 
set to 1000 so any automaton with a path bigger than 1000 would throw an 
IllegalStateException.
The patch also uses maxDeterminizedStates to limit the number of states that a 
repeat operator can create and throw a TooComplex..Exception when this limit is 
reached.
Finally the patch adds the ability to skip Operations.isFinite on 
AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
finite automatons only.



> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Jim Ferenczi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-7914:
-
Attachment: LUCENE-7914.patch

Right sorry, a prefix query is indeed infinite. I reverted the logic in the 
patch. 
This is just a very small optimization and I can remove it from the patch if it 
makes the constructor too complex.
It's just that setting a limit on a prefix query just because it might lead to 
StackOverflow is error prone.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> finite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Jim Ferenczi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-7914:
-
Attachment: LUCENE-7914.patch

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> finite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Jim Ferenczi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-7914:
-
Description: 
When creating an automaton from a regexp, some operators can create more states 
than maxDeterminizedStates:

{code}
a{1}
{code}

The example above creates a single path with 10k states already 
determinized so maxDeterminizedStates is not checked. 
Some operations on automaton like Operations.isFinite or 
Operations.topoSortStates are recursive and the maximum level of recursion 
depends on the longest path in the automaton. So a large automaton like above 
can exceed java's stack.
In most of the cases we are covered by maxDeterminizedStates but there will 
always be adversarial cases where a large automaton is created from a small 
input so I think we should also have safeguards in the recursive methods. 

I've attached a patch that adds a max recursion level to Operations.isFinite 
and Operations.topoSortStates in order to limit stack overflows. The limit is 
set to 1000 so any automaton with a path bigger than 1000 would throw an 
IllegalStateException.
The patch also uses maxDeterminizedStates to limit the number of states that a 
repeat operator can create and throw a TooComplex..Exception when this limit is 
reached.
Finally the patch adds the ability to skip Operations.isFinite on 
AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
finite automatons only.


  was:
When creating an automaton from a regexp, some operators can create more states 
than maxDeterminizedStates:


a{1}


The example above creates a single path with 10k states already 
determinized so maxDeterminizedStates is not checked. 
Some operations on automaton like Operations.isFinite or 
Operations.topoSortStates are recursive and the maximum level of recursion 
depends on the longest path in the automaton. So a large automaton like above 
can exceed java's stack.
In most of the cases we are covered by maxDeterminizedStates but there will 
always be adversarial cases where a large automaton is created from a small 
input so I think we should also have safeguards in the recursive methods. 

I've attached a patch that adds a max recursion level to Operations.isFinite 
and Operations.topoSortStates in order to limit stack overflows. The limit is 
set to 1000 so any automaton with a path bigger than 1000 would throw an 
IllegalStateException.
The patch also uses maxDeterminizedStates to limit the number of states that a 
repeat operator can create and throw a TooComplex..Exception when this limit is 
reached.
Finally the patch adds the ability to skip Operations.isFinite on 
AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
finite automatons only.



> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> finite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org