Re: [PR 61424] std::regex matches right to left, not leftmost longest

2014-07-01 Thread Jonathan Wakely

On 30/06/14 19:17 -0700, Tim Shen wrote:

On Sat, Jun 28, 2014 at 2:45 AM, Jonathan Wakely jwak...@redhat.com wrote:

On 28/06/14 00:18 -0700, Tim Shen wrote:
Please put a space above this new function, otherwise it looks like
the Dummy implementations comment applies to this function, but in
fact that function is used for dfs mode.

Similarly, maybe there should be a Dummy implementation comment on
the _State_info__bfs, RV::_M_get_sol_pos function.

OK with that change, thanks for the fix.



Committed.


Great, thanks again.


Re: [PR 61424] std::regex matches right to left, not leftmost longest

2014-06-30 Thread Tim Shen
On Sat, Jun 28, 2014 at 2:45 AM, Jonathan Wakely jwak...@redhat.com wrote:
 On 28/06/14 00:18 -0700, Tim Shen wrote:
 Please put a space above this new function, otherwise it looks like
 the Dummy implementations comment applies to this function, but in
 fact that function is used for dfs mode.

 Similarly, maybe there should be a Dummy implementation comment on
 the _State_info__bfs, RV::_M_get_sol_pos function.

 OK with that change, thanks for the fix.


Committed.

Thanks!


-- 
Regards,
Tim Shen


Re: [PR 61424] std::regex matches right to left, not leftmost longest

2014-06-28 Thread Tim Shen
On Tue, Jun 10, 2014 at 1:56 PM, Tim Shen timshe...@gmail.com wrote:
 On Tue, Jun 10, 2014 at 9:54 AM, Jonathan Wakely jwak...@redhat.com wrote:
 I'm sure this is because I still don't understand all the regex code,
 but doesn't this change mean that for an extended mode regex with
 backrefs, the user could define _GLIBCXX_REGEX_USE_THOMPSON_NFA and
 backrefs wouldn't work?

 Sorry I missed that basic POSIX (BRE) has back-references (damn!), but
 extended POSIX (ERE) doesn't. So it should look like:
 -  if (!__re._M_automaton-_M_has_backref
 +  if (!(__re._M_automaton-_M_has_backref || (__re._M_flags 
 regex_constants::ECMAScript))
 ...and all deleted _M_has_backref lines should be undeleted.

 This patch is a temporary (I'm not sure how long though) workaround;
 BFS's support for ECMAScript with no back-references shall be done
 finally.

Sorry for late; this is the complete patch.

Bootstrapped and tested.

Thanks!


-- 
Regards,
Tim Shen
commit 10c1e55884525ff4c3ad959d29181b85966630c1
Author: Tim Shen tims...@google.com
Date:   Fri Jun 27 19:24:29 2014 -0700

PR libstdc++/61424
* include/bits/regex.tcc (__regex_algo_impl): Use DFS for ECMAScript,
not just regex containing back-references.
* include/bits/regex_compiler.tcc (_Compiler::_M_disjunction):
exchange _M_next and _M_alt for alternative operator,
making matching from left to right.
* include/bits/regex_executor.h (_State_info::_M_get_sol_pos):
Add position tracking fom DFS.
* include/bits/regex_executor.tcc (_Executor::_M_main_dispatch,
_Executor::_M_dfs): Likewise.
* include/bits/regex_scanner.h: Remove unused enum entry.
* testsuite/28_regex/algorithms/regex_search/61424.cc: New
testcase from PR.

diff --git a/libstdc++-v3/include/bits/regex.tcc 
b/libstdc++-v3/include/bits/regex.tcc
index a81f517..3322379 100644
--- a/libstdc++-v3/include/bits/regex.tcc
+++ b/libstdc++-v3/include/bits/regex.tcc
@@ -71,6 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // _GLIBCXX_REGEX_USE_THOMPSON_NFA if they need to use this approach.
   bool __ret;
   if (!__re._M_automaton-_M_has_backref
+  !(__re._M_flags  regex_constants::ECMAScript)
 #ifndef _GLIBCXX_REGEX_USE_THOMPSON_NFA
   __policy == _RegexExecutorPolicy::_S_alternate
 #endif
diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc 
b/libstdc++-v3/include/bits/regex_compiler.tcc
index 0df10cc..f15f7dd 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -103,9 +103,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  auto __end = _M_nfa._M_insert_dummy();
  __alt1._M_append(__end);
  __alt2._M_append(__end);
+ // __alt2 is state._M_next, __alt1 is state._M_alt. The executor
+ // executes _M_alt before _M_next, as well as executing left
+ // alternative before right one.
  _M_stack.push(_StateSeqT(_M_nfa,
-  _M_nfa._M_insert_alt(__alt1._M_start,
-   __alt2._M_start, false),
+  _M_nfa._M_insert_alt(__alt2._M_start,
+   __alt1._M_start, false),
   __end));
}
 }
diff --git a/libstdc++-v3/include/bits/regex_executor.h 
b/libstdc++-v3/include/bits/regex_executor.h
index 1991c00..e02fa65 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -173,6 +173,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  void _M_queue(_StateIdT __i, const _ResultsVec __res)
  { _M_match_queue.emplace_back(__i, __res); }
 
+ _BiIter* _M_get_sol_pos() { return nullptr; }
+
  // Saves states that need to be considered for the next character.
  vectorpair_StateIdT, _ResultsVec  _M_match_queue;
  // Indicates which states are already visited.
@@ -191,12 +193,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // Dummy implementations for DFS mode.
  bool _M_visited(_StateIdT) const { return false; }
  void _M_queue(_StateIdT, const _ResultsVec) { }
+ _BiIter* _M_get_sol_pos() { return _M_sol_pos; }
 
  // To record current solution.
  _StateIdT _M_start;
+ _BiIter   _M_sol_pos;
};
 
-
 public:
   _ResultsVec   _M_cur_results;
   _BiIter   _M_current;
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index aefa8f4..38b8ff2 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -82,6 +82,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_main_dispatch(_Match_mode __match_mode, __dfs)
 {
   _M_has_sol = false;
+  

Re: [PR 61424] std::regex matches right to left, not leftmost longest

2014-06-28 Thread Jonathan Wakely

On 28/06/14 00:18 -0700, Tim Shen wrote:

diff --git a/libstdc++-v3/include/bits/regex_executor.h 
b/libstdc++-v3/include/bits/regex_executor.h
index 1991c00..e02fa65 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -173,6 +173,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  void _M_queue(_StateIdT __i, const _ResultsVec __res)
  { _M_match_queue.emplace_back(__i, __res); }

+ _BiIter* _M_get_sol_pos() { return nullptr; }
+
  // Saves states that need to be considered for the next character.
  vectorpair_StateIdT, _ResultsVec  _M_match_queue;
  // Indicates which states are already visited.
@@ -191,12 +193,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // Dummy implementations for DFS mode.
  bool _M_visited(_StateIdT) const { return false; }
  void _M_queue(_StateIdT, const _ResultsVec) { }
+ _BiIter* _M_get_sol_pos() { return _M_sol_pos; }


Please put a space above this new function, otherwise it looks like
the Dummy implementations comment applies to this function, but in
fact that function is used for dfs mode.

Similarly, maybe there should be a Dummy implementation comment on
the _State_info__bfs, RV::_M_get_sol_pos function.

OK with that change, thanks for the fix.



Re: [PR 61424] std::regex matches right to left, not leftmost longest

2014-06-10 Thread Jonathan Wakely

diff --git a/libstdc++-v3/include/bits/regex.tcc 
b/libstdc++-v3/include/bits/regex.tcc
index a81f517..6a1faaf 100644
--- a/libstdc++-v3/include/bits/regex.tcc
+++ b/libstdc++-v3/include/bits/regex.tcc
@@ -70,7 +70,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // without defining a macro. Users should define
  // _GLIBCXX_REGEX_USE_THOMPSON_NFA if they need to use this approach.
  bool __ret;
-  if (!__re._M_automaton-_M_has_backref
+  if (!(__re._M_flags  regex_constants::ECMAScript)
#ifndef _GLIBCXX_REGEX_USE_THOMPSON_NFA
   __policy == _RegexExecutorPolicy::_S_alternate
#endif


I'm sure this is because I still don't understand all the regex code,
but doesn't this change mean that for an extended mode regex with
backrefs, the user could define _GLIBCXX_REGEX_USE_THOMPSON_NFA and
backrefs wouldn't work?



Re: [PR 61424] std::regex matches right to left, not leftmost longest

2014-06-10 Thread Tim Shen
On Tue, Jun 10, 2014 at 9:54 AM, Jonathan Wakely jwak...@redhat.com wrote:
 I'm sure this is because I still don't understand all the regex code,
 but doesn't this change mean that for an extended mode regex with
 backrefs, the user could define _GLIBCXX_REGEX_USE_THOMPSON_NFA and
 backrefs wouldn't work?

Sorry I missed that basic POSIX (BRE) has back-references (damn!), but
extended POSIX (ERE) doesn't. So it should look like:
-  if (!__re._M_automaton-_M_has_backref
+  if (!(__re._M_automaton-_M_has_backref || (__re._M_flags 
regex_constants::ECMAScript))
...and all deleted _M_has_backref lines should be undeleted.

This patch is a temporary (I'm not sure how long though) workaround;
BFS's support for ECMAScript with no back-references shall be done
finally.


-- 
Regards,
Tim Shen