[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-23 Thread STINNER Victor

STINNER Victor  added the comment:

I tested set_match_tests() on python2.7 with:

python2.7 -m test --list-cases > cases
python2.7 -m test.bisect -i cases

Good news: the set().__contains__ matcher can be used!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-23 Thread STINNER Victor

STINNER Victor  added the comment:


New changeset 35d99830f1878867e3964577741d9a2d5a7fc8f7 by Victor Stinner in 
branch '2.7':
bpo-31324: Optimize support._match_test() (#4523) (#4524)
https://github.com/python/cpython/commit/35d99830f1878867e3964577741d9a2d5a7fc8f7


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-23 Thread STINNER Victor

Change by STINNER Victor :


--
pull_requests: +4461

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-23 Thread STINNER Victor

STINNER Victor  added the comment:


New changeset 70b2f8797146a56a6880743424f0bedf4fc30c62 by Victor Stinner in 
branch '3.6':
[3.6] bpo-31324: Optimize support._match_test() (#4523)
https://github.com/python/cpython/commit/70b2f8797146a56a6880743424f0bedf4fc30c62


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-23 Thread STINNER Victor

Change by STINNER Victor :


--
pull_requests: +4460

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-22 Thread STINNER Victor

STINNER Victor  added the comment:


New changeset bb11c3c967afaf263e00844d4ab461b7fafd6d36 by Victor Stinner in 
branch 'master':
bpo-31324: Fix test.support.set_match_tests(None) (#4505)
https://github.com/python/cpython/commit/bb11c3c967afaf263e00844d4ab461b7fafd6d36


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-22 Thread STINNER Victor

STINNER Victor  added the comment:

Oops, the commit 803ddd8ce22f0de3ab42fb98a225a704c000ef06 broke the GCC job on 
Travis CI:
---
Run tests sequentially

0:00:00 load avg: 48.91 [  1/404] test_grammar

0:00:00 load avg: 48.91 [  2/404] test_opcodes

0:00:00 load avg: 48.91 [  3/404] test_dict

0:00:00 load avg: 48.91 [  4/404] test_builtin

0:00:00 load avg: 48.91 [  5/404] test_exceptions

0:00:01 load avg: 48.91 [  6/404] test_types

0:00:01 load avg: 48.91 [  7/404] test_unittest

0:00:03 load avg: 48.91 [  8/404] test_doctest

0:00:04 load avg: 48.11 [  9/404] test_doctest2

0:00:04 load avg: 48.11 [ 10/404] test_support

0:00:07 load avg: 48.11 [ 11/404] test___all__

Traceback (most recent call last):

  File "/home/travis/build/python/cpython/Lib/test/__main__.py", line 2, in 


main()

  File "/home/travis/build/python/cpython/Lib/test/libregrtest/main.py", line 
585, in main

Regrtest().main(tests=tests, **kwargs)

  File "/home/travis/build/python/cpython/Lib/test/libregrtest/main.py", line 
510, in main

self._main(tests, kwargs)

  File "/home/travis/build/python/cpython/Lib/test/libregrtest/main.py", line 
536, in _main

self.run_tests()

  File "/home/travis/build/python/cpython/Lib/test/libregrtest/main.py", line 
458, in run_tests

self.run_tests_sequential()

  File "/home/travis/build/python/cpython/Lib/test/libregrtest/main.py", line 
379, in run_tests_sequential

result = runtest(self.ns, test)

  File "/home/travis/build/python/cpython/Lib/test/libregrtest/runtest.py", 
line 105, in runtest

support.set_match_tests(ns.match_tests)

  File "/home/travis/build/python/cpython/Lib/test/support/__init__.py", line 
1958, in set_match_tests

_match_test_patterns = tuple(patterns)

TypeError: 'NoneType' object is not iterable
---

Attached PR 4505 fixes it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-22 Thread STINNER Victor

Change by STINNER Victor :


--
pull_requests: +4443

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-21 Thread STINNER Victor

STINNER Victor  added the comment:

I merged my PR 4421 which is based on Serhiy's PR 4420. Thank you Serhiy for 
your reviews!

--

Serhiy Storchaka: "We could split patterns on two parts and create both 
matchers (...) I don't know whether it is worth to do."

My use bisect is to find a regression using the test.bisect tool. According to 
my tests, the fastest set().__contains__ matcher is always used, except of 
test_json. If you bisect a bug in test_json, it will work, but just be a little 
bit slower.

Honestly, I don't think that we can go further in term of optimization. In my 
tests, it seems like creating the matcher function and filter all tests take 
less than 1 second, so it's enough for my use case ;-)

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-21 Thread STINNER Victor

STINNER Victor  added the comment:


New changeset 803ddd8ce22f0de3ab42fb98a225a704c000ef06 by Victor Stinner in 
branch 'master':
bpo-31324: Optimize support._match_test() (#4421)
https://github.com/python/cpython/commit/803ddd8ce22f0de3ab42fb98a225a704c000ef06


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-21 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

We could split patterns on two parts and create both matchers. Then the final 
matching function could look like:

return _match_test_patterns is None or test_id in id_set or 
regex_match(test_id) or any(map(regex_match, test_id.split(".")))


I don't know whether it is worth to do.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-21 Thread STINNER Victor

STINNER Victor  added the comment:

I created a list of all test cases using:

./python -m test --list-cases > all_cases

The list contains 29,569 test cases. Sadly, the set().__contains__ matcher of 
my PR 4421 cannot be taken because test_json produces two test cases called 
"json". IMHO it's a bug in test_json, all test identifiers should be "fully 
qualified". test_builtin has a similar issue: it produces identifiers like 
"builtins.bin" instead of "test.test_builtin.builtins.bin".

If I removed the two "json" from all_cases, "./python -u -m test.bisect -i 
all_cases -v" takes less than one second before running the first test method, 
and so it's fast enough.

With PR 4420, it takes around 17 seconds before running the first test. The 
compilation of the giant regex and matching test identifiers are slow. My PR 
has the same bad performance when the regex path is taken.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-16 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

I know :-)

Unfortunately it is not obviously that these optimization actually optimize 
regular expressions. Due to the implementation particularities sometimes 
"unoptimized" code is faster than "optimized". I'm going to implement these 
optimizations in the re module, but it needs to be very careful, and may need 
changing the matching engine.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-16 Thread STINNER Victor

STINNER Victor  added the comment:

> There is good opportunity for optimizing the regular expression. All cases 
> have the literal 'test.test_asyncio.' prefix, and even longer literal 
> prefixes are common for tens cases.

I implemented such code long time ago :-)

https://github.com/haypo/hachoir3/blob/master/hachoir/regex/regex.py

I'm not sure that this code works on Python 3, it was written for Python 2 but 
not only ported to Python 3 partially.

It only supports a subset of regular expressions.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-16 Thread STINNER Victor

Change by STINNER Victor :


--
pull_requests: +4369

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-16 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

$ time ./python -m test.bisect --fail-env-changed -o bisect test_asyncio -N1
...
Bisection failed after 2 iterations and 0:00:11

real0m10,441s
user0m8,726s
sys 0m0,292s

There is good opportunity for optimizing the regular expression. All cases have 
the literal 'test.test_asyncio.' prefix, and even longer literal prefixes are 
common for tens cases.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-16 Thread STINNER Victor

STINNER Victor  added the comment:

I tested manually Serhiy's PR 4420: it works as expected!

haypo@selma$ ./python -m test test_os --list-cases -m 
test.test_os.FileTests.test_access
test.test_os.FileTests.test_access

haypo@selma$ ./python -m test test_os --list-cases -m test_access
test.test_os.FileTests.test_access

haypo@selma$ ./python -m test test_os --list-cases -m FileTests
test.test_os.FileTests.test_access
test.test_os.FileTests.test_closerange
test.test_os.FileTests.test_fdopen
test.test_os.FileTests.test_large_read
test.test_os.FileTests.test_open_keywords
test.test_os.FileTests.test_read
test.test_os.FileTests.test_rename
test.test_os.FileTests.test_replace
test.test_os.FileTests.test_symlink_keywords
test.test_os.FileTests.test_write
test.test_os.FileTests.test_write_windows_console

haypo@selma$ ./python -m test test_os --list-cases -m 'FileTest*'
test.test_os.FileTests.test_access
test.test_os.FileTests.test_closerange
test.test_os.FileTests.test_fdopen
test.test_os.FileTests.test_large_read
test.test_os.FileTests.test_open_keywords
test.test_os.FileTests.test_read
test.test_os.FileTests.test_rename
test.test_os.FileTests.test_replace
test.test_os.FileTests.test_symlink_keywords
test.test_os.FileTests.test_write
test.test_os.FileTests.test_write_windows_console

haypo@selma$ ./python -m test test_os --list-cases -m 'test_acc*'
test.test_os.FileTests.test_access
test.test_os.StatAttributeTests.test_access_denied

haypo@selma$ ./python -m test test_os --list-cases -m '*FileTests.test_access'
test.test_os.FileTests.test_access

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-11-16 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
keywords: +patch
pull_requests: +4368
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-09-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

_match_test1 = re.compile('|'.join(map(fnmatch.translate, match_tests)), 
re.I).match

def _match_test(test):
test_id = test.id()
return bool(_match_test1(test_id) or any(map(_match_test1, 
test_id.split(".")))

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-09-01 Thread STINNER Victor

STINNER Victor added the comment:

Oops. Ignore my previous comment, I was using my temporary workaround! 
(msg301126)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-09-01 Thread STINNER Victor

STINNER Victor added the comment:

Oh, by the way, -m test -m TestCaseName doesn't work fully on test_asyncio. It 
only works if I pass the full test identifier.

With TestCaseName, no test is run:

haypo@selma$ ./python -m test -u all test_asyncio --fail-env-changed -m 
PollEventLoopTests 
Run tests sequentially
0:00:00 load avg: 6.97 [1/1] test_asyncio
1 test OK.

Total duration: 458 ms
Tests result: SUCCESS


With a full test identifier, a test is run:


haypo@selma$ ./python -m test -u all test_asyncio --fail-env-changed -m 
test.test_asyncio.test_events.PollEventLoopTests.test_create_datagram_endpoint_sock
Run tests sequentially
0:00:00 load avg: 7.06 [1/1] test_asyncio
Warning -- threading_cleanup() detected 1 leaked threads (count: 1, dangling: 2)
test_asyncio failed (env changed)

1 test altered the execution environment:
test_asyncio

Total duration: 561 ms
Tests result: ENV CHANGED

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-09-01 Thread STINNER Victor

STINNER Victor added the comment:

Workaround to run test.bisect on test_asyncio:

diff --git a/Lib/test/support/__init__.py b/Lib/test/support/__init__.py
index 522804be60..201d0665b2 100644
--- a/Lib/test/support/__init__.py
+++ b/Lib/test/support/__init__.py
@@ -1911,15 +1911,7 @@ def _match_test(test):
 if match_tests is None:
 return True
 test_id = test.id()
-
-for match_test in match_tests:
-if fnmatch.fnmatchcase(test_id, match_test):
-return True
-
-for name in test_id.split("."):
-if fnmatch.fnmatchcase(name, match_test):
-return True
-return False
+return (test_id in match_tests)
 
 
 def run_unittest(*classes):


With this patch, running 756 tests takes 12 seconds, as expected.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-09-01 Thread STINNER Victor

STINNER Victor added the comment:

Example of the performance issue:

$ ./python -m test.bisect --fail-env-changed -o bisect test_asyncio

[+] Iteration 1: run 756 tests/1512
(...)

The first iteration takes forever because it runs tests with 
support.match_tests which contains 756 patterns.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-09-01 Thread STINNER Victor

Changes by STINNER Victor :


--
nosy: +louielu

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31324] support._match_test() used by test.bisect is very inefficient

2017-09-01 Thread STINNER Victor

New submission from STINNER Victor:

support._match_test() uses a nested loop calling fnmatch.fnmatchcase(). This 
function creates a temporary regular expression object. The cache of the re 
module works around the performance... if the length of support.match_tests 
fits into the cache. But when the list is longer, the function becomes dead 
slow...

def _match_test(test):
global match_tests

   
if match_tests is None:
return True 
test_id = test.id() 
   

for match_test in match_tests:
if fnmatch.fnmatchcase(test_id, match_test):
   
return True

   
for name in test_id.split("."): 
   
if fnmatch.fnmatchcase(name, match_test):   
   
return True
return False

Maybe we should build a giant regex matching test_id at each, but cache the 
regex since support.match_tests can be modified anytime. I implemented this 
once, but I lost the code :-)

Currently, it's possible to match 3 things:

* test method name: test_exit
* test class name: SysModuleTest
* full test id: test.test_sys.SysModuleTest.test_exit

It's also possible to use "*" joker character in a test name. I would like to 
keep these convenient CLI.

--
messages: 301122
nosy: haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: support._match_test() used by test.bisect is very inefficient
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com