Stefan Behnel <stefan...@behnel.de> added the comment:
Well, there's XPath for a standard:
https://www.w3.org/TR/xpath/
ElementPath deviates from it in its namespace syntax (it allows "{ns}tag" where
XPath requires "p:tag" prefixes), but that's about it. All other
Change by Stefan Behnel <stefan...@behnel.de>:
--
keywords: +patch
pull_requests: +3816
stage: -> patch review
___
Python tracker <rep...@bugs.python.org>
<https://bugs.pyt
New submission from Stefan Behnel <stefan...@behnel.de>:
* Allow whitespace around predicate parts, i.e. "[a = 'text']" instead of
requiring the less readable "[a='text']".
* Add support for text comparison of the current node, like "[.='text']".
Both curre
Change by Stefan Behnel <stefan...@behnel.de>:
--
nosy: +haypo
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31465>
___
__
Change by Stefan Behnel <stefan...@behnel.de>:
--
nosy: -scoder
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue30576>
___
__
Stefan Behnel <stefan...@behnel.de> added the comment:
FWIW, both the feature and the PR look ok to me. Code formatting is a little
funny at times, but the implementation looks good.
--
nosy: +scoder
___
Python tracker <rep...@bugs.p
Stefan Behnel added the comment:
I'm also against changing re.compile() to not compile.
And I often write code like this:
replace_whitespace = re.compile(r"\s+").sub
which is not covered by your current proposed change.
--
nos
Stefan Behnel added the comment:
Still ready for merging :)
--
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31336>
___
___
Pyth
Stefan Behnel added the comment:
Any comments on this?
--
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31465>
___
___
Pyth
Stefan Behnel added the comment:
> The question is more why/how the code didn't crash before? :-)
Typical case of a Schroedinbug.
--
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.or
Changes by Stefan Behnel <stefan...@behnel.de>:
--
pull_requests: +3632
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31465>
___
Stefan Behnel added the comment:
Question: Do you think it's ok to change the signature of _PyType_Lookup() in
this way by adding an error flag, or should I add a new function instead?
There is no performance difference to PR 3279 since gcc should optimise this
flag properly away in most
Stefan Behnel added the comment:
Thanks for confirming, Victor.
I hadn't realised that the first update of expat was already back in June. That
means it's not ruled out yet as a source of this crash. Bisecting is probably a
good idea.
--
___
Python
Stefan Behnel added the comment:
Minimal reproducer seems to be this:
--
import xml.etree.ElementTree as etree
def test():
parser = etree.XMLParser()
try:
parser.close()
except etree.ParseError as exc:
e = exc # must keep local reference!
test
Stefan Behnel added the comment:
Sorry, wrong line number. Was using an installed Py3.7, not a fresh build.
However, my crashing installed version is from September 1st, *before* the
expat update, which was apparently on September 5th.
With a clean debug build, I get a reproducible crash
New submission from Stefan Behnel:
I'm seeing crashes in the latest Py3.7 when I run this test (taken from lxml's
compatibility test suite):
etree = xml.etree.ElementTree
def test_feed_parser_error_position(self):
ParseError = etree.ParseError
parser = XMLParser
Stefan Behnel added the comment:
Test suite passes now. The crash was due to an uninitialised error flag in one
case, which lead the C compiler to do incorrect optimisations on undefined
behaviour.
--
___
Python tracker <rep...@bugs.python.
Changes by Stefan Behnel <stefan...@behnel.de>:
--
keywords: +patch
pull_requests: +3607
stage: -> patch review
___
Python tracker <rep...@bugs.python.org>
<https://bugs.pyt
Stefan Behnel added the comment:
I'm working on a PR for this, but after changing all usages and fixing up some
error handling here and there, it results in an interpreter crash for me. I'll
try to debug it during the next days.
--
nosy: +pitrou, serhiy.storchaka
Stefan Behnel added the comment:
One more thing: the fact that the lookup does not propagate exceptions leaves
some space for ambiguity. If, in a chain of MRO lookups, one would fail and a
later one would succeed, is it correct to keep trying? What if the first
failure actually failed to see
New submission from Stefan Behnel:
Follow-up to issue 31336:
The fact that _PyType_Lookup() does not propagate exceptions leaves some space
for ambiguity. If, in a chain of MRO lookups, one would fail and a later one
would succeed, is it correct to keep trying? What if the first failure
Stefan Behnel added the comment:
> Is it correct to call _PyType_Lookup() with an exception set?
The general rule of thumb is that it's not safe to call any user code with a
live exception set, and lookups can call into user code.
I quickly looked through all occurrences (there are
Stefan Behnel added the comment:
Feel free to provide a separate pull request. These issues seem independent of
the exception handling problem that I wrote a fix for.
--
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/i
Changes by Stefan Behnel <stefan...@behnel.de>:
--
keywords: +patch
pull_requests: +3542
stage: -> patch review
___
Python tracker <rep...@bugs.python.org>
<https://bugs.pyt
New submission from Stefan Behnel:
The "XMLParser.__init__()" method in "_elementtree.c" contains this code:
self->handle_start = PyObject_GetAttrString(target, "start");
self->handle_data = PyObject_GetAttrString(target, "data");
se
Stefan Behnel added the comment:
No, that one was addressed. I think only Victor's comment is still open, that's
why I asked back.
--
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/i
Stefan Behnel added the comment:
Any more comments on the proposed implementation? 13-15% seem worth it to me.
@Victor, or are you saying "PyId, or no change at all"?
--
___
Python tracker <rep...@bugs.python.org>
<https://bugs.py
Stefan Behnel added the comment:
I'm a bit torn on this. On the one hand, it's basically saying, "Cython is
probably going to do it right anyway, so let's just assume it does". That's
nice, and might be applicable to other cases as well. But that also feels like
it could need
Stefan Behnel added the comment:
I was kinda guessing that modifying the slot list wasn't a good idea. ;)
My current use case is that I implement the "create" slot because it makes it
very easy to intercept the spec and its configuration. It is not passed into
"exec"
Stefan Behnel added the comment:
Marcel proposed to disallow main-execution if the extension *might* return
anything but a real object (not only if it actually does), but that seems
excessive to me. The actual problem is that we consider it unsafe if the module
is executed more than once
Stefan Behnel added the comment:
OTOH, if the created "module" is not a module object, then we could argue that
the extension implementation is on its own with that case, and has to do its
own re-execution safety checks.
--
___
Python tr
Stefan Behnel added the comment:
BTW, it seems that Yury's dict copy optimisation would also help here. When I
use a benchmark scenario with a simple non-empty method/attribute dict (from
Cython this time), almost 10% of the creation time is spent copying that dict,
which should essentially
Stefan Behnel added the comment:
Since I'm getting highly reproducible results on re-runs, I tend to trust these
numbers.
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Stefan Behnel added the comment:
I updated the pull request with a split version of _PyType_Lookup() that
bypasses the method cache during slot updates. I also ran the benchmarks with
PGO enabled now to get realistic results. The overall gain is around 15%.
Original:
$ ./python -m timeit
Stefan Behnel added the comment:
Since the number of applications that get along without any file access is
probably close to irrelevant, "os" and "io" feel like sufficiently widely used
modules to merit being part of a "usual Python startup" bench
Stefan Behnel added the comment:
> I would prefer to use the _Py_IDENTIFIER API rather than using
> _PyDict_GetItem_KnownHash().
Do you mean for the table of slot descriptions? I'm not sure that the effect
would be comparable.
> Maybe there are other opportunities for optimization?
Stefan Behnel added the comment:
It's the slot names in "slotdefs". See "update_one_slot()".
The time that is saved is mostly the overhead of calling PyDict_GetItem(). I
actually tried PyDict_GetItemWithError() first, which is faster due to the
lower error handling overhe
Stefan Behnel added the comment:
Comparing against CPython master as of 122e88a8354e3f75aeaf6211232dac88ac296d54
I rebuilt my CPython to get clean results, and that still gave me almost 15%
overall speedup.
Original:
$ ./python -m timeit 'class Test: pass'
2 loops, best of 5: 9.55 usec
Stefan Behnel added the comment:
I literally just ran timeit on "class Test: pass", but I'll see if I can
provide proper numbers.
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.py
New submission from Stefan Behnel:
The method lookup fast path in _PyType_Lookup() does not apply during type
creation, which is highly dominated by the performance of the dict lookups
along the mro chain. Pre-calculating the name hash speeds up the creation of an
empty class (i.e. "
Changes by Stefan Behnel <stefan...@behnel.de>:
--
pull_requests: +3322
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue31336>
___
Stefan Behnel added the comment:
Regarding the user side of the problem, you might(!) be able to work around the
crash by merging nested if-conditions into and-expressions if they have no
elif/else. That's also why the split into multiple files doesn't help, it's the
depth of the nesting
Stefan Behnel added the comment:
I've looked at the file and it contains a huge amount of deeply nested
if-statements. Given that parsers and compilers are typically recursive, I can
well imagine that this is a problem, and my guess is that it's most likely just
the different C level stack
Stefan Behnel added the comment:
Wouldn't this be a typical case where we'd expect a module to evolve and gain
usage on PyPI first, before adding it to the stdlib?
Searching for "grapheme" in PyPI gives some results for me. Even if they do not
cover what this ticket asks for, they
Stefan Behnel added the comment:
1) Is this reproducible?
2) Getting a crash in compile.c indicates that this is happening at
parse/compile time and not when your Python code is executing. Can you confirm
that? Does it generate a .pyc file on import that would indicate a successful
byte code
Stefan Behnel added the comment:
Looks like the switch from PyObject_IsSubclass() to PyType_IsSubtype() was made
during the original Py3 development cycle. It should thus be safe to assume
that the semantics are "as designed". :)
What about applying the patch also to 3.6
New submission from Stefan Behnel:
PyObject *exception, *value, *tb;
PyErr_Fetch(, , );
/* PyObject_IsSubclass() can recurse and therefore is
not safe (see test_bad_getattr in test.pickletester). */
res = PyType_IsSubtype((PyTypeObject *)err, (PyTypeObject
Stefan Behnel added the comment:
This has been resolved by PEP 489, issue 24268.
The module initialisation process receives the complete ModuleSpec now,
starting with CPython 3.5, and can do with it whatever it likes, before
executing any user code.
--
resolution: -> duplicate
st
Stefan Behnel added the comment:
FYI, I've finally managed to find the time for implementing PEP 489 style
module initialisation in Cython. It was so easy that I'm sorry it took me so
long to get started. Cython 0.26 is fresh out, so the feature should go into
0.27.
https://github.com/cython
Stefan Behnel added the comment:
> Are all uses of internal CPython details optional?
Well, what classifies as a "CPython detail" sometimes just becomes clear when
other implementations don't have it. ;-)
But yes, the C code that Cython generates selects alternative implement
Stefan Behnel added the comment:
For future reference, this change is supported by Cython 0.26 (which is
currently close to release).
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Stefan Behnel added the comment:
Sorry for not responding, missed the message, it seems.
Cython has to support old-style relative imports also in Py3 because that's how
the user code was originally written, using Py2-style syntax and semantics.
Most Cython code has not been converted to Py3
Stefan Behnel added the comment:
Can the PR be applied then? It looks good to me.
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Stefan Behnel added the comment:
I do not see this as a matter of performance but as a matter of usability.
Basically, CPython could do just fine with just a single catch-all calling
convention that packs all pos/kw arguments into C arguments and passes them
over, leaving it entirely
Stefan Behnel added the comment:
I looked up this change again and was surprised that it still wasn't applied.
It feels to me that it makes sense already for reasons of consistency. Any time
frame for changing it? I'd like to use METH_FASTCALL in Cython in a
future-proof way
New submission from Stefan Behnel:
I'm seeing doctest failures in Cython's test suite with Py3.7 due to the change
of an error message:
Failed example:
func1(arg=None)
Expected:
Traceback (most recent call last):
...
TypeError: func1() takes no keyword arguments
Got
Stefan Behnel added the comment:
Patch replaced by pull request.
https://github.com/python/cpython/pull/1823
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Changes by Stefan Behnel <sco...@users.sourceforge.net>:
Removed file: http://bugs.python.org/file46906/lxml_elpath_empty_prefix.patch
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python
Stefan Behnel added the comment:
Agreed that this should be added. I think the key should be None, though, not
the empty string. I attached a quick patch for lxml's corresponding file. It's
mostly the same for ET.
--
keywords: +patch
Added file: http://bugs.python.org/file46906
Stefan Behnel added the comment:
Thanks for bringing me in. The PoC implementation looks nice. Whether I'd like
to support this in Cython? Absolutely. Requires some work, though, since Cython
still doesn't implement PEP 489. But it shouldn't be hard, if I remember the
discussions from back
Stefan Behnel added the comment:
Looks good to me (didn't test it).
Note that getchildren() is not deprecated in lxml because it's actually the
fastest way to build a list of the children. It's faster than list(element)
because it avoids the Python (C-level) iteration overhead. However
Stefan Behnel added the comment:
Thanks for asking. Cython doesn't use METH_FASTCALL yet, so this doesn't
introduce any problems.
Generally speaking, if Cython generated user code stops working with a new
CPython version, we expect people to regenerate their code with the newest
Cython
Stefan Behnel added the comment:
I'm ok with the deprecations.
Regarding the cElementTree module, this is a bit problematic. The idiomatic
import has lost its use in Py2.5 when ET and cET were added to the stdlib, so
code that was written for Py2.5 or later (e.g. because it uses generators
Stefan Behnel added the comment:
Actually, it seems that calling urlcleanup() is sufficient as a work-around.
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Stefan Behnel added the comment:
This bug makes the installation of lxml fail on many systems (especially MacOS)
when pip installs from sources, because it tries to download its library
dependencies from FTP and crashes due to the FTP "error". I guess the current
fix is to not
Stefan Behnel added the comment:
Removing HAVE_LONG_LONG entirely causes breakage of third party code that uses
this macro to enable PY_LONG_LONG support. Could you please always define it
instead of removing it?
--
nosy: +scoder
___
Python tracker
Stefan Behnel added the comment:
Raymond, you might have meant me when assigning the ticket and not Stefan Krah,
but since I'm actually not a core dev, I can't commit the patch myself.
See my last comment, though, I reviewed the patch and it should get committed
Stefan Behnel added the comment:
Definitely not a bug since this isn't required by the XML spec. As said in
issue 2647, you shouldn't rely on exact lexical characteristics of an XML byte
stream, unless you request canonical serialisation (C14N
Stefan Behnel added the comment:
Let's close this as outdated. New bugs for the new project should be reported
in github anyway.
--
resolution: -> out of date
status: open -> closed
___
Python tracker <rep...@bugs.python.o
Stefan Behnel added the comment:
> By the way, I'm surprised that the special encoding "unicode" relies on the
> *current* locale encoding when the XML declaration is requested.
That seems a weird choice. Since it serialises to a Unicode string, it
shouldn't have any XML de
Stefan Behnel added the comment:
> So this benchmark cannot be used to show the superiority of exact fractions.
I don't see how a benchmark would be a way to show that. It's certainly not the
goal of this benchmark to show that one is computationally better than the
other. But if a benchm
Stefan Behnel added the comment:
Looks like I forgot about this. My final fix still hasn't been applied, so the
code in Py3.4+ is incorrect now.
No, this cannot be tested from the Python level.
--
___
Python tracker <rep...@bugs.python.org>
Stefan Behnel added the comment:
I haven't seen any crashes in the wild here, but this is still the case in the
latest code base. The change doesn't seem invasive, so I don't see why it
shouldn't get implemented.
--
nosy: +pitrou, scoder, serhiy.storchaka
versions: +Python 3.5, Python
Changes by Stefan Behnel <sco...@users.sourceforge.net>:
--
resolution: -> fixed
status: open -> closed
___
Python tracker <rep...@bugs.python.org>
<http://bugs.
Stefan Behnel added the comment:
Done:
https://github.com/python/performance/pull/10
Note that I didn't replace the existing telco benchmark as it is more specific
to Decimal. The new benchmark makes it possible to compare the decimal and
fractions modules for similar operations, though
Stefan Behnel added the comment:
If you care so much about C stack space, you could also try to create two or
three entry point functions that keep (say) a 4, 8 and 16 items array on the
stack respectively, and then pass the pointer (and the overall length if you
need it) of that array
Stefan Behnel added the comment:
I just took a quick look at the fastcall_kwargs-2.patch for now. It looks ok in
general, but it also adds quite some special code for the dict-to-locals
mapping. Is the keyword argument calling case really that important? I mean, it
requires creating a dict
Stefan Behnel added the comment:
I like the oneArg/noArg etc. macros. We use something similar in Cython. You
can even use them to inline the METH_O and METH_NOARGS call cases (although I
use inline functions for that in Cython).
--
___
Python
Stefan Behnel added the comment:
> What do you mean by "I copied your (no-kwargs) implementation"?
I copied what you committed into CPython for _PyFunction_FastCall():
https://github.com/cython/cython/commit/8f3d3bd199a3d7f2a9fdfec0af57145b3ab363ca
and then enabled its usag
Stefan Behnel added the comment:
FYI: I copied your (no-kwargs) implementation over into Cython and I get around
17% faster calls to Python functions with 2 positional arguments.
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.p
Stefan Behnel added the comment:
Extensive callback interfaces like map() come to mind, where a large number of
calls becomes excessively time critical and might thus have made people
implement their own special purpose calling code.
However, I don't know any such code (outside of Cython
Stefan Behnel added the comment:
I agree that this would be cool. There is a tiny bit of a backwards
compatibility concern as the new function signature would be incompatible with
anything we had before, but I would guess that any code that chooses to bypass
PyObject_Call() & friends w
Changes by Stefan Behnel <sco...@users.sourceforge.net>:
--
nosy: +scoder
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27810>
___
Changes by Stefan Behnel <sco...@users.sourceforge.net>:
--
nosy: +scoder
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27128>
___
Stefan Behnel added the comment:
You can easily see it by running timeit on fstrings, e.g. patched:
$ ./python -m timeit 'f"{34276394612:15}"'
100 loops, best of 3: 0.352 usec per loop
$ ./python -m timeit 'f"{34.276394612:8.6f}"'
100 loops, best of 3: 0.497 usec pe
New submission from Stefan Behnel:
I noticed that quite some time during number formatting is spent parsing the
format spec. The attached patch speeds up float formatting by 5-15% and integer
formatting by 20-30% for me overall when using f-strings (Ubuntu 16.04, 64bit).
--
components
Stefan Behnel added the comment:
On second thought, I think it should be supported (also?) in the parser.
Otherwise, using it with an async parser would be different from (and more
involved than) one-shot parsing. That seems wrong.
--
___
Python
Stefan Behnel added the comment:
Here is a proposed patch for a new function "strip_namespaces(tree)" that
discards all namespaces from tags and attributes in a (sub-)tree, so that
subsequent processing does not have to deal with them.
The "__all__" test is failing (ha
Stefan Behnel added the comment:
Etree patch looks straight forward to me, feel free to apply it.
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Stefan Behnel added the comment:
Our CI build server says it's all fine. The fix will eventually be released,
certainly before Py3.6 comes out.
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Stefan Behnel added the comment:
Ah, thanks. Here's my implementation then:
https://github.com/cython/cython/pull/499/files
It seems that tests for valid complex literals are missing. I've added these to
the end of the list:
'1_00_00.5j',
'1_00_00.5e5',
'1_00_00j
Stefan Behnel added the comment:
Nice one. While reimplementing it for Cython, I noticed that the grammar
described in the PEP isn't exactly as it's implemented, though. The grammar says
digit (["_"] digit)*
whereas the latest patch (v4) says
`digit` (`digit` | "
Stefan Behnel added the comment:
I like Serhiy's patch, too, but it feels like the single-digit case should be
enough. I found this comment by Yury a good argument:
"""
I can see improvements in micro benchmarks, but even more importantly, Serhiy's
patch reduces memory frag
Stefan Behnel added the comment:
For reference, the bug in Cython is fixed here:
https://github.com/cython/cython/commit/ececb3e9473f6aaa65f29467921594c316ec2f06
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Stefan Behnel added the comment:
May I ask how difficult it is for any of the core developers to fix a known
typo in a Python source file?
--
nosy: +scoder
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Stefan Behnel added the comment:
LGTM
--
nosy: +scoder
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25047>
___
___
Python-bugs-
Stefan Behnel added the comment:
Let's say the change minimises the dependencies. That is a reasonable goal, too.
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Stefan Behnel added the comment:
Would there be a way to expose these internals rather than hiding them?
--
nosy: +scoder
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Stefan Behnel added the comment:
Understood and agreed. Second patch looks good to me.
Cython calls PyThreadState_GET() in pretty much every helper function that
deals with exceptions, but I doubt that the potential speed difference is going
to be relevant in the real world. And we target
Stefan Behnel added the comment:
> _collections sounds cool, but the flip side is any python without the C
> implemntation would still have the slower startup, right?
I wouldn't bother too much with that, certainly not given the order we are
talking about here. Jython's startup time is
501 - 600 of 1270 matches
Mail list logo