[issue30011] HTMLParser class is not thread safe

2017-04-15 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
resolution:  -> fixed
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30011] HTMLParser class is not thread safe

2017-04-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:


New changeset 50f948edda0e6465e194ecc50b85fa2646039b8d by Serhiy Storchaka in 
branch '2.7':
bpo-30011: Fixed race condition in HTMLParser.unescape(). (#1140)
https://github.com/python/cpython/commit/50f948edda0e6465e194ecc50b85fa2646039b8d


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30011] HTMLParser class is not thread safe

2017-04-15 Thread Alessandro Vesely

Alessandro Vesely added the comment:

Serhiy's analysis is correct.  If anything more than a comment is going
to make its way to the code, I'd suggest to move dictionary building to
its own function, so that it can be called either on first use --like
now-- or before threading if the user is concerned.

I agree there is nothing wrong with multiple builds.  My point is just a
minor, bearable inefficiency.  It can be neglected.  Its most annoying
case is probably with test suites, which are more likely to shoot up a
bunch of new threads all at once.

Greetings
Ale

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30011] HTMLParser class is not thread safe

2017-04-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

There is nothing wrong with building entitydefs multiple times since the result 
is same. An alternative is using locking, but this is more cumbersome solution. 
And building entitydefs is much faster than importing the threading module.

$ ./python -m timeit -s 'from HTMLParser import HTMLParser; p = HTMLParser()' 
-- 'HTMLParser.entitydefs = None; p.unescape("")'
1000 loops, best of 3: 412 usec per loop

$ ./python -m timeit -s 'import sys; m = sys.modules.copy()' -- 'import 
threading; sys.modules.clear(); sys.modules.update(m)'
100 loops, best of 3: 5.43 msec per loop

Current solution is faster in single-thread case, correct likely fast enough in 
multi-thread case.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30011] HTMLParser class is not thread safe

2017-04-14 Thread Alessandro Vesely

Alessandro Vesely added the comment:

On Fri 14/Apr/2017 19:44:29 +0200 Serhiy Storchaka wrote:
> 
> Changes by Serhiy Storchaka :
> 
> 
> --
> pull_requests: +1272

Thank you for your fix, Serhiy.  It makes the class behave consistently.
 However, busy processes are going to concurrently build multiple
temporary entitydefs objects before one of them wins, which is probably
worse than the greedy starting that such lazy initialization tries to
avoid in the first place.  Doesn't that design deserve a comment in the
code, at least?

Greetings
Ale

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30011] HTMLParser class is not thread safe

2017-04-14 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
pull_requests: +1272

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30011] HTMLParser class is not thread safe

2017-04-07 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
keywords: +easy
nosy: +ezio.melotti

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30011] HTMLParser class is not thread safe

2017-04-07 Thread Alessandro Vesely

New submission from Alessandro Vesely:

SYMPTOM:
When used in a multithreaded program, instances of a class derived from 
HTMLParser may convert an entity or leave it alone, in an apparently random 
fashion.

CAUSE:
The class has a static attribute, entitydefs, which, on first use, is 
initialized from None to a dictionary of entity definitions.  Initialization is 
not atomic.  Therefore, instances in concurrent threads assume that 
initialization is complete and catch a KeyError if the entity at hand hasn't 
been set yet.  In that case, the entity is left alone as if it were invalid.

WORKAROUND:
class Dummy(HTMLParser):
"""this class is defined here so that we can initialize its base 
class"""
def __init__(self):
HTMLParser.__init__(self)

# Initialize HTMLParser by loading htmlentitydefs
dummy = Dummy()
dummy.feed('')
del dummy, Dummy

--
components: Library (Lib)
messages: 291256
nosy: ale2017
priority: normal
severity: normal
status: open
title: HTMLParser class is not thread safe
type: behavior
versions: Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com