[issue41989] htmlparser unclosed script tag causes data loss

2020-10-11 Thread Waylan Limberg


Change by Waylan Limberg :


--
keywords: +patch
pull_requests: +21635
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/22658

___
Python tracker 
<https://bugs.python.org/issue41989>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41989] htmlparser unclosed script tag causes data loss

2020-10-09 Thread Waylan Limberg


New submission from Waylan Limberg :

When the `close` method of the HtmlParser is called, any cached text data is 
generally flushed and passed to a `data` event; except when in `data_mode`. 
Specifically, if an unclosed `script` or `style` tag has been encountered, a 
call to `close` does not flush the data.

A simple test which demonstrates the issue is attached.

I see that in Lib/html/parser.py#L244-L249 there are two nested if statements 
which both check for `not self.cdata_elem`. Obviously, if we got past the first 
one, that situation will never exist for the nested one. Somehow this block of 
code needs a branch for when `self.cdata_elem` is True.

I should note that the input is invalid HTML. However, the existing behavior 
results in data loss. Within any other unclosed tag (other than `script` or 
`style`) any data is still flushed and passed to a `data` event. I would expect 
the same behavior here. Although, the data escaping behavior should perhaps be 
applied as it is with data within properly closed tags.

--
components: Library (Lib)
files: test_html.py
messages: 378359
nosy: waylan
priority: normal
severity: normal
status: open
title: htmlparser unclosed script tag causes data loss
type: behavior
versions: Python 3.10, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 
3.9
Added file: https://bugs.python.org/file49505/test_html.py

___
Python tracker 
<https://bugs.python.org/issue41989>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com