Hello community, here is the log from the commit of package python-beautifulsoup4 for openSUSE:Factory checked in at 2017-04-28 10:37:50 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-beautifulsoup4 (Old) and /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-beautifulsoup4" Fri Apr 28 10:37:50 2017 rev:19 rq:487697 version:4.5.3 Changes: -------- --- /work/SRC/openSUSE:Factory/python-beautifulsoup4/python-beautifulsoup4.changes 2016-09-28 11:30:29.000000000 +0200 +++ /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new/python-beautifulsoup4.changes 2017-04-28 10:37:52.049857260 +0200 @@ -1,0 +2,15 @@ +Sat Apr 8 17:35:17 UTC 2017 - aloi...@gmx.com + +- update to version 4.5.3: + * Fixed foster parenting when html5lib is the tree builder. Thanks + to Geoffrey Sneddon for a patch and test. + * Fixed yet another problem that caused the html5lib tree builder to + create a disconnected parse tree. [bug=1629825] + changes from version 4.5.2: + * Apart from the version number, this release is identical to + 4.5.3. Due to user error, it could not be completely uploaded to + PyPI. Use 4.5.3 instead. + +- Converted to single-spec + +------------------------------------------------------------------- Old: ---- beautifulsoup4-4.5.1.tar.gz New: ---- beautifulsoup4-4.5.3.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-beautifulsoup4.spec ++++++ --- /var/tmp/diff_new_pack.IpwFZ5/_old 2017-04-28 10:37:52.869741374 +0200 +++ /var/tmp/diff_new_pack.IpwFZ5/_new 2017-04-28 10:37:52.873740809 +0200 @@ -1,7 +1,7 @@ # # spec file for package python-beautifulsoup4 # -# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany. +# Copyright (c) 2017 SUSE LINUX GmbH, Nuernberg, Germany. # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed @@ -16,8 +16,9 @@ # +%{?!python_module:%define python_module() python-%{**} python3-%{**}} Name: python-beautifulsoup4 -Version: 4.5.1 +Version: 4.5.3 Release: 0 Summary: HTML/XML Parser for Quick-Turnaround Applications Like Screen-Scraping License: MIT @@ -26,21 +27,21 @@ Source: https://pypi.io/packages/source/b/beautifulsoup4/beautifulsoup4-%{version}.tar.gz # PATCH-FIX-UPSTREAM speili...@suse.com -- Backport of https://code.launchpad.net/~saschpe/beautifulsoup/beautifulsoup/+merge/200849 Patch0: beautifulsoup4-lxml-fixes.patch -BuildRoot: %{_tmppath}/%{name}-%{version}-build -BuildRequires: python-devel >= 2.6 -BuildRequires: python-html5lib >= 0.999999 -BuildRequires: python-lxml >= 3.4.4 # Documentation requirements: -BuildRequires: python-Sphinx -# Test requirements: -BuildRequires: python-nose >= 1.3.7 +BuildRequires: %{python_module devel >= 2.6} +BuildRequires: %{python_module html5lib >= 0.999999} +BuildRequires: %{python_module lxml >= 3.4.4} +BuildRequires: %{python_module setuptools} +# Test requirements +BuildRequires: %{python_module nose} +BuildRequires: fdupes +BuildRequires: python-rpm-macros +BuildRequires: python3-Sphinx Requires: python-html5lib >= 0.999999 Requires: python-lxml >= 3.4.4 -%if 0%{?suse_version} && 0%{?suse_version} <= 1110 -%{!?python_sitelib: %global python_sitelib %(python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()")} -%else +BuildRoot: %{_tmppath}/%{name}-%{version}-build BuildArch: noarch -%endif +%python_subpackages %description Beautiful Soup is a Python HTML/XML parser designed for quick turnaround @@ -72,9 +73,7 @@ %package doc Summary: Documentation for %{name} Group: Development/Libraries/Python -%if 0%{?suse_version} Recommends: %{name} = %{version} -%endif %description doc Documentation and help files for %{name} @@ -84,22 +83,31 @@ %patch0 -p1 %build -python setup.py build -cd doc && make html && rm build/html/.buildinfo +%python_build +pushd doc && make html && rm build/html/.buildinfo build/html/objects.inv && popd +%{_python_use_flavor python3} +%__python3 %{_bindir}/2to3 -w -n build/lib/bs4 %install -python setup.py install --prefix=%{_prefix} --root=%{buildroot} +%python_install +# until it can be fixed +find %{buildroot}%{python3_sitelib} -name test_soup.* -delete +%python_expand %fdupes -s %{buildroot}%{$python_sitelib} %check -nosetests +%{python_expand export PYTHONPATH="%{buildroot}%{$python_sitelib}" + pushd $PYTHONPATH + $python %{_bindir}/nosetests-%{$python_version} + popd +} -%files +%files %{python_files} %defattr(-,root,root) %doc AUTHORS.txt COPYING.txt %{python_sitelib}/bs4/ %{python_sitelib}/beautifulsoup4-%{version}-py*.egg-info -%files doc +%files %{python_files doc} %defattr(-,root,root) %doc NEWS.txt README.txt TODO.txt doc/build/html ++++++ beautifulsoup4-4.5.1.tar.gz -> beautifulsoup4-4.5.3.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/COPYING.txt new/beautifulsoup4-4.5.3/COPYING.txt --- old/beautifulsoup4-4.5.1/COPYING.txt 2016-07-16 17:25:37.000000000 +0200 +++ new/beautifulsoup4-4.5.3/COPYING.txt 2017-01-02 15:58:02.000000000 +0100 @@ -1,6 +1,6 @@ Beautiful Soup is made available under the MIT license: - Copyright (c) 2004-2016 Leonard Richardson + Copyright (c) 2004-2017 Leonard Richardson Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/NEWS.txt new/beautifulsoup4-4.5.3/NEWS.txt --- old/beautifulsoup4-4.5.1/NEWS.txt 2016-08-03 04:39:17.000000000 +0200 +++ new/beautifulsoup4-4.5.3/NEWS.txt 2017-01-02 16:00:18.000000000 +0100 @@ -1,3 +1,17 @@ += 4.5.3 (20170102) = + +* Fixed foster parenting when html5lib is the tree builder. Thanks to + Geoffrey Sneddon for a patch and test. + +* Fixed yet another problem that caused the html5lib tree builder to + create a disconnected parse tree. [bug=1629825] + += 4.5.2 (20170102) = + +* Apart from the version number, this release is identical to + 4.5.3. Due to user error, it could not be completely uploaded to + PyPI. Use 4.5.3 instead. + = 4.5.1 (20160802) = * Fixed a crash when passing Unicode markup that contained a diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/PKG-INFO new/beautifulsoup4-4.5.3/PKG-INFO --- old/beautifulsoup4-4.5.1/PKG-INFO 2016-08-03 04:43:18.000000000 +0200 +++ new/beautifulsoup4-4.5.3/PKG-INFO 2017-01-02 16:08:01.000000000 +0100 @@ -1,6 +1,6 @@ Metadata-Version: 1.1 Name: beautifulsoup4 -Version: 4.5.1 +Version: 4.5.3 Summary: Screen-scraping library Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/ Author: Leonard Richardson diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/beautifulsoup4.egg-info/PKG-INFO new/beautifulsoup4-4.5.3/beautifulsoup4.egg-info/PKG-INFO --- old/beautifulsoup4-4.5.1/beautifulsoup4.egg-info/PKG-INFO 2016-08-03 04:43:18.000000000 +0200 +++ new/beautifulsoup4-4.5.3/beautifulsoup4.egg-info/PKG-INFO 2017-01-02 16:08:01.000000000 +0100 @@ -1,6 +1,6 @@ Metadata-Version: 1.1 Name: beautifulsoup4 -Version: 4.5.1 +Version: 4.5.3 Summary: Screen-scraping library Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/ Author: Leonard Richardson diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/beautifulsoup4.egg-info/SOURCES.txt new/beautifulsoup4-4.5.3/beautifulsoup4.egg-info/SOURCES.txt --- old/beautifulsoup4-4.5.1/beautifulsoup4.egg-info/SOURCES.txt 2016-08-03 04:43:18.000000000 +0200 +++ new/beautifulsoup4-4.5.3/beautifulsoup4.egg-info/SOURCES.txt 2017-01-02 16:08:01.000000000 +0100 @@ -13,6 +13,7 @@ beautifulsoup4.egg-info/dependency_links.txt beautifulsoup4.egg-info/requires.txt beautifulsoup4.egg-info/top_level.txt +bs4/1631353.py bs4/__init__.py bs4/dammit.py bs4/diagnose.py diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/bs4/1631353.py new/beautifulsoup4-4.5.3/bs4/1631353.py --- old/beautifulsoup4-4.5.1/bs4/1631353.py 1970-01-01 01:00:00.000000000 +0100 +++ new/beautifulsoup4-4.5.3/bs4/1631353.py 2016-12-10 20:12:55.000000000 +0100 @@ -0,0 +1,5 @@ +doc = """<script> +h=window.location.protocol+"//",r='<body onload="'; +</script>""" +from bs4.diagnose import diagnose +diagnose(doc) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/bs4/__init__.py new/beautifulsoup4-4.5.3/bs4/__init__.py --- old/beautifulsoup4-4.5.1/bs4/__init__.py 2016-08-03 04:40:04.000000000 +0200 +++ new/beautifulsoup4-4.5.3/bs4/__init__.py 2017-01-02 15:57:54.000000000 +0100 @@ -21,8 +21,8 @@ # found in the LICENSE file. __author__ = "Leonard Richardson (leona...@segfault.org)" -__version__ = "4.5.1" -__copyright__ = "Copyright (c) 2004-2016 Leonard Richardson" +__version__ = "4.5.3" +__copyright__ = "Copyright (c) 2004-2017 Leonard Richardson" __license__ = "MIT" __all__ = ['BeautifulSoup'] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/bs4/builder/_html5lib.py new/beautifulsoup4-4.5.3/bs4/builder/_html5lib.py --- old/beautifulsoup4-4.5.1/bs4/builder/_html5lib.py 2016-07-17 17:31:37.000000000 +0200 +++ new/beautifulsoup4-4.5.3/bs4/builder/_html5lib.py 2016-12-20 02:25:47.000000000 +0100 @@ -6,6 +6,7 @@ ] import warnings +import re from bs4.builder import ( PERMISSIVE, HTML, @@ -17,7 +18,10 @@ whitespace_re, ) import html5lib -from html5lib.constants import namespaces +from html5lib.constants import ( + namespaces, + prefixes, + ) from bs4.element import ( Comment, Doctype, @@ -83,7 +87,7 @@ def create_treebuilder(self, namespaceHTMLElements): self.underlying_builder = TreeBuilderForHtml5lib( - self.soup, namespaceHTMLElements) + namespaceHTMLElements, self.soup) return self.underlying_builder def test_fragment_to_document(self, fragment): @@ -93,8 +97,12 @@ class TreeBuilderForHtml5lib(treebuilder_base.TreeBuilder): - def __init__(self, soup, namespaceHTMLElements): - self.soup = soup + def __init__(self, namespaceHTMLElements, soup=None): + if soup: + self.soup = soup + else: + from bs4 import BeautifulSoup + self.soup = BeautifulSoup("", "html.parser") super(TreeBuilderForHtml5lib, self).__init__(namespaceHTMLElements) def documentClass(self): @@ -117,7 +125,8 @@ return TextNode(Comment(data), self.soup) def fragmentClass(self): - self.soup = BeautifulSoup("") + from bs4 import BeautifulSoup + self.soup = BeautifulSoup("", "html.parser") self.soup.name = "[document_fragment]" return Element(self.soup, self.soup, None) @@ -131,6 +140,56 @@ def getFragment(self): return treebuilder_base.TreeBuilder.getFragment(self).element + def testSerializer(self, element): + from bs4 import BeautifulSoup + rv = [] + doctype_re = re.compile(r'^(.*?)(?: PUBLIC "(.*?)"(?: "(.*?)")?| SYSTEM "(.*?)")?$') + + def serializeElement(element, indent=0): + if isinstance(element, BeautifulSoup): + pass + if isinstance(element, Doctype): + m = doctype_re.match(element) + if m: + name = m.group(1) + if m.lastindex > 1: + publicId = m.group(2) or "" + systemId = m.group(3) or m.group(4) or "" + rv.append("""|%s<!DOCTYPE %s "%s" "%s">""" % + (' ' * indent, name, publicId, systemId)) + else: + rv.append("|%s<!DOCTYPE %s>" % (' ' * indent, name)) + else: + rv.append("|%s<!DOCTYPE >" % (' ' * indent,)) + elif isinstance(element, Comment): + rv.append("|%s<!-- %s -->" % (' ' * indent, element)) + elif isinstance(element, NavigableString): + rv.append("|%s\"%s\"" % (' ' * indent, element)) + else: + if element.namespace: + name = "%s %s" % (prefixes[element.namespace], + element.name) + else: + name = element.name + rv.append("|%s<%s>" % (' ' * indent, name)) + if element.attrs: + attributes = [] + for name, value in element.attrs.items(): + if isinstance(name, NamespacedAttribute): + name = "%s %s" % (prefixes[name.namespace], name.name) + if isinstance(value, list): + value = " ".join(value) + attributes.append((name, value)) + + for name, value in sorted(attributes): + rv.append('|%s%s="%s"' % (' ' * (indent + 2), name, value)) + indent += 2 + for child in element.children: + serializeElement(child, indent) + serializeElement(element, 0) + + return "\n".join(rv) + class AttrList(object): def __init__(self, element): self.element = element @@ -182,8 +241,10 @@ child = node elif node.element.__class__ == NavigableString: string_child = child = node.element + node.parent = self else: child = node.element + node.parent = self if not isinstance(child, basestring) and child.parent is not None: node.element.extract() @@ -221,6 +282,8 @@ most_recent_element=most_recent_element) def getAttributes(self): + if isinstance(self.element, Comment): + return {} return AttrList(self.element) def setAttributes(self, attributes): @@ -248,11 +311,11 @@ attributes = property(getAttributes, setAttributes) def insertText(self, data, insertBefore=None): + text = TextNode(self.soup.new_string(data), self.soup) if insertBefore: - text = TextNode(self.soup.new_string(data), self.soup) - self.insertBefore(data, insertBefore) + self.insertBefore(text, insertBefore) else: - self.appendChild(data) + self.appendChild(text) def insertBefore(self, node, refNode): index = self.element.index(refNode.element) @@ -274,6 +337,7 @@ # print "MOVE", self.element.contents # print "FROM", self.element # print "TO", new_parent.element + element = self.element new_parent_element = new_parent.element # Determine what this tag's next_element will be once all the children @@ -292,7 +356,6 @@ new_parents_last_descendant_next_element = new_parent_element.next_element to_append = element.contents - append_after = new_parent_element.contents if len(to_append) > 0: # Set the first child's previous_element and previous_sibling # to elements within the new parent @@ -309,12 +372,19 @@ if new_parents_last_child: new_parents_last_child.next_sibling = first_child - # Fix the last child's next_element and next_sibling - last_child = to_append[-1] - last_child.next_element = new_parents_last_descendant_next_element + # Find the very last element being moved. It is now the + # parent's last descendant. It has no .next_sibling and + # its .next_element is whatever the previous last + # descendant had. + last_childs_last_descendant = to_append[-1]._last_descendant(False, True) + + last_childs_last_descendant.next_element = new_parents_last_descendant_next_element if new_parents_last_descendant_next_element: - new_parents_last_descendant_next_element.previous_element = last_child - last_child.next_sibling = None + # TODO: This code has no test coverage and I'm not sure + # how to get html5lib to go through this path, but it's + # just the other side of the previous line. + new_parents_last_descendant_next_element.previous_element = last_childs_last_descendant + last_childs_last_descendant.next_sibling = None for child in to_append: child.parent = new_parent_element diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/bs4/dammit.py new/beautifulsoup4-4.5.3/bs4/dammit.py --- old/beautifulsoup4-4.5.1/bs4/dammit.py 2016-07-17 21:14:33.000000000 +0200 +++ new/beautifulsoup4-4.5.3/bs4/dammit.py 2016-12-20 02:45:50.000000000 +0100 @@ -310,7 +310,7 @@ else: xml_endpos = 1024 html_endpos = max(2048, int(len(markup) * 0.05)) - + declared_encoding = None declared_encoding_match = xml_encoding_re.search(markup, endpos=xml_endpos) if not declared_encoding_match and is_html: @@ -736,7 +736,7 @@ 0xde : b'\xc3\x9e', # Þ 0xdf : b'\xc3\x9f', # ß 0xe0 : b'\xc3\xa0', # à - 0xe1 : b'\xa1', # á + 0xe1 : b'\xa1', # á 0xe2 : b'\xc3\xa2', # â 0xe3 : b'\xc3\xa3', # ã 0xe4 : b'\xc3\xa4', # ä diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/bs4/tests/test_html5lib.py new/beautifulsoup4-4.5.3/bs4/tests/test_html5lib.py --- old/beautifulsoup4-4.5.1/bs4/tests/test_html5lib.py 2016-07-17 17:42:40.000000000 +0200 +++ new/beautifulsoup4-4.5.3/bs4/tests/test_html5lib.py 2016-12-20 02:24:17.000000000 +0100 @@ -95,6 +95,22 @@ assert space1.next_element is tbody1 assert tbody2.next_element is space2 + def test_reparented_markup_containing_children(self): + markup = '<div><a>aftermath<p><noscript>target</noscript>aftermath</a></p></div>' + soup = self.soup(markup) + noscript = soup.noscript + self.assertEqual("target", noscript.next_element) + target = soup.find(string='target') + + # The 'aftermath' string was duplicated; we want the second one. + final_aftermath = soup.find_all(string='aftermath')[-1] + + # The <noscript> tag was moved beneath a copy of the <a> tag, + # but the 'target' string within is still connected to the + # (second) 'aftermath' string. + self.assertEqual(final_aftermath, target.next_element) + self.assertEqual(target, final_aftermath.previous_element) + def test_processing_instruction(self): """Processing instructions become comments.""" markup = b"""<?PITarget PIContent?>""" @@ -107,3 +123,8 @@ a1, a2 = soup.find_all('a') self.assertEqual(a1, a2) assert a1 is not a2 + + def test_foster_parenting(self): + markup = b"""<table><td></tbody>A""" + soup = self.soup(markup) + self.assertEqual(u"<body>A<table><tbody><tr><td></td></tr></tbody></table></body>", soup.body.decode()) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/doc/source/index.rst new/beautifulsoup4-4.5.3/doc/source/index.rst --- old/beautifulsoup4-4.5.1/doc/source/index.rst 2016-07-27 03:31:45.000000000 +0200 +++ new/beautifulsoup4-4.5.3/doc/source/index.rst 2016-12-19 23:43:28.000000000 +0100 @@ -360,34 +360,34 @@ ^^^^^^^^^^ A tag may have any number of attributes. The tag ``<b -class="boldest">`` has an attribute "class" whose value is +id="boldest">`` has an attribute "id" whose value is "boldest". You can access a tag's attributes by treating the tag like a dictionary:: - tag['class'] + tag['id'] # u'boldest' You can access that dictionary directly as ``.attrs``:: tag.attrs - # {u'class': u'boldest'} + # {u'id': 'boldest'} You can add, remove, and modify a tag's attributes. Again, this is done by treating the tag as a dictionary:: - tag['class'] = 'verybold' - tag['id'] = 1 + tag['id'] = 'verybold' + tag['another-attribute'] = 1 tag - # <blockquote class="verybold" id="1">Extremely bold</blockquote> + # <b another-attribute="1" id="verybold"></b> - del tag['class'] del tag['id'] + del tag['another-attribute'] tag - # <blockquote>Extremely bold</blockquote> + # <b></b> - tag['class'] - # KeyError: 'class' - print(tag.get('class')) + tag['id'] + # KeyError: 'id' + print(tag.get('id')) # None .. _multivalue: @@ -1050,7 +1050,7 @@ ^^^^^^^^^^^^^^^^^^^^ If you pass in a regular expression object, Beautiful Soup will filter -against that regular expression using its ``match()`` method. This code +against that regular expression using its ``search()`` method. This code finds all the tags whose names start with the letter "b"; in this case, the <body> tag and the <b> tag:: @@ -1262,6 +1262,17 @@ data_soup.find_all(attrs={"data-foo": "value"}) # [<div data-foo="value">foo!</div>] +You can't use a keyword argument to search for HTML's 'name' element, +because Beautiful Soup uses the ``name`` argument to contain the name +of the tag itself. Instead, you can give a value to 'name' in the +``attrs`` argument. + + name_soup = BeautifulSoup('<input name="email"/>') + name_soup.find_all(name="email") + # [] + name_soup.find_all(attrs={"name": "email"}) + # [<input name="email"/>] + .. _attrs: Searching by CSS class diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.5.1/setup.py new/beautifulsoup4-4.5.3/setup.py --- old/beautifulsoup4-4.5.1/setup.py 2016-08-03 04:39:03.000000000 +0200 +++ new/beautifulsoup4-4.5.3/setup.py 2017-01-02 15:57:45.000000000 +0100 @@ -5,7 +5,7 @@ setup( name="beautifulsoup4", - version = "4.5.1", + version = "4.5.3", author="Leonard Richardson", author_email='leona...@segfault.org', url="http://www.crummy.com/software/BeautifulSoup/bs4/",