Hello community, here is the log from the commit of package python-beautifulsoup4 for openSUSE:Factory checked in at 2015-11-17 14:23:36 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-beautifulsoup4 (Old) and /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-beautifulsoup4" Changes: -------- --- /work/SRC/openSUSE:Factory/python-beautifulsoup4/python-beautifulsoup4.changes 2015-08-10 09:15:53.000000000 +0200 +++ /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new/python-beautifulsoup4.changes 2015-11-17 14:23:37.000000000 +0100 @@ -1,0 +2,15 @@ +Sun Nov 15 16:31:46 UTC 2015 - idon...@suse.com + +- Update to version 4.4.1 + * Fixed a bug that deranged the tree when part of it was + removed. Thanks to Eric Weiser for the patch and John Wiseman for a + test. lp#1481520 + * Fixed a parse bug with the html5lib tree-builder. Thanks to Roel + Kramer for the patch. lp#1483781 + * Improved the implementation of CSS selector grouping. Thanks to + Orangain for the patch. lp#1484543 + * Fixed the test_detect_utf8 test so that it works when chardet is + installed. lp#1471359 + * Corrected the output of Declaration objects. lp#1477847 + +------------------------------------------------------------------- Old: ---- beautifulsoup4-4.4.0.tar.gz New: ---- beautifulsoup4-4.4.1.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-beautifulsoup4.spec ++++++ --- /var/tmp/diff_new_pack.PhJNU9/_old 2015-11-17 14:23:37.000000000 +0100 +++ /var/tmp/diff_new_pack.PhJNU9/_new 2015-11-17 14:23:37.000000000 +0100 @@ -17,7 +17,7 @@ Name: python-beautifulsoup4 -Version: 4.4.0 +Version: 4.4.1 Release: 0 Summary: HTML/XML Parser for Quick-Turnaround Applications Like Screen-Scraping License: MIT ++++++ beautifulsoup4-4.4.0.tar.gz -> beautifulsoup4-4.4.1.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/NEWS.txt new/beautifulsoup4-4.4.1/NEWS.txt --- old/beautifulsoup4-4.4.0/NEWS.txt 2015-07-03 17:14:43.000000000 +0200 +++ new/beautifulsoup4-4.4.1/NEWS.txt 2015-09-29 01:53:36.000000000 +0200 @@ -1,3 +1,21 @@ += 4.4.1 (20150928) = + +* Fixed a bug that deranged the tree when part of it was + removed. Thanks to Eric Weiser for the patch and John Wiseman for a + test. [bug=1481520] + +* Fixed a parse bug with the html5lib tree-builder. Thanks to Roel + Kramer for the patch. [bug=1483781] + +* Improved the implementation of CSS selector grouping. Thanks to + Orangain for the patch. [bug=1484543] + +* Fixed the test_detect_utf8 test so that it works when chardet is + installed. [bug=1471359] + +* Corrected the output of Declaration objects. [bug=1477847] + + = 4.4.0 (20150703) = Especially important changes: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/PKG-INFO new/beautifulsoup4-4.4.1/PKG-INFO --- old/beautifulsoup4-4.4.0/PKG-INFO 2015-07-03 17:23:03.000000000 +0200 +++ new/beautifulsoup4-4.4.1/PKG-INFO 2015-09-29 02:19:48.000000000 +0200 @@ -1,6 +1,6 @@ Metadata-Version: 1.1 Name: beautifulsoup4 -Version: 4.4.0 +Version: 4.4.1 Summary: Screen-scraping library Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/ Author: Leonard Richardson diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/beautifulsoup4.egg-info/PKG-INFO new/beautifulsoup4-4.4.1/beautifulsoup4.egg-info/PKG-INFO --- old/beautifulsoup4-4.4.0/beautifulsoup4.egg-info/PKG-INFO 2015-07-03 17:23:03.000000000 +0200 +++ new/beautifulsoup4-4.4.1/beautifulsoup4.egg-info/PKG-INFO 2015-09-29 02:19:48.000000000 +0200 @@ -1,6 +1,6 @@ Metadata-Version: 1.1 Name: beautifulsoup4 -Version: 4.4.0 +Version: 4.4.1 Summary: Screen-scraping library Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/ Author: Leonard Richardson diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/__init__.py new/beautifulsoup4-4.4.1/bs4/__init__.py --- old/beautifulsoup4-4.4.0/bs4/__init__.py 2015-07-03 14:39:51.000000000 +0200 +++ new/beautifulsoup4-4.4.1/bs4/__init__.py 2015-09-29 02:09:17.000000000 +0200 @@ -17,7 +17,7 @@ """ __author__ = "Leonard Richardson (leona...@segfault.org)" -__version__ = "4.4.0" +__version__ = "4.4.1" __copyright__ = "Copyright (c) 2004-2015 Leonard Richardson" __license__ = "MIT" diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/builder/_html5lib.py new/beautifulsoup4-4.4.1/bs4/builder/_html5lib.py --- old/beautifulsoup4-4.4.0/bs4/builder/_html5lib.py 2015-06-28 21:37:27.000000000 +0200 +++ new/beautifulsoup4-4.4.1/bs4/builder/_html5lib.py 2015-09-29 01:48:58.000000000 +0200 @@ -120,7 +120,10 @@ if (name in list_attr['*'] or (self.element.name in list_attr and name in list_attr[self.element.name])): - value = whitespace_re.split(value) + # A node that is being cloned may have already undergone + # this procedure. + if not isinstance(value, list): + value = whitespace_re.split(value) self.element[name] = value def items(self): return list(self.attrs.items()) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/dammit.py new/beautifulsoup4-4.4.1/bs4/dammit.py --- old/beautifulsoup4-4.4.0/bs4/dammit.py 2015-07-03 15:20:06.000000000 +0200 +++ new/beautifulsoup4-4.4.1/bs4/dammit.py 2015-09-29 01:58:41.000000000 +0200 @@ -6,6 +6,7 @@ Feed Parser. It works best on XML and HTML, but it does not rewrite the XML or HTML to reflect a new encoding; that's the tree builder's job. """ +__license__ = "MIT" from pdb import set_trace import codecs diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/diagnose.py new/beautifulsoup4-4.4.1/bs4/diagnose.py --- old/beautifulsoup4-4.4.0/bs4/diagnose.py 2015-06-27 17:11:31.000000000 +0200 +++ new/beautifulsoup4-4.4.1/bs4/diagnose.py 2015-09-29 01:56:24.000000000 +0200 @@ -1,4 +1,7 @@ """Diagnostic functions, mainly for use when doing tech support.""" + +__license__ = "MIT" + import cProfile from StringIO import StringIO from HTMLParser import HTMLParser diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/element.py new/beautifulsoup4-4.4.1/bs4/element.py --- old/beautifulsoup4-4.4.0/bs4/element.py 2015-06-28 20:57:57.000000000 +0200 +++ new/beautifulsoup4-4.4.1/bs4/element.py 2015-09-29 01:56:01.000000000 +0200 @@ -1,3 +1,5 @@ +__license__ = "MIT" + from pdb import set_trace import collections import re @@ -262,19 +264,19 @@ next_element = last_child.next_element if (self.previous_element is not None and - self.previous_element != next_element): + self.previous_element is not next_element): self.previous_element.next_element = next_element - if next_element is not None and next_element != self.previous_element: + if next_element is not None and next_element is not self.previous_element: next_element.previous_element = self.previous_element self.previous_element = None last_child.next_element = None self.parent = None if (self.previous_sibling is not None - and self.previous_sibling != self.next_sibling): + and self.previous_sibling is not self.next_sibling): self.previous_sibling.next_sibling = self.next_sibling if (self.next_sibling is not None - and self.next_sibling != self.previous_sibling): + and self.next_sibling is not self.previous_sibling): self.next_sibling.previous_sibling = self.previous_sibling self.previous_sibling = self.next_sibling = None return self @@ -287,13 +289,15 @@ last_child = self while isinstance(last_child, Tag) and last_child.contents: last_child = last_child.contents[-1] - if not accept_self and last_child == self: + if not accept_self and last_child is self: last_child = None return last_child # BS3: Not part of the API! _lastRecursiveChild = _last_descendant def insert(self, position, new_child): + if new_child is None: + raise ValueError("Cannot insert None into a tag.") if new_child is self: raise ValueError("Cannot insert a tag into itself.") if (isinstance(new_child, basestring) @@ -750,8 +754,8 @@ class Declaration(PreformattedString): - PREFIX = u'<!' - SUFFIX = u'!>' + PREFIX = u'<?' + SUFFIX = u'?>' class Doctype(PreformattedString): @@ -1286,9 +1290,23 @@ def select(self, selector, _candidate_generator=None, limit=None): """Perform a CSS selection operation on the current element.""" - # Remove whitespace directly after the grouping operator ',' - # then split into tokens. - tokens = re.sub(',[\s]*',',', selector).split() + # Handle grouping selectors if ',' exists, ie: p,a + if ',' in selector: + context = [] + for partial_selector in selector.split(','): + partial_selector = partial_selector.strip() + if partial_selector == '': + raise ValueError('Invalid group selection syntax: %s' % selector) + candidates = self.select(partial_selector, limit=limit) + for candidate in candidates: + if candidate not in context: + context.append(candidate) + + if limit and len(context) >= limit: + break + return context + + tokens = selector.split() current_context = [self] if tokens[-1] in self._selector_combinators: @@ -1298,198 +1316,192 @@ if self._select_debug: print 'Running CSS selector "%s"' % selector - for index, token_group in enumerate(tokens): + for index, token in enumerate(tokens): new_context = [] new_context_ids = set([]) - # Grouping selectors, ie: p,a - grouped_tokens = token_group.split(',') - if '' in grouped_tokens: - raise ValueError('Invalid group selection syntax: %s' % token_group) - if tokens[index-1] in self._selector_combinators: # This token was consumed by the previous combinator. Skip it. if self._select_debug: print ' Token was consumed by the previous combinator.' continue - for token in grouped_tokens: - if self._select_debug: - print ' Considering token "%s"' % token - recursive_candidate_generator = None - tag_name = None - - # Each operation corresponds to a checker function, a rule - # for determining whether a candidate matches the - # selector. Candidates are generated by the active - # iterator. - checker = None - - m = self.attribselect_re.match(token) - if m is not None: - # Attribute selector - tag_name, attribute, operator, value = m.groups() - checker = self._attribute_checker(operator, attribute, value) - - elif '#' in token: - # ID selector - tag_name, tag_id = token.split('#', 1) - def id_matches(tag): - return tag.get('id', None) == tag_id - checker = id_matches - - elif '.' in token: - # Class selector - tag_name, klass = token.split('.', 1) - classes = set(klass.split('.')) - def classes_match(candidate): - return classes.issubset(candidate.get('class', [])) - checker = classes_match - - elif ':' in token: - # Pseudo-class - tag_name, pseudo = token.split(':', 1) - if tag_name == '': - raise ValueError( - "A pseudo-class must be prefixed with a tag name.") - pseudo_attributes = re.match('([a-zA-Z\d-]+)\(([a-zA-Z\d]+)\)', pseudo) - found = [] - if pseudo_attributes is None: - pseudo_type = pseudo - pseudo_value = None - else: - pseudo_type, pseudo_value = pseudo_attributes.groups() - if pseudo_type == 'nth-of-type': - try: - pseudo_value = int(pseudo_value) - except: - raise NotImplementedError( - 'Only numeric values are currently supported for the nth-of-type pseudo-class.') - if pseudo_value < 1: - raise ValueError( - 'nth-of-type pseudo-class value must be at least 1.') - class Counter(object): - def __init__(self, destination): - self.count = 0 - self.destination = destination - - def nth_child_of_type(self, tag): - self.count += 1 - if self.count == self.destination: - return True - if self.count > self.destination: - # Stop the generator that's sending us - # these things. - raise StopIteration() - return False - checker = Counter(pseudo_value).nth_child_of_type - else: + if self._select_debug: + print ' Considering token "%s"' % token + recursive_candidate_generator = None + tag_name = None + + # Each operation corresponds to a checker function, a rule + # for determining whether a candidate matches the + # selector. Candidates are generated by the active + # iterator. + checker = None + + m = self.attribselect_re.match(token) + if m is not None: + # Attribute selector + tag_name, attribute, operator, value = m.groups() + checker = self._attribute_checker(operator, attribute, value) + + elif '#' in token: + # ID selector + tag_name, tag_id = token.split('#', 1) + def id_matches(tag): + return tag.get('id', None) == tag_id + checker = id_matches + + elif '.' in token: + # Class selector + tag_name, klass = token.split('.', 1) + classes = set(klass.split('.')) + def classes_match(candidate): + return classes.issubset(candidate.get('class', [])) + checker = classes_match + + elif ':' in token: + # Pseudo-class + tag_name, pseudo = token.split(':', 1) + if tag_name == '': + raise ValueError( + "A pseudo-class must be prefixed with a tag name.") + pseudo_attributes = re.match('([a-zA-Z\d-]+)\(([a-zA-Z\d]+)\)', pseudo) + found = [] + if pseudo_attributes is None: + pseudo_type = pseudo + pseudo_value = None + else: + pseudo_type, pseudo_value = pseudo_attributes.groups() + if pseudo_type == 'nth-of-type': + try: + pseudo_value = int(pseudo_value) + except: raise NotImplementedError( - 'Only the following pseudo-classes are implemented: nth-of-type.') - - elif token == '*': - # Star selector -- matches everything - pass - elif token == '>': - # Run the next token as a CSS selector against the - # direct children of each tag in the current context. - recursive_candidate_generator = lambda tag: tag.children - elif token == '~': - # Run the next token as a CSS selector against the - # siblings of each tag in the current context. - recursive_candidate_generator = lambda tag: tag.next_siblings - elif token == '+': - # For each tag in the current context, run the next - # token as a CSS selector against the tag's next - # sibling that's a tag. - def next_tag_sibling(tag): - yield tag.find_next_sibling(True) - recursive_candidate_generator = next_tag_sibling - - elif self.tag_name_re.match(token): - # Just a tag name. - tag_name = token + 'Only numeric values are currently supported for the nth-of-type pseudo-class.') + if pseudo_value < 1: + raise ValueError( + 'nth-of-type pseudo-class value must be at least 1.') + class Counter(object): + def __init__(self, destination): + self.count = 0 + self.destination = destination + + def nth_child_of_type(self, tag): + self.count += 1 + if self.count == self.destination: + return True + if self.count > self.destination: + # Stop the generator that's sending us + # these things. + raise StopIteration() + return False + checker = Counter(pseudo_value).nth_child_of_type else: - raise ValueError( - 'Unsupported or invalid CSS selector: "%s"' % token) - if recursive_candidate_generator: - # This happens when the selector looks like "> foo". - # - # The generator calls select() recursively on every - # member of the current context, passing in a different - # candidate generator and a different selector. - # - # In the case of "> foo", the candidate generator is - # one that yields a tag's direct children (">"), and - # the selector is "foo". - next_token = tokens[index+1] - def recursive_select(tag): - if self._select_debug: - print ' Calling select("%s") recursively on %s %s' % (next_token, tag.name, tag.attrs) - print '-' * 40 - for i in tag.select(next_token, recursive_candidate_generator): - if self._select_debug: - print '(Recursive select picked up candidate %s %s)' % (i.name, i.attrs) - yield i - if self._select_debug: - print '-' * 40 - _use_candidate_generator = recursive_select - elif _candidate_generator is None: - # By default, a tag's candidates are all of its - # children. If tag_name is defined, only yield tags - # with that name. + raise NotImplementedError( + 'Only the following pseudo-classes are implemented: nth-of-type.') + + elif token == '*': + # Star selector -- matches everything + pass + elif token == '>': + # Run the next token as a CSS selector against the + # direct children of each tag in the current context. + recursive_candidate_generator = lambda tag: tag.children + elif token == '~': + # Run the next token as a CSS selector against the + # siblings of each tag in the current context. + recursive_candidate_generator = lambda tag: tag.next_siblings + elif token == '+': + # For each tag in the current context, run the next + # token as a CSS selector against the tag's next + # sibling that's a tag. + def next_tag_sibling(tag): + yield tag.find_next_sibling(True) + recursive_candidate_generator = next_tag_sibling + + elif self.tag_name_re.match(token): + # Just a tag name. + tag_name = token + else: + raise ValueError( + 'Unsupported or invalid CSS selector: "%s"' % token) + if recursive_candidate_generator: + # This happens when the selector looks like "> foo". + # + # The generator calls select() recursively on every + # member of the current context, passing in a different + # candidate generator and a different selector. + # + # In the case of "> foo", the candidate generator is + # one that yields a tag's direct children (">"), and + # the selector is "foo". + next_token = tokens[index+1] + def recursive_select(tag): if self._select_debug: - if tag_name: - check = "[any]" - else: - check = tag_name - print ' Default candidate generator, tag name="%s"' % check + print ' Calling select("%s") recursively on %s %s' % (next_token, tag.name, tag.attrs) + print '-' * 40 + for i in tag.select(next_token, recursive_candidate_generator): + if self._select_debug: + print '(Recursive select picked up candidate %s %s)' % (i.name, i.attrs) + yield i if self._select_debug: - # This is redundant with later code, but it stops - # a bunch of bogus tags from cluttering up the - # debug log. - def default_candidate_generator(tag): - for child in tag.descendants: - if not isinstance(child, Tag): - continue - if tag_name and not child.name == tag_name: - continue - yield child - _use_candidate_generator = default_candidate_generator + print '-' * 40 + _use_candidate_generator = recursive_select + elif _candidate_generator is None: + # By default, a tag's candidates are all of its + # children. If tag_name is defined, only yield tags + # with that name. + if self._select_debug: + if tag_name: + check = "[any]" else: - _use_candidate_generator = lambda tag: tag.descendants + check = tag_name + print ' Default candidate generator, tag name="%s"' % check + if self._select_debug: + # This is redundant with later code, but it stops + # a bunch of bogus tags from cluttering up the + # debug log. + def default_candidate_generator(tag): + for child in tag.descendants: + if not isinstance(child, Tag): + continue + if tag_name and not child.name == tag_name: + continue + yield child + _use_candidate_generator = default_candidate_generator else: - _use_candidate_generator = _candidate_generator + _use_candidate_generator = lambda tag: tag.descendants + else: + _use_candidate_generator = _candidate_generator - count = 0 - for tag in current_context: - if self._select_debug: - print " Running candidate generator on %s %s" % ( - tag.name, repr(tag.attrs)) - for candidate in _use_candidate_generator(tag): - if not isinstance(candidate, Tag): - continue - if tag_name and candidate.name != tag_name: - continue - if checker is not None: - try: - result = checker(candidate) - except StopIteration: - # The checker has decided we should no longer - # run the generator. + count = 0 + for tag in current_context: + if self._select_debug: + print " Running candidate generator on %s %s" % ( + tag.name, repr(tag.attrs)) + for candidate in _use_candidate_generator(tag): + if not isinstance(candidate, Tag): + continue + if tag_name and candidate.name != tag_name: + continue + if checker is not None: + try: + result = checker(candidate) + except StopIteration: + # The checker has decided we should no longer + # run the generator. + break + if checker is None or result: + if self._select_debug: + print " SUCCESS %s %s" % (candidate.name, repr(candidate.attrs)) + if id(candidate) not in new_context_ids: + # If a tag matches a selector more than once, + # don't include it in the context more than once. + new_context.append(candidate) + new_context_ids.add(id(candidate)) + if limit and len(new_context) >= limit: break - if checker is None or result: - if self._select_debug: - print " SUCCESS %s %s" % (candidate.name, repr(candidate.attrs)) - if id(candidate) not in new_context_ids: - # If a tag matches a selector more than once, - # don't include it in the context more than once. - new_context.append(candidate) - new_context_ids.add(id(candidate)) - if limit and len(new_context) >= limit: - break - elif self._select_debug: - print " FAILURE %s %s" % (candidate.name, repr(candidate.attrs)) + elif self._select_debug: + print " FAILURE %s %s" % (candidate.name, repr(candidate.attrs)) current_context = new_context diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/testing.py new/beautifulsoup4-4.4.1/bs4/testing.py --- old/beautifulsoup4-4.4.0/bs4/testing.py 2015-06-28 21:51:27.000000000 +0200 +++ new/beautifulsoup4-4.4.1/bs4/testing.py 2015-09-29 01:56:34.000000000 +0200 @@ -1,5 +1,7 @@ """Helper classes for tests.""" +__license__ = "MIT" + import pickle import copy import functools @@ -556,6 +558,11 @@ self.assertEqual( soup.encode(), b'<?xml version="1.0" encoding="utf-8"?>\n<root/>') + def test_xml_declaration(self): + markup = b"""<?xml version="1.0" encoding="utf8"?>\n<foo/>""" + soup = self.soup(markup) + self.assertEqual(markup, soup.encode("utf8")) + def test_real_xhtml_document(self): """A real XHTML document should come out *exactly* the same as it went in.""" markup = b"""<?xml version="1.0" encoding="utf-8"?> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/tests/test_html5lib.py new/beautifulsoup4-4.4.1/bs4/tests/test_html5lib.py --- old/beautifulsoup4-4.4.0/bs4/tests/test_html5lib.py 2014-12-12 04:21:39.000000000 +0100 +++ new/beautifulsoup4-4.4.1/bs4/tests/test_html5lib.py 2015-09-29 01:51:22.000000000 +0200 @@ -89,3 +89,10 @@ markup = b"""<?PITarget PIContent?>""" soup = self.soup(markup) assert str(soup).startswith("<!--?PITarget PIContent?-->") + + def test_cloned_multivalue_node(self): + markup = b"""<a class="my_class"><p></a>""" + soup = self.soup(markup) + a1, a2 = soup.find_all('a') + self.assertEqual(a1, a2) + assert a1 is not a2 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/tests/test_soup.py new/beautifulsoup4-4.4.1/bs4/tests/test_soup.py --- old/beautifulsoup4-4.4.0/bs4/tests/test_soup.py 2015-06-27 15:30:31.000000000 +0200 +++ new/beautifulsoup4-4.4.1/bs4/tests/test_soup.py 2015-07-05 19:19:39.000000000 +0200 @@ -299,10 +299,11 @@ dammit.unicode_markup, """<foo>''""</foo>""") def test_detect_utf8(self): - utf8 = b"\xc3\xa9" + utf8 = b"Sacr\xc3\xa9 bleu! \xe2\x98\x83" dammit = UnicodeDammit(utf8) - self.assertEqual(dammit.unicode_markup, u'\xe9') self.assertEqual(dammit.original_encoding.lower(), 'utf-8') + self.assertEqual(dammit.unicode_markup, u'Sacr\xe9 bleu! \N{SNOWMAN}') + def test_convert_hebrew(self): hebrew = b"\xed\xe5\xec\xf9" diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/tests/test_tree.py new/beautifulsoup4-4.4.1/bs4/tests/test_tree.py --- old/beautifulsoup4-4.4.0/bs4/tests/test_tree.py 2015-06-28 21:50:14.000000000 +0200 +++ new/beautifulsoup4-4.4.1/bs4/tests/test_tree.py 2015-09-29 01:42:21.000000000 +0200 @@ -23,6 +23,7 @@ PY3K, CData, Comment, + Declaration, Doctype, NavigableString, SoupStrainer, @@ -1084,6 +1085,31 @@ self.assertEqual(foo_2, soup.a.string) self.assertEqual(bar_2, soup.b.string) + def test_extract_multiples_of_same_tag(self): + soup = self.soup(""" +<html> +<head> +<script>foo</script> +</head> +<body> + <script>bar</script> + <a></a> +</body> +<script>baz</script> +</html>""") + [soup.script.extract() for i in soup.find_all("script")] + self.assertEqual("<body>\n\n<a></a>\n</body>", unicode(soup.body)) + + + def test_extract_works_when_element_is_surrounded_by_identical_strings(self): + soup = self.soup( + '<html>\n' + '<body>hi</body>\n' + '</html>') + soup.find('body').extract() + self.assertEqual(None, soup.find('body')) + + def test_clear(self): """Tag.clear()""" soup = self.soup("<p><a>String <em>Italicized</em></a> and another</p>") @@ -1592,6 +1618,9 @@ soup.insert(1, doctype) self.assertEqual(soup.encode(), b"<!DOCTYPE foo>\n") + def test_declaration(self): + d = Declaration("foo") + self.assertEqual("<?foo?>", d.output_ready()) class TestSoupSelector(TreeTest): @@ -1942,22 +1971,25 @@ # Test the selector grouping operator (the comma) def test_multiple_select(self): - self.assertSelects('x, y',['xid','yid']) + self.assertSelects('x, y', ['xid', 'yid']) def test_multiple_select_with_no_space(self): - self.assertSelects('x,y',['xid','yid']) + self.assertSelects('x,y', ['xid', 'yid']) def test_multiple_select_with_more_space(self): - self.assertSelects('x, y',['xid', 'yid']) + self.assertSelects('x, y', ['xid', 'yid']) + + def test_multiple_select_duplicated(self): + self.assertSelects('x, x', ['xid']) def test_multiple_select_sibling(self): - self.assertSelects('x, y ~ p[lang=fr]',['lang-fr']) + self.assertSelects('x, y ~ p[lang=fr]', ['xid', 'lang-fr']) - def test_multiple_select(self): - self.assertSelects('x, y > z', ['zida', 'zidb', 'zidab', 'zidac']) + def test_multiple_select_tag_and_direct_descendant(self): + self.assertSelects('x, y > z', ['xid', 'zidb']) - def test_multiple_select_direct_descendant(self): - self.assertSelects('div > x, y, z', ['xid', 'yid']) + def test_multiple_select_direct_descendant_and_tags(self): + self.assertSelects('div > x, y, z', ['xid', 'yid', 'zida', 'zidb', 'zidab', 'zidac']) def test_multiple_select_indirect_descendant(self): self.assertSelects('div x,y, z', ['xid', 'yid', 'zida', 'zidb', 'zidab', 'zidac']) @@ -1966,14 +1998,14 @@ self.assertRaises(ValueError, self.soup.select, ',x, y') self.assertRaises(ValueError, self.soup.select, 'x,,y') - def test_multiple_select(self): - self.assertSelects('p[lang=en], p[lang=en-gb]',['lang-en','lang-en-gb']) + def test_multiple_select_attrs(self): + self.assertSelects('p[lang=en], p[lang=en-gb]', ['lang-en', 'lang-en-gb']) def test_multiple_select_ids(self): - self.assertSelects('x, y > z[id=zida], z[id=zidab], z[id=zidb]', ['zida', 'zidb','zidab']) + self.assertSelects('x, y > z[id=zida], z[id=zidab], z[id=zidb]', ['xid', 'zidb', 'zidab']) def test_multiple_select_nested(self): - self.assertSelects('body > div > x, y > z', ['zida', 'zidb', 'zidab', 'zidac']) + self.assertSelects('body > div > x, y > z', ['xid', 'zidb']) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/doc/source/conf.py new/beautifulsoup4-4.4.1/doc/source/conf.py --- old/beautifulsoup4-4.4.0/doc/source/conf.py 2013-05-14 14:20:54.000000000 +0200 +++ new/beautifulsoup4-4.4.1/doc/source/conf.py 2015-07-03 17:31:12.000000000 +0200 @@ -41,7 +41,7 @@ # General information about the project. project = u'Beautiful Soup' -copyright = u'2012, Leonard Richardson' +copyright = u'2004-2015, Leonard Richardson' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the @@ -50,7 +50,7 @@ # The short X.Y version. version = '4' # The full version, including alpha/beta/rc tags. -release = '4.2.0' +release = '4.4.0' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/doc/source/index.rst new/beautifulsoup4-4.4.1/doc/source/index.rst --- old/beautifulsoup4-4.4.0/doc/source/index.rst 2015-06-28 21:33:53.000000000 +0200 +++ new/beautifulsoup4-4.4.1/doc/source/index.rst 2015-09-29 00:46:53.000000000 +0200 @@ -29,7 +29,7 @@ This documentation has been translated into other languages by Beautiful Soup users: -* `这篇文档当然还有中文版. <http://www.crummy.com/software/BeautifulSoup/bs4/doc/index.cn.html>`_ +* `这篇文档当然还有中文版. <http://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/>`_ * このページは日本語で利用できます(`外部リンク <http://kondou.com/BS4/>`_) * 이 문서는 한국어 번역도 가능합니다. (`외부 링크 <http://coreapython.hosting.paran.com/etc/beautifulsoup4.html>`_) @@ -1130,7 +1130,7 @@ If you pass in a function to filter on a specific attribute like ``href``, the argument passed into the function will be the attribute value, not the whole tag. Here's a function that finds all ``a`` tags -whose ``href`` attribute _does not_ match a regular expression:: +whose ``href`` attribute *does not* match a regular expression:: def not_lacie(href): return href and not re.compile("lacie").search(href) @@ -1359,6 +1359,12 @@ soup.find_all("a", string="Elsie") # [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>] +The ``string`` argument is new in Beautiful Soup 4.4.0. In earlier +versions it was called ``text``:: + + soup.find_all("a", text="Elsie") + # [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>] + .. _limit: The ``limit`` argument @@ -3120,11 +3126,11 @@ their values, not strings. This may affect the way you search by CSS class. -If you pass one of the ``find*`` methods both :ref:`text <text>` `and` +If you pass one of the ``find*`` methods both :ref:`string <string>` `and` a tag-specific argument like :ref:`name <name>`, Beautiful Soup will search for tags that match your tag-specific criteria and whose -:ref:`Tag.string <.string>` matches your value for :ref:`text -<text>`. It will `not` find the strings themselves. Previously, +:ref:`Tag.string <.string>` matches your value for :ref:`string +<string>`. It will `not` find the strings themselves. Previously, Beautiful Soup ignored the tag-specific arguments and looked for strings. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/beautifulsoup4-4.4.0/setup.py new/beautifulsoup4-4.4.1/setup.py --- old/beautifulsoup4-4.4.0/setup.py 2015-07-03 17:18:16.000000000 +0200 +++ new/beautifulsoup4-4.4.1/setup.py 2015-09-29 02:11:15.000000000 +0200 @@ -5,7 +5,7 @@ setup( name="beautifulsoup4", - version = "4.4.0", + version = "4.4.1", author="Leonard Richardson", author_email='leona...@segfault.org', url="http://www.crummy.com/software/BeautifulSoup/bs4/",