commit python-w3lib for openSUSE:Leap:15.2

root Mon, 02 Mar 2020 04:24:51 -0800

Hello community,

here is the log from the commit of package python-w3lib for openSUSE:Leap:15.2 
checked in at 2020-03-02 13:24:44
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Leap:15.2/python-w3lib (Old)
 and      /work/SRC/openSUSE:Leap:15.2/.python-w3lib.new.26092 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-w3lib"

Mon Mar  2 13:24:44 2020 rev:11 rq:777269 version:1.21.0

Changes:
--------
--- /work/SRC/openSUSE:Leap:15.2/python-w3lib/python-w3lib.changes      
2020-01-15 15:54:08.947622176 +0100
+++ /work/SRC/openSUSE:Leap:15.2/.python-w3lib.new.26092/python-w3lib.changes   
2020-03-02 13:24:44.714563932 +0100
@@ -1,0 +2,40 @@
+Thu Aug 29 13:15:56 UTC 2019 - Marketa Calabkova <[email protected]>
+
+- update to 1.21.1
+  * Add the "encoding" and "path_encoding" parameters to
+    w3lib.url.safe_download_url (issue #118)
+  * w3lib.url.safe_url_string now also removes tabs and new lines
+    (issue #133)
+  * w3lib.html.remove_comments now also removes truncated comments
+    (issue #129)
+  * w3lib.html.remove_tags_with_content no longer removes tags which
+    start with the same text as one of the specified tags (issue #114)
+
+-------------------------------------------------------------------
+Fri Mar 29 09:53:27 UTC 2019 - [email protected]
+
+- version update to 1.20.0
+  * Fix url_query_cleaner to do not append "?" to urls without a 
+    query string (issue #109)
+  * Add support for Python 3.7 and drop Python 3.3 (issue #113)
+  * Add `w3lib.url.add_or_replace_parameters` helper (issue #117)
+  * Documentation fixes (issue #115)
+
+-------------------------------------------------------------------
+Tue Dec  4 12:56:15 UTC 2018 - Matej Cepl <[email protected]>
+
+- Remove superfluous devel dependency for noarch package
+
+-------------------------------------------------------------------
+Fri Nov 16 18:49:26 UTC 2018 - Todd R <[email protected]>
+
+- Update to version 1.19.0
+  * Add a workaround for CPython segfault (https://bugs.python.org/issue32583)
+    which affect w3lib.encoding functions. This is technically **backwards
+    incompatible** because it changes the way non-decodable bytes are replaced
+    (in some cases instead of two ``\ufffd`` chars you can get one).
+    As a side effect, the fix speeds up decoding in Python 3.4+.
+  * Add 'encoding' parameter for w3lib.http.basic_auth_header.
+  * Fix pypy testing setup, add pypy3 to CI.
+
+-------------------------------------------------------------------

Old:
----
  w3lib-1.18.0.tar.gz

New:
----
  w3lib-1.21.0.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-w3lib.spec ++++++
--- /var/tmp/diff_new_pack.tX3T3v/_old  2020-03-02 13:24:45.054564608 +0100
+++ /var/tmp/diff_new_pack.tX3T3v/_new  2020-03-02 13:24:45.058564616 +0100
@@ -1,7 +1,7 @@
 #
 # spec file for package python-w3lib
 #
-# Copyright (c) 2017 SUSE LINUX GmbH, Nuernberg, Germany.
+# Copyright (c) 2019 SUSE LINUX GmbH, Nuernberg, Germany.
 #
 # All modifications and additions to the file contributed by third parties
 # remain the property of their copyright owners, unless otherwise agreed
@@ -12,25 +12,25 @@
 # license that conforms to the Open Source Definition (Version 1.9)
 # published by the Open Source Initiative.
 
-# Please submit bugfixes or comments via http://bugs.opensuse.org/
+# Please submit bugfixes or comments via https://bugs.opensuse.org/
 #
 
 
 %{?!python_module:%define python_module() python-%{**} python3-%{**}}
 Name:           python-w3lib
-Version:        1.18.0
+Version:        1.21.0
 Release:        0
 Summary:        Library of Web-Related Functions
 License:        BSD-3-Clause
 Group:          Development/Languages/Python
 Url:            http://github.com/scrapy/w3lib
 Source:         
https://files.pythonhosted.org/packages/source/w/w3lib/w3lib-%{version}.tar.gz
-BuildRequires:  %{python_module devel}
 BuildRequires:  %{python_module setuptools}
 BuildRequires:  %{python_module six} >= 1.4.1
 BuildRequires:  fdupes
 BuildRequires:  python-rpm-macros
 BuildArch:      noarch
+
 %python_subpackages
 
 %description
@@ -70,7 +70,8 @@
 %python_exec setup.py test
 
 %files %{python_files}
-%doc README.rst LICENSE
+%doc README.rst
+%license LICENSE
 %{python_sitelib}/*
 
 %changelog

++++++ w3lib-1.18.0.tar.gz -> w3lib-1.21.0.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/PKG-INFO new/w3lib-1.21.0/PKG-INFO
--- old/w3lib-1.18.0/PKG-INFO   2017-08-03 15:25:28.000000000 +0200
+++ new/w3lib-1.21.0/PKG-INFO   2019-08-09 13:00:36.000000000 +0200
@@ -1,6 +1,6 @@
 Metadata-Version: 1.1
 Name: w3lib
-Version: 1.18.0
+Version: 1.21.0
 Summary: Library of web-related functions
 Home-page: https://github.com/scrapy/w3lib
 Author: Scrapy project
@@ -15,10 +15,10 @@
 Classifier: Programming Language :: Python :: 2
 Classifier: Programming Language :: Python :: 2.7
 Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.3
 Classifier: Programming Language :: Python :: 3.4
 Classifier: Programming Language :: Python :: 3.5
 Classifier: Programming Language :: Python :: 3.6
+Classifier: Programming Language :: Python :: 3.7
 Classifier: Programming Language :: Python :: Implementation :: CPython
 Classifier: Programming Language :: Python :: Implementation :: PyPy
 Classifier: Topic :: Internet :: WWW/HTTP
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/README.rst new/w3lib-1.21.0/README.rst
--- old/w3lib-1.18.0/README.rst 2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/README.rst 2019-08-09 13:00:00.000000000 +0200
@@ -27,7 +27,7 @@
 Requirements
 ============
 
-Python 2.7 or Python 3.3+
+Python 2.7 or Python 3.4+
 
 Install
 =======
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/docs/conf.py 
new/w3lib-1.21.0/docs/conf.py
--- old/w3lib-1.18.0/docs/conf.py       2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/docs/conf.py       2019-08-09 13:00:00.000000000 +0200
@@ -53,7 +53,7 @@
 # built documents.
 #
 # The full version, including alpha/beta/rc tags.
-release = '1.18.0'
+release = '1.21.0'
 # The short X.Y version.
 version = '.'.join(release.split('.')[:2])
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/docs/index.rst 
new/w3lib-1.21.0/docs/index.rst
--- old/w3lib-1.18.0/docs/index.rst     2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/docs/index.rst     2019-08-09 13:00:00.000000000 +0200
@@ -8,7 +8,7 @@
 
 * remove comments, or tags from HTML snippets
 * extract base url from HTML snippets
-* translate entites on HTML strings
+* translate entities on HTML strings
 * convert raw HTTP headers to dicts and vice-versa
 * construct HTTP auth header
 * converting HTML pages to unicode
@@ -39,7 +39,7 @@
 Tests
 =====
 
-`nose`_ is the preferred way to run tests. Just run: ``nosetests`` from the
+`pytest`_ is the preferred way to run tests. Just run: ``pytest`` from the
 root directory to execute tests using the default Python interpreter.
 
 `tox`_ could be used to run tests for all supported Python versions.
@@ -48,7 +48,7 @@
 Python interpreters.
 
 .. _tox: http://tox.testrun.org
-.. _nose: http://readthedocs.org/docs/nose/en/latest/
+.. _pytest: https://docs.pytest.org/en/latest/
 
 
 Changelog
@@ -74,4 +74,3 @@
 * :ref:`genindex`
 * :ref:`modindex`
 * :ref:`search`
-
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/setup.py new/w3lib-1.21.0/setup.py
--- old/w3lib-1.18.0/setup.py   2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/setup.py   2019-08-09 13:00:00.000000000 +0200
@@ -3,7 +3,7 @@
 
 setup(
     name='w3lib',
-    version='1.18.0',
+    version='1.21.0',
     license='BSD',
     description='Library of web-related functions',
     author='Scrapy project',
@@ -21,10 +21,10 @@
         'Programming Language :: Python :: 2',
         'Programming Language :: Python :: 2.7',
         'Programming Language :: Python :: 3',
-        'Programming Language :: Python :: 3.3',
         'Programming Language :: Python :: 3.4',
         'Programming Language :: Python :: 3.5',
         'Programming Language :: Python :: 3.6',
+        'Programming Language :: Python :: 3.7',
         'Programming Language :: Python :: Implementation :: CPython',
         'Programming Language :: Python :: Implementation :: PyPy',
         'Topic :: Internet :: WWW/HTTP',
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/tests/test_encoding.py 
new/w3lib-1.21.0/tests/test_encoding.py
--- old/w3lib-1.18.0/tests/test_encoding.py     2017-08-03 15:24:36.000000000 
+0200
+++ new/w3lib-1.21.0/tests/test_encoding.py     2019-08-09 13:00:00.000000000 
+0200
@@ -144,9 +144,9 @@
     def test_invalid_utf8_encoded_body_with_valid_utf8_BOM(self):
         # unlike scrapy, the BOM is stripped
         self._assert_encoding('utf-8', b"\xef\xbb\xbfWORD\xe3\xabWORD2",
-                'utf-8', u'WORD\ufffd\ufffdWORD2')
+                'utf-8', u'WORD\ufffdWORD2')
         self._assert_encoding(None, b"\xef\xbb\xbfWORD\xe3\xabWORD2",
-                'utf-8', u'WORD\ufffd\ufffdWORD2')
+                'utf-8', u'WORD\ufffdWORD2')
 
     def test_utf8_unexpected_end_of_data_with_valid_utf8_BOM(self):
         # Python implementations handle unexpected end of UTF8 data
@@ -220,6 +220,18 @@
         self._assert_encoding('utf-16', u"hi".encode('utf-16-be'), 
'utf-16-be', u"hi")
         self._assert_encoding('utf-32', u"hi".encode('utf-32-be'), 
'utf-32-be', u"hi")
 
+    def test_python_crash(self):
+        import random
+        from io import BytesIO
+        random.seed(42)
+        buf = BytesIO()
+        for i in range(150000):
+            buf.write(bytes([random.randint(0, 255)]))
+        to_unicode(buf.getvalue(), 'utf-16-le')
+        to_unicode(buf.getvalue(), 'utf-16-be')
+        to_unicode(buf.getvalue(), 'utf-32-le')
+        to_unicode(buf.getvalue(), 'utf-32-be')
+
     def test_html_encoding(self):
         # extracting the encoding from raw html is tested elsewhere
         body = b"""blah blah < meta   http-equiv="Content-Type"
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/tests/test_html.py 
new/w3lib-1.21.0/tests/test_html.py
--- old/w3lib-1.18.0/tests/test_html.py 2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/tests/test_html.py 2019-08-09 13:00:00.000000000 +0200
@@ -106,6 +106,8 @@
         self.assertEqual(remove_comments(b"test <!--textcoment--> whatever"), 
u'test  whatever')
         self.assertEqual(remove_comments(b"test <!--\ntextcoment\n--> 
whatever"), u'test  whatever')
 
+        self.assertEqual(remove_comments(b"test <!--"), u'test ')
+
 
 class RemoveTagsTest(unittest.TestCase):
     def test_returns_unicode(self):
@@ -184,6 +186,10 @@
         # text with empty tags
         self.assertEqual(remove_tags_with_content(u'<br/>a<br />', 
which_ones=('br',)), u'a')
 
+    def test_tags_with_shared_prefix(self):
+        # https://github.com/scrapy/w3lib/issues/114
+        self.assertEqual(remove_tags_with_content(u'<span></span><s></s>', 
which_ones=('s',)), u'<span></span>')
+
 
 class ReplaceEscapeCharsTest(unittest.TestCase):
     def test_returns_unicode(self):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/tests/test_http.py 
new/w3lib-1.21.0/tests/test_http.py
--- old/w3lib-1.18.0/tests/test_http.py 2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/tests/test_http.py 2019-08-09 13:00:00.000000000 +0200
@@ -1,3 +1,5 @@
+# -*- coding: utf-8 -*-
+
 import unittest
 from collections import OrderedDict
 from w3lib.http import (basic_auth_header,
@@ -14,6 +16,13 @@
         self.assertEqual(b'Basic c29tZXVzZXI6QDx5dTk-Jm8_UQ==',
             basic_auth_header('someuser', '@<yu9>&o?Q'))
 
+    def test_basic_auth_header_encoding(self):
+        self.assertEqual(b'Basic c29tw6Z1c8Oocjpzw7htZXDDpHNz',
+                basic_auth_header(u'somæusèr', u'sømepäss', encoding='utf8'))
+        # default encoding (ISO-8859-1)
+        self.assertEqual(b'Basic c29t5nVz6HI6c_htZXDkc3M=',
+                basic_auth_header(u'somæusèr', u'sømepäss'))
+
     def test_headers_raw_dict_none(self):
         self.assertIsNone(headers_raw_to_dict(None))
         self.assertIsNone(headers_dict_to_raw(None))
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/tests/test_url.py 
new/w3lib-1.21.0/tests/test_url.py
--- old/w3lib-1.18.0/tests/test_url.py  2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/tests/test_url.py  2019-08-09 13:00:00.000000000 +0200
@@ -5,7 +5,7 @@
 from w3lib.url import (is_url, safe_url_string, safe_download_url,
     url_query_parameter, add_or_replace_parameter, url_query_cleaner,
     file_uri_to_path, parse_data_uri, path_to_file_uri, any_to_uri,
-    urljoin_rfc, canonicalize_url, parse_url)
+    urljoin_rfc, canonicalize_url, parse_url, add_or_replace_parameters)
 from six.moves.urllib.parse import urlparse
 
 
@@ -59,6 +59,20 @@
 
         self.assertTrue(isinstance(safe_url_string(b'http://example.com/'), 
str))
 
+    def test_safe_url_string_remove_ascii_tab_and_newlines(self):
+        self.assertEqual(safe_url_string("http://example.com/test\n.html";),
+                                         "http://example.com/test.html";)
+        self.assertEqual(safe_url_string("http://example.com/test\t.html";),
+                                         "http://example.com/test.html";)
+        self.assertEqual(safe_url_string("http://example.com/test\r.html";),
+                                         "http://example.com/test.html";)
+        self.assertEqual(safe_url_string("http://example.com/test\r.html\n";),
+                                         "http://example.com/test.html";)
+        self.assertEqual(safe_url_string("http://example.com/test\r\n.html\t";),
+                                         "http://example.com/test.html";)
+        self.assertEqual(safe_url_string("http://example.com/test\a\n.html";),
+                                         "http://example.com/test%07.html";)
+
     def test_safe_url_string_unsafe_chars(self):
         safeurl = 
safe_url_string(r"http://localhost:8001/unwise{,},|,\,^,[,],`?|=[]&[]=|")
         self.assertEqual(safeurl, 
r"http://localhost:8001/unwise%7B,%7D,|,%5C,%5E,[,],%60?|=[]&[]=|")
@@ -203,6 +217,19 @@
                          'http://www.example.org/image')
         self.assertEqual(safe_download_url('http://www.example.org/dir/'),
                          'http://www.example.org/dir/')
+        self.assertEqual(safe_download_url(b'http://www.example.org/dir/'),
+                         'http://www.example.org/dir/')
+
+        # Encoding related tests
+        self.assertEqual(safe_download_url(b'http://www.example.org?\xa3',
+                         encoding='latin-1', path_encoding='latin-1'),
+                         'http://www.example.org/?%A3')
+        self.assertEqual(safe_download_url(b'http://www.example.org?\xc2\xa3',
+                         encoding='utf-8', path_encoding='utf-8'),
+                         'http://www.example.org/?%C2%A3')
+        
self.assertEqual(safe_download_url(b'http://www.example.org/\xc2\xa3?\xc2\xa3',
+                         encoding='utf-8', path_encoding='latin-1'),
+                         'http://www.example.org/%A3?%C2%A3')
 
     def test_is_url(self):
         self.assertTrue(is_url('http://www.example.org'))
@@ -283,7 +310,21 @@
         self.assertEqual(add_or_replace_parameter(url, 'pageurl', 'test'),
                          
'http://example.com/?version=1&pageurl=test&param2=value2')
 
+    def test_add_or_replace_parameters(self):
+        url = 'http://domain/test'
+        self.assertEqual(add_or_replace_parameters(url, {'arg': 'v'}),
+                         'http://domain/test?arg=v')
+        url = 'http://domain/test?arg1=v1&arg2=v2&arg3=v3'
+        self.assertEqual(add_or_replace_parameters(url, {'arg4': 'v4'}),
+                         'http://domain/test?arg1=v1&arg2=v2&arg3=v3&arg4=v4')
+        self.assertEqual(add_or_replace_parameters(url, {'arg4': 'v4', 'arg3': 
'v3new'}),
+                         
'http://domain/test?arg1=v1&arg2=v2&arg3=v3new&arg4=v4')
+
     def test_url_query_cleaner(self):
+        self.assertEqual('product.html',
+                url_query_cleaner("product.html?"))
+        self.assertEqual('product.html',
+                url_query_cleaner("product.html?&"))
         self.assertEqual('product.html?id=200',
                 url_query_cleaner("product.html?id=200&foo=bar&name=wired", 
['id']))
         self.assertEqual('product.html?id=200',
@@ -308,6 +349,10 @@
                 url_query_cleaner("product.html?id=2&foo=bar&name=wired", 
['id', 'foo'], remove=True))
         self.assertEqual('product.html?foo=bar&name=wired',
                 url_query_cleaner("product.html?id=2&foo=bar&name=wired", 
['id', 'footo'], remove=True))
+        self.assertEqual('product.html',
+                url_query_cleaner("product.html", ['id'], remove=True))
+        self.assertEqual('product.html',
+                url_query_cleaner("product.html?&", ['id'], remove=True))
         self.assertEqual('product.html?foo=bar',
                 url_query_cleaner("product.html?foo=bar&name=wired", 'foo'))
         self.assertEqual('product.html?foobar=wired',
@@ -321,7 +366,7 @@
 
     def test_path_to_file_uri(self):
         if os.name == 'nt':
-            self.assertEqual(path_to_file_uri("C:\\windows\clock.avi"),
+            self.assertEqual(path_to_file_uri(r"C:\\windows\clock.avi"),
                              "file:///C:/windows/clock.avi")
         else:
             self.assertEqual(path_to_file_uri("/some/path.txt"),
@@ -329,13 +374,13 @@
 
         fn = "test.txt"
         x = path_to_file_uri(fn)
-        self.assert_(x.startswith('file:///'))
+        self.assertTrue(x.startswith('file:///'))
         self.assertEqual(file_uri_to_path(x).lower(), 
os.path.abspath(fn).lower())
 
     def test_file_uri_to_path(self):
         if os.name == 'nt':
             self.assertEqual(file_uri_to_path("file:///C:/windows/clock.avi"),
-                             "C:\\windows\clock.avi")
+                             r"C:\\windows\clock.avi")
             uri = "file:///C:/windows/clock.avi"
             uri2 = path_to_file_uri(file_uri_to_path(uri))
             self.assertEqual(uri, uri2)
@@ -353,7 +398,7 @@
 
     def test_any_to_uri(self):
         if os.name == 'nt':
-            self.assertEqual(any_to_uri("C:\\windows\clock.avi"),
+            self.assertEqual(any_to_uri(r"C:\\windows\clock.avi"),
                              "file:///C:/windows/clock.avi")
         else:
             self.assertEqual(any_to_uri("/some/path.txt"),
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/tox.ini new/w3lib-1.21.0/tox.ini
--- old/w3lib-1.18.0/tox.ini    2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/tox.ini    2019-08-09 13:00:00.000000000 +0200
@@ -4,7 +4,7 @@
 # and then run "tox" from this directory.
 
 [tox]
-envlist = py27, pypy, py33, py34, py35, py36
+envlist = py27, pypy, py34, py35, py36, py37, pypy3
 
 [testenv]
 deps =
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/w3lib/__init__.py 
new/w3lib-1.21.0/w3lib/__init__.py
--- old/w3lib-1.18.0/w3lib/__init__.py  2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/w3lib/__init__.py  2019-08-09 13:00:00.000000000 +0200
@@ -1,3 +1,3 @@
-__version__ = "1.18.0"
+__version__ = "1.21.0"
 version_info = tuple(int(v) if v.isdigit() else v
                      for v in __version__.split('.'))
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/w3lib/encoding.py 
new/w3lib-1.21.0/w3lib/encoding.py
--- old/w3lib-1.18.0/w3lib/encoding.py  2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/w3lib/encoding.py  2019-08-09 13:00:00.000000000 +0200
@@ -3,6 +3,7 @@
 Functions for handling encoding of web pages
 """
 import re, codecs, encodings
+from sys import version_info
 
 _HEADER_ENCODING_RE = re.compile(r'charset=([\w-]+)', re.I)
 
@@ -22,7 +23,7 @@
 
 # regexp for parsing HTTP meta tags
 _TEMPLATE = r'''%s\s*=\s*["']?\s*%s\s*["']?'''
-_SKIP_ATTRS = '''(?x)(?:\\s+
+_SKIP_ATTRS = '''(?:\\s+
     [^=<>/\\s"'\x00-\x1f\x7f]+  # Attribute name
     (?:\\s*=\\s*
     (?:  # ' and " are entity encoded (&apos;, &quot;), so no need for \', \"
@@ -32,7 +33,7 @@
         |
         [^'"\\s]+  # attr having no ' nor "
     ))?
-)*?'''
+)*?'''  # must be used with re.VERBOSE flag
 _HTTPEQUIV_RE = _TEMPLATE % ('http-equiv', 'Content-Type')
 _CONTENT_RE = _TEMPLATE % ('content', 
r'(?P<mime>[^;]+);\s*charset=(?P<charset>[\w-]+)')
 _CONTENT2_RE = _TEMPLATE % ('charset', r'(?P<charset2>[\w-]+)')
@@ -41,8 +42,9 @@
 # check for meta tags, or xml decl. and stop search if a body tag is 
encountered
 _BODY_ENCODING_PATTERN = 
r'<\s*(?:meta%s(?:(?:\s+%s|\s+%s){2}|\s+%s)|\?xml\s[^>]+%s|body)' % (
     _SKIP_ATTRS, _HTTPEQUIV_RE, _CONTENT_RE, _CONTENT2_RE, _XML_ENCODING_RE)
-_BODY_ENCODING_STR_RE = re.compile(_BODY_ENCODING_PATTERN, re.I)
-_BODY_ENCODING_BYTES_RE = re.compile(_BODY_ENCODING_PATTERN.encode('ascii'), 
re.I)
+_BODY_ENCODING_STR_RE = re.compile(_BODY_ENCODING_PATTERN, re.I | re.VERBOSE)
+_BODY_ENCODING_BYTES_RE = re.compile(_BODY_ENCODING_PATTERN.encode('ascii'),
+                                     re.I | re.VERBOSE)
 
 def html_body_declared_encoding(html_body_str):
     '''Return the encoding specified in meta tags in the html body,
@@ -173,7 +175,7 @@
 
 # Python decoder doesn't follow unicode standard when handling
 # bad utf-8 encoded strings. see http://bugs.python.org/issue8271
-codecs.register_error('w3lib_replace', lambda exc: (u'\ufffd', exc.start+1))
+codecs.register_error('w3lib_replace', lambda exc: (u'\ufffd', exc.end))
 
 def to_unicode(data_str, encoding):
     """Convert a str object to unicode using the encoding given
@@ -181,7 +183,7 @@
     Characters that cannot be converted will be converted to ``\\ufffd`` (the
     unicode replacement character).
     """
-    return data_str.decode(encoding, 'w3lib_replace')
+    return data_str.decode(encoding, 'replace' if version_info[0:2] >= (3, 3) 
else 'w3lib_replace')
 
 def html_to_unicode(content_type_header, html_body_str,
         default_encoding='utf8', auto_detect_fun=None):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/w3lib/html.py 
new/w3lib-1.21.0/w3lib/html.py
--- old/w3lib-1.18.0/w3lib/html.py      2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/w3lib/html.py      2019-08-09 13:00:00.000000000 +0200
@@ -122,7 +122,7 @@
     return _tag_re.sub(token, to_unicode(text, encoding))
 
 
-_REMOVECOMMENTS_RE = re.compile(u'<!--.*?-->', re.DOTALL)
+_REMOVECOMMENTS_RE = re.compile(u'<!--.*?(?:-->|$)', re.DOTALL)
 def remove_comments(text, encoding=None):
     """ Remove HTML Comments.
 
@@ -220,7 +220,7 @@
 
     text = to_unicode(text, encoding)
     if which_ones:
-        tags = '|'.join([r'<%s.*?</%s>|<%s\s*/>' % (tag, tag, tag) for tag in 
which_ones])
+        tags = '|'.join([r'<%s\b.*?</%s>|<%s\s*/>' % (tag, tag, tag) for tag 
in which_ones])
         retags = re.compile(tags, re.DOTALL | re.IGNORECASE)
         text = retags.sub(u'', text)
     return text
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/w3lib/http.py 
new/w3lib-1.21.0/w3lib/http.py
--- old/w3lib-1.18.0/w3lib/http.py      2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/w3lib/http.py      2019-08-09 13:00:00.000000000 +0200
@@ -78,7 +78,7 @@
     return b'\r\n'.join(raw_lines)
 
 
-def basic_auth_header(username, password):
+def basic_auth_header(username, password, encoding='ISO-8859-1'):
     """
     Return an `Authorization` header field value for `HTTP Basic Access 
Authentication (RFC 2617)`_
 
@@ -95,5 +95,5 @@
         # XXX: RFC 2617 doesn't define encoding, but ISO-8859-1
         # seems to be the most widely used encoding here. See also:
         # 
http://greenbytes.de/tech/webdav/draft-ietf-httpauth-basicauth-enc-latest.html
-        auth = auth.encode('ISO-8859-1')
+        auth = auth.encode(encoding)
     return b'Basic ' + urlsafe_b64encode(auth)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/w3lib/url.py 
new/w3lib-1.21.0/w3lib/url.py
--- old/w3lib-1.18.0/w3lib/url.py       2017-08-03 15:24:36.000000000 +0200
+++ new/w3lib-1.21.0/w3lib/url.py       2019-08-09 13:00:00.000000000 +0200
@@ -9,7 +9,7 @@
 import posixpath
 import warnings
 import string
-from collections import namedtuple
+from collections import namedtuple, OrderedDict
 import six
 from six.moves.urllib.parse import (urljoin, urlsplit, urlunsplit,
                                     urldefrag, urlencode, urlparse,
@@ -34,9 +34,12 @@
 
 _safe_chars = RFC3986_RESERVED + RFC3986_UNRESERVED + EXTRA_SAFE_CHARS + b'%'
 
+_ascii_tab_newline_re = re.compile(r'[\t\n\r]')  # see 
https://infra.spec.whatwg.org/#ascii-tab-or-newline
+
 def safe_url_string(url, encoding='utf8', path_encoding='utf8'):
     """Convert the given URL into a legal URL by escaping unsafe characters
-    according to RFC-3986.
+    according to RFC-3986. Also, ASCII tabs and newlines are removed
+    as per https://url.spec.whatwg.org/#url-parsing.
 
     If a bytes URL is given, it is first converted to `str` using the given
     encoding (which defaults to 'utf-8'). 'utf-8' encoding is used for
@@ -56,8 +59,8 @@
     #     encoded with the supplied encoding (or UTF8 by default)
     #   - if the supplied (or default) encoding chokes,
     #     percent-encode offending bytes
-    parts = urlsplit(to_unicode(url, encoding=encoding,
-                                errors='percentencode'))
+    decoded = to_unicode(url, encoding=encoding, errors='percentencode')
+    parts = urlsplit(_ascii_tab_newline_re.sub('', decoded))
 
     # IDNA encoding can fail for too long labels (>63 characters)
     # or missing labels (e.g. http://.example.com)
@@ -84,7 +87,7 @@
 
 _parent_dirs = re.compile(r'/?(\.\./)+')
 
-def safe_download_url(url):
+def safe_download_url(url, encoding='utf8', path_encoding='utf8'):
     """ Make a url for download. This will call safe_url_string
     and then strip the fragment, if one exists. The path will
     be normalised.
@@ -92,11 +95,11 @@
     If the path is outside the document root, it will be changed
     to be within the document root.
     """
-    safe_url = safe_url_string(url)
+    safe_url = safe_url_string(url, encoding, path_encoding)
     scheme, netloc, path, query, _ = urlsplit(safe_url)
     if path:
         path = _parent_dirs.sub('', posixpath.normpath(path))
-        if url.endswith('/') and not path.endswith('/'):
+        if safe_url.endswith('/') and not path.endswith('/'):
             path += '/'
     else:
         path = '/'
@@ -182,6 +185,8 @@
     seen = set()
     querylist = []
     for ksv in query.split(sep):
+        if not ksv:
+            continue
         k, _, _ = ksv.partition(kvsep)
         if unique and k in seen:
             continue
@@ -198,6 +203,17 @@
     return url
 
 
+def _add_or_replace_parameters(url, params):
+    parsed = urlsplit(url)
+    args = parse_qsl(parsed.query, keep_blank_values=True)
+
+    new_args = OrderedDict(args)
+    new_args.update(params)
+
+    query = urlencode(new_args)
+    return urlunsplit(parsed._replace(query=query))
+
+
 def add_or_replace_parameter(url, name, new_value):
     """Add or remove a parameter to a given url
 
@@ -211,23 +227,22 @@
     >>>
 
     """
-    parsed = urlsplit(url)
-    args = parse_qsl(parsed.query, keep_blank_values=True)
+    return _add_or_replace_parameters(url, {name: new_value})
 
-    new_args = []
-    found = False
-    for name_, value_ in args:
-        if name_ == name:
-            new_args.append((name_, new_value))
-            found = True
-        else:
-            new_args.append((name_, value_))
 
-    if not found:
-        new_args.append((name, new_value))
+def add_or_replace_parameters(url, new_parameters):
+    """Add or remove a parameters to a given url
 
-    query = urlencode(new_args)
-    return urlunsplit(parsed._replace(query=query))
+    >>> import w3lib.url
+    >>> 
w3lib.url.add_or_replace_parameters('http://www.example.com/index.php', {'arg': 
'v'})
+    'http://www.example.com/index.php?arg=v'
+    >>> args = {'arg4': 'v4', 'arg3': 'v3new'}
+    >>> 
w3lib.url.add_or_replace_parameters('http://www.example.com/index.php?arg1=v1&arg2=v2&arg3=v3',
 args)
+    'http://www.example.com/index.php?arg1=v1&arg2=v2&arg3=v3new&arg4=v4'
+    >>>
+
+    """
+    return _add_or_replace_parameters(url, new_parameters)
 
 
 def path_to_file_uri(path):
@@ -291,6 +306,7 @@
 _ParseDataURIResult = namedtuple("ParseDataURIResult",
                                  "media_type media_type_parameters data")
 
+
 def parse_data_uri(uri):
     """
 
@@ -355,6 +371,7 @@
 
 
 __all__ = ["add_or_replace_parameter",
+           "add_or_replace_parameters",
            "any_to_uri",
            "canonicalize_url",
            "file_uri_to_path",
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/w3lib-1.18.0/w3lib.egg-info/PKG-INFO 
new/w3lib-1.21.0/w3lib.egg-info/PKG-INFO
--- old/w3lib-1.18.0/w3lib.egg-info/PKG-INFO    2017-08-03 15:25:28.000000000 
+0200
+++ new/w3lib-1.21.0/w3lib.egg-info/PKG-INFO    2019-08-09 13:00:36.000000000 
+0200
@@ -1,6 +1,6 @@
 Metadata-Version: 1.1
 Name: w3lib
-Version: 1.18.0
+Version: 1.21.0
 Summary: Library of web-related functions
 Home-page: https://github.com/scrapy/w3lib
 Author: Scrapy project
@@ -15,10 +15,10 @@
 Classifier: Programming Language :: Python :: 2
 Classifier: Programming Language :: Python :: 2.7
 Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.3
 Classifier: Programming Language :: Python :: 3.4
 Classifier: Programming Language :: Python :: 3.5
 Classifier: Programming Language :: Python :: 3.6
+Classifier: Programming Language :: Python :: 3.7
 Classifier: Programming Language :: Python :: Implementation :: CPython
 Classifier: Programming Language :: Python :: Implementation :: PyPy
 Classifier: Topic :: Internet :: WWW/HTTP

commit python-w3lib for openSUSE:Leap:15.2

Reply via email to