commit python-pytidylib for openSUSE:Factory

root Thu, 16 May 2019 13:07:21 -0700

Hello community,

here is the log from the commit of package python-pytidylib for 
openSUSE:Factory checked in at 2019-05-16 22:06:57
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-pytidylib (Old)
 and      /work/SRC/openSUSE:Factory/.python-pytidylib.new.5148 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-pytidylib"

Thu May 16 22:06:57 2019 rev:2 rq:702989 version:0.3.2

Changes:
--------
--- /work/SRC/openSUSE:Factory/python-pytidylib/python-pytidylib.changes        
2016-09-13 22:23:40.000000000 +0200
+++ 
/work/SRC/openSUSE:Factory/.python-pytidylib.new.5148/python-pytidylib.changes  
    2019-05-16 22:06:59.526452945 +0200
@@ -1,0 +2,8 @@
+Tue May 14 20:22:54 UTC 2019 - John Jolly <[email protected]>
+
+- Updated to 0.3.2
+  + No upstream changelog
+- Added %check section
+  + Excluded the large file check
+
+-------------------------------------------------------------------

Old:
----
  pytidylib-0.2.4.tar.gz

New:
----
  pytidylib-0.3.2.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-pytidylib.spec ++++++
--- /var/tmp/diff_new_pack.wiketI/_old  2019-05-16 22:07:01.430451061 +0200
+++ /var/tmp/diff_new_pack.wiketI/_new  2019-05-16 22:07:01.442451049 +0200
@@ -1,7 +1,7 @@
 #
 # spec file for package python-pytidylib
 #
-# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany.
+# Copyright (c) 2019 SUSE LINUX GmbH, Nuernberg, Germany.
 #
 # All modifications and additions to the file contributed by third parties
 # remain the property of their copyright owners, unless otherwise agreed
@@ -12,27 +12,25 @@
 # license that conforms to the Open Source Definition (Version 1.9)
 # published by the Open Source Initiative.
 
-# Please submit bugfixes or comments via http://bugs.opensuse.org/
+# Please submit bugfixes or comments via https://bugs.opensuse.org/
 #
 
 
-%if 0%{?suse_version} && 0%{?suse_version} <= 1110
-%{!?python_sitelib: %global python_sitelib %(python -c "from 
distutils.sysconfig import get_python_lib; print get_python_lib()")}
-%else
-BuildArch:      noarch
-%endif
+%{?!python_module:%define python_module() python-%{**} python3-%{**}}
 Name:           python-pytidylib
-Version:        0.2.4
+Version:        0.3.2
 Release:        0
 Summary:        Python wrapper for HTML Tidy (tidylib) on Python 2 and 3
 License:        MIT
 Group:          Development/Languages/Python
-Url:            http://countergram.com/open-source/pytidylib/
-Source:         
https://pypi.python.org/packages/b4/a0/b70cf2b7b4ee1f9d8fa0f1b4abbbac081a2638a580dabf29b8fb554d5fc1/pytidylib-%{version}.tar.gz
+URL:            http://countergram.com/open-source/pytidylib/
+Source:         
https://files.pythonhosted.org/packages/source/p/pytidylib/pytidylib-%{version}.tar.gz
+BuildRequires:  %{python_module pytest}
+BuildRequires:  %{python_module setuptools}
+BuildRequires:  fdupes
 BuildRequires:  libtidy-devel
-BuildRequires:  python-devel
-Requires:       libtidy5
-BuildRoot:      %{_tmppath}/%{name}-%{version}-build
+BuildArch:      noarch
+%python_subpackages
 
 %description
 `PyTidyLib`_ is a Python package that wraps the `HTML Tidy`_ library. This
@@ -62,14 +60,20 @@
 %setup -q -n pytidylib-%{version}
 
 %build
-python setup.py build
+%python_build
 
 %install
-python setup.py install --prefix=%{_prefix} --root=%{buildroot}
+%python_install
+%python_expand %fdupes %{buildroot}%{$python_sitelib}
 
-%files
-%defattr(-,root,root,-)
-%doc LICENSE README
+%check
+# The large document test is excluded as it produces inconsistent
+# results across architectures.
+%pytest -k 'not test_large_document'
+
+%files %{python_files}
+%doc README
+%license LICENSE
 %{python_sitelib}/*
 
 %changelog

++++++ pytidylib-0.2.4.tar.gz -> pytidylib-0.3.2.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/pytidylib-0.2.4/PKG-INFO new/pytidylib-0.3.2/PKG-INFO
--- old/pytidylib-0.2.4/PKG-INFO        2014-12-20 05:22:41.000000000 +0100
+++ new/pytidylib-0.3.2/PKG-INFO        2016-11-16 02:52:52.000000000 +0100
@@ -1,6 +1,6 @@
 Metadata-Version: 1.1
 Name: pytidylib
-Version: 0.2.4
+Version: 0.3.2
 Summary: Python wrapper for HTML Tidy (tidylib) on Python 2 and 3
 Home-page: http://countergram.com/open-source/pytidylib/
 Author: Jason Stitt
@@ -18,12 +18,22 @@
         * Indent the output, including proper (i.e. no) indenting for ``pre`` 
elements,
           which some (X)HTML indenting code overlooks.
         
-        Version usage
-        =============
+        Changes
+        =======
         
-        * Windows: 0.2.0 and later
-        * Python 3: Tests pass on 0.2.3
-        * tidylib itself is not actively updated and may have problems with 
newer HTML
+        * 0.3.2: Initialization bug fix
+        
+        * 0.3.1: find_library support while still allowing a list of library 
names
+        
+        * 0.3.0: Refactored to use Tidy and PersistentTidy classes while 
keeping the
+        functional interface (which will lazily create a global Tidy() object) 
for
+        backward compatibility. You can now pass a list of library names and 
base
+        options when instantiating Tidy. The keep_doc argument is now 
deprecated
+        and does nothing; use PersistentTidy.
+        
+        * 0.2.4: Bugfix for a strange memory allocation corner case in Tidy.
+        
+        * 0.2.3: Python 3 support (2 + 3 cross compatible) with passing Tox 
tests.
         
         Small example of use
         ====================
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/pytidylib-0.2.4/README new/pytidylib-0.3.2/README
--- old/pytidylib-0.2.4/README  2014-12-11 19:31:37.000000000 +0100
+++ new/pytidylib-0.3.2/README  2016-09-22 23:42:09.000000000 +0200
@@ -1,14 +1,10 @@
-For documentation, see docs/html/index.html in this distribution, or
-http://countergram.com/open-source/pytidylib/
+This is a Python wrapper around the HTML Tidy library. Quick start example:
 
-Small example of use:
+from tidylib import Tidy
+tidy = Tidy()
+document, errors = tidy.tidy_document('<p>f&otilde;o <img src="bar.jpg">',
+    options={'alt-text': 'baz'})
+print(document)
+print(errors)
 
-from tidylib import tidy_document
-document, errors = tidy_document('''<p>f&otilde;o <img src="bar.jpg">''',
-    options={'numeric-entities':1})
-print document
-print errors
-
-NOTE: HTML Tidy itself has currently not been updated for a long time, and may
-not be, and it may have trouble with newer HTML. This is just a thin Python
-wrapper around HTML Tidy, which is a separate project.
+For full documentation, see the docs/ directory.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/pytidylib-0.2.4/setup.py new/pytidylib-0.3.2/setup.py
--- old/pytidylib-0.2.4/setup.py        2014-12-20 05:19:09.000000000 +0100
+++ new/pytidylib-0.3.2/setup.py        2016-11-16 02:52:32.000000000 +0100
@@ -1,4 +1,4 @@
-# Copyright 2009 Jason Stitt
+# Copyright 2009-2015 Jason Stitt
 #
 # Permission is hereby granted, free of charge, to any person obtaining a copy
 # of this software and associated documentation files (the "Software"), to deal
@@ -33,12 +33,22 @@
 * Indent the output, including proper (i.e. no) indenting for ``pre`` elements,
   which some (X)HTML indenting code overlooks.
 
-Version usage
-=============
+Changes
+=======
 
-* Windows: 0.2.0 and later
-* Python 3: Tests pass on 0.2.3
-* tidylib itself is not actively updated and may have problems with newer HTML
+* 0.3.2: Initialization bug fix
+
+* 0.3.1: find_library support while still allowing a list of library names
+
+* 0.3.0: Refactored to use Tidy and PersistentTidy classes while keeping the
+functional interface (which will lazily create a global Tidy() object) for
+backward compatibility. You can now pass a list of library names and base
+options when instantiating Tidy. The keep_doc argument is now deprecated
+and does nothing; use PersistentTidy.
+
+* 0.2.4: Bugfix for a strange memory allocation corner case in Tidy.
+
+* 0.2.3: Python 3 support (2 + 3 cross compatible) with passing Tox tests.
 
 Small example of use
 ====================
@@ -61,7 +71,7 @@
 .. _`PyTidyLib`: http://countergram.com/open-source/pytidylib/
 """
 
-VERSION = "0.2.4"
+VERSION = "0.3.2"
 
 setup(
     name="pytidylib",
@@ -73,16 +83,15 @@
     url="http://countergram.com/open-source/pytidylib/";,
     packages=['tidylib'],
     classifiers=[
-          'Development Status :: 5 - Production/Stable',
-          'Environment :: Other Environment',
-          'Intended Audience :: Developers',
-          'License :: OSI Approved :: MIT License',
-          'Programming Language :: Python',
-          'Programming Language :: Python :: 3',
-          'Natural Language :: English',
-          'Topic :: Utilities',
-          'Topic :: Text Processing :: Markup :: HTML',
-          'Topic :: Text Processing :: Markup :: XML',
-          ],
-    )
-
+        'Development Status :: 5 - Production/Stable',
+        'Environment :: Other Environment',
+        'Intended Audience :: Developers',
+        'License :: OSI Approved :: MIT License',
+        'Programming Language :: Python',
+        'Programming Language :: Python :: 3',
+        'Natural Language :: English',
+        'Topic :: Utilities',
+        'Topic :: Text Processing :: Markup :: HTML',
+        'Topic :: Text Processing :: Markup :: XML',
+    ],
+)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/pytidylib-0.2.4/tests/test_docs.py 
new/pytidylib-0.3.2/tests/test_docs.py
--- old/pytidylib-0.2.4/tests/test_docs.py      2014-12-20 05:05:51.000000000 
+0100
+++ new/pytidylib-0.3.2/tests/test_docs.py      2016-09-29 04:23:12.000000000 
+0200
@@ -22,7 +22,7 @@
 from __future__ import unicode_literals
 
 import unittest
-from tidylib import tidy_document, release_tidy_doc, thread_local_doc
+from tidylib import Tidy, PersistentTidy, tidy_document
 
 DOC = u'''<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
 <html>
@@ -48,20 +48,20 @@
 
     def test_alt_added_to_img(self):
         h = "<img src='foo'>"
-        expected = DOC % '''<img src='foo' alt="">'''
-        doc, err = tidy_document(h)
+        expected = DOC % '''<img src='foo' alt="bar">'''
+        doc, err = tidy_document(h, {'alt-text': 'bar'})
         self.assertEqual(doc, expected)
 
     def test_entity_preserved_using_bytes(self):
         h = b"&eacute;"
         expected = (DOC % "&eacute;").encode('utf-8')
-        doc, err = tidy_document(h)
+        doc, err = tidy_document(h, {'preserve-entities': 1})
         self.assertEqual(doc, expected)
 
     def test_numeric_entities_using_bytes(self):
         h = b"&eacute;"
         expected = (DOC % "&#233;").encode('utf-8')
-        doc, err = tidy_document(h, {'numeric-entities': 1})
+        doc, err = tidy_document(h, {'numeric-entities': 1, 'output-encoding': 
'ascii'})
         self.assertEqual(doc, expected)
 
     def test_non_ascii_preserved(self):
@@ -76,6 +76,28 @@
         doc, err = tidy_document(h)
         self.assertEqual(doc, expected)
 
+    def test_can_use_two_tidy_instances(self):
+        t1 = Tidy()
+        t2 = Tidy()
+        self.assertEqual(t1.tidy_document(DOC % 'a')[0], DOC % 'a')
+        self.assertEqual(t2.tidy_document(DOC % 'b')[0], DOC % 'b')
+
+    def test_tidy_doesnt_persist_options(self):
+        tidy = Tidy()
+        # This option makes it a fragment
+        doc, err = tidy.tidy_document(DOC % 'a', {'show-body-only': 1})
+        self.assertEqual(doc, 'a\n')
+        doc, err = tidy.tidy_document(DOC % 'a')
+        self.assertEqual(doc, DOC % 'a')
+
+    def test_persistent_tidy_does_persist_options(self):
+        tidy = PersistentTidy()
+        # This option makes it a fragment
+        doc, err = tidy.tidy_document(DOC % 'a', {'show-body-only': 1})
+        self.assertEqual(doc, 'a\n')
+        doc, err = tidy.tidy_document(DOC % 'a')
+        self.assertEqual(doc, 'a\n')
+
     def test_xmlns_large_document_xml_corner_case(self):
         # Test for a super weird edge case in Tidy that can cause it to return
         # the wrong required buffer size.
@@ -84,16 +106,6 @@
         doc, err = tidy_document(html, {'output-xml': 1})
         self.assertEqual(doc.strip()[-7:], "</html>")
 
-    def test_keep_document(self):
-        h = "hello"
-        expected = DOC % h
-        for i in range(4):
-            doc, err = tidy_document(h, keep_doc=True)
-            self.assertEqual(doc, expected)
-        assert hasattr(thread_local_doc, 'doc')
-        release_tidy_doc()
-        assert not hasattr(thread_local_doc, 'doc')
-
 
 if __name__ == '__main__':
     unittest.main()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/pytidylib-0.2.4/tests/test_fragments.py 
new/pytidylib-0.3.2/tests/test_fragments.py
--- old/pytidylib-0.2.4/tests/test_fragments.py 2014-12-11 19:31:37.000000000 
+0100
+++ new/pytidylib-0.3.2/tests/test_fragments.py 2016-09-29 04:23:12.000000000 
+0200
@@ -36,20 +36,20 @@
 
     def test_alt_added_to_img(self):
         h = "<img src='foo'>"
-        expected = '''<img src='foo' alt="">'''
-        doc, err = tidy_fragment(h)
+        expected = '''<img src='foo' alt="bar">'''
+        doc, err = tidy_fragment(h, {'alt-text': 'bar'})
         self.assertEqual(doc, expected)
 
     def test_entity_preserved_using_bytes(self):
         h = b"&eacute;"
         expected = b"&eacute;"
-        doc, err = tidy_fragment(h)
+        doc, err = tidy_fragment(h, {'preserve-entities': 1})
         self.assertEqual(doc, expected)
 
     def test_numeric_entities_using_bytes(self):
         h = b"&eacute;"
         expected = b"&#233;"
-        doc, err = tidy_fragment(h, {'numeric-entities': 1})
+        doc, err = tidy_fragment(h, {'numeric-entities': 1, 'output-encoding': 
'ascii'})
         self.assertEqual(doc, expected)
 
     def test_non_ascii_preserved(self):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/pytidylib-0.2.4/tests/test_init.py 
new/pytidylib-0.3.2/tests/test_init.py
--- old/pytidylib-0.2.4/tests/test_init.py      1970-01-01 01:00:00.000000000 
+0100
+++ new/pytidylib-0.3.2/tests/test_init.py      2016-11-16 02:47:54.000000000 
+0100
@@ -0,0 +1,32 @@
+# -*- coding: utf-8 -*-
+# Copyright 2009-2016 Jason Stitt
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+from __future__ import unicode_literals
+
+import unittest
+from tidylib import Tidy, PersistentTidy, tidy_document
+
+
+class TestDocs1(unittest.TestCase):
+
+    def test_not_find_lib(self):
+        with self.assertRaises(OSError):
+            tidy = Tidy(lib_names=[])
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/pytidylib-0.2.4/tests/threadsafety.py 
new/pytidylib-0.3.2/tests/threadsafety.py
--- old/pytidylib-0.2.4/tests/threadsafety.py   2014-12-11 19:31:37.000000000 
+0100
+++ new/pytidylib-0.3.2/tests/threadsafety.py   2016-09-22 23:42:09.000000000 
+0200
@@ -24,9 +24,8 @@
 
 error_queue = Queue()
 
-DOC = '''<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
-<html xmlns="http://www.w3.org/1999/xhtml";>
+DOC = '''<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
+<html>
   <head>
     <title></title>
   </head>
@@ -63,5 +62,5 @@
 if __name__ == '__main__':
     run_test()
     if not error_queue.empty():
-        print "About %s errors out of %s" % (error_queue.qsize(), NUM_THREADS 
* NUM_TRIES)
-        print error_queue.get()
+        print("About %s errors out of %s" % (error_queue.qsize(), NUM_THREADS 
* NUM_TRIES))
+        print(error_queue.get())
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/pytidylib-0.2.4/tidylib/__init__.py 
new/pytidylib-0.3.2/tidylib/__init__.py
--- old/pytidylib-0.2.4/tidylib/__init__.py     2014-12-20 05:21:13.000000000 
+0100
+++ new/pytidylib-0.3.2/tidylib/__init__.py     2016-09-22 23:42:09.000000000 
+0200
@@ -1,203 +1 @@
-# Copyright 2009-2014 Jason Stitt
-#
-# Permission is hereby granted, free of charge, to any person obtaining a copy
-# of this software and associated documentation files (the "Software"), to deal
-# in the Software without restriction, including without limitation the rights
-# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-# copies of the Software, and to permit persons to whom the Software is
-# furnished to do so, subject to the following conditions:
-#
-# The above copyright notice and this permission notice shall be included in
-# all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-# THE SOFTWARE.
-
-import ctypes
-import threading
-import platform
-from tidylib.sink import create_sink, destroy_sink
-
-__all__ = ['tidy_document', 'tidy_fragment', 'release_tidy_doc']
-
-# -------------------------------------------------------------------------- #
-# Constants
-
-LIB_NAMES = ['libtidy', 'libtidy.so', 'libtidy-0.99.so.0', 'cygtidy-0-99-0',
-             'tidylib', 'libtidy.dylib', 'tidy']
-ENOMEM = -12
-BASE_OPTIONS = {
-    "indent": 1,           # Pretty; not too much of a performance hit
-    "tidy-mark": 0,        # No tidy meta tag in output
-    "wrap": 0,             # No wrapping
-    "alt-text": "",        # Help ensure validation
-    "doctype": 'strict',   # Little sense in transitional for tool-generated 
markup...
-    "force-output": 1,     # May not get what you expect but you will get 
something
-}
-
-# Note: These are meant as sensible defaults. If you don't like these being
-# applied by default, just set tidylib.BASE_OPTIONS = {} after importing.
-# You can of course override any of these options when you call the
-# tidy_document() or tidy_fragment() function
-
-# -------------------------------------------------------------------------- #
-# Globals
-
-tidy = None
-thread_local_doc = threading.local()
-
-# Fix for Windows b/c tidy uses stdcall on Windows
-if "Windows" == platform.system():
-    load_library = ctypes.windll.LoadLibrary
-else:
-    load_library = ctypes.cdll.LoadLibrary
-
-for name in LIB_NAMES:
-    try:
-        tidy = load_library(name)
-        break
-    except OSError:
-        pass
-
-if tidy is None:
-    raise OSError("Could not load libtidy using any of these names: %s" % 
(",".join(LIB_NAMES)))
-
-tidy.tidyCreate.restype = ctypes.POINTER(ctypes.c_void_p)  # Fix for 64-bit 
systems
-
-# -------------------------------------------------------------------------- #
-# 3.x/2.x cross-compatibility
-
-try:
-    unicode  # 2.x
-
-    def is_unicode(obj):
-        return isinstance(obj, unicode)
-
-    def encode_key_value(k, v):
-        return unicode(k).encode('utf-8'), unicode(v).encode('utf-8')
-except NameError:
-    # 3.x
-    def is_unicode(obj):
-        return isinstance(obj, str)
-
-    def encode_key_value(k, v):
-        return str(k).encode('utf-8'), str(v).encode('utf-8')
-
-# -------------------------------------------------------------------------- #
-# Functions
-
-
-def tidy_document(text, options=None, keep_doc=False):
-    """ Run a string with markup through HTML Tidy; return the corrected one.
-
-    text: The markup, which may be anything from an empty string to a complete
-    (X)HTML document. If you pass in a unicode type (py3 str, py2 unicode) you
-    get one back out, and tidy will have some options set that may affect
-    behavior (e.g. named entities converted to plain unicode characters). If
-    you pass in a bytes type (py3 bytes, py2 str) you will get one of those
-    back.
-
-    options (dict): Options passed directly to HTML Tidy; see the HTML Tidy 
docs
-    (http://tidy.sourceforge.net/docs/quickref.html) or run tidy -help-config
-    from the command line.
-
-    keep_doc (boolean): If True, store 1 document object per thread and re-use
-    it, for a slight performance boost especially when tidying very large 
numbers
-    of very short documents.
-
-    returns (str, str): The tidied markup and unparsed warning/error messages.
-    Warnings and errors are returned just as tidylib returns them.
-    """
-    global tidy, option_names
-
-    # Unicode approach is to encode as string, then decode libtidy output
-    use_unicode = False
-    if is_unicode(text):
-        use_unicode = True
-        text = text.encode('utf-8')
-
-    # Manage thread-local storage of persistent document object
-    if keep_doc:
-        if not hasattr(thread_local_doc, 'doc'):
-            thread_local_doc.doc = tidy.tidyCreate()
-        doc = thread_local_doc.doc
-    else:
-        doc = tidy.tidyCreate()
-
-    # This is where error messages are sent by libtidy
-    sink = create_sink()
-    tidy.tidySetErrorSink(doc, sink)
-
-    try:
-        # Set options on the document
-        # If keep_doc=True, options will persist between calls, but they can
-        # be overridden, and the BASE_OPTIONS will be set each time
-        tidy_options = dict(BASE_OPTIONS)
-        if options:
-            tidy_options.update(options)
-        if use_unicode:
-            tidy_options['input-encoding'] = 'utf8'
-            tidy_options['output-encoding'] = 'utf8'
-        for key in tidy_options:
-            value = tidy_options[key]
-            key = key.replace('_', '-')
-            if value is None:
-                value = ''
-            key, value = encode_key_value(key, value)
-            tidy.tidyOptParseValue(doc, key, value)
-            error = str(sink)
-            if error:
-                raise ValueError("(tidylib) " + error)
-
-        # The point of the whole thing
-        tidy.tidyParseString(doc, text)
-        tidy.tidyCleanAndRepair(doc)
-
-        # Guess at buffer size; tidy returns ENOMEM if the buffer is too
-        # small and puts the required size into out_length
-        out_length = ctypes.c_int(8192)
-        out = ctypes.c_buffer(out_length.value)
-        while ENOMEM == tidy.tidySaveString(doc, out, 
ctypes.byref(out_length)):
-            out = ctypes.c_buffer(out_length.value)
-
-        document = out.value
-        if use_unicode:
-            document = document.decode('utf-8')
-        errors = str(sink)
-    finally:
-        destroy_sink(sink)
-        if not keep_doc:
-            tidy.tidyRelease(doc)
-
-    return (document, errors)
-
-
-def tidy_fragment(text, options=None, keep_doc=False):
-    """ Tidy a string with markup and return only the <body> contents.
-
-    HTML Tidy normally returns a full (X)HTML document; this function returns 
only
-    the contents of the <body> element and is meant to be used for snippets.
-    Calling tidy_fragment on elements that don't go in the <body>, like 
<title>,
-    will produce incorrect behavior.
-
-    Arguments and return value are the same as tidy_document. Note that HTML
-    Tidy will always complain about the lack of a doctype and <title> element
-    in fragments, and these errors are not stripped out for you. """
-    options = dict(options) if options else dict()
-    options["show-body-only"] = 1
-    document, errors = tidy_document(text, options, keep_doc)
-    document = document.strip()
-    return document, errors
-
-
-def release_tidy_doc():
-    """ Release the stored document object in the current thread. Only useful
-    if you have called tidy_document or tidy_fragament with keep_doc=True. """
-    if hasattr(thread_local_doc, 'doc'):
-        tidy.tidyRelease(thread_local_doc.doc)
-        del thread_local_doc.doc
+from .tidy import Tidy, PersistentTidy, tidy_document, tidy_fragment, 
release_tidy_doc
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/pytidylib-0.2.4/tidylib/tidy.py 
new/pytidylib-0.3.2/tidylib/tidy.py
--- old/pytidylib-0.2.4/tidylib/tidy.py 1970-01-01 01:00:00.000000000 +0100
+++ new/pytidylib-0.3.2/tidylib/tidy.py 2016-11-16 02:49:58.000000000 +0100
@@ -0,0 +1,239 @@
+# Copyright 2009-2015 Jason Stitt
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+import ctypes
+import ctypes.util
+import threading
+import platform
+import warnings
+from contextlib import contextmanager
+from .sink import create_sink, destroy_sink
+
+__all__ = ['Tidy', 'PersistentTidy']
+
+# Default search order for library names if nothing is passed in
+LIB_NAMES = ['libtidy', 'libtidy.so', 'libtidy-0.99.so.0', 'cygtidy-0-99-0',
+             'tidylib', 'libtidy.dylib', 'tidy']
+
+# Error code from library
+ENOMEM = -12
+
+# Default options; can be overriden with argument to Tidy()
+BASE_OPTIONS = {
+    "indent": 1,           # Pretty; not too much of a performance hit
+    "tidy-mark": 0,        # No tidy meta tag in output
+    "wrap": 0,             # No wrapping
+    "alt-text": "",        # Help ensure validation
+    "doctype": 'strict',   # Little sense in transitional for tool-generated 
markup...
+    "force-output": 1,     # May not get what you expect but you will get 
something
+}
+
+KEEP_DOC_WARNING = "keep_doc and release_tidy_doc are no longer used. Create a 
PersistentTidy object instead."
+
+# Fix for Windows b/c tidy uses stdcall on Windows
+if "Windows" == platform.system():
+    load_library = ctypes.windll.LoadLibrary
+else:
+    load_library = ctypes.cdll.LoadLibrary
+
+# -------------------------------------------------------------------------- #
+# 3.x/2.x cross-compatibility
+
+try:
+    unicode  # 2.x
+
+    def is_unicode(obj):
+        return isinstance(obj, unicode)
+
+    def encode_key_value(k, v):
+        return unicode(k).encode('utf-8'), unicode(v).encode('utf-8')
+except NameError:
+    # 3.x
+    def is_unicode(obj):
+        return isinstance(obj, str)
+
+    def encode_key_value(k, v):
+        return str(k).encode('utf-8'), str(v).encode('utf-8')
+
+# -------------------------------------------------------------------------- #
+# The main python interface
+
+
+class Tidy(object):
+
+    """ Wrapper around the HTML Tidy library for cleaning up possibly invalid
+    HTML and XHTML. """
+
+    def __init__(self, lib_names=None):
+        self._tidy = None
+        if lib_names is None:
+            lib_names = ctypes.util.find_library('tidy') or LIB_NAMES
+        if isinstance(lib_names, str):
+            lib_names = [lib_names]
+        for name in lib_names:
+            try:
+                self._tidy = load_library(name)
+                break
+            except OSError:
+                continue
+        if self._tidy is None:
+            raise OSError(
+                "Could not load libtidy using any of these names: "
+                + ",".join(lib_names))
+        self._tidy.tidyCreate.restype = ctypes.POINTER(ctypes.c_void_p)  # Fix 
for 64-bit systems
+
+    @contextmanager
+    def _doc_and_sink(self):
+        " Create and cleanup a Tidy document and error sink "
+        doc = self._tidy.tidyCreate()
+        sink = create_sink()
+        self._tidy.tidySetErrorSink(doc, sink)
+        yield (doc, sink)
+        destroy_sink(sink)
+        self._tidy.tidyRelease(doc)
+
+    def tidy_document(self, text, options=None):
+        """ Run a string with markup through HTML Tidy; return the corrected 
one
+        and any error output.
+
+        text: The markup, which may be anything from an empty string to a 
complete
+        (X)HTML document. If you pass in a unicode type (py3 str, py2 unicode) 
you
+        get one back out, and tidy will have some options set that may affect
+        behavior (e.g. named entities converted to plain unicode characters). 
If
+        you pass in a bytes type (py3 bytes, py2 str) you will get one of those
+        back.
+
+        options (dict): Options passed directly to HTML Tidy; see the HTML 
Tidy docs
+        (http://tidy.sourceforge.net/docs/quickref.html) or run tidy 
-help-config
+        from the command line.
+
+        returns (str, str): The tidied markup and unparsed warning/error 
messages.
+        Warnings and errors are returned just as tidylib returns them.
+        """
+
+        # Unicode approach is to encode as string, then decode libtidy output
+        use_unicode = False
+        if is_unicode(text):
+            use_unicode = True
+            text = text.encode('utf-8')
+
+        with self._doc_and_sink() as (doc, sink):
+            tidy_options = dict(BASE_OPTIONS)
+            if options:
+                tidy_options.update(options)
+            if use_unicode:
+                tidy_options['input-encoding'] = 'utf8'
+                tidy_options['output-encoding'] = 'utf8'
+            for key in tidy_options:
+                value = tidy_options[key]
+                key = key.replace('_', '-')
+                if value is None:
+                    value = ''
+                key, value = encode_key_value(key, value)
+                self._tidy.tidyOptParseValue(doc, key, value)
+                error = str(sink)
+                if error:
+                    raise ValueError("(tidylib) " + error)
+
+            self._tidy.tidyParseString(doc, text)
+            self._tidy.tidyCleanAndRepair(doc)
+
+            # Guess at buffer size; tidy returns ENOMEM if the buffer is too
+            # small and puts the required size into out_length
+            out_length = ctypes.c_int(8192)
+            out = ctypes.c_buffer(out_length.value)
+            while ENOMEM == self._tidy.tidySaveString(doc, out, 
ctypes.byref(out_length)):
+                out = ctypes.c_buffer(out_length.value)
+
+            document = out.value
+            if use_unicode:
+                document = document.decode('utf-8')
+            errors = str(sink)
+
+        return (document, errors)
+
+    def tidy_fragment(self, text, options=None):
+        """ Tidy a string with markup and return only the <body> contents.
+
+        HTML Tidy normally returns a full (X)HTML document; this function 
returns only
+        the contents of the <body> element and is meant to be used for 
snippets.
+        Calling tidy_fragment on elements that don't go in the <body>, like 
<title>,
+        will produce incorrect behavior.
+
+        Arguments and return value are the same as tidy_document. Note that 
HTML
+        Tidy will always complain about the lack of a doctype and <title> 
element
+        in fragments, and these errors are not stripped out for you. """
+        options = dict(options) if options else dict()
+        options["show-body-only"] = 1
+        document, errors = self.tidy_document(text, options)
+        document = document.strip()
+        return document, errors
+
+
+class PersistentTidy(Tidy):
+
+    """ Functions the same as the Tidy class but keeps a persistent reference
+    to one Tidy document object. This increases performance slightly when
+    tidying many documents in a row. It also persists all options (not just
+    the base options) between runs, which could lead to unexpected behavior.
+    If you plan to use different options on each run with PersistentTidy, set
+    all options that could change on every call. Note that passing in unicode
+    text will result in the input-encoding and output-encoding options being
+    automatically set. Thread-local storage is used for the document object
+    (one document per thread). """
+
+    def __init__(self, lib_names=None):
+        Tidy.__init__(self, lib_names)
+        self._local = threading.local()
+        self._local.doc = self._tidy.tidyCreate()
+
+    def __del__(self):
+        self._tidy.tidyRelease(self._local.doc)
+
+    @contextmanager
+    def _doc_and_sink(self):
+        " Create and cleanup an error sink but use the persistent doc object "
+        sink = create_sink()
+        self._tidy.tidySetErrorSink(self._local.doc, sink)
+        yield (self._local.doc, sink)
+        destroy_sink(sink)
+
+
+def tidy_document(text, options=None, keep_doc=False):
+    if keep_doc:
+        warnings.warn(KEEP_DOC_WARNING, DeprecationWarning, stacklevel=2)
+    return get_module_tidy().tidy_document(text, options)
+
+
+def tidy_fragment(text, options=None, keep_doc=False):
+    if keep_doc:
+        warnings.warn(KEEP_DOC_WARNING, DeprecationWarning, stacklevel=2)
+    return get_module_tidy().tidy_fragment(text, options)
+
+
+def get_module_tidy():
+    global _tidy
+    if '_tidy' not in globals():
+        _tidy = Tidy()
+    return _tidy
+
+
+def release_tidy_doc():
+    warnings.warn(KEEP_DOC_WARNING, DeprecationWarning, stacklevel=2)

commit python-pytidylib for openSUSE:Factory

Reply via email to