Hello community,
here is the log from the commit of package python-pytidylib for
openSUSE:Factory checked in at 2019-05-16 22:06:57
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-pytidylib (Old)
and /work/SRC/openSUSE:Factory/.python-pytidylib.new.5148 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-pytidylib"
Thu May 16 22:06:57 2019 rev:2 rq:702989 version:0.3.2
Changes:
--------
--- /work/SRC/openSUSE:Factory/python-pytidylib/python-pytidylib.changes
2016-09-13 22:23:40.000000000 +0200
+++
/work/SRC/openSUSE:Factory/.python-pytidylib.new.5148/python-pytidylib.changes
2019-05-16 22:06:59.526452945 +0200
@@ -1,0 +2,8 @@
+Tue May 14 20:22:54 UTC 2019 - John Jolly <[email protected]>
+
+- Updated to 0.3.2
+ + No upstream changelog
+- Added %check section
+ + Excluded the large file check
+
+-------------------------------------------------------------------
Old:
----
pytidylib-0.2.4.tar.gz
New:
----
pytidylib-0.3.2.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-pytidylib.spec ++++++
--- /var/tmp/diff_new_pack.wiketI/_old 2019-05-16 22:07:01.430451061 +0200
+++ /var/tmp/diff_new_pack.wiketI/_new 2019-05-16 22:07:01.442451049 +0200
@@ -1,7 +1,7 @@
#
# spec file for package python-pytidylib
#
-# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany.
+# Copyright (c) 2019 SUSE LINUX GmbH, Nuernberg, Germany.
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -12,27 +12,25 @@
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.
-# Please submit bugfixes or comments via http://bugs.opensuse.org/
+# Please submit bugfixes or comments via https://bugs.opensuse.org/
#
-%if 0%{?suse_version} && 0%{?suse_version} <= 1110
-%{!?python_sitelib: %global python_sitelib %(python -c "from
distutils.sysconfig import get_python_lib; print get_python_lib()")}
-%else
-BuildArch: noarch
-%endif
+%{?!python_module:%define python_module() python-%{**} python3-%{**}}
Name: python-pytidylib
-Version: 0.2.4
+Version: 0.3.2
Release: 0
Summary: Python wrapper for HTML Tidy (tidylib) on Python 2 and 3
License: MIT
Group: Development/Languages/Python
-Url: http://countergram.com/open-source/pytidylib/
-Source:
https://pypi.python.org/packages/b4/a0/b70cf2b7b4ee1f9d8fa0f1b4abbbac081a2638a580dabf29b8fb554d5fc1/pytidylib-%{version}.tar.gz
+URL: http://countergram.com/open-source/pytidylib/
+Source:
https://files.pythonhosted.org/packages/source/p/pytidylib/pytidylib-%{version}.tar.gz
+BuildRequires: %{python_module pytest}
+BuildRequires: %{python_module setuptools}
+BuildRequires: fdupes
BuildRequires: libtidy-devel
-BuildRequires: python-devel
-Requires: libtidy5
-BuildRoot: %{_tmppath}/%{name}-%{version}-build
+BuildArch: noarch
+%python_subpackages
%description
`PyTidyLib`_ is a Python package that wraps the `HTML Tidy`_ library. This
@@ -62,14 +60,20 @@
%setup -q -n pytidylib-%{version}
%build
-python setup.py build
+%python_build
%install
-python setup.py install --prefix=%{_prefix} --root=%{buildroot}
+%python_install
+%python_expand %fdupes %{buildroot}%{$python_sitelib}
-%files
-%defattr(-,root,root,-)
-%doc LICENSE README
+%check
+# The large document test is excluded as it produces inconsistent
+# results across architectures.
+%pytest -k 'not test_large_document'
+
+%files %{python_files}
+%doc README
+%license LICENSE
%{python_sitelib}/*
%changelog
++++++ pytidylib-0.2.4.tar.gz -> pytidylib-0.3.2.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/pytidylib-0.2.4/PKG-INFO new/pytidylib-0.3.2/PKG-INFO
--- old/pytidylib-0.2.4/PKG-INFO 2014-12-20 05:22:41.000000000 +0100
+++ new/pytidylib-0.3.2/PKG-INFO 2016-11-16 02:52:52.000000000 +0100
@@ -1,6 +1,6 @@
Metadata-Version: 1.1
Name: pytidylib
-Version: 0.2.4
+Version: 0.3.2
Summary: Python wrapper for HTML Tidy (tidylib) on Python 2 and 3
Home-page: http://countergram.com/open-source/pytidylib/
Author: Jason Stitt
@@ -18,12 +18,22 @@
* Indent the output, including proper (i.e. no) indenting for ``pre``
elements,
which some (X)HTML indenting code overlooks.
- Version usage
- =============
+ Changes
+ =======
- * Windows: 0.2.0 and later
- * Python 3: Tests pass on 0.2.3
- * tidylib itself is not actively updated and may have problems with
newer HTML
+ * 0.3.2: Initialization bug fix
+
+ * 0.3.1: find_library support while still allowing a list of library
names
+
+ * 0.3.0: Refactored to use Tidy and PersistentTidy classes while
keeping the
+ functional interface (which will lazily create a global Tidy() object)
for
+ backward compatibility. You can now pass a list of library names and
base
+ options when instantiating Tidy. The keep_doc argument is now
deprecated
+ and does nothing; use PersistentTidy.
+
+ * 0.2.4: Bugfix for a strange memory allocation corner case in Tidy.
+
+ * 0.2.3: Python 3 support (2 + 3 cross compatible) with passing Tox
tests.
Small example of use
====================
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/pytidylib-0.2.4/README new/pytidylib-0.3.2/README
--- old/pytidylib-0.2.4/README 2014-12-11 19:31:37.000000000 +0100
+++ new/pytidylib-0.3.2/README 2016-09-22 23:42:09.000000000 +0200
@@ -1,14 +1,10 @@
-For documentation, see docs/html/index.html in this distribution, or
-http://countergram.com/open-source/pytidylib/
+This is a Python wrapper around the HTML Tidy library. Quick start example:
-Small example of use:
+from tidylib import Tidy
+tidy = Tidy()
+document, errors = tidy.tidy_document('<p>fõo <img src="bar.jpg">',
+ options={'alt-text': 'baz'})
+print(document)
+print(errors)
-from tidylib import tidy_document
-document, errors = tidy_document('''<p>fõo <img src="bar.jpg">''',
- options={'numeric-entities':1})
-print document
-print errors
-
-NOTE: HTML Tidy itself has currently not been updated for a long time, and may
-not be, and it may have trouble with newer HTML. This is just a thin Python
-wrapper around HTML Tidy, which is a separate project.
+For full documentation, see the docs/ directory.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/pytidylib-0.2.4/setup.py new/pytidylib-0.3.2/setup.py
--- old/pytidylib-0.2.4/setup.py 2014-12-20 05:19:09.000000000 +0100
+++ new/pytidylib-0.3.2/setup.py 2016-11-16 02:52:32.000000000 +0100
@@ -1,4 +1,4 @@
-# Copyright 2009 Jason Stitt
+# Copyright 2009-2015 Jason Stitt
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -33,12 +33,22 @@
* Indent the output, including proper (i.e. no) indenting for ``pre`` elements,
which some (X)HTML indenting code overlooks.
-Version usage
-=============
+Changes
+=======
-* Windows: 0.2.0 and later
-* Python 3: Tests pass on 0.2.3
-* tidylib itself is not actively updated and may have problems with newer HTML
+* 0.3.2: Initialization bug fix
+
+* 0.3.1: find_library support while still allowing a list of library names
+
+* 0.3.0: Refactored to use Tidy and PersistentTidy classes while keeping the
+functional interface (which will lazily create a global Tidy() object) for
+backward compatibility. You can now pass a list of library names and base
+options when instantiating Tidy. The keep_doc argument is now deprecated
+and does nothing; use PersistentTidy.
+
+* 0.2.4: Bugfix for a strange memory allocation corner case in Tidy.
+
+* 0.2.3: Python 3 support (2 + 3 cross compatible) with passing Tox tests.
Small example of use
====================
@@ -61,7 +71,7 @@
.. _`PyTidyLib`: http://countergram.com/open-source/pytidylib/
"""
-VERSION = "0.2.4"
+VERSION = "0.3.2"
setup(
name="pytidylib",
@@ -73,16 +83,15 @@
url="http://countergram.com/open-source/pytidylib/",
packages=['tidylib'],
classifiers=[
- 'Development Status :: 5 - Production/Stable',
- 'Environment :: Other Environment',
- 'Intended Audience :: Developers',
- 'License :: OSI Approved :: MIT License',
- 'Programming Language :: Python',
- 'Programming Language :: Python :: 3',
- 'Natural Language :: English',
- 'Topic :: Utilities',
- 'Topic :: Text Processing :: Markup :: HTML',
- 'Topic :: Text Processing :: Markup :: XML',
- ],
- )
-
+ 'Development Status :: 5 - Production/Stable',
+ 'Environment :: Other Environment',
+ 'Intended Audience :: Developers',
+ 'License :: OSI Approved :: MIT License',
+ 'Programming Language :: Python',
+ 'Programming Language :: Python :: 3',
+ 'Natural Language :: English',
+ 'Topic :: Utilities',
+ 'Topic :: Text Processing :: Markup :: HTML',
+ 'Topic :: Text Processing :: Markup :: XML',
+ ],
+)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/pytidylib-0.2.4/tests/test_docs.py
new/pytidylib-0.3.2/tests/test_docs.py
--- old/pytidylib-0.2.4/tests/test_docs.py 2014-12-20 05:05:51.000000000
+0100
+++ new/pytidylib-0.3.2/tests/test_docs.py 2016-09-29 04:23:12.000000000
+0200
@@ -22,7 +22,7 @@
from __future__ import unicode_literals
import unittest
-from tidylib import tidy_document, release_tidy_doc, thread_local_doc
+from tidylib import Tidy, PersistentTidy, tidy_document
DOC = u'''<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
@@ -48,20 +48,20 @@
def test_alt_added_to_img(self):
h = "<img src='foo'>"
- expected = DOC % '''<img src='foo' alt="">'''
- doc, err = tidy_document(h)
+ expected = DOC % '''<img src='foo' alt="bar">'''
+ doc, err = tidy_document(h, {'alt-text': 'bar'})
self.assertEqual(doc, expected)
def test_entity_preserved_using_bytes(self):
h = b"é"
expected = (DOC % "é").encode('utf-8')
- doc, err = tidy_document(h)
+ doc, err = tidy_document(h, {'preserve-entities': 1})
self.assertEqual(doc, expected)
def test_numeric_entities_using_bytes(self):
h = b"é"
expected = (DOC % "é").encode('utf-8')
- doc, err = tidy_document(h, {'numeric-entities': 1})
+ doc, err = tidy_document(h, {'numeric-entities': 1, 'output-encoding':
'ascii'})
self.assertEqual(doc, expected)
def test_non_ascii_preserved(self):
@@ -76,6 +76,28 @@
doc, err = tidy_document(h)
self.assertEqual(doc, expected)
+ def test_can_use_two_tidy_instances(self):
+ t1 = Tidy()
+ t2 = Tidy()
+ self.assertEqual(t1.tidy_document(DOC % 'a')[0], DOC % 'a')
+ self.assertEqual(t2.tidy_document(DOC % 'b')[0], DOC % 'b')
+
+ def test_tidy_doesnt_persist_options(self):
+ tidy = Tidy()
+ # This option makes it a fragment
+ doc, err = tidy.tidy_document(DOC % 'a', {'show-body-only': 1})
+ self.assertEqual(doc, 'a\n')
+ doc, err = tidy.tidy_document(DOC % 'a')
+ self.assertEqual(doc, DOC % 'a')
+
+ def test_persistent_tidy_does_persist_options(self):
+ tidy = PersistentTidy()
+ # This option makes it a fragment
+ doc, err = tidy.tidy_document(DOC % 'a', {'show-body-only': 1})
+ self.assertEqual(doc, 'a\n')
+ doc, err = tidy.tidy_document(DOC % 'a')
+ self.assertEqual(doc, 'a\n')
+
def test_xmlns_large_document_xml_corner_case(self):
# Test for a super weird edge case in Tidy that can cause it to return
# the wrong required buffer size.
@@ -84,16 +106,6 @@
doc, err = tidy_document(html, {'output-xml': 1})
self.assertEqual(doc.strip()[-7:], "</html>")
- def test_keep_document(self):
- h = "hello"
- expected = DOC % h
- for i in range(4):
- doc, err = tidy_document(h, keep_doc=True)
- self.assertEqual(doc, expected)
- assert hasattr(thread_local_doc, 'doc')
- release_tidy_doc()
- assert not hasattr(thread_local_doc, 'doc')
-
if __name__ == '__main__':
unittest.main()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/pytidylib-0.2.4/tests/test_fragments.py
new/pytidylib-0.3.2/tests/test_fragments.py
--- old/pytidylib-0.2.4/tests/test_fragments.py 2014-12-11 19:31:37.000000000
+0100
+++ new/pytidylib-0.3.2/tests/test_fragments.py 2016-09-29 04:23:12.000000000
+0200
@@ -36,20 +36,20 @@
def test_alt_added_to_img(self):
h = "<img src='foo'>"
- expected = '''<img src='foo' alt="">'''
- doc, err = tidy_fragment(h)
+ expected = '''<img src='foo' alt="bar">'''
+ doc, err = tidy_fragment(h, {'alt-text': 'bar'})
self.assertEqual(doc, expected)
def test_entity_preserved_using_bytes(self):
h = b"é"
expected = b"é"
- doc, err = tidy_fragment(h)
+ doc, err = tidy_fragment(h, {'preserve-entities': 1})
self.assertEqual(doc, expected)
def test_numeric_entities_using_bytes(self):
h = b"é"
expected = b"é"
- doc, err = tidy_fragment(h, {'numeric-entities': 1})
+ doc, err = tidy_fragment(h, {'numeric-entities': 1, 'output-encoding':
'ascii'})
self.assertEqual(doc, expected)
def test_non_ascii_preserved(self):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/pytidylib-0.2.4/tests/test_init.py
new/pytidylib-0.3.2/tests/test_init.py
--- old/pytidylib-0.2.4/tests/test_init.py 1970-01-01 01:00:00.000000000
+0100
+++ new/pytidylib-0.3.2/tests/test_init.py 2016-11-16 02:47:54.000000000
+0100
@@ -0,0 +1,32 @@
+# -*- coding: utf-8 -*-
+# Copyright 2009-2016 Jason Stitt
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+from __future__ import unicode_literals
+
+import unittest
+from tidylib import Tidy, PersistentTidy, tidy_document
+
+
+class TestDocs1(unittest.TestCase):
+
+ def test_not_find_lib(self):
+ with self.assertRaises(OSError):
+ tidy = Tidy(lib_names=[])
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/pytidylib-0.2.4/tests/threadsafety.py
new/pytidylib-0.3.2/tests/threadsafety.py
--- old/pytidylib-0.2.4/tests/threadsafety.py 2014-12-11 19:31:37.000000000
+0100
+++ new/pytidylib-0.3.2/tests/threadsafety.py 2016-09-22 23:42:09.000000000
+0200
@@ -24,9 +24,8 @@
error_queue = Queue()
-DOC = '''<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
- "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml">
+DOC = '''<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
+<html>
<head>
<title></title>
</head>
@@ -63,5 +62,5 @@
if __name__ == '__main__':
run_test()
if not error_queue.empty():
- print "About %s errors out of %s" % (error_queue.qsize(), NUM_THREADS
* NUM_TRIES)
- print error_queue.get()
+ print("About %s errors out of %s" % (error_queue.qsize(), NUM_THREADS
* NUM_TRIES))
+ print(error_queue.get())
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/pytidylib-0.2.4/tidylib/__init__.py
new/pytidylib-0.3.2/tidylib/__init__.py
--- old/pytidylib-0.2.4/tidylib/__init__.py 2014-12-20 05:21:13.000000000
+0100
+++ new/pytidylib-0.3.2/tidylib/__init__.py 2016-09-22 23:42:09.000000000
+0200
@@ -1,203 +1 @@
-# Copyright 2009-2014 Jason Stitt
-#
-# Permission is hereby granted, free of charge, to any person obtaining a copy
-# of this software and associated documentation files (the "Software"), to deal
-# in the Software without restriction, including without limitation the rights
-# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-# copies of the Software, and to permit persons to whom the Software is
-# furnished to do so, subject to the following conditions:
-#
-# The above copyright notice and this permission notice shall be included in
-# all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-# THE SOFTWARE.
-
-import ctypes
-import threading
-import platform
-from tidylib.sink import create_sink, destroy_sink
-
-__all__ = ['tidy_document', 'tidy_fragment', 'release_tidy_doc']
-
-# -------------------------------------------------------------------------- #
-# Constants
-
-LIB_NAMES = ['libtidy', 'libtidy.so', 'libtidy-0.99.so.0', 'cygtidy-0-99-0',
- 'tidylib', 'libtidy.dylib', 'tidy']
-ENOMEM = -12
-BASE_OPTIONS = {
- "indent": 1, # Pretty; not too much of a performance hit
- "tidy-mark": 0, # No tidy meta tag in output
- "wrap": 0, # No wrapping
- "alt-text": "", # Help ensure validation
- "doctype": 'strict', # Little sense in transitional for tool-generated
markup...
- "force-output": 1, # May not get what you expect but you will get
something
-}
-
-# Note: These are meant as sensible defaults. If you don't like these being
-# applied by default, just set tidylib.BASE_OPTIONS = {} after importing.
-# You can of course override any of these options when you call the
-# tidy_document() or tidy_fragment() function
-
-# -------------------------------------------------------------------------- #
-# Globals
-
-tidy = None
-thread_local_doc = threading.local()
-
-# Fix for Windows b/c tidy uses stdcall on Windows
-if "Windows" == platform.system():
- load_library = ctypes.windll.LoadLibrary
-else:
- load_library = ctypes.cdll.LoadLibrary
-
-for name in LIB_NAMES:
- try:
- tidy = load_library(name)
- break
- except OSError:
- pass
-
-if tidy is None:
- raise OSError("Could not load libtidy using any of these names: %s" %
(",".join(LIB_NAMES)))
-
-tidy.tidyCreate.restype = ctypes.POINTER(ctypes.c_void_p) # Fix for 64-bit
systems
-
-# -------------------------------------------------------------------------- #
-# 3.x/2.x cross-compatibility
-
-try:
- unicode # 2.x
-
- def is_unicode(obj):
- return isinstance(obj, unicode)
-
- def encode_key_value(k, v):
- return unicode(k).encode('utf-8'), unicode(v).encode('utf-8')
-except NameError:
- # 3.x
- def is_unicode(obj):
- return isinstance(obj, str)
-
- def encode_key_value(k, v):
- return str(k).encode('utf-8'), str(v).encode('utf-8')
-
-# -------------------------------------------------------------------------- #
-# Functions
-
-
-def tidy_document(text, options=None, keep_doc=False):
- """ Run a string with markup through HTML Tidy; return the corrected one.
-
- text: The markup, which may be anything from an empty string to a complete
- (X)HTML document. If you pass in a unicode type (py3 str, py2 unicode) you
- get one back out, and tidy will have some options set that may affect
- behavior (e.g. named entities converted to plain unicode characters). If
- you pass in a bytes type (py3 bytes, py2 str) you will get one of those
- back.
-
- options (dict): Options passed directly to HTML Tidy; see the HTML Tidy
docs
- (http://tidy.sourceforge.net/docs/quickref.html) or run tidy -help-config
- from the command line.
-
- keep_doc (boolean): If True, store 1 document object per thread and re-use
- it, for a slight performance boost especially when tidying very large
numbers
- of very short documents.
-
- returns (str, str): The tidied markup and unparsed warning/error messages.
- Warnings and errors are returned just as tidylib returns them.
- """
- global tidy, option_names
-
- # Unicode approach is to encode as string, then decode libtidy output
- use_unicode = False
- if is_unicode(text):
- use_unicode = True
- text = text.encode('utf-8')
-
- # Manage thread-local storage of persistent document object
- if keep_doc:
- if not hasattr(thread_local_doc, 'doc'):
- thread_local_doc.doc = tidy.tidyCreate()
- doc = thread_local_doc.doc
- else:
- doc = tidy.tidyCreate()
-
- # This is where error messages are sent by libtidy
- sink = create_sink()
- tidy.tidySetErrorSink(doc, sink)
-
- try:
- # Set options on the document
- # If keep_doc=True, options will persist between calls, but they can
- # be overridden, and the BASE_OPTIONS will be set each time
- tidy_options = dict(BASE_OPTIONS)
- if options:
- tidy_options.update(options)
- if use_unicode:
- tidy_options['input-encoding'] = 'utf8'
- tidy_options['output-encoding'] = 'utf8'
- for key in tidy_options:
- value = tidy_options[key]
- key = key.replace('_', '-')
- if value is None:
- value = ''
- key, value = encode_key_value(key, value)
- tidy.tidyOptParseValue(doc, key, value)
- error = str(sink)
- if error:
- raise ValueError("(tidylib) " + error)
-
- # The point of the whole thing
- tidy.tidyParseString(doc, text)
- tidy.tidyCleanAndRepair(doc)
-
- # Guess at buffer size; tidy returns ENOMEM if the buffer is too
- # small and puts the required size into out_length
- out_length = ctypes.c_int(8192)
- out = ctypes.c_buffer(out_length.value)
- while ENOMEM == tidy.tidySaveString(doc, out,
ctypes.byref(out_length)):
- out = ctypes.c_buffer(out_length.value)
-
- document = out.value
- if use_unicode:
- document = document.decode('utf-8')
- errors = str(sink)
- finally:
- destroy_sink(sink)
- if not keep_doc:
- tidy.tidyRelease(doc)
-
- return (document, errors)
-
-
-def tidy_fragment(text, options=None, keep_doc=False):
- """ Tidy a string with markup and return only the <body> contents.
-
- HTML Tidy normally returns a full (X)HTML document; this function returns
only
- the contents of the <body> element and is meant to be used for snippets.
- Calling tidy_fragment on elements that don't go in the <body>, like
<title>,
- will produce incorrect behavior.
-
- Arguments and return value are the same as tidy_document. Note that HTML
- Tidy will always complain about the lack of a doctype and <title> element
- in fragments, and these errors are not stripped out for you. """
- options = dict(options) if options else dict()
- options["show-body-only"] = 1
- document, errors = tidy_document(text, options, keep_doc)
- document = document.strip()
- return document, errors
-
-
-def release_tidy_doc():
- """ Release the stored document object in the current thread. Only useful
- if you have called tidy_document or tidy_fragament with keep_doc=True. """
- if hasattr(thread_local_doc, 'doc'):
- tidy.tidyRelease(thread_local_doc.doc)
- del thread_local_doc.doc
+from .tidy import Tidy, PersistentTidy, tidy_document, tidy_fragment,
release_tidy_doc
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/pytidylib-0.2.4/tidylib/tidy.py
new/pytidylib-0.3.2/tidylib/tidy.py
--- old/pytidylib-0.2.4/tidylib/tidy.py 1970-01-01 01:00:00.000000000 +0100
+++ new/pytidylib-0.3.2/tidylib/tidy.py 2016-11-16 02:49:58.000000000 +0100
@@ -0,0 +1,239 @@
+# Copyright 2009-2015 Jason Stitt
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in
+# all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+# THE SOFTWARE.
+
+import ctypes
+import ctypes.util
+import threading
+import platform
+import warnings
+from contextlib import contextmanager
+from .sink import create_sink, destroy_sink
+
+__all__ = ['Tidy', 'PersistentTidy']
+
+# Default search order for library names if nothing is passed in
+LIB_NAMES = ['libtidy', 'libtidy.so', 'libtidy-0.99.so.0', 'cygtidy-0-99-0',
+ 'tidylib', 'libtidy.dylib', 'tidy']
+
+# Error code from library
+ENOMEM = -12
+
+# Default options; can be overriden with argument to Tidy()
+BASE_OPTIONS = {
+ "indent": 1, # Pretty; not too much of a performance hit
+ "tidy-mark": 0, # No tidy meta tag in output
+ "wrap": 0, # No wrapping
+ "alt-text": "", # Help ensure validation
+ "doctype": 'strict', # Little sense in transitional for tool-generated
markup...
+ "force-output": 1, # May not get what you expect but you will get
something
+}
+
+KEEP_DOC_WARNING = "keep_doc and release_tidy_doc are no longer used. Create a
PersistentTidy object instead."
+
+# Fix for Windows b/c tidy uses stdcall on Windows
+if "Windows" == platform.system():
+ load_library = ctypes.windll.LoadLibrary
+else:
+ load_library = ctypes.cdll.LoadLibrary
+
+# -------------------------------------------------------------------------- #
+# 3.x/2.x cross-compatibility
+
+try:
+ unicode # 2.x
+
+ def is_unicode(obj):
+ return isinstance(obj, unicode)
+
+ def encode_key_value(k, v):
+ return unicode(k).encode('utf-8'), unicode(v).encode('utf-8')
+except NameError:
+ # 3.x
+ def is_unicode(obj):
+ return isinstance(obj, str)
+
+ def encode_key_value(k, v):
+ return str(k).encode('utf-8'), str(v).encode('utf-8')
+
+# -------------------------------------------------------------------------- #
+# The main python interface
+
+
+class Tidy(object):
+
+ """ Wrapper around the HTML Tidy library for cleaning up possibly invalid
+ HTML and XHTML. """
+
+ def __init__(self, lib_names=None):
+ self._tidy = None
+ if lib_names is None:
+ lib_names = ctypes.util.find_library('tidy') or LIB_NAMES
+ if isinstance(lib_names, str):
+ lib_names = [lib_names]
+ for name in lib_names:
+ try:
+ self._tidy = load_library(name)
+ break
+ except OSError:
+ continue
+ if self._tidy is None:
+ raise OSError(
+ "Could not load libtidy using any of these names: "
+ + ",".join(lib_names))
+ self._tidy.tidyCreate.restype = ctypes.POINTER(ctypes.c_void_p) # Fix
for 64-bit systems
+
+ @contextmanager
+ def _doc_and_sink(self):
+ " Create and cleanup a Tidy document and error sink "
+ doc = self._tidy.tidyCreate()
+ sink = create_sink()
+ self._tidy.tidySetErrorSink(doc, sink)
+ yield (doc, sink)
+ destroy_sink(sink)
+ self._tidy.tidyRelease(doc)
+
+ def tidy_document(self, text, options=None):
+ """ Run a string with markup through HTML Tidy; return the corrected
one
+ and any error output.
+
+ text: The markup, which may be anything from an empty string to a
complete
+ (X)HTML document. If you pass in a unicode type (py3 str, py2 unicode)
you
+ get one back out, and tidy will have some options set that may affect
+ behavior (e.g. named entities converted to plain unicode characters).
If
+ you pass in a bytes type (py3 bytes, py2 str) you will get one of those
+ back.
+
+ options (dict): Options passed directly to HTML Tidy; see the HTML
Tidy docs
+ (http://tidy.sourceforge.net/docs/quickref.html) or run tidy
-help-config
+ from the command line.
+
+ returns (str, str): The tidied markup and unparsed warning/error
messages.
+ Warnings and errors are returned just as tidylib returns them.
+ """
+
+ # Unicode approach is to encode as string, then decode libtidy output
+ use_unicode = False
+ if is_unicode(text):
+ use_unicode = True
+ text = text.encode('utf-8')
+
+ with self._doc_and_sink() as (doc, sink):
+ tidy_options = dict(BASE_OPTIONS)
+ if options:
+ tidy_options.update(options)
+ if use_unicode:
+ tidy_options['input-encoding'] = 'utf8'
+ tidy_options['output-encoding'] = 'utf8'
+ for key in tidy_options:
+ value = tidy_options[key]
+ key = key.replace('_', '-')
+ if value is None:
+ value = ''
+ key, value = encode_key_value(key, value)
+ self._tidy.tidyOptParseValue(doc, key, value)
+ error = str(sink)
+ if error:
+ raise ValueError("(tidylib) " + error)
+
+ self._tidy.tidyParseString(doc, text)
+ self._tidy.tidyCleanAndRepair(doc)
+
+ # Guess at buffer size; tidy returns ENOMEM if the buffer is too
+ # small and puts the required size into out_length
+ out_length = ctypes.c_int(8192)
+ out = ctypes.c_buffer(out_length.value)
+ while ENOMEM == self._tidy.tidySaveString(doc, out,
ctypes.byref(out_length)):
+ out = ctypes.c_buffer(out_length.value)
+
+ document = out.value
+ if use_unicode:
+ document = document.decode('utf-8')
+ errors = str(sink)
+
+ return (document, errors)
+
+ def tidy_fragment(self, text, options=None):
+ """ Tidy a string with markup and return only the <body> contents.
+
+ HTML Tidy normally returns a full (X)HTML document; this function
returns only
+ the contents of the <body> element and is meant to be used for
snippets.
+ Calling tidy_fragment on elements that don't go in the <body>, like
<title>,
+ will produce incorrect behavior.
+
+ Arguments and return value are the same as tidy_document. Note that
HTML
+ Tidy will always complain about the lack of a doctype and <title>
element
+ in fragments, and these errors are not stripped out for you. """
+ options = dict(options) if options else dict()
+ options["show-body-only"] = 1
+ document, errors = self.tidy_document(text, options)
+ document = document.strip()
+ return document, errors
+
+
+class PersistentTidy(Tidy):
+
+ """ Functions the same as the Tidy class but keeps a persistent reference
+ to one Tidy document object. This increases performance slightly when
+ tidying many documents in a row. It also persists all options (not just
+ the base options) between runs, which could lead to unexpected behavior.
+ If you plan to use different options on each run with PersistentTidy, set
+ all options that could change on every call. Note that passing in unicode
+ text will result in the input-encoding and output-encoding options being
+ automatically set. Thread-local storage is used for the document object
+ (one document per thread). """
+
+ def __init__(self, lib_names=None):
+ Tidy.__init__(self, lib_names)
+ self._local = threading.local()
+ self._local.doc = self._tidy.tidyCreate()
+
+ def __del__(self):
+ self._tidy.tidyRelease(self._local.doc)
+
+ @contextmanager
+ def _doc_and_sink(self):
+ " Create and cleanup an error sink but use the persistent doc object "
+ sink = create_sink()
+ self._tidy.tidySetErrorSink(self._local.doc, sink)
+ yield (self._local.doc, sink)
+ destroy_sink(sink)
+
+
+def tidy_document(text, options=None, keep_doc=False):
+ if keep_doc:
+ warnings.warn(KEEP_DOC_WARNING, DeprecationWarning, stacklevel=2)
+ return get_module_tidy().tidy_document(text, options)
+
+
+def tidy_fragment(text, options=None, keep_doc=False):
+ if keep_doc:
+ warnings.warn(KEEP_DOC_WARNING, DeprecationWarning, stacklevel=2)
+ return get_module_tidy().tidy_fragment(text, options)
+
+
+def get_module_tidy():
+ global _tidy
+ if '_tidy' not in globals():
+ _tidy = Tidy()
+ return _tidy
+
+
+def release_tidy_doc():
+ warnings.warn(KEEP_DOC_WARNING, DeprecationWarning, stacklevel=2)