added both the python library libmat2 and a command line tool called mat2 to remove metadata from various files.
https://0xacab.org/jvoisin/mat2 tests are disabled because the tarball in https://pypi.org/project/mat2/ doesn't include the test documents. the test documents are, however, present in https://0xacab.org/jvoisin/mat2 so cloning that repository separately and running the test yields the attached test-results.txt file. Looks like it fails on some video files which i'll look into, but it mostly works at least on my own personal files! this library can be a building block for apps that use mat2 like https://gitlab.com/rmnvgr/metadata-cleaner as well. the library also requires a couple runtime libraries to be installed, and they can be checked by running the --check-dependencies command. $ mat2 --check-dependencies Dependencies for mat2 0.13.4: - Cairo: yes - Exiftool: yes (optional) - Ffmpeg: yes (optional) - GLib from PyGobject: yes - GdkPixbuf from PyGobject: yes - Mutagen: yes - Poppler from PyGobject: yes - PyGobject: yes please test! works on my files on current/amd64. OK? -- jagtalon.net weirder.earth/@jag
py3-mat2.tar.gz
Description: application/gzip
jag@big ~/D/mat2 (master)> coverage run --branch -m unittest discover -s tests/
...E.....FF..FF..........EEERROR:root:Something went wrong during the
processing of ./tests/data/clean.avi: Command '['/usr/local/bin/ffmpeg', '-i',
'./tests/data/clean.avi', '-y', '-map', '0', '-codec', 'copy', '-loglevel',
'panic', '-hide_banner', '-map_metadata', '-1', '-map_chapters', '-1',
'-disposition', '0', '-fflags', '+bitexact', '-flags:v', '+bitexact',
'-flags:a', '+bitexact', './tests/data/clean.cleaned.avi']' returned non-zero
exit status 1.
.ERROR:root:Something went wrong during the processing of
./tests/data/--output.avi: Command '['/usr/local/bin/ffmpeg', '-i',
'./tests/data/--output.avi', '-y', '-map', '0', '-codec', 'copy', '-loglevel',
'panic', '-hide_banner', '-map_metadata', '-1', '-map_chapters', '-1',
'-disposition', '0', '-fflags', '+bitexact', '-flags:v', '+bitexact',
'-flags:a', '+bitexact', './tests/data/--output.cleaned.avi']' returned
non-zero exit status 1.
...ERROR:root:Unable to parse /tmp/tmp5je1k6bq/OEBPS/content.opf in
./tests/data/clean.epub.
WARNING:root:Something went wrong during deep cleaning of OEBPS/content.opf in
./tests/data/clean.epub
..........FWARNING:root:Not a valid bencoded string: 137
WARNING:root:Not a valid bencoded string: 137
WARNING:root:Not a valid bencoded string:
WARNING:root:Not a valid bencoded string:
WARNING:root:Not a valid bencoded string:
WARNING:root:Invalid bencoded value (data after valid prefix)
..F............................[+] Testing pdf
[+] Testing png
[+] Testing jpg
[+] Testing wav
[+] Testing aiff
[+] Testing mp3
[+] Testing ogg
[+] Testing flac
[+] Testing docx
[+] Testing odt
[+] Testing tiff
Warning: [minor] Can't delete IFD0 from TIFF - ./tests/data/clean.tiff
[+] Testing bmp
[+] Testing torrent
[+] Testing odf
[+] Testing odg
[+] Testing txt
[+] Testing gif
[+] Testing css
[+] Testing svg
[+] Testing ppm
[+] Testing avi
[+] Testing mp4
WARNING:root:The format of "./tests/data/clean.mp4" (video/mp4) has some
mandatory metadata fields; mat2 filled them with standard data.
WARNING:root:The format of "./tests/data/clean.cleaned.mp4" (video/mp4) has
some mandatory metadata fields; mat2 filled them with standard data.
[+] Testing wmv
WARNING:root:The format of "./tests/data/clean.wmv" (video/x-ms-wmv) has some
mandatory metadata fields; mat2 filled them with standard data.
WARNING:root:The format of "./tests/data/clean.cleaned.wmv" (video/x-ms-wmv)
has some mandatory metadata fields; mat2 filled them with standard data.
[+] Testing heic
Warning: ICC_Profile deleted. Image colors may be affected -
./tests/data/clean.heic
Warning: ICC_Profile deleted. Image colors may be affected -
./tests/data/clean.cleaned.heic
...EEEEEWARNING:root:./tests/data/clean.pptx contains invalid cNvPr: {1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 20, 22, 24}
................E....FE..........ERROR:root:In file ./tests/data/clean.docx,
element word/media/setup.py's format (text/x-python) isn't supported
.ERROR:root:In file ./tests/data/clean.odt, element Pictures/setup.py's format
(text/x-python) isn't supported
.....Warning: [minor] Can't delete IFD0 from TIFF - ./tests/data/clean.tiff
..WARNING:root:In file ./tests/data/clean.docx, keeping unknown element
word/media/setup.py (format: text/x-python)
.WARNING:root:In file ./tests/data/clean.docx, omitting unknown element
word/media/setup.py (format: text/x-python)
..
======================================================================
ERROR: test_different (test_climat2.TestCommandLineParallel.test_different)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_climat2.py", line 269, in
test_different
shutil.copytree(src, dst)
File "/usr/local/lib/python3.11/shutil.py", line 573, in copytree
return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/shutil.py", line 471, in _copytree
os.makedirs(dst, exist_ok=dirs_exist_ok)
File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: './tests/data/parallel'
======================================================================
ERROR: test_docx (test_corrupted_files.TestCorruptedEmbedded.test_docx)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_corrupted_files.py", line 69, in
test_docx
parser.remove_all()
^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'remove_all'
======================================================================
ERROR: test_odt (test_corrupted_files.TestCorruptedEmbedded.test_odt)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_corrupted_files.py", line 77, in
test_odt
self.assertFalse(parser.remove_all())
^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'remove_all'
======================================================================
ERROR: test_tar (test_libmat2.TestCleaningArchives.test_tar)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 679, in test_tar
self.assertEqual(meta['./tests/data/dirty.docx']['word/media/image1.png']['Comment'],
'This is a comment, be careful!')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'word/media/image1.png'
======================================================================
ERROR: test_tarbz2 (test_libmat2.TestCleaningArchives.test_tarbz2)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 749, in
test_tarbz2
self.assertEqual(meta['./tests/data/dirty.docx']['word/media/image1.png']['Comment'],
'This is a comment, be careful!')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'word/media/image1.png'
======================================================================
ERROR: test_targz (test_libmat2.TestCleaningArchives.test_targz)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 714, in test_targz
self.assertEqual(meta['./tests/data/dirty.docx']['word/media/image1.png']['Comment'],
'This is a comment, be careful!')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'word/media/image1.png'
======================================================================
ERROR: test_tarxz (test_libmat2.TestCleaningArchives.test_tarxz)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 784, in test_tarxz
self.assertEqual(meta['./tests/data/dirty.docx']['word/media/image1.png']['Comment'],
'This is a comment, be careful!')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'word/media/image1.png'
======================================================================
ERROR: test_zip (test_libmat2.TestCleaningArchives.test_zip)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 649, in test_zip
self.assertEqual(meta['tests/data/dirty.docx']['word/media/image1.png']['Comment'],
'This is a comment, be careful!')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'word/media/image1.png'
======================================================================
ERROR: test_tar (test_libmat2.TestGetMeta.test_tar)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 241, in test_tar
self.assertEqual(meta['./tests/data/dirty.flac']['comments'], 'Thank you
for using MAT !')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'comments'
======================================================================
ERROR: test_zip (test_libmat2.TestGetMeta.test_zip)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 189, in test_zip
self.assertEqual(meta['tests/data/dirty.flac']['comments'], 'Thank you for
using MAT !')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'comments'
======================================================================
FAIL: test_docx (test_climat2.TestGetMeta.test_docx)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_climat2.py", line 203, in test_docx
self.assertIn(b'Application: LibreOffice/5.4.5.1$Linux_X86_64', stdout)
AssertionError: b'Application: LibreOffice/5.4.5.1$Linux_X86_64' not found in
b"[-] ./tests/data/dirty.docx's format (None) is not supported\n"
======================================================================
FAIL: test_flac (test_climat2.TestGetMeta.test_flac)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_climat2.py", line 226, in test_flac
self.assertIn(b'comments: Thank you for using MAT !', stdout)
AssertionError: b'comments: Thank you for using MAT !' not found in b"[-]
./tests/data/dirty.flac's format (None) is not supported\n"
======================================================================
FAIL: test_odt (test_climat2.TestGetMeta.test_odt)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_climat2.py", line 211, in test_odt
self.assertIn(b'generator: LibreOffice/3.3$Unix', stdout)
AssertionError: b'generator: LibreOffice/3.3$Unix' not found in b"[-]
./tests/data/dirty.odt's format (None) is not supported\n"
======================================================================
FAIL: test_ogg (test_climat2.TestGetMeta.test_ogg)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_climat2.py", line 234, in test_ogg
self.assertIn(b'comments: Thank you for using MAT !', stdout)
AssertionError: b'comments: Thank you for using MAT !' not found in b"[-]
./tests/data/dirty.ogg's format (None) is not supported\n"
======================================================================
FAIL: test_tar (test_corrupted_files.TestCorruptedFiles.test_tar)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_corrupted_files.py", line 320, in
test_tar
with self.assertRaises(ValueError):
AssertionError: ValueError not raised
======================================================================
FAIL: test_zip (test_corrupted_files.TestCorruptedFiles.test_zip)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_corrupted_files.py", line 242, in
test_zip
with self.assertRaises(ValueError):
AssertionError: ValueError not raised
======================================================================
FAIL: test_wmv (test_libmat2.TestGetMeta.test_wmv)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 206, in test_wmv
self.assertEqual(mimetype, 'video/x-ms-wmv')
AssertionError: None != 'video/x-ms-wmv'
----------------------------------------------------------------------
Ran 125 tests in 97.346s
FAILED (failures=7, errors=10)
