Your test program works in artful with python 3.6 for me; I guess something got updated to fix it but am not going to dig into why unless you really want me to...
** Changed in: python3.5 (Ubuntu) Status: New => Fix Released -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to file in Ubuntu. https://bugs.launchpad.net/bugs/1677244 Title: "UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 69: surrogates not allowed" with mime.file() on path from os.walk Status in file package in Ubuntu: New Status in python3.5 package in Ubuntu: Fix Released Bug description: The following script works fine on 16.04 LTS: #!/usr/bin/python3 import magic import os dir = "/usr/share/ca-certificates/mozilla" mime = magic.open(magic.MAGIC_MIME) mime.load() for root, dirnames, filenames in os.walk(dir): for f in filenames: fn = os.path.join(root, f) print("%s: %s" % (fn, mime.file(fn))) Eg: $ python3 /tmp/test.py /usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt: text/plain; charset=us-ascii ... (notice the last filename before the ellipsis) But on 17.04, this happens: $ python3 /tmp/test.py /usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii Traceback (most recent call last): File "/home/ubuntu/test.py", line 15, in <module> print("%s: %s" % (fn, mime.file(fn))) File "/usr/lib/python3/dist-packages/magic.py", line 130, in file bi = bytes(filename, 'utf-8') UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 69: surrogates not allowed I'm guessing this is a change in python3 that python3-magic hasn't accounted for, but I'm not sure. Adding python3 task just in case. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/file/+bug/1677244/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp