On 2022-05-07 19:47, Stefan Ram wrote:
Marco Sulla <marco.sulla.pyt...@gmail.com> writes:
Well, ok, but I need a generic method to get LF and CR for any
encoding an user can input.

   "LF" and "CR" come from US-ASCII. It is theoretically
   possible that there might be some encodings out there
   (not for Unicode) that are not based on US-ASCII and
   have no LF or no CR.

is good for any encoding? Furthermore, is there a way to get the
encoding of an opened file object?

   I have written a function that might be able to detect one
   of few encodings based on a heuristic algorithm.

def encoding( name ):
     path = pathlib.Path( name )
     for encoding in( "utf_8", "latin_1", "cp1252" ):
         try:
             with path.open( encoding=encoding, errors="strict" )as file:
                 text = file.read()
             return encoding
         except UnicodeDecodeError:
             pass
     return "ascii"

   Yes, it's potentially slow and might be wrong.
   The result "ascii" might mean it's a binary file.

"latin-1" will decode any sequence of bytes, so it'll never try "cp1252", nor fall back to "ascii", and falling back to "ascii" is wrong anyway because the file could contain 0x80..0xFF, which aren't supported by that encoding.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to