Re: Post request and encoding

2020-11-03 Thread Hernán De Angelis
I see. Should be "encoding". Thanks. /H. On 2020-11-03 19:30, Dieter Maurer wrote: Hernán De Angelis wrote at 2020-11-2 10:06 +0100: ... My request has the form: header = {'Content-type':'application/xml', 'charset':'utf-8'} Not your probl

Re: Post request and encoding

2020-11-03 Thread Dieter Maurer
Hernán De Angelis wrote at 2020-11-2 10:06 +0100: > ... >My request has the form: > >header = {'Content-type':'application/xml', 'charset':'utf-8'} Not your problem (which you have already resolved) but: `charset` is not an individual header but a parameter for the `Content-Type` header. For `xml`

Re: Post request and encoding

2020-11-02 Thread Grant Edwards
On 2020-11-02, Ethan Furman wrote: > On 11/2/20 9:32 AM, Karsten Hilbert wrote: > >> because .encode() does not operate in-place. > > Yeah, none of the string operations do, and it's embarrassing how > many times that still bites me. :-/ I've been writing Python for a little over 20 years. In an

Re: Post request and encoding

2020-11-02 Thread Variable Starlight
Thanks, I now learned the lesson. 👍 Den mån 2 nov. 2020 18:58Ethan Furman skrev: > On 11/2/20 9:32 AM, Karsten Hilbert wrote: > > > because .encode() does not operate in-place. > > Yeah, none of the string operations do, and it's embarrassing how many > times that still bites me. :-/ > > -- > ~

Re: Post request and encoding

2020-11-02 Thread Variable Starlight
No worries ☺ Den mån 2 nov. 2020 19:05Karsten Hilbert skrev: > On Mon, Nov 02, 2020 at 06:43:20PM +0100, Hernán De Angelis wrote: > > > I see, my mistake was (tacitly) assuming that encode() could work in > place. > > > > Now I see that it should work in a previous line as you wrote. > > > > Tha

Re: Post request and encoding

2020-11-02 Thread Karsten Hilbert
On Mon, Nov 02, 2020 at 06:43:20PM +0100, Hernán De Angelis wrote: > I see, my mistake was (tacitly) assuming that encode() could work in place. > > Now I see that it should work in a previous line as you wrote. > > Thank you! Sure, and excuse my perhaps slightly terse tone in that earlier mail .

Re: Post request and encoding

2020-11-02 Thread Ethan Furman
On 11/2/20 9:32 AM, Karsten Hilbert wrote: because .encode() does not operate in-place. Yeah, none of the string operations do, and it's embarrassing how many times that still bites me. :-/ -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list

Re: Post request and encoding

2020-11-02 Thread Hernán De Angelis
I see, my mistake was (tacitly) assuming that encode() could work in place. Now I see that it should work in a previous line as you wrote. Thank you! /H. On 2020-11-02 18:32, Karsten Hilbert wrote: On Mon, Nov 02, 2020 at 06:21:15PM +0100, Hernán De Angelis wrote: For the record: Just repl

Re: Post request and encoding

2020-11-02 Thread Karsten Hilbert
On Mon, Nov 02, 2020 at 06:21:15PM +0100, Hernán De Angelis wrote: For the record: > Just reply to myself and whoever might find this useful. > > encode() must be done within the request call: Nope (but it can, as you showed). > header = {'Content-type':'application/xml', 'charset':'UTF-8'} > r

Re: Post request and encoding

2020-11-02 Thread Chris Angelico
On Tue, Nov 3, 2020 at 4:22 AM Hernán De Angelis wrote: > > Just reply to myself and whoever might find this useful. > > encode() must be done within the request call: > > header = {'Content-type':'application/xml', 'charset':'UTF-8'} > response = requests.post(server, data=request.encode('utf-8')

Re: Post request and encoding

2020-11-02 Thread Hernán De Angelis
Just reply to myself and whoever might find this useful. encode() must be done within the request call: header = {'Content-type':'application/xml', 'charset':'UTF-8'} response = requests.post(server, data=request.encode('utf-8'), headers=header) not in a previous separate line as I did. Now

Post request and encoding

2020-11-02 Thread Hernán De Angelis
Hi everyone, I am writing a program that sends a post request to a server. The post request may include keywords with Swedish characters (åöä). I noticed that requests that include strings without those characters return a useful expected response. On the other hand, posts including those ch

Re: Another 2 to 3 mail encoding problem

2020-08-31 Thread Peter J. Holzer
On 2020-08-27 09:34:47 +0100, Chris Green wrote: > Peter J. Holzer wrote: > > The problem is that the message contains a '\ufeff' character (byte > > order mark) where email/generator.py expects only ASCII characters. > > > > I see two possible reasons for this: [...] > > Both reasons are weird.

Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread MRAB
On 2020-08-27 17:29, Barry Scott wrote: On 26 Aug 2020, at 16:10, Chris Green wrote: UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in position 4: ordinal not in range(128) So what do I need to do to the message I'm adding with mbx.add(msg) to fix this? (I assume that'

Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Barry Scott
> On 26 Aug 2020, at 16:10, Chris Green wrote: > > UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in > position 4: ordinal not in range(128) > > So what do I need to do to the message I'm adding with mbx.add(msg) to > fix this? (I assume that's what I need to do). >>> i

Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Barry
> On 27 Aug 2020, at 10:40, Chris Green wrote: > > Karsten Hilbert wrote: >>> Terry Reedy wrote: > On 8/26/2020 11:10 AM, Chris Green wrote: > >> I have a simple[ish] local mbox mail delivery module as follows:- > ... >> It has run faultlessly for many years under Python

Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Chris Green
ither as raw 8 > bit bytes on systems that are 8-bit clean, or for systems that are not, > they will need to be encoded either as base-64 or using quote-printable > encoding. These characters are to interpreted in the character set > defined (or presumed) in the header, or even some other

Aw: Re: Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Karsten Hilbert
> > > Because of this, the Python 3 str type is not suitable to store an email > > > message, since it insists on the string being Unicode encoded, > > > > I should greatly appreciate to be enlightened as to what > > a "string being Unicode encoded" is intended to say ? > > > > A Python 3 "str" or

Re: Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Chris Angelico
On Thu, Aug 27, 2020 at 11:10 PM Karsten Hilbert wrote: > > > Because of this, the Python 3 str type is not suitable to store an email > > message, since it insists on the string being Unicode encoded, > > I should greatly appreciate to be enlightened as to what > a "string being Unicode encoded"

Aw: Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Karsten Hilbert
> Because of this, the Python 3 str type is not suitable to store an email > message, since it insists on the string being Unicode encoded, I should greatly appreciate to be enlightened as to what a "string being Unicode encoded" is intended to say ? Thanks, Karsten -- https://mail.python.org/ma

Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Richard Damon
s that are not, they will need to be encoded either as base-64 or using quote-printable encoding. These characters are to interpreted in the character set defined (or presumed) in the header, or even some other binary object like and image or executable if the content type isn't text. Because

Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Chris Green
Karsten Hilbert wrote: > > Terry Reedy wrote: > > > On 8/26/2020 11:10 AM, Chris Green wrote: > > > > > > > I have a simple[ish] local mbox mail delivery module as follows:- > > > ... > > > > It has run faultlessly for many years under Python 2. I've now > > > > changed the calling program to Py

Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Cameron Simpson
I even get to see messages in other >scripts such as arabic, chinese, etc. See: https://docs.python.org/3/library/email.generator.html#module-email.generator While is conservatively writes ASCII (and email has extensive support for encoding other character sets into ASCII), you might profi

Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Peter Otten
Chris Green wrote: > To add a little to this, the problem is definitely when I receive a > message with UTF8 (or at least non-ascci) characters in it. My code > is basically very simple, the main program reads an E-Mail message > received from .forward on its standard input and makes it into an m

Aw: Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Karsten Hilbert
> Terry Reedy wrote: > > On 8/26/2020 11:10 AM, Chris Green wrote: > > > > > I have a simple[ish] local mbox mail delivery module as follows:- > > ... > > > It has run faultlessly for many years under Python 2. I've now > > > changed the calling program to Python 3 and while it handles most > > >

Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Chris Green
on-ascii >characters are QP or base64 encoded, and some higher layer uses 8bit >instead. > > * A mime-part is declared as charset=us-ascii but contains really >Unicode characters. > > Both reasons are weird. > > The first would be an unreasonable assumpti

Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Chris Green
Terry Reedy wrote: > On 8/26/2020 11:10 AM, Chris Green wrote: > > > I have a simple[ish] local mbox mail delivery module as follows:- > ... > > It has run faultlessly for many years under Python 2. I've now > > changed the calling program to Python 3 and while it handles most > > E-Mail OK I ha

Re: Another 2 to 3 mail encoding problem

2020-08-26 Thread Terry Reedy
On 8/26/2020 11:10 AM, Chris Green wrote: I have a simple[ish] local mbox mail delivery module as follows:- ... It has run faultlessly for many years under Python 2. I've now changed the calling program to Python 3 and while it handles most E-Mail OK I have just got the following error:-

Unsubscrip (Re: Another 2 to 3 mail encoding problem)

2020-08-26 Thread Terry Reedy
On 8/26/2020 11:27 AM, Alexa Oña wrote: Don’t send me more emails -- https://mail.python.org/mailman/listinfo/python-list Unsubscribe yourself by going to the indicated url. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list

Re: Another 2 to 3 mail encoding problem

2020-08-26 Thread Michael Torrie
On 8/26/20 9:27 AM, Alexa Oña wrote: > Don’t send me more emails > > https://mail.python.org/mailman/listinfo/python-list ^ Please unsubscribe from the mailing list. Click on the link above. Thank you. -- https://mail.python.org/mailman/listinfo/pyth

Re: Another 2 to 3 mail encoding problem

2020-08-26 Thread Peter J. Holzer
code character '\ufeff' in > position 4: ordinal not in range(128) The problem is that the message contains a '\ufeff' character (byte order mark) where email/generator.py expects only ASCII characters. I see two possible reasons for this: * The mbox writing code assumes that

Re: Another 2 to 3 mail encoding problem

2020-08-26 Thread Python
Alexa Oña wrote: Don’t send me more emails Obtener Outlook para iOS You are the one spamming the mailing list with unrelated posts. STOP. -- https://mail.python.org/mailman/listinfo/python-list

Re: Another 2 to 3 mail encoding problem

2020-08-26 Thread Chris Green
To add a little to this, the problem is definitely when I receive a message with UTF8 (or at least non-ascci) characters in it. My code is basically very simple, the main program reads an E-Mail message received from .forward on its standard input and makes it into an mbox message as follows:-

Re: Another 2 to 3 mail encoding problem

2020-08-26 Thread Alexa Oña
Don’t send me more emails Obtener Outlook para iOS<https://aka.ms/o0ukef> De: Python-list en nombre de Chris Green Enviado: Wednesday, August 26, 2020 5:10:35 PM Para: python-list@python.org Asunto: Another 2 to 3 mail encoding problem I'm unear

Another 2 to 3 mail encoding problem

2020-08-26 Thread Chris Green
I'm unearthing a few issues here trying to convert my mail filter and delivery programs from 2 to 3! I have a simple[ish] local mbox mail delivery module as follows:- import mailbox import logging import logging.handlers import os import time # # # Class derived

Re: Reversible malformed UTF-8 to malformed UTF-16 encoding

2019-03-19 Thread Florian Weimer
* MRAB: > On 2019-03-19 20:32, Florian Weimer wrote: >> I've seen occasional proposals like this one coming up: >> >> | I therefore suggested 1999-11-02 on the unic...@unicode.org mailing >> | list the following approach. Instead of using U+FFFD, simply encode >> | malformed UTF-8 sequences as ma

Re: Reversible malformed UTF-8 to malformed UTF-16 encoding

2019-03-19 Thread MRAB
emerge. <http://hyperreal.org/~est/utf-8b/releases/utf-8b-20060413043934/kuhn-utf-8b.html> Has this ever been implemented in any Python version? I seem to remember something like that, but all I could find was me talking about this in 2000. It's not entirely clear whether this is a good id

Reversible malformed UTF-8 to malformed UTF-16 encoding

2019-03-19 Thread Florian Weimer
8b/releases/utf-8b-20060413043934/kuhn-utf-8b.html> Has this ever been implemented in any Python version? I seem to remember something like that, but all I could find was me talking about this in 2000. It's not entirely clear whether this is a good idea as the default encoding for security reason

Re: Convert a list with wrong encoding to utf8

2019-02-15 Thread Piet van Oostrum
vergos.niko...@gmail.com writes: > Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 8:56:31 μ.μ. UTC+2, ο χρήστης MRAB έγραψε: > >> It doesn't have a 'b' prefix, so either it's Python 2 or it's a Unicode >> string that was decoded wrongly from the bytes. > > Yes it doesnt have the 'b' prefix so that hexadecimal

Re: Convert a list with wrong encoding to utf8

2019-02-15 Thread Gregory Ewing
ing fetced as utf8, right? No, I don't think so. As far as I can tell from a brief reading of the MySQL docs, that only sets the *connection* encoding, which is concerned with transferring data over the connection between the client and the server. It has no bearing on the encoding used to d

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread Gregory Ewing
se encoded as utf-8, but some of it is actually using a different encoding, and you're getting confused by the resulting inconsistencies. I suggest you look carefully at *all* the names in the list, straight after getting them from the database. If some of them look okay and some of them look like

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread Michael Torrie
latin1') step. >> >> encode() is something that turns text into bytes >> decode() is something that turns bytes into text >> >> So, if you already have bytes and you need text, you should only want to be >> doing a decode() and you just need to specific the co

RE: Convert a list with wrong encoding to utf8

2019-02-14 Thread David Raymond
tting the module deal with encodings, you gave it the raw UTF-8 encoding, and the module or db server said "let me encode that into the field or database defined default of latin-1 for you"... or something like that. -Original Message- From: Python-list [mailto:pyth

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread vergos . nikolas
I'm using Python3 and pymysql and already have charset presnt [python] con = pymysql.connect( db = 'clientele', user = 'vergos', passwd = '**', charset = 'utf8' ) cur = con.cursor() [/python] From that i understand that the names being fetched from the db to pyhton script are being fetced a

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread vergos . nikolas
Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 8:56:31 μ.μ. UTC+2, ο χρήστης MRAB έγραψε: > It doesn't have a 'b' prefix, so either it's Python 2 or it's a Unicode > string that was decoded wrongly from the bytes. Yes it doesnt have the 'b' prefix so that hexadecimal are representation of strings and not rep

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread Νίκος Βέργος
in1') step. > > > > > > encode() is something that turns text into bytes > > > decode() is something that turns bytes into text > > > > > > So, if you already have bytes and you need text, you should only want > to be > > > doing a

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread vergos . nikolas
; > decode() is something that turns bytes into text > > > > > > So, if you already have bytes and you need text, you should only want to > > > be > > > doing a decode() and you just need to specific the correct encoding. > > > > I Agree but I don't

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread Igor Korot
#x27;latin1') step. > > > > encode() is something that turns text into bytes > > decode() is something that turns bytes into text > > > > So, if you already have bytes and you need text, you should only want to be > > doing a decode() and you just need to specifi

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread vergos . nikolas
is something that turns bytes into text > > So, if you already have bytes and you need text, you should only want to be > doing a decode() and you just need to specific the correct encoding. I Agree but I don't know in what encoding the string is encoded into. I just tried names =

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread MRAB
ould only want to be doing a decode() and you just need to specific the correct encoding. It doesn't have a 'b' prefix, so either it's Python 2 or it's a Unicode string that was decoded wrongly from the bytes. On Thu, Feb 14, 2019 at 12:15 PM wrote: Τη Πέμπτη, 14 Φεβρουα

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread Calvin Spealman
you just need the decode('utf8') part and *not* the encode('latin1') step. encode() is something that turns text into bytes decode() is something that turns bytes into text So, if you already have bytes and you need text, you should only want to be doing a decode() and you j

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread vergos . nikolas
Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 6:45:29 μ.μ. UTC+2, ο χρήστης Calvin Spealman έγραψε: > You can only decode FROM the same encoding you've encoded TO. Any decoding > must know the input it receives follows the rules of its encoding scheme. > latin1 is not utf8. > > Howeve

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread Calvin Spealman
You can only decode FROM the same encoding you've encoded TO. Any decoding must know the input it receives follows the rules of its encoding scheme. latin1 is not utf8. However, in your case, you aren't seeing problem with the decoding. That step is never reached. It is failing to

Re: Convert a list with wrong encoding to utf8

2019-02-14 Thread Chris Angelico
On Fri, Feb 15, 2019 at 3:41 AM wrote: > > Hello, i have tried the following to chnage encoding to utf8 because for some > reason it has changed regarding list names > > [python] > #populate client listing into list >

Convert a list with wrong encoding to utf8

2019-02-14 Thread vergos . nikolas
Hello, i have tried the following to chnage encoding to utf8 because for some reason it has changed regarding list names [python] #populate client listing into list names.append( name ) names.append( '' ) names.sort() for nam

Re: bytes() or .encode() for encoding str's as bytes?

2018-08-31 Thread Cameron Simpson
) I always use the former. I wonder why that is. I guess the aesthetic rule is something along the lines: use a dot if you can. I also use the former. You could give any class an encode method; the latter works only for strings. Of course, an encode method accepting a Unicode encoding n

Re: bytes() or .encode() for encoding str's as bytes?

2018-08-31 Thread Marko Rauhamaa
Malcolm Greene : > Is there a benefit to using one of these techniques over the other? > Is one approach more Pythonic and preferred over the other for > style reasons? > message = message.encode('UTF-8') > message = bytes(message, 'UTF-8') I always use the former. I wonder why that is. I guess th

bytes() or .encode() for encoding str's as bytes?

2018-08-31 Thread Malcolm Greene
Is there a benefit to using one of these techniques over the other? Is one approach more Pythonic and preferred over the other for style reasons? message = message.encode('UTF-8') message = bytes(message, 'UTF-8') Thank you, Malcolm -- https://mail.python.org/mailman/listinfo/python-list

Re: String encoding in Py2.7

2018-05-29 Thread Chris Angelico
> Using Python 2.7 (will switch to Py3 soon but Before I'd like to >>>> understand how string encoding worked) >>> >>> Oh dear. This is probably the exact wrong way to go about it: the >>> interplay between string encoding, unicode and bytes is much

Re: String encoding in Py2.7

2018-05-29 Thread Steven D'Aprano
On Tue, 29 May 2018 09:19:52 +, Fabien LUCE wrote: > May 29 2018 11:12 AM, "Thomas Jollans" wrote: >> On 2018-05-29 09:55, f...@lutix.org wrote: >> >>> Hello, >>> Using Python 2.7 (will switch to Py3 soon but Before I'd like to >>> un

Re: String encoding in Py2.7

2018-05-29 Thread Chris Angelico
On Tue, May 29, 2018 at 5:55 PM, wrote: > Hello, > Using Python 2.7 (will switch to Py3 soon but Before I'd like to understand > how string encoding worked) > Could you please tell me is I understood well what occurs in Python's mind: > in a .py file: > if I wri

Re: String encoding in Py2.7

2018-05-29 Thread Fabien LUCE
May 29 2018 11:12 AM, "Thomas Jollans" wrote: > On 2018-05-29 09:55, f...@lutix.org wrote: > >> Hello, >> Using Python 2.7 (will switch to Py3 soon but Before I'd like to understand >> how string encoding >> worked) > > Oh dear. This

Re: String encoding in Py2.7

2018-05-29 Thread Thomas Jollans
On 2018-05-29 09:55, f...@lutix.org wrote: > Hello, > Using Python 2.7 (will switch to Py3 soon but Before I'd like to understand > how string encoding worked) Oh dear. This is probably the exact wrong way to go about it: the interplay between string encoding, unicode and byte

String encoding in Py2.7

2018-05-29 Thread ftg
Hello, Using Python 2.7 (will switch to Py3 soon but Before I'd like to understand how string encoding worked) Could you please tell me is I understood well what occurs in Python's mind: in a .py file: if I write s="héhéhé", if my file is declared as unicode coding, python wil

rot13 as I/O encoding (was Re: Python-list Digest, Vol 171, Issue 7)

2017-12-06 Thread Chris Angelico
formation. >>>> vzcbeg fgngvfgvpf >>>> vzcbeg enaqbz >>>> aEbyyf = 50 >>>> ebyyf = [enaqbz.enaqvag(1, 6) sbe wax va enatr(aEbyyf)] >>>> fgngvfgvpf.zrna(ebyyf) > 3.5 > Curiously, that doesn't seem to affect everything. >>

Re: JSON encoding PDF or Excel files in Python 2.7

2017-07-21 Thread Skip Montanaro
> JSON supports floats, ints, (Unicode) strings, lists and dicts (with string > keys). It doesn't support bytestrings (raw bytes). Thanks, MRAB and Irmen. It looks like bson does what I need. Skip -- https://mail.python.org/mailman/listinfo/python-list

Re: JSON encoding PDF or Excel files in Python 2.7

2017-07-21 Thread MRAB
On 2017-07-21 19:52, Skip Montanaro wrote: I would like to JSON encode some PDF and Excel files. I can read the content: pdf = open("somefile.pdf", "rb").read() but now what? json.dumps() insists on treating it as a string to be interpreted as utf-8, and bytes == str in Python 2.x. I can't jso

Re: JSON encoding PDF or Excel files in Python 2.7

2017-07-21 Thread Irmen de Jong
On 21/07/2017 20:52, Skip Montanaro wrote: > I would like to JSON encode some PDF and Excel files. I can read the content: > > pdf = open("somefile.pdf", "rb").read() > > but now what? json.dumps() insists on treating it as a string to be > interpreted as utf-8, and bytes == str in Python 2.x. I

JSON encoding PDF or Excel files in Python 2.7

2017-07-21 Thread Skip Montanaro
I would like to JSON encode some PDF and Excel files. I can read the content: pdf = open("somefile.pdf", "rb").read() but now what? json.dumps() insists on treating it as a string to be interpreted as utf-8, and bytes == str in Python 2.x. I can't json.dumps() a bytearray. I can pickle the raw c

Re: UTF-8 Encoding Error

2016-12-29 Thread subhabangalore
is is a BAD idea, and doing it by "reflex" without very careful thought is > just cargo-cult programming. You should not thoughtlessly change the > default encoding without knowing what you are doing -- and if you know what > you are doing, you won't change it at all. > > The Pyth

Re: UTF-8 Encoding Error

2016-12-29 Thread Steve D'Aprano
lex of mine, whenever I encounter Python 2 Unicode > errors: > > import sys > reload(sys) > sys.setdefaultencoding('utf8') This is a BAD idea, and doing it by "reflex" without very careful thought is just cargo-cult programming. You should not thoughtlessly chan

Re: UTF-8 Encoding Error

2016-12-29 Thread subhabangalore
On Friday, December 30, 2016 at 3:35:56 AM UTC+5:30, subhaba...@gmail.com wrote: > On Monday, December 26, 2016 at 3:37:37 AM UTC+5:30, Gonzalo V wrote: > > Try utf-8-sig > > El 25 dic. 2016 2:57 AM, "Grady Martin" <> escribió: > > > > > On 2016年12月22日 22時38分, wrote: > > > > > >> I am getting the

Re: UTF-8 Encoding Error

2016-12-29 Thread subhabangalore
On Monday, December 26, 2016 at 3:37:37 AM UTC+5:30, Gonzalo V wrote: > Try utf-8-sig > El 25 dic. 2016 2:57 AM, "Grady Martin" <> escribió: > > > On 2016年12月22日 22時38分, wrote: > > > >> I am getting the error: > >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: > >> inval

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Steve D'Aprano
On Wed, 28 Dec 2016 02:05 am, Skip Montanaro wrote: > I am trying to parse some XML which doesn't specify an encoding (Python > 2.7.12 via Anaconda on RH Linux), so it barfs when it encounters non-ASCII > data. No great surprise there, but I'm having trouble getting it to use

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Peter Otten
Peter Otten wrote: > works, but to go back to the bytes that the XML parser needs the > "preferred encoding", in your case ASCII, will be used. Correction: it's probably sys.getdefaultencoding() rather than locale.getdefaultencoding(). So all systems with a sane configura

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Peter Otten
.tree.ElementTree.parse(f).getroot() > > Thanks, that worked. Would appreciate an explanation of why binary > mode was necessary. It would seem that since the file contents are > text, just in a non-ASCII encoding, that specifying the encoding when > opening the file should do th

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Skip Montanaro
ssary. It would seem that since the file contents are text, just in a non-ASCII encoding, that specifying the encoding when opening the file should do the trick. Skip -- https://mail.python.org/mailman/listinfo/python-list

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Peter Otten
Skip Montanaro wrote: > I am trying to parse some XML which doesn't specify an encoding (Python > 2.7.12 via Anaconda on RH Linux), so it barfs when it encounters non-ASCII > data. No great surprise there, but I'm having trouble getting it to use > another encoding. First,

encoding="utf8" ignored when parsing XML

2016-12-27 Thread Skip Montanaro
I am trying to parse some XML which doesn't specify an encoding (Python 2.7.12 via Anaconda on RH Linux), so it barfs when it encounters non-ASCII data. No great surprise there, but I'm having trouble getting it to use another encoding. First, I tried specifying the encoding when o

Re: UTF-8 Encoding Error

2016-12-25 Thread Gonzalo V
Try utf-8-sig El 25 dic. 2016 2:57 AM, "Grady Martin" escribió: > On 2016年12月22日 22時38分, subhabangal...@gmail.com wrote: > >> I am getting the error: >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: >> invalid start byte >> > > The following is a reflex of mine, whenever

Re: UTF-8 Encoding Error

2016-12-24 Thread Grady Martin
On 2016年12月22日 22時38分, subhabangal...@gmail.com wrote: I am getting the error: UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: invalid start byte The following is a reflex of mine, whenever I encounter Python 2 Unicode errors: import sys reload(sys) sys.setdefaultencod

Re: UTF-8 Encoding Error

2016-12-22 Thread Cameron Simpson
nothing is of much help. It would help to see a very small program that produces your error message. Generally you need to open text files in the same encoding used for thei text. Which sounds obvious, but I'm presuming you've not done that. Normally, when you open a file you can spe

UTF-8 Encoding Error

2016-12-22 Thread subhabangalore
I am getting the error: UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: invalid start byte as I try to read some files through TaggedCorpusReader. TaggedCorpusReader is a module of NLTK. My files are saved in ANSI format in MS-Windows default. I am using Python2.7 on MS-

Re: filesystem encoding 'strict' on Windows

2016-09-30 Thread eryk sun
On Fri, Sep 30, 2016 at 5:58 AM, iMath wrote: > the doc of os.fsencode(filename) says Encode filename to the filesystem > encoding 'strict' > on Windows, what does 'strict' mean ? "strict" is the error handler for the encoding. It raises a UnicodeE

filesystem encoding 'strict' on Windows

2016-09-29 Thread iMath
the doc of os.fsencode(filename) says Encode filename to the filesystem encoding 'strict' on Windows, what does 'strict' mean ? -- https://mail.python.org/mailman/listinfo/python-list

Python grammar extension via encoding (pyxl style) questions

2016-09-29 Thread Pavel Velikhov
Hi everybody! We’re building an experimental extension to Python, we’re extending Python’s comprehensions into a full-scale query language. And we’d love to use the trick that was done in pyxl, where a special encoding of the file will trigger the preprocessor to run and compile our

Re: print() function with encoding= and errors= parameters?

2016-08-03 Thread Malcolm Greene
> You could use win_unicode_console enabled in sitecustomize or usercustomize. > https://pypi.python.org/pypi/win_unicode_console The pypi link you shared has an excellent summary of the issues associated when working Unicode from the Windows terminal. Thank you Eryk. Malcolm -- https://mail.py

Re: print() function with encoding= and errors= parameters?

2016-08-03 Thread eryk sun
On Wed, Aug 3, 2016 at 3:35 PM, Malcolm Greene wrote: > But under Windows, the stdout is workstation specific and *never* UTF-8. So > the > occasional non-ASCII string trips up our diagnostic output when tested under > Windows. You could use win_unicode_console enabled in sitecustomize or usercu

Re: print() function with encoding= and errors= parameters?

2016-08-03 Thread Malcolm Greene
Chris, > Don't forget that the print function can simply be shadowed. I did forget! Another excellent option. Thank you! Malcolm -- https://mail.python.org/mailman/listinfo/python-list

Re: print() function with encoding= and errors= parameters?

2016-08-03 Thread Chris Angelico
On Thu, Aug 4, 2016 at 1:35 AM, Malcolm Greene wrote: > My use case was diagnostic output being (temporarily) output to stdout > via debug related print statements. The output is for debugging only and > not meant for production. I was looking for a solution that would allow > me to output to the

Re: print() function with encoding= and errors= parameters?

2016-08-03 Thread Malcolm Greene
as few changes to the original scripts as possible, eg. non-invasive except for the print statements themselves. When debugging under Linux/OSX, standard print statements work fine because their stdouts' encoding is UTF-8. But under Windows, the stdout is workstation specific and *never* UTF-

Re: print() function with encoding= and errors= parameters?

2016-08-03 Thread Random832
On Wed, Aug 3, 2016, at 11:09, Peter Otten wrote: > I'm unsure about this myself -- wouldn't it be better to detach the > underlying raw stream? Like Well, "better" depends on your point of view. > The ValueError raised if you try to write to the original stdout > looks like a feature to me. Ma

Re: print() function with encoding= and errors= parameters?

2016-08-03 Thread Peter Otten
Random832 wrote: > On Wed, Aug 3, 2016, at 08:29, Malcolm Greene wrote: >> Looking for a way to use the Python 3 print() function with encoding and >> errors parameters. >> Are there any concerns with closing and re-opening sys.stdout so >> sys.stdout has a specific e

Re: print() function with encoding= and errors= parameters?

2016-08-03 Thread Random832
On Wed, Aug 3, 2016, at 08:29, Malcolm Greene wrote: > Looking for a way to use the Python 3 print() function with encoding and > errors parameters. > > Are there any concerns with closing and re-opening sys.stdout so > sys.stdout has a specific encoding and errors behavior? W

print() function with encoding= and errors= parameters?

2016-08-03 Thread Malcolm Greene
Looking for a way to use the Python 3 print() function with encoding and errors parameters. Are there any concerns with closing and re-opening sys.stdout so sys.stdout has a specific encoding and errors behavior? Would this break other standard libraries that depend on sys.stdout being configured

Re: how to handle surrogate encoding: read from fs write to database

2016-06-12 Thread Marko Rauhamaa
Random832 : > On Sun, Jun 12, 2016, at 12:50, Steven D'Aprano wrote: >> I think Windows also gets it almost write: NTFS uses UTF-16, and (I >> think) only allow valid Unicode file names. > > Nope. Windows allows any sequence of 16-bit units (except for a dozen or > so ASCII characters) in filename

Re: how to handle surrogate encoding: read from fs write to database

2016-06-12 Thread Random832
On Sun, Jun 12, 2016, at 12:50, Steven D'Aprano wrote: > I think Windows also gets it almost write: NTFS uses UTF-16, and (I > think) only allow valid Unicode file names. Nope. Windows allows any sequence of 16-bit units (except for a dozen or so ASCII characters) in filenames. Of course, you're

Re: how to handle surrogate encoding: read from fs write to database

2016-06-12 Thread Steven D'Aprano
epair the file name yourself, by deleting or replacing the surrogates. > But that's not nice. Once I need to compare filename with some string I'll > have to convert strings to bytes. That's not hard. 'my filename.txt'.encode('utf-8') # or whatever encodi

how to handle surrogate encoding: read from fs write to database

2016-06-12 Thread Peter Volkov
Hi, everybody. What is a best practice to deal with filenames in python3? The problem is that os.walk(src_dir), os.listdir(src_dir), ... return "surrogate" strings as filenames. It is impossible to assume that they are normal strings that could be print()'ed on unicode terminal or saved as as stri

encoding issue help

2016-04-25 Thread 李洋
hi: i want to decompress the string "\x1F\x8B\x08\x00\x00\x00\x00\x00\x00\x00UP]k\xC3 \x14\xFD+\xC3\xE7\xCD\xA8\xF9X\xE2\xEBX\xA1\x0CF\x1F\xBA\xEE%\x10\xAC\xB1\xAD\xC4h\x88f%\x8C\xFD\xF7]\x1B\xDA\xAD\xF8\xE29\xE7z\xEE9~#\xE7\x11G\xAF\xBB\x1C=\x22\xDFv_j\x04H+@\xBAW\x1A\xEEe\x91>SF\x18+i\x9Ef\x0

  1   2   3   4   5   6   7   8   9   10   >