Re: convert string to bytes without changing data (encoding)

2012-08-30 Thread Nobody
On Wed, 29 Aug 2012 19:39:15 -0400, Piet van Oostrum wrote: Reading from stdin/a file gets you bytes, and not a string, because Python cannot automagically guess what format the input is in. Huh? Oh, it can certainly guess (in the absence of any other information, it uses the current

Re: convert string to bytes without changing data (encoding)

2012-08-29 Thread Piet van Oostrum
Ross Ridge rri...@csclub.uwaterloo.ca writes: But it is in fact only stored in one particular way, as a series of bytes. No, it can be stored in different ways. Certainly in Python 3.3 and beyond. And in 3.2 also, depending on wide/narrow build. -- Piet van Oostrum p...@vanoostrum.org WWW:

Re: convert string to bytes without changing data (encoding)

2012-08-29 Thread Piet van Oostrum
Heiko Wundram modeln...@modelnine.org writes: Reading from stdin/a file gets you bytes, and not a string, because Python cannot automagically guess what format the input is in. Huh? Python 3.3.0rc1 (v3.3.0rc1:8bb5c7bc46ba, Aug 25 2012, 10:09:29) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

Re: convert string to bytes without changing data (encoding)

2012-03-30 Thread Michael Ströder
Steven D'Aprano wrote: On Thu, 29 Mar 2012 17:36:34 +, Prasad, Ramit wrote: Technically, ASCII goes up to 256 but they are not A-z letters. Technically, ASCII is 7-bit, so it goes up to 127. No, ASCII only defines 0-127. Values =128 are not ASCII. From

Re: convert string to bytes without changing data (encoding)

2012-03-30 Thread Serhiy Storchaka
28.03.12 21:13, Heiko Wundram написав(ла): Reading from stdin/a file gets you bytes, and not a string, because Python cannot automagically guess what format the input is in. In Python3 reading from stdin gets you string. Use sys.stdin.buffer.raw for access to byte stream. And reading from

Re: convert string to bytes without changing data (encoding)

2012-03-30 Thread Chris Angelico
On Sat, Mar 31, 2012 at 6:06 AM, Serhiy Storchaka storch...@gmail.com wrote: 28.03.12 21:13, Heiko Wundram написав(ла): Reading from stdin/a file gets you bytes, and not a string, because Python cannot automagically guess what format the input is in. In Python3 reading from stdin gets you

Re: convert string to bytes without changing data (encoding)

2012-03-29 Thread Mark Lawrence
On 29/03/2012 04:58, Ross Ridge wrote: Chris Angelicoros...@gmail.com wrote: Actually, he is justified. It's one thing to work in C or assembly and write code that depends on certain bit-pattern representations of data (although even that causes trouble - assuming that

Re: convert string to bytes without changing data (encoding)

2012-03-29 Thread Steven D'Aprano
On Wed, 28 Mar 2012 23:58:53 -0400, Ross Ridge wrote: How does that in anyway justify Evan Driscoll maliciously lying about code he's never seen? You are perfectly justified to complain about Evan making sweeping generalisations about your code when he has not seen it; you are NOT justified

Re: convert string to bytes without changing data (encoding)

2012-03-29 Thread Peter Daum
On 2012-03-28 23:37, Terry Reedy wrote: 2. Decode as if the text were latin-1 and ignore the non-ascii 'latin-1' chars. When done, encode back to 'latin-1' and the non-ascii chars will be as they originally were. ... actually, in the beginning of my quest, I ran into an decoding exception

Re: convert string to bytes without changing data (encoding)

2012-03-29 Thread Ross Ridge
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Your reaction is to make an equally unjustified estimate of Evan's mindset, namely that he is not just wrong about you, but *deliberately and maliciously* lying about you in the full knowledge that he is wrong. No, Evan in his own

Re: Re: Re: Re: convert string to bytes without changing data (encoding)

2012-03-29 Thread Evan Driscoll
On 01/-10/-28163 01:59 PM, Ross Ridge wrote: Evan Driscolldrisc...@cs.wisc.edu wrote: People like you -- who write to assumptions which are not even remotely guaranteed by the spec -- are part of the reason software sucks. ... This email is a bit harsher than it deserves -- but I feel not by

Re: convert string to bytes without changing data (encoding)

2012-03-29 Thread Terry Reedy
On 3/29/2012 11:30 AM, Ross Ridge wrote: No, Evan in his own words admitted that his post was ment to be harsh, I agree that he should have restrained and censored his writing. Just because I refuse to drink the it's impossible to represent strings as a series of bytes kool-aid I do not

RE: convert string to bytes without changing data (encoding)

2012-03-29 Thread Prasad, Ramit
: Wednesday, March 28, 2012 2:50 PM To: python-list@python.org Subject: Re: convert string to bytes without changing data (encoding) On 28/03/2012 20:02, Prasad, Ramit wrote: The right way to convert bytes to strings, and vice versa, is via encoding and decoding operations. If you want

Re: convert string to bytes without changing data (encoding)

2012-03-29 Thread Ross Ridge
Ross Ridge wrote: Just because I refuse to drink the it's impossible to represent strings as a series of bytes kool-aid Terry Reedy tjre...@udel.edu wrote: I do not believe *anyone* has made that claim. Is this meant to be a wild exaggeration? As wild as Evan's? Sorry, it would've been more

Re: convert string to bytes without changing data (encoding)

2012-03-29 Thread Chris Angelico
On Fri, Mar 30, 2012 at 5:00 AM, Ross Ridge rri...@csclub.uwaterloo.ca wrote: Sorry, it would've been more accurate to label the flavour of kool-aid Chris Angelico was trying to push as it's impossible ... without encoding:        What is a string? It's not a series of bytes. You can't

Re: convert string to bytes without changing data (encoding)

2012-03-29 Thread Steven D'Aprano
On Thu, 29 Mar 2012 17:36:34 +, Prasad, Ramit wrote: Technically, ASCII goes up to 256 but they are not A-z letters. Technically, ASCII is 7-bit, so it goes up to 127. No, ASCII only defines 0-127. Values =128 are not ASCII. From https://en.wikipedia.org/wiki/ASCII: ASCII

Re: convert string to bytes without changing data (encoding)

2012-03-29 Thread Steven D'Aprano
On Thu, 29 Mar 2012 11:30:19 -0400, Ross Ridge wrote: Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Your reaction is to make an equally unjustified estimate of Evan's mindset, namely that he is not just wrong about you, but *deliberately and maliciously* lying about you in the

Re: Re: Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Ross Ridge
Evan Driscoll drisc...@cs.wisc.edu wrote: People like you -- who write to assumptions which are not even remotely guaranteed by the spec -- are part of the reason software sucks. ... This email is a bit harsher than it deserves -- but I feel not by much. I don't see how you could feel the least

Re: Re: Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Chris Angelico
On Thu, Mar 29, 2012 at 2:04 PM, Ross Ridge rri...@csclub.uwaterloo.ca wrote: Evan Driscoll  drisc...@cs.wisc.edu wrote: People like you -- who write to assumptions which are not even remotely guaranteed by the spec -- are part of the reason software sucks. ... This email is a bit harsher than

Re: Re: Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Ross Ridge
Chris Angelico ros...@gmail.com wrote: Actually, he is justified. It's one thing to work in C or assembly and write code that depends on certain bit-pattern representations of data (although even that causes trouble - assuming that sizeof(int)=3D=3Dsizeof(int*) isn't good for portability), but in

convert string to bytes without changing data (encoding)

2012-03-28 Thread Peter Daum
Hi, is there any way to convert a string to bytes without interpreting the data in any way? Something like: s='abcde' b=bytes(s, unchanged) Regards, Peter -- http://mail.python.org/mailman/listinfo/python-list

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Chris Angelico
On Wed, Mar 28, 2012 at 7:56 PM, Peter Daum ga...@cs.tu-berlin.de wrote: Hi, is there any way to convert a string to bytes without interpreting the data in any way? Something like: s='abcde' b=bytes(s, unchanged) What is a string? It's not a series of bytes. You can't convert it without

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Stefan Behnel
Peter Daum, 28.03.2012 10:56: is there any way to convert a string to bytes without interpreting the data in any way? Something like: s='abcde' b=bytes(s, unchanged) If you can tell us what you actually want to achieve, i.e. why you want to do this, we may be able to tell you how to do what

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Peter Daum
On 2012-03-28 11:02, Chris Angelico wrote: On Wed, Mar 28, 2012 at 7:56 PM, Peter Daum ga...@cs.tu-berlin.de wrote: is there any way to convert a string to bytes without interpreting the data in any way? Something like: s='abcde' b=bytes(s, unchanged) What is a string? It's not a series

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Heiko Wundram
Am 28.03.2012 11:43, schrieb Peter Daum: ... in my example, the variable s points to a string, i.e. a series of bytes, (0x61,0x62 ...) interpreted as ascii/unicode characters. No; a string contains a series of codepoints from the unicode plane, representing natural language characters (at

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Stefan Behnel
Peter Daum, 28.03.2012 11:43: What I am looking for is a general way to just copy the raw data from a string object to a byte object without any attempt to decode or encode anything ... That's why I asked about your use case - where does the data come from and why is it contained in a

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Ross Ridge
Chris Angelico ros...@gmail.com wrote: What is a string? It's not a series of bytes. Of course it is. Conceptually you're not supposed to think of it that way, but a string is stored in memory as a series of bytes. What he's asking for many not be very useful or practical, but if that's your

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Chris Angelico
On Thu, Mar 29, 2012 at 2:36 AM, Ross Ridge rri...@csclub.uwaterloo.ca wrote: Chris Angelico  ros...@gmail.com wrote: What is a string? It's not a series of bytes. Of course it is.  Conceptually you're not supposed to think of it that way, but a string is stored in memory as a series of bytes.

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Grant Edwards
On 2012-03-28, Chris Angelico ros...@gmail.com wrote: for all you know, it might actually be stored as a sequence of apples in a refrigerator [...] There's no logical Python way to turn that into a series of bytes. There's got to be a joke there somewhere about how to eat an apple... --

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Dave Angel
On 03/28/2012 04:56 AM, Peter Daum wrote: Hi, is there any way to convert a string to bytes without interpreting the data in any way? Something like: s='abcde' b=bytes(s, unchanged) Regards, Peter You needed to specify that you are using Python 3.x . In

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Peter Daum
On 2012-03-28 12:42, Heiko Wundram wrote: Am 28.03.2012 11:43, schrieb Peter Daum: ... in my example, the variable s points to a string, i.e. a series of bytes, (0x61,0x62 ...) interpreted as ascii/unicode characters. No; a string contains a series of codepoints from the unicode plane,

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Steven D'Aprano
On Wed, 28 Mar 2012 11:36:10 -0400, Ross Ridge wrote: Chris Angelico ros...@gmail.com wrote: What is a string? It's not a series of bytes. Of course it is. Conceptually you're not supposed to think of it that way, but a string is stored in memory as a series of bytes. You don't know that.

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Heiko Wundram
Am 28.03.2012 19:43, schrieb Peter Daum: As it seems, this would be far easier with python 2.x. With python 3 and its strict distinction between str and bytes, things gets syntactically pretty awkward and error-prone (something as innocently looking like s=s+'/' hidden in a rarely reached branch

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Ross Ridge
Ross Ridge rri...@csclub.uwaterloo.ca wr= Of course it is. =A0Conceptually you're not supposed to think of it that way, but a string is stored in memory as a series of bytes. Chris Angelico ros...@gmail.com wrote: Note that distinction. I said that a string is not a series of bytes; you say

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Steven D'Aprano
On Wed, 28 Mar 2012 11:43:52 +0200, Peter Daum wrote: ... in my example, the variable s points to a string, i.e. a series of bytes, (0x61,0x62 ...) interpreted as ascii/unicode characters. No. Strings are not sequences of bytes (except in the trivial sense that everything in computer memory

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Jussi Piitulainen
Peter Daum writes: ... I was under the illusion, that python (like e.g. perl) stored strings internally in utf-8. In this case the conversion would simple mean to re-label the data. Unfortunately, as I meanwhile found out, this is not the case (nor the apple encoding ;-), so it would indeed

RE: convert string to bytes without changing data (encoding)

2012-03-28 Thread Prasad, Ramit
As it seems, this would be far easier with python 2.x. With python 3 and its strict distinction between str and bytes, things gets syntactically pretty awkward and error-prone (something as innocently looking like s=s+'/' hidden in a rarely reached branch and a seemingly correct program will

RE: convert string to bytes without changing data (encoding)

2012-03-28 Thread Prasad, Ramit
You can read as bytes and decode as ASCII but ignoring the troublesome non-text characters: print(open('text.txt', 'br').read().decode('ascii', 'ignore')) Das fr ASCII nicht benutzte Bit kann auch fr Fehlerkorrekturzwecke (Parittsbit) auf den Kommunikationsleitungen oder fr andere

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Ian Kelly
On Wed, Mar 28, 2012 at 11:43 AM, Peter Daum ga...@cs.tu-berlin.de wrote: ... I was under the illusion, that python (like e.g. perl) stored strings internally in utf-8. In this case the conversion would simple mean to re-label the data. Unfortunately, as I meanwhile found out, this is not the

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Terry Reedy
On 3/28/2012 11:36 AM, Ross Ridge wrote: Chris Angelicoros...@gmail.com wrote: What is a string? It's not a series of bytes. Of course it is. Conceptually you're not supposed to think of it that way, but a string is stored in memory as a series of bytes. *If* it is stored in byte

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Steven D'Aprano
On Wed, 28 Mar 2012 19:43:36 +0200, Peter Daum wrote: The longer story of my question is: I am new to python (obviously), and since I am not familiar with either one, I thought it would be advisory to go for python 3.x. The biggest problem that I am facing is, that I am often dealing with

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Ross Ridge
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: The right way to convert bytes to strings, and vice versa, is via encoding and decoding operations. If you want to dictate to the original poster the correct way to do things then you don't need to do anything more that. You don't

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Ethan Furman
Peter Daum wrote: On 2012-03-28 12:42, Heiko Wundram wrote: Am 28.03.2012 11:43, schrieb Peter Daum: ... in my example, the variable s points to a string, i.e. a series of bytes, (0x61,0x62 ...) interpreted as ascii/unicode characters. No; a string contains a series of codepoints from the

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Tim Chase
On 03/28/12 13:05, Ross Ridge wrote: Ross Ridgerri...@csclub.uwaterloo.ca wr= But a Python Unicode string might be stored in several ways; for all you know, it might actually be stored as a sequence of apples in a refrigerator, just as long as they can be referenced correctly. But it is in

Re: Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Evan Driscoll
On 01/-10/-28163 01:59 PM, Ross Ridge wrote: Steven D'Apranosteve+comp.lang.pyt...@pearwood.info wrote: The right way to convert bytes to strings, and vice versa, is via encoding and decoding operations. If you want to dictate to the original poster the correct way to do things then you

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Ross Ridge
Tim Chase python.l...@tim.thechases.com wrote: Internally, they're a series of bytes, but they are MEANINGLESS bytes unless you know how they are encoded internally. Those bytes could be UTF-8, UTF-16, UTF-32, or any of a number of other possible encodings[1]. If you get the internal byte

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Albert W. Hopkins
On Wed, 2012-03-28 at 14:05 -0400, Ross Ridge wrote: Ross Ridge rri...@csclub.uwaterloo.ca wr= Of course it is. =A0Conceptually you're not supposed to think of it that way, but a string is stored in memory as a series of bytes. Chris Angelico ros...@gmail.com wrote: Note that

RE: convert string to bytes without changing data (encoding)

2012-03-28 Thread Prasad, Ramit
The right way to convert bytes to strings, and vice versa, is via encoding and decoding operations. If you want to dictate to the original poster the correct way to do things then you don't need to do anything more that. You don't need to pretend like Chris Angelico that there's isn't a

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Ethan Furman
Prasad, Ramit wrote: You can read as bytes and decode as ASCII but ignoring the troublesome non-text characters: print(open('text.txt', 'br').read().decode('ascii', 'ignore')) Das fr ASCII nicht benutzte Bit kann auch fr Fehlerkorrekturzwecke (Parittsbit) auf den Kommunikationsleitungen oder

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread John Nagle
On 3/28/2012 10:43 AM, Peter Daum wrote: On 2012-03-28 12:42, Heiko Wundram wrote: Am 28.03.2012 11:43, schrieb Peter Daum: The longer story of my question is: I am new to python (obviously), and since I am not familiar with either one, I thought it would be advisory to go for python 3.x.

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Grant Edwards
On 2012-03-28, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Wed, 28 Mar 2012 19:43:36 +0200, Peter Daum wrote: The longer story of my question is: I am new to python (obviously), and since I am not familiar with either one, I thought it would be advisory to go for python

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread MRAB
On 28/03/2012 20:02, Prasad, Ramit wrote: The right way to convert bytes to strings, and vice versa, is via encoding and decoding operations. If you want to dictate to the original poster the correct way to do things then you don't need to do anything more that. You don't need to pretend

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Grant Edwards
On 2012-03-28, Prasad, Ramit ramit.pra...@jpmorgan.com wrote: You can't generally just deal with the ascii portions without knowing something about the encoding. Say you encounter a byte greater than 127. Is it a single non-ASCII character, or is it the leading byte of a multi-byte character?

Re: Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Ross Ridge
Evan Driscoll drisc...@cs.wisc.edu wrote: So yes, you can say that pretending there's not a mapping of strings to internal representation is silly, because there is. However, there's nothing you can say about that mapping. I'm not the one labeling anything as being silly. I'm the one labeling

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Mark Lawrence
On 28/03/2012 20:43, Ross Ridge wrote: Evan Driscolldrisc...@cs.wisc.edu wrote: So yes, you can say that pretending there's not a mapping of strings to internal representation is silly, because there is. However, there's nothing you can say about that mapping. I'm not the one labeling

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Neil Cerutti
On 2012-03-28, Ross Ridge rri...@csclub.uwaterloo.ca wrote: Evan Driscoll drisc...@cs.wisc.edu wrote: So yes, you can say that pretending there's not a mapping of strings to internal representation is silly, because there is. However, there's nothing you can say about that mapping. I'm not

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Terry Reedy
On 3/28/2012 1:43 PM, Peter Daum wrote: The longer story of my question is: I am new to python (obviously), and since I am not familiar with either one, I thought it would be advisory to go for python 3.x. I strongly agree with that unless you have reason to use 2.7. Python 3.3 (.0a1 in

Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Steven D'Aprano
On Wed, 28 Mar 2012 15:43:31 -0400, Ross Ridge wrote: I can in fact say what the internal byte string representation of strings is any given build of Python 3. Don't keep us in suspense! Given: Python 3.2.2 (default, Mar 4 2012, 10:50:33) [GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2

Re: Re: Re: convert string to bytes without changing data (encoding)

2012-03-28 Thread Evan Driscoll
On 3/28/2012 14:43, Ross Ridge wrote: Evan Driscoll drisc...@cs.wisc.edu wrote: So yes, you can say that pretending there's not a mapping of strings to internal representation is silly, because there is. However, there's nothing you can say about that mapping. I'm not the one labeling