Re: [Python-Dev] Python3 "complexity"

2014-01-11 Thread Matěj Cepl
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2014-01-10, 17:34 GMT, you wrote: > From my experience, the concept of a default locale is deeply > flawed. What if I log into a (Linux) machine using an old > latin-1 putty from the Windows XP era, have most file names > and contents in UTF-8 e

Re: [Python-Dev] Python3 "complexity" - 2 use cases

2014-01-10 Thread Ben Finney
"Jim J. Jewett" writes: > > > Steven D'Aprano wrote: > >> I think that heuristics to guess the encoding have their role to play, > >> if the caller understands the risks. > > Ben Finney wrote: > > In my opinion, content-type guessing heuristics certainly don't belong > > in the standard library

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Ethan Furman
On 01/10/2014 03:22 PM, Mark Lawrence wrote: On 10/01/2014 22:06, Chris Barker wrote: I'm not so sure -- it could be used (abused?) for that, but I'm suggesting it be used for mixed ascii-binary data. I don't know that there IS a "right" way to do that -- at least not an efficient or easy to re

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Chris Barker
On Fri, Jan 10, 2014 at 3:22 PM, Mark Lawrence wrote: > The correct way is to read the interface specification which tells you > what should be in the data. Or do people not use interface specifications > these days, preferring to guess what they've got instead? > No one is suggesting guessing (

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Mark Lawrence
On 10/01/2014 22:06, Chris Barker wrote: On Fri, Jan 10, 2014 at 6:05 AM, Paul Moore mailto:p.f.mo...@gmail.com>> wrote: > Using the 'latin-1' to mean unknown encoding can easily result > in Mojibake (unreadable text) entering your application with > dangerous effects on your othe

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Chris Barker
On Fri, Jan 10, 2014 at 6:05 AM, Paul Moore wrote: > > Using the 'latin-1' to mean unknown encoding can easily result > > in Mojibake (unreadable text) entering your application with > > dangerous effects on your other text data. > > Agreed. The latin-1 suggestion is purely for people who object

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Serhiy Storchaka
10.01.14 18:27, Baptiste Carvello написав(ла): would it make sense to be more general, and allow a "lenient mode", where all files implicitly opened with the default encoding would also use the surrogateescape error handler ? The surrogateescape error handler is compatible only with ASCII-comp

[Python-Dev] Python3 "complexity" - 2 use cases

2014-01-10 Thread Jim J. Jewett
> Steven D'Aprano wrote: >> I think that heuristics to guess the encoding have their role to play, >> if the caller understands the risks. Ben Finney wrote: > In my opinion, content-type guessing heuristics certainly don't belong > in the standard library. It would be great if there were never

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Greg Ewing
INADA Naoki wrote: latin1 is OK but is it Pythonic? Latin is most certainly a Pythonic subject: http://www.youtube.com/watch?v=IIAdHEwiAy8 -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Philip Jenvey
On Jan 10, 2014, at 7:35 AM, Nick Coghlan wrote: > Putting this here because I found out today it's not in any of the > PEPs and folks have to go digging in mailing list archives to find it. > I'll add it to my Python 3 Q&A at some point. > > The reason Python 3 currently tries to rely on the PO

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Serhiy Storchaka
10.01.14 14:19, M.-A. Lemburg написав(ла): BTW: Perhaps it would be a good idea to backport the surrogateescape error handler to Python 2.7 to simplify writing code which works in both Python 2 and 3. You also should change the UTF-8 codec so that it will reject surrogates (i.e. u'\ud880'.enco

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Stefan Ring
On Fri, Jan 10, 2014 at 4:35 PM, Nick Coghlan wrote: > On 10 January 2014 13:32, Lennart Regebro wrote: >> No, because your environment have a default language. And Python has a >> default encoding. You only get problems when some file doesn't use the >> default encoding. > > The reason Python 3

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Baptiste Carvello
Le 10/01/2014 16:35, Nick Coghlan a écrit : > One idea we're considering for Python 3.5 is to have a report of > "ascii" on a POSIX OS imply the surrogateescape error handler (at > least for the standard streams, and perhaps in other contexts), since > the OS reporting the POSIX/C locale almost ce

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread INADA Naoki
Now I feel it is bad thing that encouraging using unicode for binary with latin-1 encoding or surrogateescape errorhandler. Handling binary data in str type using latin-1 is just a hack. Surrogateescape is just a workaround to keep undecodable bytes in text. Encouraging binary data in str type wi

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Stefan Krah
Nick Coghlan wrote: > One idea we're considering for Python 3.5 is to have a report of > "ascii" on a POSIX OS imply the surrogateescape error handler (at > least for the standard streams, and perhaps in other contexts), since > the OS reporting the POSIX/C locale almost certainly indicates a > co

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Nick Coghlan
On 10 January 2014 13:32, Lennart Regebro wrote: > On Thu, Jan 9, 2014 at 10:06 AM, Kristján Valur Jónsson > wrote: >> Do I speak Chinese to my grocer because china is a growing force in the >> world? Or start every discussion with my children with a negotiation on >> what language to use? > >

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Matěj Cepl
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2014-01-10, 12:19 GMT, you wrote: > Using the 'latin-1' to mean unknown encoding can easily result > in Mojibake (unreadable text) entering your application with > dangerous effects on your other text data. > > E.g. "Marc-André" read using 'latin-1'

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread Paul Moore
On 10 January 2014 12:19, M.-A. Lemburg wrote: > Just a word of caution: > > Using the 'latin-1' to mean unknown encoding can easily result > in Mojibake (unreadable text) entering your application with > dangerous effects on your other text data. Agreed. The latin-1 suggestion is purely for peop

Re: [Python-Dev] Python3 "complexity"

2014-01-10 Thread M.-A. Lemburg
On 09.01.2014 22:45, Antoine Pitrou wrote: > On Thu, 9 Jan 2014 13:36:05 -0800 > Chris Barker wrote: >> >> Some folks have suggested using latin-1 (or other 8-bit encoding) -- is >> that guaranteed to work with any binary data, and round-trip accurately? > > Yes, it is. Just a word of caution:

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Stephen J. Turnbull
Chris Angelico writes: > I'm not saying that chardet is bad, but I *am* saying, and I stand > by this, that an auto-detect option on file open is a bad idea. I have used it by default in Emacs and XEmacs since 1990, and I certainly haven't experienced it as a bad idea at *any* time in more than

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Stephen J. Turnbull
INADA Naoki writes: > latin1 is OK but is it Pythonic? Yes. EIBTI, including being explicit that you're doing something that has semantics that you are ignoring but may come back to bite you or somebody who naively uses your module. There's nothing un-Pythonic about using potentially dangerous

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Ben Finney
Steven D'Aprano writes: > I think that heuristics to guess the encoding have their role to play, > if the caller understands the risks. I think, for a language whose developers espouse a principle “In the face of ambiguity, refuse the temptation to guess”, heuristics have no role to play in the

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Chris Angelico
On Fri, Jan 10, 2014 at 1:39 PM, Steven D'Aprano wrote: > On Fri, Jan 10, 2014 at 12:22:02PM +1100, Chris Angelico wrote: >> On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik >> wrote: >> > 2. introduce autodetect mode to open functions >> > 1. read and transform on the fly, maintaining

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Lennart Regebro
On Fri, Jan 10, 2014 at 2:03 AM, Joao S. O. Bueno wrote: > On 9 January 2014 04:50, Lennart Regebro wrote: >> To be honest, you can define text as "A stream of bytes that are split >> up in lines separated by a linefeed", and do some basic text >> processing like that. Just very *basic*, but stil

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Lennart Regebro
On Thu, Jan 9, 2014 at 10:06 AM, Kristján Valur Jónsson wrote: > Do I speak Chinese to my grocer because china is a growing force in the > world? Or start every discussion with my children with a negotiation on what > language to use? No, because your environment have a default language. And P

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Steven D'Aprano
On Fri, Jan 10, 2014 at 12:22:02PM +1100, Chris Angelico wrote: > On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik > wrote: > > 2. introduce autodetect mode to open functions > > 1. read and transform on the fly, maintaining a buffer that > > stores original bytes > > and their

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Terry Reedy
On 1/9/2014 6:25 PM, Chris Barker wrote: as so -- I want to replace a bit of ascii text surrounded by arbitrary binary: (apologies for the py2...) In [24]: b Out[24]: '\x01\x00\xd1\x80\xd1a name\xd0\x80' In [25]: u = b.decode('latin-1') In [26]: u2 = u.replace('a name', 'a different name') In [

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Steven D'Aprano
On Thu, Jan 09, 2014 at 02:08:57PM -0800, Ethan Furman wrote: > If latin1 is used to convert binary to text, how convoluted is it to then > take chunks of that text and convert to int, or some other variety of > unicode? > > For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' > > If that were dec

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Chris Angelico
On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik wrote: > 2. introduce autodetect mode to open functions > 1. read and transform on the fly, maintaining a buffer that > stores original bytes > and their mapping to letters. The mapping is updated as bytes > frequency >

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Joao S. O. Bueno
On 9 January 2014 04:50, Lennart Regebro wrote: > To be honest, you can define text as "A stream of bytes that are split > up in lines separated by a linefeed", and do some basic text > processing like that. Just very *basic*, but still. Replacing > characters. Extracting certain lines etc. That

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread anatoly techtonik
On Thu, Jan 9, 2014 at 10:00 AM, Mark Lawrence wrote: > On 09/01/2014 06:50, Lennart Regebro wrote: >> >> On Thu, Jan 9, 2014 at 1:07 AM, Ben Finney >> wrote: >>> >>> Kristján Valur Jónsson writes: >>> Believe it or not, sometimes you really don't care about encodings. Sometimes you ju

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread INADA Naoki
latin1 is OK but is it Pythonic? I've posted suggestion about add 'bytes' as a alias for 'latin1'. http://comments.gmane.org/gmane.comp.python.ideas/10315 I want one Pythonic way to handle "binary containing ascii (or latin1 or utf-8 or other ascii compatible)". On Fri, Jan 10, 2014 at 8:53 AM

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Chris Barker
On Thu, Jan 9, 2014 at 3:14 PM, Ethan Furman wrote: > Sorry, I was too short with my example. My use case is binary files, with > ASCII metadata and binary metadata, as well as ASCII-encoded numeric > values, binary-coded numeric values, ASCII-encoded boolean values, and > who-knows-what-(before

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Ethan Furman
On 01/09/2014 02:54 PM, Paul Moore wrote: On 9 January 2014 22:08, Ethan Furman wrote: For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' If that were decoded using latin1 how would I then get the first two bytes to the integer 256 and the last six bytes to their Cyrillic meaning? (Apologies for

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Chris Barker
On Thu, Jan 9, 2014 at 2:54 PM, Paul Moore > For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' > > > > If that were decoded using latin1 how would I then get the first two > bytes > > to the integer 256 and the last six bytes to their Cyrillic meaning? > > (Apologies for not testing myself, short

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Ethan Furman
On 01/09/2014 02:54 PM, Paul Moore wrote: On 9 January 2014 22:08, Ethan Furman wrote: For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' If that were decoded using latin1 how would I then get the first two bytes to the integer 256 and the last six bytes to their Cyrillic meaning? (Apologies for

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Paul Moore
On 9 January 2014 22:08, Ethan Furman wrote: > For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80' > > If that were decoded using latin1 how would I then get the first two bytes > to the integer 256 and the last six bytes to their Cyrillic meaning? > (Apologies for not testing myself, short on time.)

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Ethan Furman
On 01/09/2014 02:00 PM, Chris Barker wrote: On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote: Chris Barker wrote: latin-1 guaranteed to work with any binary data, and round-trip accurately? Yes, it is. and will surrogateescape work for arbitrary binary data? Yes, it will. Then ma

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Paul Moore
On 9 January 2014 22:00, Chris Barker wrote: > On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote: >> >> > latin-1 guaranteed to work with any binary data, and round-trip >> > accurately? >> >> Yes, it is. >> >> > and will surrogateescape work for arbitrary binary data? >> >> Yes, it will. > >

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Brett Cannon
On Thu, Jan 9, 2014 at 5:00 PM, Chris Barker wrote: > On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote: > >> > latin-1 guaranteed to work with any binary data, and round-trip >> accurately? >> >> Yes, it is. >> >> > and will surrogateescape work for arbitrary binary data? >> >> Yes, it will.

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Chris Barker
On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote: > > latin-1 guaranteed to work with any binary data, and round-trip > accurately? > > Yes, it is. > > > and will surrogateescape work for arbitrary binary data? > > Yes, it will. > Then maybe this is really a documentation issue, after all.

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Antoine Pitrou
On Thu, 9 Jan 2014 13:36:05 -0800 Chris Barker wrote: > > Some folks have suggested using latin-1 (or other 8-bit encoding) -- is > that guaranteed to work with any binary data, and round-trip accurately? Yes, it is. > and will surrogateescape work for arbitrary binary data? Yes, it will. Reg

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Chris Barker
This has all gotten a bit complicated because everyone has been thinking in terms of actual encodings and actual text files. But I think the use-case here is something different: A file with a bunch of bytes in it, _some_of which are ascii, and the rest are other bytes (maybe binary data, maybe no

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Kristján Valur Jónsson
...@gmail.com] Sent: Thursday, January 09, 2014 18:08 To: Kristján Valur Jónsson Cc: Victor Stinner; Antoine Pitrou; python-dev@python.org Subject: Re: [Python-Dev] Python3 "complexity" http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html is currently linke

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Nick Coghlan
On 9 Jan 2014 22:25, "Kristján Valur Jónsson" wrote: > > > > > -Original Message- > > From: Victor Stinner [mailto:victor.stin...@gmail.com] > > Sent: 9. janúar 2014 13:51 > > To: Kristján Valur Jónsson > > Cc: Antoine Pitrou; python-dev

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-09 Thread Nick Coghlan
On 9 Jan 2014 22:08, "Antoine Pitrou" wrote: > > On Thu, 9 Jan 2014 09:03:40 -0500 > Daniel Holth wrote: > > They emphatically do not want the Python 2 > > model especially not implicit coercion. They only want additional > > tools for text or string processing in Python 3. > > That's a good poin

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Stephen J. Turnbull
Steven D'Aprano writes: > If it were, we wouldn't need text strings :-) Speak for yourself, Kemosabe. Red man need Unicode, full meal not just a few bytes. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/pyt

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Kristján Valur Jónsson
> -Original Message- > From: Victor Stinner [mailto:victor.stin...@gmail.com] > Sent: 9. janúar 2014 13:51 > To: Kristján Valur Jónsson > Cc: Antoine Pitrou; python-dev@python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > 2014/1/9 Kristján V

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Steven D'Aprano
On Thu, Jan 09, 2014 at 01:00:59PM +, Kristján Valur Jónsson wrote: > Which reminds me, can Python3 read text files with BOM automatically yet? I'm not sure what you mean by that. If you mean, can Python3 distinguish between UTF-16BE and UTF-16LE on the basis of a BOM, then it's been able t

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-09 Thread Antoine Pitrou
On Thu, 9 Jan 2014 09:03:40 -0500 Daniel Holth wrote: > They emphatically do not want the Python 2 > model especially not implicit coercion. They only want additional > tools for text or string processing in Python 3. That's a good point. Now it's up to people who need those additional tools to p

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-09 Thread Daniel Holth
So the customer you're looking for is the person who cares a lot about encodings, knows how to do Unicode correctly, and has noticed that certain valid cases not limited to imperialist simpletons (dealing with specific common things invented before 1996, dealing with mixed encodings, doing what Nic

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Kristján Valur Jónsson
> -Original Message- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames@python.org] On Behalf Of Kristján Valur > Jónsson > Sent: 9. janúar 2014 13:37 > To: Antoine Pitrou; python-dev@python.org > Subject: Re: [Python-Dev] Python3 "complexit

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Victor Stinner
2014/1/9 Kristján Valur Jónsson : > This definition is funny, because according to Wikipedia, it is a "superset" > of 8869-1 ( latin1) Bytes 0x80..0x9f are unassigned in ISO/CEI 8859-1... but are assigned in (IANA's) ISO-8859-1. Python implements the latter, ISO-8859-1. Wikipedia says "This enc

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Kristján Valur Jónsson
> -Original Message- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames@python.org] On Behalf Of Antoine Pitrou > Sent: 9. janúar 2014 13:18 > To: python-dev@python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > On Thu, 9

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Paul Moore
On 9 January 2014 13:00, Kristján Valur Jónsson wrote: >> You don't say what problems, but I assume encoding/decoding errors. So the >> files apparently weren't in the system encoding. OK, at that point I'd >> probably say to heck with it and use latin-1. Assuming I was sure that (a) >> I'd >> ne

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Antoine Pitrou
On Thu, 9 Jan 2014 12:55:35 + Kristján Valur Jónsson wrote: > > If you don't "care" about the encoding, why don't you use latin1? > > Things will roundtrip fine and work as well as under Python 2. > > Because latin1 does not define all code points, giving you errors there. >>> b = bytes(rang

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Kristján Valur Jónsson
> -Original Message- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames@python.org] On Behalf Of Antoine Pitrou > Sent: 9. janúar 2014 12:42 > To: python-dev@python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > On Thu, 9

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Martin v. Löwis
> Right. But even latin-1, or better, cp1252 (on windows) does not solve it > because these have undefined > code points. That's not true. latin-1 does not have undefined code points. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.o

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Kristján Valur Jónsson
> -Original Message- > From: Paul Moore [mailto:p.f.mo...@gmail.com] > Sent: 9. janúar 2014 10:53 > To: Kristján Valur Jónsson > Cc: Stefan Ring; python-dev@python.org > > Moving to python 3, I found that this quickly caused problems. > > You don't say what problems, but I assume encodin

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-09 Thread Antoine Pitrou
On Thu, 9 Jan 2014 17:09:10 +1000 Nick Coghlan wrote: > > There's also the fact that POSIX folks are used to "r" and "rb" being > the same thing. Which fails immediately under Windows :-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@pyth

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Antoine Pitrou
On Thu, 9 Jan 2014 10:15:08 + Kristján Valur Jónsson wrote: > > Moving to python 3, I found that this quickly caused problems. So, I > explicitly added an encoding. Better guess an encoding, something that is > likely, e.g. cp1252 > with open(fn1, encoding='cp1252') as f1: > with open

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Steven D'Aprano
On Thu, Jan 09, 2014 at 05:11:06PM +1000, Nick Coghlan wrote: > On 9 January 2014 10:07, Ben Finney wrote: > > So, if what you want is to parse text and not get gibberish, you need to > > *tell* Python what the encoding is. That's a brute fact of the world of > > text in computing. > > Set the m

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Paul Moore
On 9 January 2014 10:15, Kristján Valur Jónsson wrote: > Also, the problem I'm describing has to do with real world stuff. > This is the python 2 program: > with open(fn1) as f1: > with open(fn2, 'w') as f2: > f2.write(process_text(f1.read()) > > Moving to python 3, I found that this q

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Kristján Valur Jónsson
> -Original Message- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames@python.org] On Behalf Of Stefan Ring > Sent: 9. janúar 2014 09:32 > To: python-dev@python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > > just

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Stephen J. Turnbull
Paul Moore writes: > So I think that if this discussion is to be of any real benefit, a > specific example is needed. I honestly don't think I've ever > encountered a case where "Sometimes [I] just want to parse text > files" and code that uses the default encoding (i.e., looks pretty > much

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Stefan Ring
> just became harder to use for that purpose. The entire discussion reminds me very much of the situation with file names in OS X. Whenever I want to look at an old zip file or tarball which happens to have been lying around on my hard drive for a decade or more, I can't because OS X insist that f

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Paul Moore
On 9 January 2014 09:01, Mark Shannon wrote: > On 09/01/14 00:07, Ben Finney wrote: >> >> Kristján Valur Jónsson writes: >> >>> Believe it or not, sometimes you really don't care about encodings. >>> Sometimes you just want to parse text files. >> >> >> Files don't contain text, they contain byte

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Kristján Valur Jónsson
> -Original Message- > From: Python-Dev [mailto:python-dev- > bounces+kristjan=ccpgames@python.org] On Behalf Of Ben Finney > Sent: 9. janúar 2014 00:50 > To: python-dev@python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > Kristján Valu

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Mark Shannon
On 09/01/14 00:07, Ben Finney wrote: Kristján Valur Jónsson writes: Believe it or not, sometimes you really don't care about encodings. Sometimes you just want to parse text files. Files don't contain text, they contain bytes. Bytes only become text when filtered through the correct encoding

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Lennart Regebro
On Thu, Jan 9, 2014 at 8:16 AM, Ben Finney wrote: > Nick Coghlan writes: >> Set the mode to "rb", process it as binary. Done. > > Which entails abandoning the stated goal of “just want to parse text > files” :-) Only if your definition of "text files" means it's unicode.

Re: [Python-Dev] Python3 "complexity"

2014-01-09 Thread Chris Angelico
On Thu, Jan 9, 2014 at 5:50 PM, Lennart Regebro wrote: > To be honest, you can define text as "A stream of bytes that are split > up in lines separated by a linefeed", and do some basic text > processing like that. Just very *basic*, but still. Replacing > characters. Extracting certain lines etc.

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread Nick Coghlan
On 9 January 2014 15:22, Greg Ewing wrote: > Kristján Valur Jónsson wrote: >> >> all you want is to open that .txt >> file on the drive and extract some phone numbers and merge in some email >> addresses. What encoding does the file have? Do I care? Must I care? > > > To some extent, yes. If the e

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Ben Finney
Nick Coghlan writes: > On 9 January 2014 10:07, Ben Finney wrote: > > Kristján Valur Jónsson writes: > > > >> Believe it or not, sometimes you really don't care about encodings. > >> Sometimes you just want to parse text files. > > > > Files don't contain text, they contain bytes. Bytes only be

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Nick Coghlan
On 9 January 2014 10:07, Ben Finney wrote: > Kristján Valur Jónsson writes: > >> Believe it or not, sometimes you really don't care about encodings. >> Sometimes you just want to parse text files. > > Files don't contain text, they contain bytes. Bytes only become text > when filtered through the

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Nick Coghlan
-Dev [python-dev-bounces+kristjan=ccpgames@python.org] on > behalf of Ben Finney [ben+pyt...@benfinney.id.au] > Sent: Thursday, January 09, 2014 00:07 > To: python-dev@python.org > Subject: Re: [Python-Dev] Python3 "complexity" > > Kristján Valur Jónsson writes: > >>

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Mark Lawrence
On 09/01/2014 06:50, Lennart Regebro wrote: On Thu, Jan 9, 2014 at 1:07 AM, Ben Finney wrote: Kristján Valur Jónsson writes: Believe it or not, sometimes you really don't care about encodings. Sometimes you just want to parse text files. Files don't contain text, they contain bytes. Bytes

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Lennart Regebro
On Thu, Jan 9, 2014 at 1:07 AM, Ben Finney wrote: > Kristján Valur Jónsson writes: > >> Believe it or not, sometimes you really don't care about encodings. >> Sometimes you just want to parse text files. > > Files don't contain text, they contain bytes. Bytes only become text > when filtered thro

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Stephen J. Turnbull
Ben Finney writes: > That's a much better analogy. The customer may not care, but the > question is essential and must be answered; if the supplier guesses what > the customer wants, they are doing the customer a disservice. It is a much better analogy for me on my desktop, and for programmers

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Stephen J. Turnbull
Kristján Valur Jónsson writes: > Still playing the devil's advocate: > I didn't used to must. Why must I must now? Did the universe just > shift when I fired up python3? No. Go look at the Economist's tag cloud and notice how big "China" and "India" are most days. The universe has been shi

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread Greg Ewing
Kristján Valur Jónsson wrote: all you want is to open that .txt file on the drive and extract some phone numbers and merge in some email addresses. What encoding does the file have? Do I care? Must I care? To some extent, yes. If the encoding happens to be an ascii-compatible one, such as latin

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread Dan Stromberg
On Wed, Jan 8, 2014 at 2:04 PM, Kristján Valur Jónsson wrote: > > Believe it or not, sometimes you really don't care about encodings. > Sometimes you just want to parse text files. Python 3 forces you to think > about abstract concepts like encodings when all you want is to open that .txt > fil

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Chris Angelico
On Thu, Jan 9, 2014 at 11:21 AM, MRAB wrote: > On the other hand: > > "I need a new battery." > > "What kind of battery?" > > "I don't care!" Or, bringing it back to Python: How do you write a set out to a file? foo = {1, 2, 4, 8, 16, 32} open("foo.txt","w").write(foo) # Uh... nope!

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread Terry Reedy
On 1/8/2014 5:04 PM, Kristján Valur Jónsson wrote: Believe it or not, sometimes you really don't care about encodings. Sometimes you just want to parse text files. Python 3 forces you to think about abstract concepts like encodings when all you want is to open that .txt file on the drive and ex

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread R. David Murray
On Thu, 09 Jan 2014 00:12:57 +, wrote: > I think there might be a different analogy: Having to specify an > encoding is like having strong typing. In Python 2.7, we _can_ forego > that and just duck-type our strings :) Python is a strongly typed language. Saying that python2 let you duck t

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread Mark Lawrence
On 09/01/2014 00:12, Kristján Valur Jónsson wrote: Just to avoid confusion, let me state up front that I am very well aware of encodings and all that, having internationalized one largish app in python 2.x. I know the problems that 2.x had with tracking down the source of errors and understan

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Ben Finney
Kristján Valur Jónsson writes: > I didn't used to must. Why must I must now? Did the universe just > shift when I fired up python3? In a sense, yes. The world of software has been shifting for decades, as a reasult of broader changes in how different segments of humanity have changed their int

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread Kristján Valur Jónsson
, 2014 23:40 To: python-dev@python.org Subject: Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) Why *do* you care? Isn't your system configured for utf-8, and all your .txt files encoded with utf-8 by default? Or at least configured with a single consist

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Ben Finney
MRAB writes: > On 2014-01-09 00:07, Ben Finney wrote: > > Kristján Valur Jónsson writes: > >> Python 3 forces you to think about abstract concepts like encodings > >> when all you want is to open that .txt file on the drive and > >> extract some phone numbers and merge in some email addresses. W

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Kristján Valur Jónsson
s+kristjan=ccpgames@python.org] on behalf of Ben Finney [ben+pyt...@benfinney.id.au] Sent: Thursday, January 09, 2014 00:07 To: python-dev@python.org Subject: Re: [Python-Dev] Python3 "complexity" Kristján Valur Jónsson writes: > Python 3 forces you to think about abstract concept

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Mark Lawrence
On 09/01/2014 00:21, MRAB wrote: "I need a new battery." "What kind of battery?" "I don't care!" A neat summary of the draft requirements specification for Python 2.8. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread Isaac Morland
On Wed, 8 Jan 2014, Kristján Valur Jónsson wrote: Believe it or not, sometimes you really don't care about encodings. Sometimes you just want to parse text files. Python 3 forces you to think about abstract concepts like encodings when all you want is to open that .txt file on the drive and

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread MRAB
On 2014-01-09 00:07, Ben Finney wrote: Kristján Valur Jónsson writes: Believe it or not, sometimes you really don't care about encodings. Sometimes you just want to parse text files. Files don't contain text, they contain bytes. Bytes only become text when filtered through the correct encodi

Re: [Python-Dev] Python3 "complexity"

2014-01-08 Thread Ben Finney
Kristján Valur Jónsson writes: > Believe it or not, sometimes you really don't care about encodings. > Sometimes you just want to parse text files. Files don't contain text, they contain bytes. Bytes only become text when filtered through the correct encoding. Python should not guess the encodi

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread R. David Murray
On Wed, 08 Jan 2014 22:04:56 +, wrote: > Believe it or not, sometimes you really don't care about encodings. > Sometimes you just want to parse text files. Python 3 forces you to > think about abstract concepts like encodings when all you want is to > open that .txt file on the drive and extr

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread Victor Stinner
Hi, > Python 3 forces you to think about abstract concepts like encodings when all > you want is to open that .txt file on the drive and extract some phone > numbers and merge in some email addresses. You can open a text file using ascii + surrogateescape, or just open the file in binary. Vic

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread Joao S. O. Bueno
On 8 January 2014 20:04, Kristján Valur Jónsson wrote: > Believe it or not, sometimes you really don't care about encodings. > Sometimes you just want to parse text files. Python 3 forces you to think > about abstract concepts like encodings when all you want is to open that .txt > file on the

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread Kristján Valur Jónsson
__ From: Python-Dev [python-dev-bounces+kristjan=ccpgames@python.org] on behalf of R. David Murray [rdmur...@bitdance.com] Sent: Wednesday, January 08, 2014 21:29 To: python-dev@python.org Subject: Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...) ... It

Re: [Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

2014-01-08 Thread R. David Murray
On Wed, 08 Jan 2014 19:22:08 +, "Matt Billenstein" wrote: > I started in Python blissfully unaware of unicode - it was a different time > for > sure, but what I knew from C worked pretty much the same in Python - I could > read some binary data out of a file, twiddle some bits, and write it b