Chris Angelico writes:
> I'm not saying that chardet is bad, but I *am* saying, and I stand
> by this, that an auto-detect option on file open is a bad idea.
I have used it by default in Emacs and XEmacs since 1990, and I
certainly haven't experienced it as a bad idea at *any* time in more
than
INADA Naoki writes:
> latin1 is OK but is it Pythonic?
Yes. EIBTI, including being explicit that you're doing something that
has semantics that you are ignoring but may come back to bite you or
somebody who naively uses your module.
There's nothing un-Pythonic about using potentially dangerous
Steven D'Aprano writes:
> I think that heuristics to guess the encoding have their role to play,
> if the caller understands the risks.
I think, for a language whose developers espouse a principle “In the
face of ambiguity, refuse the temptation to guess”, heuristics have no
role to play in the
On Fri, Jan 10, 2014 at 1:39 PM, Steven D'Aprano wrote:
> On Fri, Jan 10, 2014 at 12:22:02PM +1100, Chris Angelico wrote:
>> On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik
>> wrote:
>> > 2. introduce autodetect mode to open functions
>> > 1. read and transform on the fly, maintaining
On Fri, Jan 10, 2014 at 2:03 AM, Joao S. O. Bueno wrote:
> On 9 January 2014 04:50, Lennart Regebro wrote:
>> To be honest, you can define text as "A stream of bytes that are split
>> up in lines separated by a linefeed", and do some basic text
>> processing like that. Just very *basic*, but stil
On Thu, Jan 9, 2014 at 10:06 AM, Kristján Valur Jónsson
wrote:
> Do I speak Chinese to my grocer because china is a growing force in the
> world? Or start every discussion with my children with a negotiation on what
> language to use?
No, because your environment have a default language. And P
On Fri, Jan 10, 2014 at 12:22:02PM +1100, Chris Angelico wrote:
> On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik
> wrote:
> > 2. introduce autodetect mode to open functions
> > 1. read and transform on the fly, maintaining a buffer that
> > stores original bytes
> > and their
On 1/9/2014 6:25 PM, Chris Barker wrote:
as so -- I want to replace a bit of ascii text surrounded by arbitrary
binary:
(apologies for the py2...)
In [24]: b
Out[24]: '\x01\x00\xd1\x80\xd1a name\xd0\x80'
In [25]: u = b.decode('latin-1')
In [26]: u2 = u.replace('a name', 'a different name')
In [
On Thu, Jan 09, 2014 at 02:08:57PM -0800, Ethan Furman wrote:
> If latin1 is used to convert binary to text, how convoluted is it to then
> take chunks of that text and convert to int, or some other variety of
> unicode?
>
> For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80'
>
> If that were dec
On 10 Jan 2014 03:32, "Antoine Pitrou" wrote:
>
> On Fri, 10 Jan 2014 05:26:04 +1000
> Nick Coghlan wrote:
> >
> > We should probably include format_map for consistency with the str API.
>
> Yes, you're right.
>
> > >However, I
> > > also added bytearray into the mix, as bytearray objects should
On Fri, Jan 10, 2014 at 11:53 AM, anatoly techtonik wrote:
> 2. introduce autodetect mode to open functions
> 1. read and transform on the fly, maintaining a buffer that
> stores original bytes
> and their mapping to letters. The mapping is updated as bytes
> frequency
>
On 9 January 2014 04:50, Lennart Regebro wrote:
> To be honest, you can define text as "A stream of bytes that are split
> up in lines separated by a linefeed", and do some basic text
> processing like that. Just very *basic*, but still. Replacing
> characters. Extracting certain lines etc.
That
On Thu, Jan 9, 2014 at 10:00 AM, Mark Lawrence wrote:
> On 09/01/2014 06:50, Lennart Regebro wrote:
>>
>> On Thu, Jan 9, 2014 at 1:07 AM, Ben Finney
>> wrote:
>>>
>>> Kristján Valur Jónsson writes:
>>>
Believe it or not, sometimes you really don't care about encodings.
Sometimes you ju
latin1 is OK but is it Pythonic?
I've posted suggestion about add 'bytes' as a alias for 'latin1'.
http://comments.gmane.org/gmane.comp.python.ideas/10315
I want one Pythonic way to handle "binary containing ascii (or latin1 or
utf-8 or other ascii compatible)".
On Fri, Jan 10, 2014 at 8:53 AM
On Thu, Jan 9, 2014 at 3:14 PM, Ethan Furman wrote:
> Sorry, I was too short with my example. My use case is binary files, with
> ASCII metadata and binary metadata, as well as ASCII-encoded numeric
> values, binary-coded numeric values, ASCII-encoded boolean values, and
> who-knows-what-(before
On 01/09/2014 02:54 PM, Paul Moore wrote:
On 9 January 2014 22:08, Ethan Furman wrote:
For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80'
If that were decoded using latin1 how would I then get the first two bytes
to the integer 256 and the last six bytes to their Cyrillic meaning?
(Apologies for
On Thu, Jan 9, 2014 at 2:54 PM, Paul Moore
> For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80'
> >
> > If that were decoded using latin1 how would I then get the first two
> bytes
> > to the integer 256 and the last six bytes to their Cyrillic meaning?
> > (Apologies for not testing myself, short
On 01/09/2014 02:54 PM, Paul Moore wrote:
On 9 January 2014 22:08, Ethan Furman wrote:
For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80'
If that were decoded using latin1 how would I then get the first two bytes
to the integer 256 and the last six bytes to their Cyrillic meaning?
(Apologies for
On 9 January 2014 22:08, Ethan Furman wrote:
> For example: b'\x01\x00\xd1\x80\xd1\83\xd0\x80'
>
> If that were decoded using latin1 how would I then get the first two bytes
> to the integer 256 and the last six bytes to their Cyrillic meaning?
> (Apologies for not testing myself, short on time.)
On 01/09/2014 02:00 PM, Chris Barker wrote:
On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote:
Chris Barker wrote:
latin-1 guaranteed to work with any binary data, and round-trip accurately?
Yes, it is.
and will surrogateescape work for arbitrary binary data?
Yes, it will.
Then ma
On 9 January 2014 22:00, Chris Barker wrote:
> On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote:
>>
>> > latin-1 guaranteed to work with any binary data, and round-trip
>> > accurately?
>>
>> Yes, it is.
>>
>> > and will surrogateescape work for arbitrary binary data?
>>
>> Yes, it will.
>
>
On Thu, Jan 9, 2014 at 5:00 PM, Chris Barker wrote:
> On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote:
>
>> > latin-1 guaranteed to work with any binary data, and round-trip
>> accurately?
>>
>> Yes, it is.
>>
>> > and will surrogateescape work for arbitrary binary data?
>>
>> Yes, it will.
On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou wrote:
> > latin-1 guaranteed to work with any binary data, and round-trip
> accurately?
>
> Yes, it is.
>
> > and will surrogateescape work for arbitrary binary data?
>
> Yes, it will.
>
Then maybe this is really a documentation issue, after all.
On Thu, 9 Jan 2014 13:36:05 -0800
Chris Barker wrote:
>
> Some folks have suggested using latin-1 (or other 8-bit encoding) -- is
> that guaranteed to work with any binary data, and round-trip accurately?
Yes, it is.
> and will surrogateescape work for arbitrary binary data?
Yes, it will.
Reg
I'm not sure how format_map helps in porting from 2 to 3, since it
doesn't exist in any version of 2.
Although that said, it's no doubt a useful feature, just not useful in
code that supports both 2 and 3 with a single code base or when porting
to 3.
Eric.
On 1/9/2014 4:02 PM, antoine.pitrou wro
This has all gotten a bit complicated because everyone has been thinking in
terms of actual encodings and actual text files. But I think the use-case
here is something different:
A file with a bunch of bytes in it, _some_of which are ascii, and the rest
are other bytes (maybe binary data, maybe no
Thanks Nick. This does seem to cover it all. Perhaps it is worth mentioning
cp1252 as the windows version of latin1, which _does_not_ cover all code points
and hence requires surrogateescapes for best effort solution.
K
From: Nick Coghlan [ncogh...@gmail.com
On Fri, 10 Jan 2014 05:26:04 +1000
Nick Coghlan wrote:
>
> We should probably include format_map for consistency with the str API.
Yes, you're right.
> >However, I
> > also added bytearray into the mix, as bytearray objects should
> > generally support the same operations as bytes (and they can
On 9 Jan 2014 06:43, "Antoine Pitrou" wrote:
>
>
> Hi,
>
> With Victor's consent, I overhauled PEP 460 and made the feature set
> more restricted and consistent with the bytes/str separation.
+1
I was initially dubious about the idea, but the proposed semantics look
good to me.
We should probab
On 9 Jan 2014 22:25, "Kristján Valur Jónsson" wrote:
>
>
>
> > -Original Message-
> > From: Victor Stinner [mailto:victor.stin...@gmail.com]
> > Sent: 9. janúar 2014 13:51
> > To: Kristján Valur Jónsson
> > Cc: Antoine Pitrou; python-dev@python.org
> > Subject: Re: [Python-Dev] Python3 "co
On 9 Jan 2014 22:08, "Antoine Pitrou" wrote:
>
> On Thu, 9 Jan 2014 09:03:40 -0500
> Daniel Holth wrote:
> > They emphatically do not want the Python 2
> > model especially not implicit coercion. They only want additional
> > tools for text or string processing in Python 3.
>
> That's a good poin
On Jan 08, 2014, at 01:51 PM, Stephen J. Turnbull wrote:
>Benjamin Peterson writes:
>
> > I agree. This is a very important, much-requested feature for low-level
> > networking code.
>
>I hear it's much-requested, but is there any description of typical
>use cases?
The two unported libraries that
(Resending with an adjusted Subject and not through Gmane. Apologies for
duplicates.)
On Jan 08, 2014, at 01:51 PM, Stephen J. Turnbull wrote:
>Benjamin Peterson writes:
>
> > I agree. This is a very important, much-requested feature for low-level
> > networking code.
>
>I hear it's much-request
Steven D'Aprano writes:
> If it were, we wouldn't need text strings :-)
Speak for yourself, Kemosabe. Red man need Unicode, full meal not
just a few bytes.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/pyt
On 01/09/2014 03:39 AM, Serhiy Storchaka wrote:
07.01.14 22:51, Ethan Furman написав(ла):
AFAIK you don't write much C code. So perhaps C sources maintainability is not
too valuable for you.
I don't write much C code yet, no, but C source maintainability is even more important to me because o
> -Original Message-
> From: Victor Stinner [mailto:victor.stin...@gmail.com]
> Sent: 9. janúar 2014 13:51
> To: Kristján Valur Jónsson
> Cc: Antoine Pitrou; python-dev@python.org
> Subject: Re: [Python-Dev] Python3 "complexity"
>
> 2014/1/9 Kristján Valur Jónsson :
> > This definition i
On Thu, Jan 09, 2014 at 01:00:59PM +, Kristján Valur Jónsson wrote:
> Which reminds me, can Python3 read text files with BOM automatically yet?
I'm not sure what you mean by that. If you mean, can Python3 distinguish
between UTF-16BE and UTF-16LE on the basis of a BOM, then it's been able
t
On Thu, 9 Jan 2014 09:03:40 -0500
Daniel Holth wrote:
> They emphatically do not want the Python 2
> model especially not implicit coercion. They only want additional
> tools for text or string processing in Python 3.
That's a good point. Now it's up to people who need those additional
tools to p
So the customer you're looking for is the person who cares a lot about
encodings, knows how to do Unicode correctly, and has noticed that
certain valid cases not limited to imperialist simpletons (dealing
with specific common things invented before 1996, dealing with mixed
encodings, doing what Nic
> -Original Message-
> From: Python-Dev [mailto:python-dev-
> bounces+kristjan=ccpgames@python.org] On Behalf Of Kristján Valur
> Jónsson
> Sent: 9. janúar 2014 13:37
> To: Antoine Pitrou; python-dev@python.org
> Subject: Re: [Python-Dev] Python3 "complexity"
>
> This definition is f
2014/1/9 Kristján Valur Jónsson :
> This definition is funny, because according to Wikipedia, it is a "superset"
> of 8869-1 ( latin1)
Bytes 0x80..0x9f are unassigned in ISO/CEI 8859-1... but are assigned
in (IANA's) ISO-8859-1.
Python implements the latter, ISO-8859-1.
Wikipedia says "This enc
> -Original Message-
> From: Python-Dev [mailto:python-dev-
> bounces+kristjan=ccpgames@python.org] On Behalf Of Antoine Pitrou
> Sent: 9. janúar 2014 13:18
> To: python-dev@python.org
> Subject: Re: [Python-Dev] Python3 "complexity"
>
> On Thu, 9 Jan 2014 12:55:35 +
> Kristján V
On 9 January 2014 13:00, Kristján Valur Jónsson wrote:
>> You don't say what problems, but I assume encoding/decoding errors. So the
>> files apparently weren't in the system encoding. OK, at that point I'd
>> probably say to heck with it and use latin-1. Assuming I was sure that (a)
>> I'd
>> ne
On Thu, 9 Jan 2014 12:55:35 +
Kristján Valur Jónsson wrote:
> > If you don't "care" about the encoding, why don't you use latin1?
> > Things will roundtrip fine and work as well as under Python 2.
>
> Because latin1 does not define all code points, giving you errors there.
>>> b = bytes(rang
> -Original Message-
> From: Python-Dev [mailto:python-dev-
> bounces+kristjan=ccpgames@python.org] On Behalf Of Antoine Pitrou
> Sent: 9. janúar 2014 12:42
> To: python-dev@python.org
> Subject: Re: [Python-Dev] Python3 "complexity"
>
> On Thu, 9 Jan 2014 10:15:08 +
> Kristján V
> Right. But even latin-1, or better, cp1252 (on windows) does not solve it
> because these have undefined
> code points.
That's not true. latin-1 does not have undefined code points.
Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.o
> -Original Message-
> From: Paul Moore [mailto:p.f.mo...@gmail.com]
> Sent: 9. janúar 2014 10:53
> To: Kristján Valur Jónsson
> Cc: Stefan Ring; python-dev@python.org
> > Moving to python 3, I found that this quickly caused problems.
>
> You don't say what problems, but I assume encodin
On Thu, 9 Jan 2014 17:09:10 +1000
Nick Coghlan wrote:
>
> There's also the fact that POSIX folks are used to "r" and "rb" being
> the same thing.
Which fails immediately under Windows :-)
Regards
Antoine.
___
Python-Dev mailing list
Python-Dev@pyth
On Thu, 09 Jan 2014 03:54:13 +
MRAB wrote:
> I'm thinking that the "i" format could be used for signed integers and
> the "u" for unsigned integers. The width would be the number of bytes.
> You would also need to have a way of specifying the endianness.
>
> For example:
>
> >>> b'{:<2i}'.f
On Thu, 9 Jan 2014 10:15:08 +
Kristján Valur Jónsson wrote:
>
> Moving to python 3, I found that this quickly caused problems. So, I
> explicitly added an encoding. Better guess an encoding, something that is
> likely, e.g. cp1252
> with open(fn1, encoding='cp1252') as f1:
> with open
On Thu, Jan 09, 2014 at 05:11:06PM +1000, Nick Coghlan wrote:
> On 9 January 2014 10:07, Ben Finney wrote:
> > So, if what you want is to parse text and not get gibberish, you need to
> > *tell* Python what the encoding is. That's a brute fact of the world of
> > text in computing.
>
> Set the m
07.01.14 22:51, Ethan Furman написав(ла):
On 01/07/2014 12:39 PM, Serhiy Storchaka wrote:
* It clutters up hg log and hg blame results. Every time when you
change clinic.py to generate different output, it
touches multiple lines in all files which use Argument Clinic and
clutters up their histor
On 9 January 2014 10:15, Kristján Valur Jónsson wrote:
> Also, the problem I'm describing has to do with real world stuff.
> This is the python 2 program:
> with open(fn1) as f1:
> with open(fn2, 'w') as f2:
> f2.write(process_text(f1.read())
>
> Moving to python 3, I found that this q
Am 06.01.14 17:26, schrieb Michael Urman:
> Here's some more guesswork. Does it seem possible that msiexec is
> trying to verify the revocation status of the certificate used to sign
> the python .msi file? Per
> http://blogs.technet.com/b/pki/archive/2006/11/30/basic-crl-checking-with-certutil.asp
On 9 Jan 2014 11:29, "INADA Naoki" wrote:
>
>
>> And I think everyone was well intentioned - and python3 covers most of
the
>> bases, but working with binary data is not only a "wire-protocol
programmer's"
>> problem.
If you're working with binary data, use the binary API offered by bytes,
bytear
> -Original Message-
> From: Python-Dev [mailto:python-dev-
> bounces+kristjan=ccpgames@python.org] On Behalf Of Stefan Ring
> Sent: 9. janúar 2014 09:32
> To: python-dev@python.org
> Subject: Re: [Python-Dev] Python3 "complexity"
>
> > just became harder to use for that purpose.
>
>
Paul Moore writes:
> So I think that if this discussion is to be of any real benefit, a
> specific example is needed. I honestly don't think I've ever
> encountered a case where "Sometimes [I] just want to parse text
> files" and code that uses the default encoding (i.e., looks pretty
> much
Am 08.01.14 16:03, schrieb Nick Coghlan:
> On 9 January 2014 00:43, Bob Hanson wrote:
>> When I read this comment of yours, Guido, I immediately started
>> wondering about this. You may well be right -- indeed, I have a
>> very old install (c.2007) which has not been updated (other than
>> one or
> just became harder to use for that purpose.
The entire discussion reminds me very much of the situation with file
names in OS X. Whenever I want to look at an old zip file or tarball
which happens to have been lying around on my hard drive for a decade
or more, I can't because OS X insist that f
On 9 January 2014 09:01, Mark Shannon wrote:
> On 09/01/14 00:07, Ben Finney wrote:
>>
>> Kristján Valur Jónsson writes:
>>
>>> Believe it or not, sometimes you really don't care about encodings.
>>> Sometimes you just want to parse text files.
>>
>>
>> Files don't contain text, they contain byte
> -Original Message-
> From: Python-Dev [mailto:python-dev-
> bounces+kristjan=ccpgames@python.org] On Behalf Of Ben Finney
> Sent: 9. janúar 2014 00:50
> To: python-dev@python.org
> Subject: Re: [Python-Dev] Python3 "complexity"
>
> Kristján Valur Jónsson writes:
>
> > I didn't us
On 09/01/14 00:07, Ben Finney wrote:
Kristján Valur Jónsson writes:
Believe it or not, sometimes you really don't care about encodings.
Sometimes you just want to parse text files.
Files don't contain text, they contain bytes. Bytes only become text
when filtered through the correct encoding
On Thu, Jan 9, 2014 at 8:16 AM, Ben Finney wrote:
> Nick Coghlan writes:
>> Set the mode to "rb", process it as binary. Done.
>
> Which entails abandoning the stated goal of “just want to parse text
> files” :-)
Only if your definition of "text files" means it's unicode.
On Thu, Jan 9, 2014 at 5:50 PM, Lennart Regebro wrote:
> To be honest, you can define text as "A stream of bytes that are split
> up in lines separated by a linefeed", and do some basic text
> processing like that. Just very *basic*, but still. Replacing
> characters. Extracting certain lines etc.
64 matches
Mail list logo