[Python-Dev] Mailbox module - timings and functionality changes
I hope this is an appropriate dev topic. It seems to me that the unicode discussions of recent days are well highlighted by difficulties I am having using the mailbox module (hardly surprising given the difficulties of handling email generally) even though it passes its tests. I can't find anything related in the issue tracker (symptoms: one program that works fine under Python 2 in under twenty seconds takes forever (over ten minutes) to fail while creating the (start, stop) index to the mailbox). My code reads Thunderbird mailboxen from file store on my Windows Vista system under 3.1. The failures I am experiencing could easily be encoding issues so I won't post any detail yet, but I am concerned about the timing - even when the code is "fixed", if it needs to be, the performance may still make the module of dubious value. Can someone who is set up to do easily just do a timing of test_mailbox under 2.6 and 3.2, to verify they see the same disparity as me? The test takes about twice as long under 3.1 here (and I am concerned that unexercised aspects of the code may extend real-world problem run times by an order of magnitude or more). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
Hello Steve, > Can someone who is set up to do easily just do a timing of test_mailbox > under 2.6 and 3.2, to verify they see the same disparity as me? The test > takes about twice as long under 3.1 here On Ubuntu timing was: Python 2.6.5: 23.8sec Python 2.7rc2: 32.7sec Python 3.1.2: 32.3sec All the best, -- Miki ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On Tue, Jun 29, 2010 at 09:56:11AM -0400, Steve Holden wrote: > Can someone who is set up to do easily just do a timing of test_mailbox > under 2.6 and 3.2, to verify they see the same disparity as me? The test Actually, No. Python 2.7b2+ (trunk:81685M, Jun 4 2010, 21:52:06) Ran 274 tests in 27.231s OK real0m27.769s user0m1.110s sys 0m0.440s Python 3.2a0 (py3k:82364M, Jun 29 2010, 19:37:27 Ran 268 tests in 24.444s OK real0m25.126s user0m2.810s sys 0m0.270s 07:39 PM:senthil@:~/python/py3k This is under Ubuntu 64 Bit. Perhaps, the problem you are observing is Windows Only? -- Senthil Banectomy, n.: The removal of bruises on a banana. -- Rich Hall, "Sniglets" ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
Command line: ./python -m test.regrtest -v test_mailbox trunk: Ran 274 tests in 25.239s py3k: Ran 268 tests in 26.263s So I don't see any substantial difference on a Kubuntu 10.04 box (both builds are recent'ish, but not completely up to date). However, the underlying IO access is significantly different between POSIX and Windows, so there could still be something pathological happening at the filesystem manipulation layer. My comparisons are also 2.7 vs 3.2 rather than 2.6 vs 3.1. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
Nick Coghlan wrote: > Command line: ./python -m test.regrtest -v test_mailbox > > trunk: Ran 274 tests in 25.239s > py3k: Ran 268 tests in 26.263s > > So I don't see any substantial difference on a Kubuntu 10.04 box (both > builds are recent'ish, but not completely up to date). > > However, the underlying IO access is significantly different between > POSIX and Windows, so there could still be something pathological > happening at the filesystem manipulation layer. My comparisons are > also 2.7 vs 3.2 rather than 2.6 vs 3.1. > > Cheers, > Nick. > Thanks for all the timings! If a Windows user could do the same thing that would help ... regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
Steve Holden wrote: > Nick Coghlan wrote: >> Command line: ./python -m test.regrtest -v test_mailbox >> >> trunk: Ran 274 tests in 25.239s >> py3k: Ran 268 tests in 26.263s >> >> So I don't see any substantial difference on a Kubuntu 10.04 box (both >> builds are recent'ish, but not completely up to date). >> >> However, the underlying IO access is significantly different between >> POSIX and Windows, so there could still be something pathological >> happening at the filesystem manipulation layer. My comparisons are >> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >> >> Cheers, >> Nick. >> > Thanks for all the timings! If a Windows user could do the same thing > that would help ... > And there is *definitely a performance issue. I created a Thunderbird folder of 26 Google alerts and just parsed then all after reading them in from the mailbox. 2.5 (!): 0.78 sec 3.1: 42.80 sec Rather than debate the code here perhaps I should just open an issue for this? I can then provide both a program and some data, which can be added to the tests if appropriate. The issue can clearly stand some investigation. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?
On Jun 28, 2010, at 05:28 PM, M.-A. Lemburg wrote: >How many Python users will compile Python in debug mode ? How many Python users compile Python at all? :) >The point is that the default build of Python should use >the correct production settings for the C compiler out of >the box and that's what AC_PROG_CC is all about. Sure. >I'm pretty sure that Python developers who want to use a >debug build have enough code foo to get the -O2 turned into a -O0 >either by adjust OPT and/or by providing their own CFLAGS env var. Yes, but it's a PITA for several reasons, IMO: * It's pretty underdocumented * It's obscure * It's hard to remember the exact fu needed because you do it infrequently * I usually only remember my mistake when gdb acts funny I strongly suggest that --with-pydebug should be all you need to ensure the best debugging environment, which means turning off compiler optimization. Last time I tried, the -O0 was added and it worked well. (I know this has been in flux though.) >Also note that in some cases you may actually want to have >a debug build with optimizations turned on, e.g. to track down >a compiler optimization bug. Yes, but that's *much* more rare than wanting to step through some bit of C code without going crazy. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On 29/06/2010 15:26, Steve Holden wrote: Nick Coghlan wrote: Command line: ./python -m test.regrtest -v test_mailbox trunk: Ran 274 tests in 25.239s py3k: Ran 268 tests in 26.263s So I don't see any substantial difference on a Kubuntu 10.04 box (both builds are recent'ish, but not completely up to date). However, the underlying IO access is significantly different between POSIX and Windows, so there could still be something pathological happening at the filesystem manipulation layer. My comparisons are also 2.7 vs 3.2 rather than 2.6 vs 3.1. Cheers, Nick. Thanks for all the timings! If a Windows user could do the same thing that would help ... WinXP SP3 2.6 Ran 272 tests in 13.172s 3.1 Ran 267 tests in 15.735s py3k A *lot* of ERROR and FAIL tests WinXP SP3 TJG ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?
On Jun 28, 2010, at 06:03 PM, M.-A. Lemburg wrote: >OPT already uses -O0 if --with-pydebug is used and the >compiler supports -g. Since OPT gets added after CFLAGS, the override >already happens... So nobody's proposing to drop that? Good! Ignore my last message then. :) -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On Tue, Jun 29, 2010 at 7:49 AM, Steve Holden wrote: > Steve Holden wrote: >> Nick Coghlan wrote: >>> Command line: ./python -m test.regrtest -v test_mailbox >>> >>> trunk: Ran 274 tests in 25.239s >>> py3k: Ran 268 tests in 26.263s >>> >>> So I don't see any substantial difference on a Kubuntu 10.04 box (both >>> builds are recent'ish, but not completely up to date). >>> >>> However, the underlying IO access is significantly different between >>> POSIX and Windows, so there could still be something pathological >>> happening at the filesystem manipulation layer. My comparisons are >>> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >>> >>> Cheers, >>> Nick. >>> >> Thanks for all the timings! If a Windows user could do the same thing >> that would help ... >> > And there is *definitely a performance issue. I created a Thunderbird > folder of 26 Google alerts and just parsed then all after reading them > in from the mailbox. > > 2.5 (!): 0.78 sec > 3.1 : 42.80 sec > > Rather than debate the code here perhaps I should just open an issue for > this? I can then provide both a program and some data, which can be > added to the tests if appropriate. The issue can clearly stand some > investigation. Since you have such a great reproducible test case, could you point the profiler at it? (Perhaps on a reduced dataset... The profiler multiples your run time by some number between 2 and 10 IIRC.) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On 29/06/2010 15:51, Tim Golden wrote: On 29/06/2010 15:26, Steve Holden wrote: Nick Coghlan wrote: Command line: ./python -m test.regrtest -v test_mailbox trunk: Ran 274 tests in 25.239s py3k: Ran 268 tests in 26.263s So I don't see any substantial difference on a Kubuntu 10.04 box (both builds are recent'ish, but not completely up to date). However, the underlying IO access is significantly different between POSIX and Windows, so there could still be something pathological happening at the filesystem manipulation layer. My comparisons are also 2.7 vs 3.2 rather than 2.6 vs 3.1. Cheers, Nick. Thanks for all the timings! If a Windows user could do the same thing that would help ... WinXP SP3 2.6 Ran 272 tests in 13.172s 3.1 Ran 267 tests in 15.735s py3k A *lot* of ERROR and FAIL tests py3k HEAD on Win7 Ran 268 tests in 34.055s TJG ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pickle security and remote logging
anatoly techtonik gmail.com> writes: > insecure. SocketHandler and DatagramHandler docs should at least > contain a warning about danger of exposing unpickling interfaces to > insecure networks. I've updated the documentation of SocketHandler.makePickle to mention security concerns, and that the method can be overridden to use a more secure implementation (e.g. HMAC-signed pickles). Regards, Vinay Sajip ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?
Barry Warsaw wrote: > On Jun 28, 2010, at 05:28 PM, M.-A. Lemburg wrote: > >> How many Python users will compile Python in debug mode ? > > How many Python users compile Python at all? :) > >> The point is that the default build of Python should use >> the correct production settings for the C compiler out of >> the box and that's what AC_PROG_CC is all about. > > Sure. > >> I'm pretty sure that Python developers who want to use a >> debug build have enough code foo to get the -O2 turned into a -O0 >> either by adjust OPT and/or by providing their own CFLAGS env var. > > Yes, but it's a PITA for several reasons, IMO: > > * It's pretty underdocumented > * It's obscure > * It's hard to remember the exact fu needed because you do it infrequently > * I usually only remember my mistake when gdb acts funny > > I strongly suggest that --with-pydebug should be all you need to ensure the > best debugging environment, which means turning off compiler optimization. > Last time I tried, the -O0 was added and it worked well. (I know this has > been in flux though.) > >> Also note that in some cases you may actually want to have >> a debug build with optimizations turned on, e.g. to track down >> a compiler optimization bug. > > Yes, but that's *much* more rare than wanting to step through some bit of C > code without going crazy. I agree - trying to step through -O2 optimized code isn't going to help debug your code, it's going to help you debug the optimizer. That's a very rare use case. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On Tue, 29 Jun 2010 11:40:50 -0400 Steve Holden wrote: > Sure. I attach the outputs of both files, as well as the program and the > data. With profiling (python -m cProfile test3.py) the run took less > than a third of a second under 2.5, and 168 seconds under 3.1. I'd say > that was problematical :) > > I will leave the profiler output to speak for itself, since I can find > nothing much to say about it except that there's a hell of a lot of > decoding going on inside mailbox.iterkeys(). Ok, a lot of time is spent in cp1252 decoding. Somewhat less time, but still too much of it, is spent in TextIOWrapper.tell(). This seems to imply that mailbox files are opened in text mode, which sounds wrong to me. Perhaps Andrew can shed more light on this? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On Tue, Jun 29, 2010 at 07:56:22AM -0700, Guido van Rossum wrote: > Since you have such a great reproducible test case, could you point > the profiler at it? (Perhaps on a reduced dataset... The profiler > multiples your run time by some number between 2 and 10 IIRC.) Let me underline Guido's suggestion. Steve, I've done a lot of mailbox.py stuff and can look at your problem, but off the top of my head, my suspicion would be that I/O is the culprit, and a profile could confirm that. My thought is that mailbox.py is opening the file in some reading mode that ends up doing a lot more processing on Windows than on Unix because of universal newlines or something like that. --amk ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote:
> I will leave the profiler output to speak for itself, since I can find
> nothing much to say about it except that there's a hell of a lot of
> decoding going on inside mailbox.iterkeys().
The problem is actually in _generate_toc(), which is reading through
the entire file to figure out where all the 'From' lines that start
messages are located. TextIOWrapper()'s tell() method seems to be
very slow, so one help is to only call tell() when necessary; patch:
-> svn diff Lib/
Index: Lib/mailbox.py
===
--- Lib/mailbox.py (revision 82346)
+++ Lib/mailbox.py (working copy)
@@ -775,13 +775,14 @@
starts, stops = [], []
self._file.seek(0)
while True:
-line_pos = self._file.tell()
line = self._file.readline()
if line.startswith('From '):
+line_pos = self._file.tell()
if len(stops) < len(starts):
stops.append(line_pos - len(os.linesep))
starts.append(line_pos)
elif not line:
+line_pos = self._file.tell()
stops.append(line_pos)
break
self._toc = dict(enumerate(zip(starts, stops)))
But should mailboxes really be opened in a UTF-8 encoding, or should
they be treated as 7-bit text? I'll have to think about this.
--amk
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On Tue, 29 Jun 2010 18:34:22 +0200, Antoine Pitrou wrote: > On Tue, 29 Jun 2010 11:40:50 -0400 > Steve Holden wrote: > > Sure. I attach the outputs of both files, as well as the program and the > > data. With profiling (python -m cProfile test3.py) the run took less > > than a third of a second under 2.5, and 168 seconds under 3.1. I'd say > > that was problematical :) > > > > I will leave the profiler output to speak for itself, since I can find > > nothing much to say about it except that there's a hell of a lot of > > decoding going on inside mailbox.iterkeys(). > > Ok, a lot of time is spent in cp1252 decoding. Somewhat less time, but > still too much of it, is spent in TextIOWrapper.tell(). This seems to > imply that mailbox files are opened in text mode, which sounds wrong to > me. Perhaps Andrew can shed more light on this? Given the current state of the email package for python3, it makes sense that it would open them in text mode. email can't currently process bytes, only text. -- R. David Murray www.bitdance.com ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On Tue, 29 Jun 2010 12:52:28 -0400 "A.M. Kuchling" wrote: > > But should mailboxes really be opened in a UTF-8 encoding, or should > they be treated as 7-bit text? I'll have to think about this. I don't see how you can assume UTF-8 for mailbox files, given that each message will have its particular encoding. Besides, Steve's profile results show that you are not using UTF-8, but rather the local encoding, which is cp1252 under his Windows setup. Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
A.M. Kuchling wrote:
> On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote:
>> I will leave the profiler output to speak for itself, since I can find
>> nothing much to say about it except that there's a hell of a lot of
>> decoding going on inside mailbox.iterkeys().
>
> The problem is actually in _generate_toc(), which is reading through
> the entire file to figure out where all the 'From' lines that start
> messages are located. TextIOWrapper()'s tell() method seems to be
> very slow, so one help is to only call tell() when necessary; patch:
>
> -> svn diff Lib/
> Index: Lib/mailbox.py
> ===
> --- Lib/mailbox.py(revision 82346)
> +++ Lib/mailbox.py(working copy)
> @@ -775,13 +775,14 @@
> starts, stops = [], []
> self._file.seek(0)
> while True:
> -line_pos = self._file.tell()
> line = self._file.readline()
> if line.startswith('From '):
> +line_pos = self._file.tell()
> if len(stops) < len(starts):
> stops.append(line_pos - len(os.linesep))
> starts.append(line_pos)
> elif not line:
> +line_pos = self._file.tell()
> stops.append(line_pos)
> break
> self._toc = dict(enumerate(zip(starts, stops)))
>
> But should mailboxes really be opened in a UTF-8 encoding, or should
> they be treated as 7-bit text? I'll have to think about this.
Neither! You can't open them as 7-bit text, because real-world email
does contain bytes whose ordinal value exceeds 127. You can't open them
using a text encoding because theoretically there might be ASCII headers
that indicate that parts of the content are in specific character sets
or encodings.
If only we had a data structure that easily allowed us to manipulate
8-bit characters ...
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
See Python Video! http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
"All I want for my birthday is another birthday" -
Ian Dury, 1942-2000
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
It should probably be opened in binary mode. Binary files do have a
.readline() method (returning a bytes object), and bytes objects have
a .startswith() method. The tell positions computed this way are even
compatible with those used by the text file. So you could do it this
way:
- open binary stream
- compute TOC by reading through it using .readline() and .tell()
- rewind (don't close)
- wrap the binary stream in a text stream
- use that for the rest of the code
--Guido
On Tue, Jun 29, 2010 at 10:54 AM, Steve Holden wrote:
> A.M. Kuchling wrote:
>> On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote:
>>> I will leave the profiler output to speak for itself, since I can find
>>> nothing much to say about it except that there's a hell of a lot of
>>> decoding going on inside mailbox.iterkeys().
>>
>> The problem is actually in _generate_toc(), which is reading through
>> the entire file to figure out where all the 'From' lines that start
>> messages are located. TextIOWrapper()'s tell() method seems to be
>> very slow, so one help is to only call tell() when necessary; patch:
>>
>> -> svn diff Lib/
>> Index: Lib/mailbox.py
>> ===
>> --- Lib/mailbox.py (revision 82346)
>> +++ Lib/mailbox.py (working copy)
>> @@ -775,13 +775,14 @@
>> starts, stops = [], []
>> self._file.seek(0)
>> while True:
>> - line_pos = self._file.tell()
>> line = self._file.readline()
>> if line.startswith('From '):
>> + line_pos = self._file.tell()
>> if len(stops) < len(starts):
>> stops.append(line_pos - len(os.linesep))
>> starts.append(line_pos)
>> elif not line:
>> + line_pos = self._file.tell()
>> stops.append(line_pos)
>> break
>> self._toc = dict(enumerate(zip(starts, stops)))
>>
>> But should mailboxes really be opened in a UTF-8 encoding, or should
>> they be treated as 7-bit text? I'll have to think about this.
>
> Neither! You can't open them as 7-bit text, because real-world email
> does contain bytes whose ordinal value exceeds 127. You can't open them
> using a text encoding because theoretically there might be ASCII headers
> that indicate that parts of the content are in specific character sets
> or encodings.
>
> If only we had a data structure that easily allowed us to manipulate
> 8-bit characters ...
>
> regards
> Steve
> --
> Steve Holden +1 571 484 6266 +1 800 494 3119
> See Python Video! http://python.mirocommunity.org/
> Holden Web LLC http://www.holdenweb.com/
> UPCOMING EVENTS: http://holdenweb.eventbrite.com/
> "All I want for my birthday is another birthday" -
> Ian Dury, 1942-2000
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>
--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
Guido van Rossum wrote:
> It should probably be opened in binary mode. Binary files do have a
> .readline() method (returning a bytes object), and bytes objects have
> a .startswith() method. The tell positions computed this way are even
> compatible with those used by the text file. So you could do it this
> way:
>
> - open binary stream
> - compute TOC by reading through it using .readline() and .tell()
> - rewind (don't close)
Because closing is inefficient, or because it breaks the algorithm?
> - wrap the binary stream in a text stream
"wrap" how? The ultimate destiny of the text is twofold:
1) To be stored as some kind of LOB in a database, and
2) Therefrom to be reconstituted and parsed into email.Message objects.
Is the wrapping a one-off operation or a software layer? Sorry, being a
bit dense here, I know.
regards
Steve
> - use that for the rest of the code
>
> --Guido
>
> On Tue, Jun 29, 2010 at 10:54 AM, Steve Holden wrote:
>> A.M. Kuchling wrote:
>>> On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote:
I will leave the profiler output to speak for itself, since I can find
nothing much to say about it except that there's a hell of a lot of
decoding going on inside mailbox.iterkeys().
>>> The problem is actually in _generate_toc(), which is reading through
>>> the entire file to figure out where all the 'From' lines that start
>>> messages are located. TextIOWrapper()'s tell() method seems to be
>>> very slow, so one help is to only call tell() when necessary; patch:
>>>
>>> -> svn diff Lib/
>>> Index: Lib/mailbox.py
>>> ===
>>> --- Lib/mailbox.py(revision 82346)
>>> +++ Lib/mailbox.py(working copy)
>>> @@ -775,13 +775,14 @@
>>> starts, stops = [], []
>>> self._file.seek(0)
>>> while True:
>>> -line_pos = self._file.tell()
>>> line = self._file.readline()
>>> if line.startswith('From '):
>>> +line_pos = self._file.tell()
>>> if len(stops) < len(starts):
>>> stops.append(line_pos - len(os.linesep))
>>> starts.append(line_pos)
>>> elif not line:
>>> +line_pos = self._file.tell()
>>> stops.append(line_pos)
>>> break
>>> self._toc = dict(enumerate(zip(starts, stops)))
>>>
>>> But should mailboxes really be opened in a UTF-8 encoding, or should
>>> they be treated as 7-bit text? I'll have to think about this.
>> Neither! You can't open them as 7-bit text, because real-world email
>> does contain bytes whose ordinal value exceeds 127. You can't open them
>> using a text encoding because theoretically there might be ASCII headers
>> that indicate that parts of the content are in specific character sets
>> or encodings.
>>
>> If only we had a data structure that easily allowed us to manipulate
>> 8-bit characters ...
>>
>> regards
>> Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
See Python Video! http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
"All I want for my birthday is another birthday" -
Ian Dury, 1942-2000
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pickle security and remote logging
On Tue, Jun 29, 2010 at 6:15 PM, Vinay Sajip wrote: > > I've updated the documentation of SocketHandler.makePickle to mention security > concerns, and that the method can be overridden to use a more secure > implementation (e.g. HMAC-signed pickles). Thanks. But I doubt HMAC complication helps to protect logging server. If shared key is compromised -server becomes vulnerable. I would prefer approach when no code execution is possible. Some alternative serialization way for transmitting log data structures over network. Protocol buffers first come in mind, but they seem to be an overkill, and stdlib doesn't include any implementation. -- anatoly t. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pickle security and remote logging
On Tue, Jun 29, 2010 at 4:22 PM, anatoly techtonik wrote: > On Tue, Jun 29, 2010 at 6:15 PM, Vinay Sajip wrote: >> >> I've updated the documentation of SocketHandler.makePickle to mention >> security >> concerns, and that the method can be overridden to use a more secure >> implementation (e.g. HMAC-signed pickles). > > Thanks. But I doubt HMAC complication helps to protect logging server. > If shared key is compromised -server becomes vulnerable. I would > prefer approach when no code execution is possible. Some alternative > serialization way for transmitting log data structures over network. > Protocol buffers first come in mind, but they seem to be an overkill, > and stdlib doesn't include any implementation. You could use marshal by default. It does not execute code when unmarshalling. A limitation is that it only supports built-in types like list, dict, string etc. but that might be just fine for logging data. Another option would be JSON. (Or XML, if you want bulky. :-) As for protocol buffers, assuming its absence (so far :-) from the stdlib is the only objection, how hard would it be to make the logging package "prepared" so that if one *did* have protocol buffers installed, it would be a one-line config setting to use them? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden wrote: > A.M. Kuchling wrote: > > But should mailboxes really be opened in a UTF-8 encoding, or should > > they be treated as 7-bit text? I'll have to think about this. > > Neither! You can't open them as 7-bit text, because real-world email > does contain bytes whose ordinal value exceeds 127. You can't open them > using a text encoding because theoretically there might be ASCII headers > that indicate that parts of the content are in specific character sets > or encodings. > > If only we had a data structure that easily allowed us to manipulate > 8-bit characters ... email6 *will* handle this use case. When it exists :) But note that it is *not* just a matter of easily handling 8 bit characters. There are a whole bunch of algorithms needed for interpreting that 7 and 8 bit data. All the info is there in the email headers, but being able to do string operations on 8 bit byte strings doesn't get you the answers you need by itself. It really is the case that the Python3 bytes/unicode split forces us to redo most of the algorithms so that they handle bytes and text *correctly*. This isn't a trivial undertaking, but the end result will be well worth it. -- R. David Murray www.bitdance.com ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
On Tue, 29 Jun 2010 17:02:14 -0400, Steve Holden wrote: > Guido van Rossum wrote: > > > - wrap the binary stream in a text stream > > "wrap" how? The ultimate destiny of the text is twofold: I would imagine Guido is talking about an io.TextIOWrapper...in other words, take the binary file you've just finished grabbing info from, and reread it as a text file in order to grab the actual message content. If you have messages in your files that are using an 8bit content transfer encoding, then you (currently) will have some problems unless the charset happens to be the one you use when you wrap the binary stream as a text stream. -- R. David Murray www.bitdance.com ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mailbox module - timings and functionality changes
R. David Murray wrote: > On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden wrote: >> A.M. Kuchling wrote: >>> But should mailboxes really be opened in a UTF-8 encoding, or should >>> they be treated as 7-bit text? I'll have to think about this. >> Neither! You can't open them as 7-bit text, because real-world email >> does contain bytes whose ordinal value exceeds 127. You can't open them >> using a text encoding because theoretically there might be ASCII headers >> that indicate that parts of the content are in specific character sets >> or encodings. >> >> If only we had a data structure that easily allowed us to manipulate >> 8-bit characters ... > > email6 *will* handle this use case. When it exists :) But note that it > is *not* just a matter of easily handling 8 bit characters. There are > a whole bunch of algorithms needed for interpreting that 7 and 8 bit data. > All the info is there in the email headers, but being able to do string > operations on 8 bit byte strings doesn't get you the answers you need > by itself. > > It really is the case that the Python3 bytes/unicode split forces us > to redo most of the algorithms so that they handle bytes and text > *correctly*. This isn't a trivial undertaking, but the end result > will be well worth it. > I completely agree. The unusual thing here is that I of all people should find himself running into these issues, since my use of Python is normally pretty conservative. Since the course I am currently writing is already overdue I have to find answers now to problems that were present in the initial 3.0 release and have not received much attention since. You know that I support your work to revise the email package. I hope that we can eventually have it incorporate mailbox readers as well. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] OS X buildbots: why am I skipping these tests?
My Leopard and Tiger PPC buildbots are momentarily green! But I'm
looking into why I'm skipping some tests. My buildbots are up-to-date
OS-wise and very vanilla, with the latest applicable Xcode.
4 skips unexpected on darwin:
test_gdb test_ioctl test_readline test_ttk_guionly
Three of these (gdb, readline, ttk_guionly) are just bad predictions of
which tests should skip on Darwin, I think -- gdb is only version 6, so
that test won't run, readline doesn't get built, ttk doesn't work
without Tcl/Tk 8.5. But the the skip of test_ioctl baffles me.
"test_ioctl skipped -- Unable to open /dev/tty"
But when I log in via ssh and try it with the system python:
~ wjanssen$ python
python
Python 2.5.1 (r251:54863, Jun 17 2009, 20:37:34)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> open("/dev/tty")
open("/dev/tty")
>>>
Seems to work fine. So this I don't understand. Any ideas, anyone?
Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?
Steve Holden writes: > I agree - trying to step through -O2 optimized code isn't going to > help debug your code, it's going to help you debug the > optimizer. That's a very rare use case. Not really. I don't have a lot of practice in debugging at that level, so take it with a grain of salt, but what I've found with XEmacs code is that debugging at -O0 is less often helpful than debugging at -O2. Quite often a naive compilation strategy is used which basically turns those C statements into macros for the underlying assembler, and the code works the way the author thinks it should. But his assumptions are invalid, and when optimized it fails. So I guess you can call that "debugging the optimizer" if you like ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] OS X buildbots: why am I skipping these tests?
On Tue, Jun 29, 2010 at 7:55 PM, Bill Janssen wrote:
> My Leopard and Tiger PPC buildbots are momentarily green! But I'm
> looking into why I'm skipping some tests. My buildbots are up-to-date
> OS-wise and very vanilla, with the latest applicable Xcode.
>
> 4 skips unexpected on darwin:
> test_gdb test_ioctl test_readline test_ttk_guionly
>
> Three of these (gdb, readline, ttk_guionly) are just bad predictions of
> which tests should skip on Darwin, I think -- gdb is only version 6, so
> that test won't run, readline doesn't get built, ttk doesn't work
> without Tcl/Tk 8.5.
So it looks like you gould get readline and ttk to run and pass by
separately downloading and installing readline (I've done this many
times before) and Tcl/Tk (no idea but I suppose it should work).
> But the the skip of test_ioctl baffles me.
>
> "test_ioctl skipped -- Unable to open /dev/tty"
>
> But when I log in via ssh and try it with the system python:
>
> ~ wjanssen$ python
> python
> Python 2.5.1 (r251:54863, Jun 17 2009, 20:37:34)
> [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
open("/dev/tty")
> open("/dev/tty")
>
>
> Seems to work fine. So this I don't understand. Any ideas, anyone?
Maybe the buildbot runs the tests as a tty-less daemon process. If you
ask me it's pretty crazy to have a test that requires a tty. But there
you have it -- and it's the same in Python 3. (But then again, who
knows, I might have written that test. ;-)
--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] OS X buildbots: why am I skipping these tests?
> Seems to work fine. So this I don't understand. Any ideas, anyone? Didn't we discuss this before? The buildbot slave has no controlling terminal anymore, hence it cannot open /dev/tty. If you are curious, just patch your checkout to output the exact errno (e.g. to stdout), and trigger a build through the web. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Taking over the Mercurial Migration
It seems that both Dirkjan and Brett are very caught up with real life for the coming months. So I suggest that some other committer who favors the Mercurial transition steps forward and takes over this project. If nobody volunteers, I propose that we release 3.2 from Subversion, and reconsider Mercurial migration next year. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Taking over the Mercurial Migration
"Martin v. Löwis" writes: > It seems that both Dirkjan and Brett are very caught up > with real life for the coming months. So I suggest that > some other committer who favors the Mercurial transition > steps forward and takes over this project. I am not a committer, and am not intimately familiar with PEP 385, so not appropriate to become the proponent, I think. However, I am one of the PEP 374 co-authors, and have experience with previous transition to Mercurial of similar scale (XEmacs). I can promise to devote time to the transition in July and August, in support of whoever might step forward. I hope someone does. > If nobody volunteers, I propose that we release 3.2 > from Subversion, and reconsider Mercurial migration > next year. In the absence of a volunteer, I think that's probably necessary. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
