[Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Steve Holden
I hope this is an appropriate dev topic.

It seems to me that the unicode discussions of recent days are well
highlighted by difficulties I am having using the mailbox module (hardly
surprising given the difficulties of handling email generally) even
though it passes its tests.

I can't find anything related in the issue tracker (symptoms: one
program that works fine under Python 2 in under twenty seconds takes
forever (over ten minutes) to fail while creating the (start, stop)
index to the mailbox). My code reads Thunderbird mailboxen from file
store on my Windows Vista system under 3.1.

The failures I am experiencing could easily be encoding issues so I
won't post any detail yet, but I am concerned about the timing - even
when the code is fixed, if it needs to be, the performance may still
make the module of dubious value.

Can someone who is set up to do easily just do a timing of test_mailbox
under 2.6 and 3.2, to verify they see the same disparity as me? The test
takes about twice as long under 3.1 here (and I am concerned that
unexercised aspects of the code may extend real-world problem run times
by an order of magnitude or more).

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
All I want for my birthday is another birthday -
 Ian Dury, 1942-2000

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Miki Tebeka
Hello Steve,

 Can someone who is set up to do easily just do a timing of test_mailbox
 under 2.6 and 3.2, to verify they see the same disparity as me? The test
 takes about twice as long under 3.1 here
On Ubuntu timing was:

Python 2.6.5:  23.8sec
Python 2.7rc2: 32.7sec
Python 3.1.2:  32.3sec

All the best,
--
Miki
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Senthil Kumaran
On Tue, Jun 29, 2010 at 09:56:11AM -0400, Steve Holden wrote:
 Can someone who is set up to do easily just do a timing of test_mailbox
 under 2.6 and 3.2, to verify they see the same disparity as me? The test

Actually, No.

Python 2.7b2+ (trunk:81685M, Jun  4 2010, 21:52:06) 
Ran 274 tests in 27.231s

OK

real0m27.769s
user0m1.110s
sys 0m0.440s

Python 3.2a0 (py3k:82364M, Jun 29 2010, 19:37:27

Ran 268 tests in 24.444s

OK

real0m25.126s
user0m2.810s
sys 0m0.270s
07:39 PM:senthil@:~/python/py3k

This is under Ubuntu 64 Bit.
Perhaps, the problem you are observing is Windows Only?

-- 
Senthil

Banectomy, n.:
The removal of bruises on a banana.
-- Rich Hall, Sniglets
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Nick Coghlan
Command line: ./python -m test.regrtest -v test_mailbox

trunk: Ran 274 tests in 25.239s
py3k: Ran 268 tests in 26.263s

So I don't see any substantial difference on a Kubuntu 10.04 box (both
builds are recent'ish, but not completely up to date).

However, the underlying IO access is significantly different between
POSIX and Windows, so there could still be something pathological
happening at the filesystem manipulation layer. My comparisons are
also 2.7 vs 3.2 rather than 2.6 vs 3.1.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Steve Holden
Nick Coghlan wrote:
 Command line: ./python -m test.regrtest -v test_mailbox
 
 trunk: Ran 274 tests in 25.239s
 py3k: Ran 268 tests in 26.263s
 
 So I don't see any substantial difference on a Kubuntu 10.04 box (both
 builds are recent'ish, but not completely up to date).
 
 However, the underlying IO access is significantly different between
 POSIX and Windows, so there could still be something pathological
 happening at the filesystem manipulation layer. My comparisons are
 also 2.7 vs 3.2 rather than 2.6 vs 3.1.
 
 Cheers,
 Nick.
 
Thanks for all the timings! If a Windows user could do the same thing
that would help ...

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
All I want for my birthday is another birthday -
 Ian Dury, 1942-2000
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?

2010-06-29 Thread Barry Warsaw
On Jun 28, 2010, at 05:28 PM, M.-A. Lemburg wrote:

How many Python users will compile Python in debug mode ?

How many Python users compile Python at all? :)

The point is that the default build of Python should use
the correct production settings for the C compiler out of
the box and that's what AC_PROG_CC is all about.

Sure.

I'm pretty sure that Python developers who want to use a
debug build have enough code foo to get the -O2 turned into a -O0
either by adjust OPT and/or by providing their own CFLAGS env var.

Yes, but it's a PITA for several reasons, IMO:

* It's pretty underdocumented
* It's obscure
* It's hard to remember the exact fu needed because you do it infrequently
* I usually only remember my mistake when gdb acts funny

I strongly suggest that --with-pydebug should be all you need to ensure the
best debugging environment, which means turning off compiler optimization.
Last time I tried, the -O0 was added and it worked well.  (I know this has
been in flux though.)

Also note that in some cases you may actually want to have
a debug build with optimizations turned on, e.g. to track down
a compiler optimization bug.

Yes, but that's *much* more rare than wanting to step through some bit of C
code without going crazy.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Tim Golden

On 29/06/2010 15:26, Steve Holden wrote:

Nick Coghlan wrote:

Command line: ./python -m test.regrtest -v test_mailbox

trunk: Ran 274 tests in 25.239s
py3k: Ran 268 tests in 26.263s

So I don't see any substantial difference on a Kubuntu 10.04 box (both
builds are recent'ish, but not completely up to date).

However, the underlying IO access is significantly different between
POSIX and Windows, so there could still be something pathological
happening at the filesystem manipulation layer. My comparisons are
also 2.7 vs 3.2 rather than 2.6 vs 3.1.

Cheers,
Nick.


Thanks for all the timings! If a Windows user could do the same thing
that would help ...


WinXP SP3

2.6 Ran 272 tests in 13.172s
3.1 Ran 267 tests in 15.735s
py3k A *lot* of ERROR and FAIL tests

WinXP SP3

TJG
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?

2010-06-29 Thread Barry Warsaw
On Jun 28, 2010, at 06:03 PM, M.-A. Lemburg wrote:

OPT already uses -O0 if --with-pydebug is used and the
compiler supports -g. Since OPT gets added after CFLAGS, the override
already happens...

So nobody's proposing to drop that?  Good!  Ignore my last message then. :)

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Tim Golden

On 29/06/2010 15:51, Tim Golden wrote:

On 29/06/2010 15:26, Steve Holden wrote:

Nick Coghlan wrote:

Command line: ./python -m test.regrtest -v test_mailbox

trunk: Ran 274 tests in 25.239s
py3k: Ran 268 tests in 26.263s

So I don't see any substantial difference on a Kubuntu 10.04 box (both
builds are recent'ish, but not completely up to date).

However, the underlying IO access is significantly different between
POSIX and Windows, so there could still be something pathological
happening at the filesystem manipulation layer. My comparisons are
also 2.7 vs 3.2 rather than 2.6 vs 3.1.

Cheers,
Nick.


Thanks for all the timings! If a Windows user could do the same thing
that would help ...


WinXP SP3

2.6 Ran 272 tests in 13.172s
3.1 Ran 267 tests in 15.735s
py3k A *lot* of ERROR and FAIL tests


py3k HEAD on Win7 Ran 268 tests in 34.055s

TJG
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pickle security and remote logging

2010-06-29 Thread Vinay Sajip
anatoly techtonik techtonik at gmail.com writes:

 insecure. SocketHandler and DatagramHandler docs should at least
 contain a warning about danger of exposing unpickling interfaces to
 insecure networks.

I've updated the documentation of SocketHandler.makePickle to mention security
concerns, and that the method can be overridden to use a more secure
implementation (e.g. HMAC-signed pickles).

Regards,

Vinay Sajip

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?

2010-06-29 Thread Steve Holden
Barry Warsaw wrote:
 On Jun 28, 2010, at 05:28 PM, M.-A. Lemburg wrote:
 
 How many Python users will compile Python in debug mode ?
 
 How many Python users compile Python at all? :)
 
 The point is that the default build of Python should use
 the correct production settings for the C compiler out of
 the box and that's what AC_PROG_CC is all about.
 
 Sure.
 
 I'm pretty sure that Python developers who want to use a
 debug build have enough code foo to get the -O2 turned into a -O0
 either by adjust OPT and/or by providing their own CFLAGS env var.
 
 Yes, but it's a PITA for several reasons, IMO:
 
 * It's pretty underdocumented
 * It's obscure
 * It's hard to remember the exact fu needed because you do it infrequently
 * I usually only remember my mistake when gdb acts funny
 
 I strongly suggest that --with-pydebug should be all you need to ensure the
 best debugging environment, which means turning off compiler optimization.
 Last time I tried, the -O0 was added and it worked well.  (I know this has
 been in flux though.)
 
 Also note that in some cases you may actually want to have
 a debug build with optimizations turned on, e.g. to track down
 a compiler optimization bug.
 
 Yes, but that's *much* more rare than wanting to step through some bit of C
 code without going crazy.

I agree - trying to step through -O2 optimized code isn't going to help
debug your code, it's going to help you debug the optimizer. That's a
very rare use case.

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
All I want for my birthday is another birthday -
 Ian Dury, 1942-2000

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Antoine Pitrou
On Tue, 29 Jun 2010 11:40:50 -0400
Steve Holden st...@holdenweb.com wrote:
 Sure. I attach the outputs of both files, as well as the program and the
 data. With profiling (python -m cProfile test3.py) the run took less
 than a third of a second under 2.5, and 168 seconds under 3.1. I'd say
 that was problematical :)
 
 I will leave the profiler output to speak for itself, since I can find
 nothing much to say about it except that there's a hell of a lot of
 decoding going on inside mailbox.iterkeys().

Ok, a lot of time is spent in cp1252 decoding. Somewhat less time, but
still too much of it, is spent in TextIOWrapper.tell(). This seems to
imply that mailbox files are opened in text mode, which sounds wrong to
me. Perhaps Andrew can shed more light on this?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread A.M. Kuchling
On Tue, Jun 29, 2010 at 07:56:22AM -0700, Guido van Rossum wrote:
 Since you have such a great reproducible test case, could you point
 the profiler at it? (Perhaps on a reduced dataset... The profiler
 multiples your run time by some number between 2 and 10 IIRC.)

Let me underline Guido's suggestion.  Steve, I've done a lot of
mailbox.py stuff and can look at your problem, but off the top of my
head, my suspicion would be that I/O is the culprit, and a profile
could confirm that.  My thought is that mailbox.py is opening the file
in some reading mode that ends up doing a lot more processing on
Windows than on Unix because of universal newlines or something like
that.

--amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread A.M. Kuchling
On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote:
 I will leave the profiler output to speak for itself, since I can find
 nothing much to say about it except that there's a hell of a lot of
 decoding going on inside mailbox.iterkeys().

The problem is actually in _generate_toc(), which is reading through
the entire file to figure out where all the 'From' lines that start
messages are located.  TextIOWrapper()'s tell() method seems to be
very slow, so one help is to only call tell() when necessary; patch:

- svn diff Lib/
Index: Lib/mailbox.py
===
--- Lib/mailbox.py  (revision 82346)
+++ Lib/mailbox.py  (working copy)
@@ -775,13 +775,14 @@
 starts, stops = [], []
 self._file.seek(0)
 while True:
-line_pos = self._file.tell()
 line = self._file.readline()
 if line.startswith('From '):
+line_pos = self._file.tell()
 if len(stops)  len(starts):
 stops.append(line_pos - len(os.linesep))
 starts.append(line_pos)
 elif not line:
+line_pos = self._file.tell()
 stops.append(line_pos)
 break
 self._toc = dict(enumerate(zip(starts, stops)))

But should mailboxes really be opened in a UTF-8 encoding, or should
they be treated as 7-bit text?  I'll have to think about this.

--amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread R. David Murray
On Tue, 29 Jun 2010 18:34:22 +0200, Antoine Pitrou solip...@pitrou.net wrote:
 On Tue, 29 Jun 2010 11:40:50 -0400
 Steve Holden st...@holdenweb.com wrote:
  Sure. I attach the outputs of both files, as well as the program and the
  data. With profiling (python -m cProfile test3.py) the run took less
  than a third of a second under 2.5, and 168 seconds under 3.1. I'd say
  that was problematical :)
  
  I will leave the profiler output to speak for itself, since I can find
  nothing much to say about it except that there's a hell of a lot of
  decoding going on inside mailbox.iterkeys().
 
 Ok, a lot of time is spent in cp1252 decoding. Somewhat less time, but
 still too much of it, is spent in TextIOWrapper.tell(). This seems to
 imply that mailbox files are opened in text mode, which sounds wrong to
 me. Perhaps Andrew can shed more light on this?

Given the current state of the email package for python3, it makes
sense that it would open them in text mode.  email can't currently
process bytes, only text.

--
R. David Murray  www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Antoine Pitrou
On Tue, 29 Jun 2010 12:52:28 -0400
A.M. Kuchling a...@amk.ca wrote:
 
 But should mailboxes really be opened in a UTF-8 encoding, or should
 they be treated as 7-bit text?  I'll have to think about this.

I don't see how you can assume UTF-8 for mailbox files, given that each
message will have its particular encoding.
Besides, Steve's profile results show that you are not using UTF-8, but
rather the local encoding, which is cp1252 under his Windows setup.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Steve Holden
A.M. Kuchling wrote:
 On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote:
 I will leave the profiler output to speak for itself, since I can find
 nothing much to say about it except that there's a hell of a lot of
 decoding going on inside mailbox.iterkeys().
 
 The problem is actually in _generate_toc(), which is reading through
 the entire file to figure out where all the 'From' lines that start
 messages are located.  TextIOWrapper()'s tell() method seems to be
 very slow, so one help is to only call tell() when necessary; patch:
 
 - svn diff Lib/
 Index: Lib/mailbox.py
 ===
 --- Lib/mailbox.py(revision 82346)
 +++ Lib/mailbox.py(working copy)
 @@ -775,13 +775,14 @@
  starts, stops = [], []
  self._file.seek(0)
  while True:
 -line_pos = self._file.tell()
  line = self._file.readline()
  if line.startswith('From '):
 +line_pos = self._file.tell()
  if len(stops)  len(starts):
  stops.append(line_pos - len(os.linesep))
  starts.append(line_pos)
  elif not line:
 +line_pos = self._file.tell()
  stops.append(line_pos)
  break
  self._toc = dict(enumerate(zip(starts, stops)))
 
 But should mailboxes really be opened in a UTF-8 encoding, or should
 they be treated as 7-bit text?  I'll have to think about this.

Neither! You can't open them as 7-bit text, because real-world email
does contain bytes whose ordinal value exceeds 127. You can't open them
using a text encoding because theoretically there might be ASCII headers
that indicate that parts of the content are in specific character sets
or encodings.

If only we had a data structure that easily allowed us to manipulate
8-bit characters ...

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
All I want for my birthday is another birthday -
 Ian Dury, 1942-2000
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Guido van Rossum
It should probably be opened in binary mode. Binary files do have a
.readline() method (returning a bytes object), and bytes objects have
a .startswith() method. The tell positions computed this way are even
compatible with those used by the text file. So you could do it this
way:

- open binary stream
- compute TOC by reading through it using .readline() and .tell()
- rewind (don't close)
- wrap the binary stream in a text stream
- use that for the rest of the code

--Guido

On Tue, Jun 29, 2010 at 10:54 AM, Steve Holden st...@holdenweb.com wrote:
 A.M. Kuchling wrote:
 On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote:
 I will leave the profiler output to speak for itself, since I can find
 nothing much to say about it except that there's a hell of a lot of
 decoding going on inside mailbox.iterkeys().

 The problem is actually in _generate_toc(), which is reading through
 the entire file to figure out where all the 'From' lines that start
 messages are located.  TextIOWrapper()'s tell() method seems to be
 very slow, so one help is to only call tell() when necessary; patch:

 - svn diff Lib/
 Index: Lib/mailbox.py
 ===
 --- Lib/mailbox.py    (revision 82346)
 +++ Lib/mailbox.py    (working copy)
 @@ -775,13 +775,14 @@
          starts, stops = [], []
          self._file.seek(0)
          while True:
 -            line_pos = self._file.tell()
              line = self._file.readline()
              if line.startswith('From '):
 +                line_pos = self._file.tell()
                  if len(stops)  len(starts):
                      stops.append(line_pos - len(os.linesep))
                  starts.append(line_pos)
              elif not line:
 +                line_pos = self._file.tell()
                  stops.append(line_pos)
                  break
          self._toc = dict(enumerate(zip(starts, stops)))

 But should mailboxes really be opened in a UTF-8 encoding, or should
 they be treated as 7-bit text?  I'll have to think about this.

 Neither! You can't open them as 7-bit text, because real-world email
 does contain bytes whose ordinal value exceeds 127. You can't open them
 using a text encoding because theoretically there might be ASCII headers
 that indicate that parts of the content are in specific character sets
 or encodings.

 If only we had a data structure that easily allowed us to manipulate
 8-bit characters ...

 regards
  Steve
 --
 Steve Holden           +1 571 484 6266   +1 800 494 3119
 See Python Video!       http://python.mirocommunity.org/
 Holden Web LLC                 http://www.holdenweb.com/
 UPCOMING EVENTS:        http://holdenweb.eventbrite.com/
 All I want for my birthday is another birthday -
                                     Ian Dury, 1942-2000
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/guido%40python.org




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Steve Holden
Guido van Rossum wrote:
 It should probably be opened in binary mode. Binary files do have a
 .readline() method (returning a bytes object), and bytes objects have
 a .startswith() method. The tell positions computed this way are even
 compatible with those used by the text file. So you could do it this
 way:
 
 - open binary stream
 - compute TOC by reading through it using .readline() and .tell()
 - rewind (don't close)

Because closing is inefficient, or because it breaks the algorithm?

 - wrap the binary stream in a text stream

wrap how? The ultimate destiny of the text is twofold:

1) To be stored as some kind of LOB in a database, and
2) Therefrom to be reconstituted and parsed into email.Message objects.

Is the wrapping a one-off operation or a software layer? Sorry, being a
bit dense here, I know.

regards
 Steve

 - use that for the rest of the code
 
 --Guido
 
 On Tue, Jun 29, 2010 at 10:54 AM, Steve Holden st...@holdenweb.com wrote:
 A.M. Kuchling wrote:
 On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote:
 I will leave the profiler output to speak for itself, since I can find
 nothing much to say about it except that there's a hell of a lot of
 decoding going on inside mailbox.iterkeys().
 The problem is actually in _generate_toc(), which is reading through
 the entire file to figure out where all the 'From' lines that start
 messages are located.  TextIOWrapper()'s tell() method seems to be
 very slow, so one help is to only call tell() when necessary; patch:

 - svn diff Lib/
 Index: Lib/mailbox.py
 ===
 --- Lib/mailbox.py(revision 82346)
 +++ Lib/mailbox.py(working copy)
 @@ -775,13 +775,14 @@
  starts, stops = [], []
  self._file.seek(0)
  while True:
 -line_pos = self._file.tell()
  line = self._file.readline()
  if line.startswith('From '):
 +line_pos = self._file.tell()
  if len(stops)  len(starts):
  stops.append(line_pos - len(os.linesep))
  starts.append(line_pos)
  elif not line:
 +line_pos = self._file.tell()
  stops.append(line_pos)
  break
  self._toc = dict(enumerate(zip(starts, stops)))

 But should mailboxes really be opened in a UTF-8 encoding, or should
 they be treated as 7-bit text?  I'll have to think about this.
 Neither! You can't open them as 7-bit text, because real-world email
 does contain bytes whose ordinal value exceeds 127. You can't open them
 using a text encoding because theoretically there might be ASCII headers
 that indicate that parts of the content are in specific character sets
 or encodings.

 If only we had a data structure that easily allowed us to manipulate
 8-bit characters ...

 regards
  Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
All I want for my birthday is another birthday -
 Ian Dury, 1942-2000

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pickle security and remote logging

2010-06-29 Thread anatoly techtonik
On Tue, Jun 29, 2010 at 6:15 PM, Vinay Sajip vinay_sa...@yahoo.co.uk wrote:

 I've updated the documentation of SocketHandler.makePickle to mention security
 concerns, and that the method can be overridden to use a more secure
 implementation (e.g. HMAC-signed pickles).

Thanks. But I doubt HMAC complication helps to protect logging server.
If shared key is compromised -server becomes vulnerable. I would
prefer approach when no code execution is possible. Some alternative
serialization way for transmitting log data structures over network.
Protocol buffers first come in mind, but they seem to be an overkill,
and stdlib doesn't include any implementation.

-- 
anatoly t.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread R. David Murray
On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden st...@holdenweb.com wrote:
 A.M. Kuchling wrote:
  But should mailboxes really be opened in a UTF-8 encoding, or should
  they be treated as 7-bit text?  I'll have to think about this.
 
 Neither! You can't open them as 7-bit text, because real-world email
 does contain bytes whose ordinal value exceeds 127. You can't open them
 using a text encoding because theoretically there might be ASCII headers
 that indicate that parts of the content are in specific character sets
 or encodings.
 
 If only we had a data structure that easily allowed us to manipulate
 8-bit characters ...

email6 *will* handle this use case.  When it exists :)  But note that it
is *not* just a matter of easily handling 8 bit characters.  There are
a whole bunch of algorithms needed for interpreting that 7 and 8 bit data.
All the info is there in the email headers, but being able to do string
operations on 8 bit byte strings doesn't get you the answers you need
by itself.

It really is the case that the Python3 bytes/unicode split forces us
to redo most of the algorithms so that they handle bytes and text
*correctly*.  This isn't a trivial undertaking, but the end result
will be well worth it.

--
R. David Murray  www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread R. David Murray
On Tue, 29 Jun 2010 17:02:14 -0400, Steve Holden st...@holdenweb.com wrote:
 Guido van Rossum wrote:
 
  - wrap the binary stream in a text stream
 
 wrap how? The ultimate destiny of the text is twofold:

I would imagine Guido is talking about an io.TextIOWrapper...in other
words, take the binary file you've just finished grabbing info
from, and reread it as a text file in order to grab the actual
message content.

If you have messages in your files that are using an 8bit content
transfer encoding, then you (currently) will have some problems
unless the charset happens to be the one you use when you wrap
the binary stream as a text stream.

--
R. David Murray  www.bitdance.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mailbox module - timings and functionality changes

2010-06-29 Thread Steve Holden
R. David Murray wrote:
 On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden st...@holdenweb.com wrote:
 A.M. Kuchling wrote:
 But should mailboxes really be opened in a UTF-8 encoding, or should
 they be treated as 7-bit text?  I'll have to think about this.
 Neither! You can't open them as 7-bit text, because real-world email
 does contain bytes whose ordinal value exceeds 127. You can't open them
 using a text encoding because theoretically there might be ASCII headers
 that indicate that parts of the content are in specific character sets
 or encodings.

 If only we had a data structure that easily allowed us to manipulate
 8-bit characters ...
 
 email6 *will* handle this use case.  When it exists :)  But note that it
 is *not* just a matter of easily handling 8 bit characters.  There are
 a whole bunch of algorithms needed for interpreting that 7 and 8 bit data.
 All the info is there in the email headers, but being able to do string
 operations on 8 bit byte strings doesn't get you the answers you need
 by itself.
 
 It really is the case that the Python3 bytes/unicode split forces us
 to redo most of the algorithms so that they handle bytes and text
 *correctly*.  This isn't a trivial undertaking, but the end result
 will be well worth it.
 
I completely agree. The unusual thing here is that I of all people
should find himself running into these issues, since my use of Python is
normally pretty conservative. Since the course I am currently writing is
already overdue I have to find answers now to problems that were present
in the initial 3.0 release and have not received much attention since.

You know that I support your work to revise the email package. I hope
that we can eventually have it incorporate mailbox readers as well.

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
All I want for my birthday is another birthday -
 Ian Dury, 1942-2000

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] OS X buildbots: why am I skipping these tests?

2010-06-29 Thread Bill Janssen
My Leopard and Tiger PPC buildbots are momentarily green!  But I'm
looking into why I'm skipping some tests.  My buildbots are up-to-date
OS-wise and very vanilla, with the latest applicable Xcode.

4 skips unexpected on darwin:
test_gdb test_ioctl test_readline test_ttk_guionly

Three of these (gdb, readline, ttk_guionly) are just bad predictions of
which tests should skip on Darwin, I think -- gdb is only version 6, so
that test won't run, readline doesn't get built, ttk doesn't work
without Tcl/Tk 8.5.  But the the skip of test_ioctl baffles me.

test_ioctl skipped -- Unable to open /dev/tty

But when I log in via ssh and try it with the system python:

~ wjanssen$ python
python
Python 2.5.1 (r251:54863, Jun 17 2009, 20:37:34) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type help, copyright, credits or license for more information.
 open(/dev/tty)
open(/dev/tty)
open file '/dev/tty', mode 'r' at 0x597b8
 

Seems to work fine.  So this I don't understand.  Any ideas, anyone?

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?

2010-06-29 Thread Stephen J. Turnbull
Steve Holden writes:

  I agree - trying to step through -O2 optimized code isn't going to
  help debug your code, it's going to help you debug the
  optimizer. That's a very rare use case.

Not really.  I don't have a lot of practice in debugging at that
level, so take it with a grain of salt, but what I've found with
XEmacs code is that debugging at -O0 is less often helpful than
debugging at -O2.  Quite often a naive compilation strategy is used
which basically turns those C statements into macros for the
underlying assembler, and the code works the way the author thinks it
should.  But his assumptions are invalid, and when optimized it fails.

So I guess you can call that debugging the optimizer if you like
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OS X buildbots: why am I skipping these tests?

2010-06-29 Thread Guido van Rossum
On Tue, Jun 29, 2010 at 7:55 PM, Bill Janssen jans...@parc.com wrote:
 My Leopard and Tiger PPC buildbots are momentarily green!  But I'm
 looking into why I'm skipping some tests.  My buildbots are up-to-date
 OS-wise and very vanilla, with the latest applicable Xcode.

 4 skips unexpected on darwin:
    test_gdb test_ioctl test_readline test_ttk_guionly

 Three of these (gdb, readline, ttk_guionly) are just bad predictions of
 which tests should skip on Darwin, I think -- gdb is only version 6, so
 that test won't run, readline doesn't get built, ttk doesn't work
 without Tcl/Tk 8.5.

So it looks like you gould get readline and ttk to run and pass by
separately downloading and installing readline (I've done this many
times before) and Tcl/Tk (no idea but I suppose it should work).

 But the the skip of test_ioctl baffles me.

 test_ioctl skipped -- Unable to open /dev/tty

 But when I log in via ssh and try it with the system python:

 ~ wjanssen$ python
 python
 Python 2.5.1 (r251:54863, Jun 17 2009, 20:37:34)
 [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
 Type help, copyright, credits or license for more information.
 open(/dev/tty)
 open(/dev/tty)
 open file '/dev/tty', mode 'r' at 0x597b8


 Seems to work fine.  So this I don't understand.  Any ideas, anyone?

Maybe the buildbot runs the tests as a tty-less daemon process. If you
ask me it's pretty crazy to have a test that requires a tty. But there
you have it -- and it's the same in Python 3. (But then again, who
knows, I might have written that test. ;-)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OS X buildbots: why am I skipping these tests?

2010-06-29 Thread Martin v. Löwis
 Seems to work fine.  So this I don't understand.  Any ideas, anyone?

Didn't we discuss this before? The buildbot slave has no controlling
terminal anymore, hence it cannot open /dev/tty. If you are curious,
just patch your checkout to output the exact errno (e.g. to stdout),
and trigger a build through the web.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Taking over the Mercurial Migration

2010-06-29 Thread Martin v. Löwis
It seems that both Dirkjan and Brett are very caught up
with real life for the coming months. So I suggest that
some other committer who favors the Mercurial transition
steps forward and takes over this project.

If nobody volunteers, I propose that we release 3.2
from Subversion, and reconsider Mercurial migration
next year.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com