[issue46011] Python 3.10 email returns invalid Date: header unchanged.

2021-12-08 Thread Mark Sapiro


Mark Sapiro  added the comment:

Upon further research I realized this is related to 
https://bugs.python.org/issue30681 and that while there are no message.defects 
the Date: header does have the InvalidDateDefect and its datetime attribute is 
None so I consider this resolved.

--
stage:  -> resolved

___
Python tracker 
<https://bugs.python.org/issue46011>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46011] Python 3.10 email returns invalid Date: header unchanged.

2021-12-07 Thread Mark Sapiro


New submission from Mark Sapiro :

Here is an interactive Python session
```
Python 3.10.1 (main, Dec  7 2021, 15:44:39) [GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from email import message_from_bytes, policy
>>> msg_raw = b"""Return-Path: 
... Delivered-To: mailman-us...@dinsdale.python.org
... From: u...@example.com
... Message-Id: 
... Date: Tue, 30 Nov 1999 23:56:33 -3000 (CST)
... To: mailman-us...@python.org
... 
... msg1
... """
>>> message = message_from_bytes(msg_raw, policy=policy.default)
>>> message.get('date')
'Tue, 30 Nov 1999 23:56:33 -3000 (CST)'
>>> message.defects
[]
>>> 
```
The same session in Python 3.9 throws ValueError: offset must be a timedelta 
strictly between -timedelta(hours=24) and timedelta(hours=24), not 
datetime.timedelta(days=-2, seconds=64800).

At first I thought this was related to https://bugs.python.org/issue30681 but 
that seems to not be the case as utils.parsedate_to_datetime('Tue, 30 Nov 1999 
23:56:33 -3000 (CST)') throws the same exception In Python 3.10.1.

I think getting the Date: header which has an invalid timezone should either 
throw the exception as before or return None, but not return the invalid date 
header.

--
components: email
keywords: 3.10regression
messages: 407997
nosy: barry, msapiro, r.david.murray
priority: normal
severity: normal
status: open
title: Python 3.10 email returns invalid Date: header unchanged.
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue46011>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45921] codecs module doesn't support iso-8859-6-i, iso-8859-6-e, iso-8859-8-i or iso-8859-8-i

2021-11-29 Thread Mark Sapiro


Mark Sapiro  added the comment:

The mailman-us...@python.org list received a post with the From: header 
containing a Hebrew display name RFC 2047 encoded with the iso-8859-8-i charset 
which threw a LookupError: unknown encoding: iso-8859-8-i exception in 
processing and shunted the message. The message body also had the charset 
declared as iso-8859-8-i although it contained only ascii. Unfortunately, I 
don't have the original message so I can't say what MUA created it or how 
common this usage is.

I do think that just adding these as aliases for the non-annotated encodings is 
an appropriate response.

--

___
Python tracker 
<https://bugs.python.org/issue45921>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45921] codecs module doesn't support iso-8859-6-i, iso-8859-6-e, iso-8859-8-i or iso-8859-8-i

2021-11-28 Thread Mark Sapiro


New submission from Mark Sapiro :

iso-8859-6-i, iso-8859-6-e, iso-8859-8-i and iso-8859-8-i are all IANA 
recognized character sets per 
https://www.iana.org/assignments/character-sets/character-sets.xhtml. These are 
all unrecognized by codecs.lookup().

--
components: Library (Lib)
messages: 407240
nosy: msapiro
priority: normal
severity: normal
status: open
title: codecs module doesn't support iso-8859-6-i, iso-8859-6-e, iso-8859-8-i 
or iso-8859-8-i
type: behavior
versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue45921>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44560] Unrecognized charset "eucgb2312_cn" in email header for many MUA

2021-07-06 Thread Mark Sapiro


Change by Mark Sapiro :


--
versions: +Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue44560>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43996] Doc for mutable sequence pop() method implies argument is a slice or sequence.

2021-04-30 Thread Mark Sapiro


Mark Sapiro  added the comment:

Thank you for the explanation which I understand and accept. I also fully (or 
maybe not quite fully) understand the use of square brackets to indicate 
optional arguments. It's just that in the context of the table at 
https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types every 
other use of square brackets indicates a list or a slice and that's what 
confused me. Granted, all the other square bracket usage was not around a 
method argument, and I accept that the doc is correct, but I still found it 
confusing.

--

___
Python tracker 
<https://bugs.python.org/issue43996>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43996] Doc for mutable sequence pop() method implies argument is a slice or sequence.

2021-04-30 Thread Mark Sapiro


New submission from Mark Sapiro :

In several places in the documentation including:

```
grep -rn 'pop.\[i\]'
Lib/pydoc_data/topics.py:13184: '| "s.pop([i])"   | 
retrieves the item at *i* '
Lib/pydoc_data/topics.py:13647: '| "s.pop([i])" 
  | retrieves the item at '
Doc/tutorial/datastructures.rst:47:.. method:: list.pop([i])
Doc/library/array.rst:193:.. method:: array.pop([i])
Doc/library/stdtypes.rst:1116:| ``s.pop([i])``   | retrieves the 
item at *i* and  | \(2)|
```
the mutable sequence and array `pop()` method is documented as shown above in a 
way that implies the argument to `pop()` is a slice or sequence when it is 
actually just an integer. All those references should be `pop(i)` rather than 
`pop([i])`.

--
assignee: docs@python
components: Documentation
messages: 392551
nosy: docs@python, msapiro
priority: normal
severity: normal
status: open
title: Doc for mutable sequence pop() method implies argument is a slice or 
sequence.
type: behavior
versions: Python 3.10, Python 3.11, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue43996>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42054] email message get_content throws KeyError for content main types font and model

2020-10-16 Thread Mark Sapiro


New submission from Mark Sapiro :

With Policy = email.policy.default, there are handlers for get_content() only 
for content types 'text', 'audio', 'image', 'video', 'application', 
'message/rfc822', 'message/external-body' and 'message'. While these are the 
only main types listed in RFC 6838, RFC 8081 adds 'font' and RFC 2077 defines 
'model' there are several registered 'font' and 'model' types at 
https://www.iana.org/assignments/media-types/media-types.xhtml

It would be good if get_content() returned content, even if only raw bytes, for 
those types.

--
messages: 378738
nosy: msapiro
priority: normal
severity: normal
status: open
title: email message get_content throws KeyError for content main types font 
and model
versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue42054>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27321] Email parser creates a message object that can't be flattened

2020-10-05 Thread Mark Sapiro


Mark Sapiro  added the comment:

I work around it with
```
class Message(email.message.Message):

def as_string(self):
# Work around for https://bugs.python.org/issue27321 and
# https://bugs.python.org/issue32330.
try:
value = email.message.Message.as_string(self)
except (KeyError, LookupError, UnicodeEncodeError):
value = email.message.Message.as_bytes(self).decode(
'ascii', 'replace')
# Also ensure no unicode surrogates in the returned string.
return email.utils._sanitize(value)
```
This is easy for me because it's Mailman which already subclasses 
email.message.Message for other reasons. It is perhaps more difficult if you 
aren't already subclassing email.message.Message for other purposes.

--

___
Python tracker 
<https://bugs.python.org/issue27321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40597] generated email message exceeds RFC-mandated limit of 998 characters

2020-05-30 Thread Mark Sapiro


Change by Mark Sapiro :


--
pull_requests: +19786
stage: resolved -> patch review
pull_request: https://github.com/python/cpython/pull/20542

___
Python tracker 
<https://bugs.python.org/issue40597>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40597] generated email message exceeds RFC-mandated limit of 998 characters

2020-05-30 Thread Mark Sapiro


Mark Sapiro  added the comment:

With the fix in PR 20038, committed at 
https://github.com/python/cpython/commit/6f2f475d5a2cd7675dce844f3af436ba919ef92b
 it is no longer possible to set_content(''). Attempts to do so produce the 
following
```
  File 
"/var/MM/3/hk_39/hyperkitty/.tox/py39-django30/lib/python3.9/site-packages/django_mailman3/lib/scrub.py",
 line 95, in _get_all_attachments
part.set_content('')
  File "/usr/local/lib/python3.9/email/message.py", line 1171, in set_content
super().set_content(*args, **kw)
  File "/usr/local/lib/python3.9/email/message.py", line 1101, in set_content
content_manager.set_content(self, *args, **kw)
  File "/usr/local/lib/python3.9/email/contentmanager.py", line 37, in 
set_content
handler(msg, obj, *args, **kw)
  File "/usr/local/lib/python3.9/email/contentmanager.py", line 185, in 
set_text_content
cte, payload = _encode_text(string, charset, cte, msg.policy)
  File "/usr/local/lib/python3.9/email/contentmanager.py", line 149, in 
_encode_text
if max(len(x) for x in lines) <= policy.max_line_length:
ValueError: max() arg is an empty sequence
```

--
nosy: +msapiro

___
Python tracker 
<https://bugs.python.org/issue40597>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39384] Email parser creates a message object that can't be flattened as bytes.

2020-02-05 Thread Mark Sapiro


Mark Sapiro  added the comment:

I've researched this further, and I know how this happens. The original message 
contains a text/html part (in my case, the only part) which contains a base64 
or quoted-printable body which when decoded contains non-ascii. It is parsed 
correctly by email.message_from_bytes.

It is then processed by Mailman's content filtering which retrieves html 
payload via

part.get_payload(decode=True).decode(ctype, errors='replace'))

where part is the text/html part and ctype is 'utf-8' in this case. It then 
uses elinks, lynx or some other configured command to convert the html payload 
to plain text and that plain text still contains non-ascii.

It then replaces the payload and sets the content type via

del part['content-transfer-encoding']
part.set_payload(plain_text)
part.set_type('text/plain')

And this results in a message which can't be flattened as_bytes.

The issue is set_payload() should encode the payload appropriately and in fact, 
it does if an appropriate charset is given, so this is our error in not 
providing a charset= argument to set_payload.

Closing this and the corresponding PR.

--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue39384>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39384] Email parser creates a message object that can't be flattened as bytes.

2020-02-04 Thread Mark Sapiro


Mark Sapiro  added the comment:

Other Mailman3 installations are also encountering this issue. See 
https://lists.mailman3.org/archives/list/mailman-us...@mailman3.org/message/VQZORIDL5PNQ4W33KIMVTFTANSGZD46S/

--

___
Python tracker 
<https://bugs.python.org/issue39384>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39384] Email parser creates a message object that can't be flattened as bytes.

2020-01-20 Thread Mark Sapiro


Mark Sapiro  added the comment:

This came about because of an actual situation in a Mailman 3 installation. I 
can't say for sure what the actual original message looked like, but it was 
received by Mailman's LMTP server and parsed with email.message_from_bytes(), 
so it clearly wasn't exactly like the message excerpt I posted in the report 
above. However, All I had to go by was the message object from the shunted 
pickle file created as a result of the exception.

The message was processed by Mailman, but when Mailman's handler pipeline 
attempted to save it for the digest, it calls an instance of mailbox.MMDF to 
add the message to the mailbox accumulating messages for the digest, and that 
in turn calls the flatten method of an email.generator.BytesGenerator instance. 
and that's where the exception was thrown.

Perhaps the suggested patch in https://github.com/python/cpython/pull/18056 
doesn't address every possible case, and it can result in a slightly garbled 
message due to replacing 'invalid' characters, but in my case at least, it is 
much preferable to the alternative.

--

___
Python tracker 
<https://bugs.python.org/issue39384>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27321] Email parser creates a message object that can't be flattened

2020-01-19 Thread Mark Sapiro


Change by Mark Sapiro :


--
pull_requests: +17467
pull_request: https://github.com/python/cpython/pull/18074

___
Python tracker 
<https://bugs.python.org/issue27321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32330] Email parser creates a message object that can't be flattened

2020-01-18 Thread Mark Sapiro


Change by Mark Sapiro :


--
keywords: +patch
pull_requests: +17453
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/18059

___
Python tracker 
<https://bugs.python.org/issue32330>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32330] Email parser creates a message object that can't be flattened

2020-01-18 Thread Mark Sapiro


Change by Mark Sapiro :


--
versions: +Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue32330>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39384] Email parser creates a message object that can't be flattened as bytes.

2020-01-18 Thread Mark Sapiro


Change by Mark Sapiro :


--
versions: +Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue39384>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39384] Email parser creates a message object that can't be flattened as bytes.

2020-01-18 Thread Mark Sapiro

New submission from Mark Sapiro :

This is similar to https://bugs.python.org/issue32330 but is the opposite 
behavior. In that issue, the message couldn't be flattened as a string but 
could be flattened as bytes. Here, the message can be flattened as a string but 
can't be flattened as bytes.

The original message was created by an arguably defective email client that 
quoted a message containing a utf8 encoded RIGHT SINGLE QUOTATION MARK and 
utf-8 encoded separately the three bytes resulting in `â**` instead of `’`. 
That's not really relevant but is just to show how such a message can be 
generated.

The following interactive python session shows the issue.

```
>>> import email
>>> msg = email.message_from_string("""From u...@example.com Sat Jan 18 
>>> 04:09:40 2020
... From: u...@example.com
... To: re...@example.com
... Subject: Century Dates for Insurance purposes
... Date: Fri, 17 Jan 2020 20:09:26 -0800
... Message-ID: <75ccdd72-d71c-407c-96bd-0ca95abcf...@email.android.com>
... MIME-Version: 1.0
... Content-Type: text/plain; charset="utf-8"
... Content-Transfer-Encoding: 8bit
... 
...Thursday-Monday will cover both days of staging and then storing 
goods
...post-century. I think thatâ**s the way to go.
... 
... """)
>>> msg.as_bytes()
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.7/email/message.py", line 178, in as_bytes
g.flatten(self, unixfrom=unixfrom)
  File "/usr/local/lib/python3.7/email/generator.py", line 116, in flatten
self._write(msg)
  File "/usr/local/lib/python3.7/email/generator.py", line 181, in _write
self._dispatch(msg)
  File "/usr/local/lib/python3.7/email/generator.py", line 214, in _dispatch
meth(msg)
  File "/usr/local/lib/python3.7/email/generator.py", line 432, in _handle_text
super(BytesGenerator,self)._handle_text(msg)
  File "/usr/local/lib/python3.7/email/generator.py", line 249, in _handle_text
self._write_lines(payload)
  File "/usr/local/lib/python3.7/email/generator.py", line 155, in _write_lines
self.write(line)
  File "/usr/local/lib/python3.7/email/generator.py", line 406, in write
self._fp.write(s.encode('ascii', 'surrogateescape'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe2' in position 33: 
ordinal not in range(128)
>>> 
```

--
components: email
messages: 360249
nosy: barry, msapiro, r.david.murray
priority: normal
severity: normal
status: open
title: Email parser creates a message object that can't be flattened as bytes.
versions: Python 3.5, Python 3.6, Python 3.7

___
Python tracker 
<https://bugs.python.org/issue39384>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37919] nntplib throws spurious NNTPProtocolError

2019-08-22 Thread Mark Sapiro


New submission from Mark Sapiro :

This is really due to an nntp server bug, but here's the scenerio.

A connection is opened to the server.

An article is posted via the connection's post() method.

The server responds to the article data with

240 Article posted 

but due to the server bug, if the message-id is long, this response comes on 
two lines as

240 Article posted
 

The post() method reads only the first line and returns it.

Then the connection's quit() method (or some other method) is called, and it 
sees the second line of the prior response as the server's response rather than 
the actual response, and raises NNTPProtocolError.

Arguably, NNTPProtocolError is appropriate in this scenario, but if so, it 
should be raised by the post() method and not by a subsequent method.

--
components: Library (Lib)
messages: 350214
nosy: msapiro
priority: normal
severity: normal
status: open
title: nntplib throws spurious NNTPProtocolError
versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue37919>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36910] Certain Malformed email causes email.parser to throw AttributeError

2019-05-14 Thread Mark Sapiro


Mark Sapiro  added the comment:

I do intend to submit a PR. I haven't yet worked it out though.

--

___
Python tracker 
<https://bugs.python.org/issue36910>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36910] Certain Malformed email causes email.parser to throw AttributeError

2019-05-13 Thread Mark Sapiro


New submission from Mark Sapiro :

The code in the attached parse_bug.py file when run with Python 3.5, 3.6 or 3.7 
throws AttributeError with this traceback:

```
Traceback (most recent call last):
  File "parse_bug.py", line 9, in 
""")
  File "/usr/local/lib/python3.7/email/parser.py", line 124, in parsebytes
return self.parser.parsestr(text, headersonly)
  File "/usr/local/lib/python3.7/email/parser.py", line 68, in parsestr
return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/local/lib/python3.7/email/parser.py", line 58, in parse
return feedparser.close()
  File "/usr/local/lib/python3.7/email/feedparser.py", line 187, in close
self._call_parse()
  File "/usr/local/lib/python3.7/email/feedparser.py", line 180, in _call_parse
self._parse()
  File "/usr/local/lib/python3.7/email/feedparser.py", line 323, in _parsegen
if (self._cur.get('content-transfer-encoding', '8bit').lower()
AttributeError: 'Header' object has no attribute 'lower'
```

The triggering condition appears to be the Content-Transfer-Encoding: header 
with a non-ascii character in the headers of a multipart part.

The parser should probably throw email.errors.HeaderParseError with a 
MalformedHeaderDefect in this case rather than AttributeError.

While arguably code should defend against unanticipated exceptions, the fact 
that such an exception can be thrown while parsing an arbitrary message could 
be considered a security issue.

--
components: email
files: parse_bug.py
messages: 342415
nosy: barry, msapiro, r.david.murray
priority: normal
severity: normal
status: open
title: Certain Malformed email causes email.parser to throw AttributeError
type: behavior
versions: Python 3.5, Python 3.6, Python 3.7
Added file: https://bugs.python.org/file48330/parse_bug.py

___
Python tracker 
<https://bugs.python.org/issue36910>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34155] email.utils.parseaddr mistakenly parse an email

2018-11-06 Thread Mark Sapiro


Mark Sapiro  added the comment:

I agree that my example with an @ in the 'display name', although actually seen 
in the wild, is non-compliant, and that the behavior of parseaddr() in this 
case is not a bug.

Sorry for the noise.

--

___
Python tracker 
<https://bugs.python.org/issue34155>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34155] email.utils.parseaddr mistakenly parse an email

2018-11-06 Thread Mark Sapiro


Mark Sapiro  added the comment:

The issue is illustrated much more simply as follows:

email.utils.parseaddr('John Doe j...@example.com ')

returns

('', 'John Doe j...@example.com')

whereas it should return

('John Doe j...@example.com', 'ot...@example.net')

I'll look at developing a patch.

--
nosy: +msapiro

___
Python tracker 
<https://bugs.python.org/issue34155>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32330] Email parser creates a message object that can't be flattened

2017-12-15 Thread Mark Sapiro

Mark Sapiro <m...@msapiro.net> added the comment:

> I do wonder where you are using the string version of messages :)

Probably some places where we could use bytes, but one of the problem areas is 
where we save the content of a message held for moderation.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32330>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32330] Email parser creates a message object that can't be flattened

2017-12-14 Thread Mark Sapiro

Mark Sapiro <m...@msapiro.net> added the comment:

Yes. I think errors=replace is a good solution. In Mailman, we have our own 
mailman.email.message.Message class which is a subclass of 
email.message.Message and what we do to work around this and issue27321 is 
override as_string() with:

def as_string(self):
# Work around for https://bugs.python.org/issue27321 and
# https://bugs.python.org/issue32330.
try:
value = email.message.Message.as_string(self)
except (KeyError, UnicodeEncodeError):
value = email.message.Message.as_bytes(self).decode(
'ascii', 'replace')
return value

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32330>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32330] Email parser creates a message object that can't be flattened

2017-12-14 Thread Mark Sapiro

New submission from Mark Sapiro <m...@msapiro.net>:

This is related to https://bugs.python.org/issue27321 but a different exception 
is thrown for a different reason. This is caused by a defective spam message. I 
don't actually have the offending message from the wild, but the attached 
bad_email_2.eml illustrates the problem.

The defect is the message declares the content charset as us-ascii, but the 
body contains non-ascii. When the message is parsed into an 
email.message.Message object and the objects as_string() method is called, 
UnicodeEncodeError is thrown as follows:

>>> import email
>>> with open('bad_email_2.eml', 'rb') as fp:
... msg = email.message_from_binary_file(fp)
... 
>>> msg.as_string()
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.5/email/message.py", line 159, in as_string
g.flatten(self, unixfrom=unixfrom)
  File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
self._write(msg)
  File "/usr/lib/python3.5/email/generator.py", line 181, in _write
self._dispatch(msg)
  File "/usr/lib/python3.5/email/generator.py", line 214, in _dispatch
meth(msg)
  File "/usr/lib/python3.5/email/generator.py", line 243, in _handle_text
msg.set_payload(payload, charset)
  File "/usr/lib/python3.5/email/message.py", line 316, in set_payload
payload = payload.encode(charset.output_charset)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-33: 
ordinal not in range(128)

--
components: email
files: bad_email_2.eml
messages: 308353
nosy: barry, msapiro, r.david.murray
priority: normal
severity: normal
status: open
title: Email parser creates a message object that can't be flattened
type: behavior
versions: Python 3.5, Python 3.6
Added file: https://bugs.python.org/file47333/bad_email_2.eml

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32330>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32144] email.policy.SMTP and SMTPUTF8 doesn't honor linesep's value

2017-11-26 Thread Mark Sapiro

Change by Mark Sapiro <m...@msapiro.net>:


--
nosy: +msapiro

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32144>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27321] Email parser creates a message object that can't be flattened

2017-06-03 Thread Mark Sapiro

Mark Sapiro added the comment:

It looks like Johannes beat me to it. Thanks for that, but see my comments in 
the diff at 
https://github.com/kyrias/cpython/commit/a986a8274a522c73d87360da6930e632a3eb4ebb

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27321] Email parser creates a message object that can't be flattened

2017-06-03 Thread Mark Sapiro

Mark Sapiro added the comment:

I considered look before you leap, but I decided since we're munging the 
headers anyway, preserving their order is not that critical, but the patch is 
easy enough. I'll work on that and a test.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27321] Email parser creates a message object that can't be flattened

2016-06-14 Thread Mark Sapiro

Mark Sapiro added the comment:

One additional observation. The original message contained no 
Content-Transfer-Encoding header even though the message body was raw koi8-r 
characters. Adding

Content-Transfer-Encoding: 8bit

to the message headers avoids the issue, but that is not a practical solution 
as the message was Russian spam received by a Mailman list and the resultant 
KeyError caused problems in Mailman.

We can work on defending against this in Mailman, but I suggest that the 
munge_cte code in generator._write() avoid the documented possible KeyError 
raised by replace_header() by using __delitem__() and __setitem__() instead as 
in the attached generator.patch.

--
keywords: +patch
Added file: http://bugs.python.org/file43394/generator.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27321] Email parser creates a message object that can't be flattened

2016-06-14 Thread Mark Sapiro

New submission from Mark Sapiro:

The attached file, bad_email, can be parsed via

msg = email.message_from_binary_file(open('bad_email', 'rb'))

but then msg.as_string() prodices the following:

Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.5/email/message.py", line 159, in as_string
g.flatten(self, unixfrom=unixfrom)
  File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
self._write(msg)
  File "/usr/lib/python3.5/email/generator.py", line 189, in _write
msg.replace_header('content-transfer-encoding', munge_cte[0])
  File "/usr/lib/python3.5/email/message.py", line 559, in replace_header
raise KeyError(_name)
KeyError: 'content-transfer-encoding'

--
components: email
files: bad_email
messages: 268580
nosy: barry, msapiro, r.david.murray
priority: normal
severity: normal
status: open
title: Email parser creates a message object that can't be flattened
versions: Python 3.4, Python 3.5
Added file: http://bugs.python.org/file43391/bad_email

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: Threading is foobared?

2016-04-04 Thread Mark Sapiro
Mark Sapiro wrote:
> Random832 wrote:
> 
>> Any chance that it could fix reference headers to match?
>> 
>> Actually, merely prepending the original Message-ID itself to the
>> references header might be enough to change the reply's situation from
>> "nephew" ("reply to [missing] sibling") to "grandchild" ("reply to
>> [missing] reply"), which might be good enough to make threading work
>> right on most clients, and would be *easy* (whereas maintaining an
>> ongoing reversible mapping may not be).
>> 
>> And if it's not too much additional work, maybe throw in an
>> X-Mailman-Original-Message-ID (and -References if anything is done with
>> that) field, so that the original state can be recovered.
> 
> 
> I think these are good ideas. I'm going to try to do something along
> these lines.


This is now implemented on mail.python.org for python-list@python.org
and the others that gateway to Usenet.

I hope this will mitigate at least some of the threading issues.

As noted earlier in this thread, the original Message-ID: is appended,
not prepended to References:. More specifically, if there is a
References: header, the original Message-ID: is appended. If not, one is
created with the In-Reply-To: value if any and the original Message-ID:.

-- 
Mark Sapiro <m...@msapiro.net>The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
-- 
https://mail.python.org/mailman/listinfo/python-list


[issue26686] email.parser stops parsing headers too soon.

2016-04-01 Thread Mark Sapiro

Mark Sapiro added the comment:

Added Python 2.7 to versions:

--
versions: +Python 2.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26686>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26686] email.parser stops parsing headers too soon.

2016-04-01 Thread Mark Sapiro

New submission from Mark Sapiro:

Given an admittedly defective (the folded Content-Type: isn't indented) message 
part with the following headers/body

---
Content-Disposition: inline; filename="04EBD_._A546BB.zip"
Content-Type: application/x-rar-compressed; x-unix-mode=0600;
name="04EBD_._A546BB.zip"
Content-Transfer-Encoding: base64

UmFyIRoHAM+QcwAADQBKRXQgkC4ApAMAAEAHAAACJLrQXYFUfkgdMwkAIGEw
ZjEwZi5qcwDwrrI/DB2NDI0TzcGb3Gpb8HzsS0UlpwELvdyWnVaBQt7Sl2zbJpx1qqFCGGk6
...
---

email.parser parses the headers as

---
Content-Disposition: inline; filename="04EBD_._A546BB.zip"
Content-Type: application/x-rar-compressed; x-unix-mode=0600;
---

and the body as

---
name="04EBD_._A546BB.zip"
Content-Transfer-Encoding: base64

UmFyIRoHAM+QcwAADQBKRXQgkC4ApAMAAEAHAAACJLrQXYFUfkgdMwkAIGEw
ZjEwZi5qcwDwrrI/DB2NDI0TzcGb3Gpb8HzsS0UlpwELvdyWnVaBQt7Sl2zbJpx1qqFCGGk6
...
---

and shows no defects.

This is wrong. RFC5322 section 2.1 is clear that everything up to the first 
empty line is headers. Even the docstring in the email/parser.py module says 
"The header block is terminated either by the end of the string or by a blank 
line."

Since the message is defective, it isn't clear what the correct result should 
be, but I think

Headers:
Content-Disposition: inline; filename="04EBD_._A546BB.zip"
Content-Type: application/x-rar-compressed; x-unix-mode=0600;
Content-Transfer-Encoding: base64

Body:
UmFyIRoHAM+QcwAADQBKRXQgkC4ApAMAAEAHAAACJLrQXYFUfkgdMwkAIGEw
ZjEwZi5qcwDwrrI/DB2NDI0TzcGb3Gpb8HzsS0UlpwELvdyWnVaBQt7Sl2zbJpx1qqFCGGk6
...

Defects:
name="04EBD_._A546BB.zip"

would be more appropriate. The problem is that the Content-Transfer-Encoding: 
base64 header is not in the headers so that get_payload(decode=True) doesn't 
decode the base64 encoded body making malware recognition difficult.

--
components: Library (Lib)
messages: 262750
nosy: msapiro
priority: normal
severity: normal
status: open
title: email.parser stops parsing headers too soon.
type: behavior
versions: Python 3.4

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26686>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: Threading is foobared?

2016-03-31 Thread Mark Sapiro
Random832 wrote:

> One additional thing that would be nice and would solve most of the
> duplicate problem with hypothetically including the rewritten
> Message-IDs in outgoing emails, would be to detect crossposts to
> multiple lists in the same Mailman instance, and to send them to Usenet
> (and to subscribers) as a single message, with appropriate headers for a
> crosspost.


This is difficult to do for various reasons. The main issue is gating to
news is asynchronously done by a separate process. Even if the process
could reliably determine that another gatewayed list in the installation
was a recipient of this post which it could only do by examining
explicit addressees and the other list might be a Bcc:, we'd still have
to arbitrate somehow which post gets gatewayed to the multiple news
groups and which ones get dropped. Although I suppose we could send each
one for all the news groups and let the news server figure it out.

Anyway, I don't plan to try this.

-- 
Mark Sapiro <m...@msapiro.net>The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Threading is foobared?

2016-03-31 Thread Mark Sapiro
Random832 wrote:

> Any chance that it could fix reference headers to match?
> 
> Actually, merely prepending the original Message-ID itself to the
> references header might be enough to change the reply's situation from
> "nephew" ("reply to [missing] sibling") to "grandchild" ("reply to
> [missing] reply"), which might be good enough to make threading work
> right on most clients, and would be *easy* (whereas maintaining an
> ongoing reversible mapping may not be).
> 
> And if it's not too much additional work, maybe throw in an
> X-Mailman-Original-Message-ID (and -References if anything is done with
> that) field, so that the original state can be recovered.


I think these are good ideas. I'm going to try to do something along
these lines.


> Rather than exclusively rewriting for usenet, maybe the rewritten
> headers could also be included in outgoing emails and the archive?
> 
> Putting it in outgoing emails would solve the problem entirely, though
> it would mean people get duplicates if they're subscribed to multiple
> lists to which something is posted or get CC'd. The archive wouldn't
> have this issue.


This is more difficult since archiving, gatewaying to Usenet and
delivery to list members are asynchronous processes that have no way to
communicate with each other.

It could be accomplished by doing a Usenet check in the incoming
pipeline and putting the Mailman Message-ID in the message metadata or
doing the mods at that point, but I don't think I want to expand the
scope of something that is non RFC compliant in the first place.

I need to think about these things some more.

-- 
Mark Sapiro <m...@msapiro.net>The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Threading is foobared?

2016-03-30 Thread Mark Sapiro
Hi all,

I'm jumping in on this thread because Tim asked.

I'm here because I'm a Mailman developer and the primary maintainer of
Mailman for the @python.org lists.

Regarding the initial post in this thread from Steven D'Aprano
suggesting that broken threading is more common recently and quoting a
couple of Message-ID:/References: headers wherein a message ID was
apparently munged from

<1392737302.749065.1459024715818.javamail.ya...@mail.yahoo.com>
to
<1392737302.749065.1459024715818.javamail.yahoo@mail.yahoo.com>

Some Background:

Our long time mail.python.org server provided by xs4all died of severe
hardware failure late last October. We were able to get a replacement
server through the PSF and get it configured and running within a couple
of days, but this new server couldn't access the nntp server at xs4all.
With the kind assistance of members of the community we were able to get
access to a news server at the Free University of Berlin which is now
our gateway to Usenet.

This server undoubtedly has different policies and behaviors from the
prior server at xs4all. I'm not sure what mail or news server is
responsible for munging the IDs as above, but it could be our new Usenet
gateway. All I know for sure is that Mailman doesn't do that specific
munging.

What Mailman does do as noted by Random832 is replace the Message-ID:
header value in posts gated to Usenet with a list specific, Mailman
generated unique value. There is a reason for this, and that reason is
if a message is cross-posted to two lists which both gateway to Usenet,
and Mailman didn't make the Message-IDs unique, the news server would
discard one of the two posts as a duplicate and the post would be
missing from one of the recipient Usenet groups.

Granted that this is bad and breaks threading, but avoiding message loss
is a more important goal.

I understand I'm not providing any solutions here, but perhaps a more
complete understanding of what the issues are will ease the pain.

-- 
Mark Sapiro <m...@msapiro.net>The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan



signature.asc
Description: OpenPGP digital signature
-- 
https://mail.python.org/mailman/listinfo/python-list


[issue1409460] email.Utils.parseaddr() gives arcane result

2010-07-17 Thread Mark Sapiro

Mark Sapiro m...@msapiro.net added the comment:

 parsing 'merwok'
  expected ('merwok', '')
  got  ('', 'merwok')


I think ('', 'merwok') is the correct result. I think most if not all MUAs/MTAs 
will interpret an address without an '@', albeit invalid, as a local-part in 
the local domain, thus parsing 'merwok' as the address 'merwok' with no real 
name is probably the right thing to do with this input. The alternative would 
be to return ('', '') indicating failure.


 parsing 'merwok w...@rusty'
  expected ('', 'w...@rusty')
  got  ('', 'merwok...@rusty')


Here, I think failure is a more appropriate return.

In any case, I think this is a new bug deserving of a new report. It is not 
really relevant to this issue which has to do with nested parentheses.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1409460
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5713] smtplib gets out of sync if server returns a 421 status

2009-10-22 Thread Mark Sapiro

Mark Sapiro m...@msapiro.net added the comment:

I'm not completely sure about this, but here's my thoughts. In the
scenarios I've seen, the 421 reply/disconnect only occurs in response to
a RCPT which has an invalid address and follows several prior refused
RCPTs. In this case, I think the proper action is to close the
connection and raise SMTPRecipientsRefused and return a dictionary with
the actual responses for the refused RCPTS prior to the 421 and the 421
response only for the RCPT that produced it.

If the 421 comes at another time, I think the current process does the
right thing. It will raise the appropriate exception if it gets the
chance. It just needs to be sure that if the response was 421 that
instead of doing self.rset() it does self.close().

I have attached a patch against the 2.6.1 smtplib.py which I think does
the right thing. I haven't tested this at all, but I think it should
work. The documentation may need to be updated to emphasize that even
though all recipients aren't listed in the dictionary returned with the
SMTPRecipientsRefused exception, no one got the mail.

--
keywords: +patch
Added file: http://bugs.python.org/file15183/smtplib.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5713] smtplib gets out of sync if server returns a 421 status

2009-04-06 Thread Mark Sapiro

New submission from Mark Sapiro m...@msapiro.net:

RFC821 upon which smtplib was originally based does not define a 421
status code and implies the server should only disconnect in response to
a QUIT command.

Subsequent extensions in RFC2821 (and now RFC5321) define situations
under which the server may return a 421 status and disconnect. This
leads to the following problem.

An smtplib.SMTP() instance is created and its sendmail() method is
called with a list of recipients which contains several invalid, local
addresses. sendmail() processes the recipient list, calling the rcpt()
method for each. Some of these may be accepted with a 250 or 251 status
and some may be rejected with a 550 or other status. The rejects are
kept in a dictionary to be eventually returned as the sendmail() result.

However, with the Postfix server at least, after 20 rejects, the server
sends a 421 Too many errors reply and disconnects, but sendmail
continues to process and this results in raising
SMTPServerDisconnected(Connection unexpectedly closed) and the
response dictionary containing the invalid addresses and their responses
is lost.

The caller may see the exception as retryable and may retry the send
after some delay, but since the caller has received no information about
the invalid addresses, it sends the same recipient list and the scenario
repeats.

--
components: Library (Lib)
messages: 85666
nosy: msapiro
severity: normal
status: open
title: smtplib gets out of sync if server returns a 421 status
type: behavior
versions: Python 2.4, Python 2.5, Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5277] email message.get_params() and related methods sometimes fail.

2009-02-15 Thread Mark Sapiro

New submission from Mark Sapiro m...@msapiro.net:

The message method get_params() and the related get_param() and
get_filename() do not properly decode an RFC 2231 encoded parameter such
as the following:

Content-Disposition: inline;
 filename*0=Re: [Mailman-Users] Messages shunted with \TypeError: ;
 filename*1=decodingUnicode is not supported\.eml

This is because the message helper function _parseparams() mistakenly
thinks the second semicolon is inside a quoted string because it counts
the quoted (escaped) quote and sees an odd number.

The attached patch will fix this.

--
components: Library (Lib)
files: message.patch
keywords: patch
messages: 82215
nosy: barry, msapiro
severity: normal
status: open
title: email message.get_params() and related methods sometimes fail.
type: behavior
versions: Python 2.4, Python 2.5, Python 2.6, Python 3.0, Python 3.1
Added file: http://bugs.python.org/file13105/message.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5277
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4279] Module 'parser' fails to build

2009-01-10 Thread Mark Sapiro

Mark Sapiro m...@msapiro.net added the comment:

This problem also occurs when building the 2.6.1 parser module on Cygwin
1.5.25. It did not occur with Python 2.6 or 2.5.x.

The error from 'make' is

building 'parser' extension
gcc -shared -Wl,--enable-auto-image-base
build/temp.cygwin-1.5.25-i686-2.6/cygdrive/c/Python_dist/Python-2.6.1/Modules/parsermodule.o
-L/usr/local/lib -L. -lpython2.6 -o
build/lib.cygwin-1.5.25-i686-2.6/parser.dll
build/temp.cygwin-1.5.25-i686-2.6/cygdrive/c/Python_dist/Python-2.6.1/Modules/parsermodule.o:
In function `parser_expr':
/cygdrive/c/Python_dist/Python-2.6.1/Modules/parsermodule.c:552:
undefined reference to `__PyParser_Grammar'
build/temp.cygwin-1.5.25-i686-2.6/cygdrive/c/Python_dist/Python-2.6.1/Modules/parsermodule.o:
In function `parser_suite':
/cygdrive/c/Python_dist/Python-2.6.1/Modules/parsermodule.c:552:
undefined reference to `__PyParser_Grammar'
collect2: ld returned 1 exit status

I was able to work around the error and build a parser module that
passed unit test by manually running

gcc -shared -Wl,--enable-auto-image-base
build/temp.cygwin-1.5.25-i686-2.6/cygdrive/c/Python_dist/Python-2.6.1/Modules/parsermodule.o
Python/graminit.o -L/usr/local/lib -L. -lpython2.6 -o
build/lib.cygwin-1.5.25-i686-2.6/parser.dll

i.e. by including Python/graminit.o in the explicit object files to load.

I have also confirmed that applying the parser-grammar.patch from #4288
will allow make to successfully build a parser module that passes unit
tests.

--
nosy: +msapiro
versions: +Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4279
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4288] parsermodule and grammar variable

2009-01-10 Thread Mark Sapiro

Changes by Mark Sapiro m...@msapiro.net:


--
nosy: +msapiro

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4288
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4789] Documentation changes break existing URIs

2009-01-10 Thread Mark Sapiro

Mark Sapiro m...@msapiro.net added the comment:

Thank you for adding the redirects, and for getting them right in spite
of my garbling some of them in the original report.

I have updated the links for the next Mailman release.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4789
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4789] Documentation changes break existing URIs

2008-12-30 Thread Mark Sapiro

New submission from Mark Sapiro m...@msapiro.net:

The Mailman GUI contains a few links to the python.org documentation
which are now broken. These links and the current equivalents are:

http://www.python.org/doc/
works, but could map to http://docs.python.org/
http://www.python.org/doc/current/
works, but could map to http://docs.python.org/
http://www.python.org/doc/current/lib/
- http://docs.python.org/library/
http://www.python.org/doc/current/lib/module-re.htm
- http://docs.python.org/library/re.html
http://www.python.org/doc/current/lib/re-syntax
- http://docs.python.org/library/re.html#regular-expression-syntax
http://www.python.org/doc/current/lib/typesseq-strings.html
-
http://docs.python.org/library/stdtypes.html#string-formatting-operations

It would be really cool if these old URIs could redirect to the new ones.

--
assignee: georg.brandl
components: Documentation
messages: 78583
nosy: barry, georg.brandl, msapiro
severity: normal
status: open
title: Documentation changes break existing URIs
type: behavior
versions: Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4789
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: urllib2.HTTPError: HTTP Error 204: NoContent

2008-10-19 Thread Mark Sapiro
On Oct 19, 9:49 am, Philip Semanchuk [EMAIL PROTECTED] wrote:
 On Oct 19, 2008, at 6:13 AM, silk.odyssey wrote:

  I am getting the following error trying to download an html page using
  urllib2.

  urllib2.HTTPError: HTTP Error 204: NoContent

  The url is of this type:

 http://www.amazon.com/gp/offer-listing/B000KJX3A0%3FSubscriptionId%3D...

  I can open it in my browser without problems.Any ideas on a solution?

 Are you changing the user-agent? Some sites sniff user agents and  
 return different results to browsers than to suspected bots.


I tried it.

 import urllib2
 url = 
 'http://www.amazon.com/gp/offer-listing/B000KJX3A0%3FSubscriptionId%3D183VXJS74KNQ89D0NRR2%26tag%3Dws%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3DB000KJX3A0'
 op = urllib2.urlopen(url)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python2.5/urllib2.py, line 121, in urlopen
return _opener.open(url, data)
  File /usr/lib/python2.5/urllib2.py, line 380, in open
response = meth(req, response)
  File /usr/lib/python2.5/urllib2.py, line 491, in http_response
'http', request, response, code, msg, hdrs)
  File /usr/lib/python2.5/urllib2.py, line 418, in error
return self._call_chain(*args)
  File /usr/lib/python2.5/urllib2.py, line 353, in _call_chain
result = func(*args)
  File /usr/lib/python2.5/urllib2.py, line 499, in
http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 204: NoContent
 headers = {}
 headers['User-Agent'] = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; 
 rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3'
 ro = urllib2.Request(url, None, headers)
 op = urllib2.urlopen(ro)
 page = op.read()
 page
 (lots of HTML)

So the answer is as Philip suggests - amazon.com doesn't like 'Python-
urllib/2.5' as a User-Agent. You have to give it something that looks
like a browser.

--
(for email use this address please - you can figure it out)

Mark Sapiro mark at msapiro net   Any clod can have the facts;
San Francisco Bay Area, Californiahaving opinions is an art. -
  C. McCabe, The Fearless
Spectator
--
http://mail.python.org/mailman/listinfo/python-list


Re: [Mailman-Developers] Parsing and Rendering rfc8222

2006-07-04 Thread Mark Sapiro
Brad Knowles wrote:

Ethan said:

 I plan on using [2] to generate mbox thread indexes for rapid navigation
 of lists. Any suggestions for more robust variants would be welcome;
 feedback on how to handle threading for message-id-less messages would
 also be welcome.

All messages should have message-ids -- this is one of the most basic
requirements of the Internet e-mail related RFCs.  If nothing else, the
local MTA on the Mailman server should have provided a message-id.

I interpreted Ethan's concern to be messages that lack References: and
In-Reply-To: rather than a Message-ID: per se. Also, generating a
Message-ID: at archiving time does no good (at least in the absence of
an archive interface to allow replying to an archive post) because
it's too late to get that id into References: and/or In-Reply-To: of
email replies.

Also there is a related issue if A posts, B replies, A replies off list
to B, and B replies on list. If threading relies solely on References:
or In-Reply-To:, and either A's or B's MUA generates only In-Reply-To,
this thread is broken at the 'missing' post. I don't have any really
good suggestions for alternative threading algorithms however. I think
there was something on this not too long ago on mailman-users or maybe
mailman-developers - I looked and found what I think I remember. The
relevant post is at
http://mail.python.org/pipermail/mailman-developers/2005-January/017660.html
and points to a description of an algorithm at
http://www.jwz.org/doc/threading.html.

-- 
Mark Sapiro [EMAIL PROTECTED]   The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

-- 
http://mail.python.org/mailman/listinfo/python-list


email.Utils.parseaddr() gives arcane result

2006-01-09 Thread Mark Sapiro
email.Utils.parseaddr('Real Name ((comment)) [EMAIL PROTECTED]')

returns

('comment [EMAIL PROTECTED]', 'Real')

Granted the string above is invalid as RFC 2822 does not allow
parentheses within comments, but most mail agents seem to at least take
the contents of the angle brackets as the address.

rfc822.parseaddr() returns the same result in this case.

If these functions aren't going to return their respective failure
indication in this case, I think they should at least return
'[EMAIL PROTECTED]' as the second item of the returned tuple.

--
(for email use this address please - you can figure it out)

Mark Sapiro msapiro -at- value netAny clod can have the facts;
San Francisco Bay Area, Californiahaving opinions is an art. -
  C. McCabe, The Fearless Spectator

-- 
http://mail.python.org/mailman/listinfo/python-list