[issue39380] ftplib uses latin-1 as default encoding

2020-04-13 Thread STINNER Victor


STINNER Victor  added the comment:

It's now fixed. Thanks Eric V. Smith for suggesting the change and thanks 
Sebastian Pedersen for the very complete implementation (with tests and new 
constructor parameters!).

The default encoding change is documented properly in the "Changes in the 
Python API" section of What's New in Python 3.9.

Hopefully, it was already possible to override the encoding in Python 3.8 and 
older by modifying FTP.encoding class attribute.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-04-13 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset a1a0eb4a394a5ac7a8422616ce1ee4125a3ef74f by Sebastian Pedersen in 
branch 'master':
bpo-39380: Change ftplib encoding from latin-1 to utf-8 (GH-18048)
https://github.com/python/cpython/commit/a1a0eb4a394a5ac7a8422616ce1ee4125a3ef74f


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-03-22 Thread Sebastian G Pedersen


Sebastian G Pedersen  added the comment:

Yes, I will update the PR before the end of next week.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-03-20 Thread Giampaolo Rodola'


Giampaolo Rodola'  added the comment:

+1 from me as well. @SebastianGPedersen could you update the PR (constructor + 
doc changes)?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-03-20 Thread Eric V. Smith


Eric V. Smith  added the comment:

I agree with Victor.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-03-20 Thread STINNER Victor


STINNER Victor  added the comment:

I'm now fine with changing the default. But I would still prefer to have an 
encoding parameter in the constructor. Making these two changes at once now 
makes sense to me.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-03-15 Thread Sebastian G Pedersen


Sebastian G Pedersen  added the comment:

Thanks again for the engagement.
I am also in favor of adding the encoding to the constructor. However, 
following Giampaolo's comment, that utf-8 is standard and has been for a long 
time (and the RFC dating back to year 1999), I am also in favor of changing the 
default encoding in the same PR. As mentioned in the previous comments, most 
users use ASCII and will not be affected by this anyway - which is probably 
also why this issue hasn't been raised before.
Please let me know your thoughts on this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-27 Thread STINNER Victor


STINNER Victor  added the comment:

Before changing the default, I would prefer to see a PR adding an encoding 
parameter to the FTP constructor.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-26 Thread STINNER Victor


STINNER Victor  added the comment:

I'm in favor of changing the default encoding to UTF-8, but it requires good 
documentation, especially to provide a solution working on Python 3.8 and 3.9 
to change the encoding (see below).

--

The encoding is used to encode commands with the FTP server and decode the 
server replies. I expect that most replies are basically letters, digits and 
spaces. I guess that the most problematic commands are:

* send user and password
* decode filenames of LIST command reply
* encode filename in STOR command

I expect that the original FTP protocol doesn't specify any encoding and so 
that FTP server implementations took some freedom. I would not be surprised to 
use ANSI code pages used on servers running on Windows.

Currently, encoding is a class attribute: it's not convenient to override it 
depending on the host. I would prefer to have a new parameter for the 
constructor.

Giampaolo:
> some servers may enable UTF-8 only if client explicitly sends "OPTS UTF-8 ON" 
> first, but that is based on an draft RFC. Server implementors usually treat 
> this command as a no-op and simply assume UTF-8 as the default.
> With that said, I am -1 about implementing logic based on FEAT/OPTS: that 
> should be done before login, and at that point some servers may erroneously 
> reject any command other than USER, PASS and ACCT. 

Oh. In this case, always send "OPTS UTF-8 ON" just after the socket is 
connected sounds like a bad idea.


Sebastian:
> Since RFC 2640, the industry standard within FTP Clients is UTF-8 (see e.g. 
> FileZilla here: https://wiki.filezilla-project.org/Character_Encoding, or 
> WinSCP here: https://winscp.net/eng/docs/faq_utf8).

"Internationalization of the File Transfer Protocol" was published in 1999. It 
recommends the UTF-8. Following a RFC recommendation is good argument to change 
the default encoding to UTF-8.
https://tools.ietf.org/html/rfc2640


Giampaolo:
> Personally I think it makes more sense to just use UTF-8 without going 
> through a deprecation period

I concur. Deprecation is usually used for features which are going to be 
removed (module, function or function parameter). Here it's just about a 
default parameter value. I expect to have encoding="utf-8" default in the 
constructor.

The annoying part is that Python 3.8 only has a class attribute. The simplest 
option seems to be creating a FTP object, modify its encoding attribute and 
*then* logs in. Another options is to subclass the FTP class. IMO the worst is 
to modify ftplib.FTP.encoding attribute (monkey patch the module).

I expect that most users use username, password and filenames encodable to 
ASCII and so will not notify the change to UTF-8. We can document a solution 
working on all Python versions to use different encoding name.

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-26 Thread Inada Naoki


Inada Naoki  added the comment:

I'm not FTP user so I don't have strong opinion.
If it is too late to change already, change it in 3.9 might be OK.

> However, shouldn't it be a FutureWarning, so it will be reported by default 
> at initialisation?

If it is warning for end users, it should be FutureWarning.
If the warning is for developers, it should be DeprecationWarning.
Warning for API changes is DeprecationWarning in general.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-25 Thread Giampaolo Rodola'


Giampaolo Rodola'  added the comment:

It's been a long time since I implemented UTF-8 support in pyftpdlib, but long 
story short is that:
- most recent servers are supposed to use UTF-8 by default
- such servers must include "UTF-8" in the FEAT command response
- some servers may enable UTF-8 only if client explicitly sends "OPTS UTF-8 ON" 
first, but that is based on an draft RFC. Server implementors usually treat 
this command as a no-op and simply assume UTF-8 as the default.

With that said, I am -1 about implementing logic based on FEAT/OPTS: that 
should be done before login, and at that point some servers may erroneously 
reject any command other than USER, PASS and ACCT. 

Personally I think it makes more sense to just use UTF-8 without going through 
a deprecation period, document *encoding* attribute and mention that in order 
to deal with servers not supporting UTF8 you can pre-emptively check FEAT 
response and set ASCII encoding. But I am not a unicode expert, so I would like 
to hear some other opinion re. the implications of going from latin-1 to utf8 
in terms of potential code breakage.

--
nosy: +benjamin.peterson, ezio.melotti, lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-25 Thread Sebastian G Pedersen


Sebastian G Pedersen  added the comment:

Thank you for the feedback.

I will elaborate a little bit on the reasons and thoughts behind the pull 
request:
Since RFC 2640, the industry standard within FTP Clients is UTF-8 (see e.g. 
FileZilla here: https://wiki.filezilla-project.org/Character_Encoding, or 
WinSCP here: https://winscp.net/eng/docs/faq_utf8).

Given this, I believe the majority of the users that have not investigated the 
code wrongly assumes UTF-8 encoding for ftplib as well (as it is now).

I am new to contributing, and not sure, how much deprecation warnings are used 
(I simply followed the previous encoding change on ftplib), so I will change it 
based on the feedback. However, shouldn't it be a FutureWarning, so it will be 
reported by default at initialisation?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-24 Thread Dong-hee Na


Dong-hee Na  added the comment:

I agree with inada.naoki

--
nosy: +corona10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-24 Thread Inada Naoki


Inada Naoki  added the comment:

Can we have some deprecation period?

def __init__(self, ..., encoding=None):
...
if encoding is not None:
self.encoding = encoding
else:
warnings.warn("The default encoding of the FTP class will be changed 
from 'latin1' to 'utf-8' in 3.11. Pass the encoding explicitly for now.",
DeprecationWarning)

--
nosy: +inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-23 Thread Eric V. Smith


New submission from Eric V. Smith :

What's the reason behind this change? I love UTF-8 and all, but is there some 
standard (de facto or de jure) that discusses this?

What does this change fix? What's the implication to existing clients?

I'm not opposed to the change per se, but I'd like to understand the reasoning.

--
nosy: +eric.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-18 Thread SilentGhost


Change by SilentGhost :


--
versions:  -Python 3.5, Python 3.6, Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-18 Thread Sebastian G Pedersen


Change by Sebastian G Pedersen :


--
keywords: +patch
pull_requests: +17443
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/18048

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-18 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +giampaolo.rodola

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39380] ftplib uses latin-1 as default encoding

2020-01-18 Thread Sebastian G Pedersen


Change by Sebastian G Pedersen :


--
components: Library (Lib)
nosy: SebastianGPedersen
priority: normal
severity: normal
status: open
title: ftplib uses latin-1 as default encoding
type: behavior
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com