Re: [Python-Dev] Can I submit more support of standard library for VxWorks after 3.8.0 beta1?

2019-03-19 Thread Terry Reedy
One main purpose of the beta period is to discover and fix bugs and 
otherwise tweek behavior in the new features included in the first beta. 
 This usually presupposes that the feature is thought to be 
'ready-to-go' as-is, absent new discoveries.


Interpretation and issuance of exceptions is usually up to the release 
manager.  For 3.8, this is Łukasz Langa,  Victor Stinner is otherwise 
most involved with platform support.


FWIW, I personally might allow 'suppport added by beta 1' to include 
failures in up to 4 modules to be fixed as bugs as soon as possible.


On 3/19/2019 1:39 PM, Victor Stinner wrote:
You have to find someone to review your PRs. It takes time. We are all 
volunteers. I looked and merged some VxWorks PRs.


Would it be possible to pay a core dev to do the reviews? That's an open 
question for core devs and for WindRiver.


I think that this is an excellent idea (as in 'likely needed to meet the 
deadline').  (And no, I am not a candidate for the contract.)


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Nathaniel Smith
On Tue, Mar 19, 2019 at 3:43 PM Giampaolo Rodola'  wrote:
>
> On Tue, Mar 19, 2019 at 6:21 PM Guido van Rossum  wrote:
>
> >> On Tue, Mar 19, 2019 at 10:14 AM Giampaolo Rodola'  
> >> wrote:
> >> Technically you cannot make it 100% safe, only less likely to occur.
> > That seems unnecessarily pedantic. A cryptographic random generator, like 
> > the one in the secrets module, *is* safe enough for all practical purposes 
> > (with a large margin).
> > [...]
> > Hm, the random sequence (implemented in tempfile._RandomNameSequence) is 
> > currently derived from the random module, which is not cryptographically 
> > secure. Maybe all we need to do is replace its source of randomness with 
> > one derived from the secrets module. That seems a one-line change.
>
> Seems to make sense to me. Currently the random string is 8 chars
> long, meaning (if I'm not mistaken) max 40320 possible combinations:
>
> >>> len(list(itertools.permutations('wuoa736m'
> 40320

It's 8 chars taken from a-z0-9_, so there are 37**8 ~= 10**12 possible
names, or ~41 bits.

> We may use 9 chars and get to 362880 and increase "security".
> Regardless, it seems to me the original deprecation warning for
> mktemp() was not really justified and should be removed.

Historically, mktemp variants have caused *tons* of serious security
vulnerabilities. It's not a theoretical issue. See e.g.
https://www.owasp.org/index.php/Insecure_Temporary_File

I believe that if we used sufficient unguessable randomness to
generate the name, then that would resolve the issues and let us
un-deprecate it. Though like Brett I would like to double-check this
with security specialists :-).

This would also simplify the implementation a *lot*, down to just:

def mktemp(suffix='', prefix='tmp', dir=None):
if dir is None:
dir = gettempdir()
return _os.path.join(dir, prefix +
secrets.token_urlsafe(ENTROPY_BYTES) + suffix)

The choice of ENTROPY_BYTES is an interesting question. 16 (= 128
bits) would be a nice "obviously safe" number, and gives 22-byte
filenames. We might be able to get away with fewer, if we had a
plausible cost model for the attack. This is another point where a
security specialist might be helpful :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Can I submit more support of standard library for VxWorks after 3.8.0 beta1?

2019-03-19 Thread Kuhl, Brian
Hi Victor,
I pinged our product manager and he’s open to the idea of setting up a 
consulting  arrangement with a core developer to help move things along.
At least in principal.
If anyone on the core team is interested, and reasonably unaligned with Wind 
River’s competitors, please contact me off list, Brian dot Kuhl at Wind River 
dot com.

Brian

From: Python-Dev 
[mailto:python-dev-bounces+brian.kuhl=windriver@python.org] On Behalf Of 
Victor Stinner
Sent: Tuesday, March 19, 2019 1:40 PM
To: Xin, Peixing 
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Can I submit more support of standard library for 
VxWorks after 3.8.0 beta1?

You have to find someone to review your PRs. It takes time. We are all 
volunteers. I looked and merged some VxWorks PRs.

Would it be possible to pay a core dev to do the reviews? That's an open 
question for core devs and for WindRiver.

Victor

Le mardi 19 mars 2019, Xin, Peixing 
mailto:peixing@windriver.com>> a écrit :
> Hi, Experts:
>
>
>
> Seeing from the Python 3.8.0 
> schedule(https://www.python.org/dev/peps/pep-0569/#schedule), new features 
> will not be allowed to submit after 3.8.0 beta1. For VxWorks RTOS platform 
> supporting CPython, we are using 
> bpo-31904(https://bugs.python.org/issue31904) for PRs to submit our codes. 
> Now I want to know whether we can add more standard library support for 
> VxWorks AFTER 3.8.0 beta1?
>
>
>
> 
>
>
>
> Thanks,
>
> Peixing
>
>

--
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread eryk sun
On 3/19/19, Victor Stinner  wrote:
>
> When I write tests, I don't really care of security, but
> NamedTemporaryFile caused me many troubles on Windows: you cannot
> delete a file if it's still open in a another program. It's way more
> convenient to use tempfile.mktemp().

Opening the file again for normal access is problematic.
NamedTemporaryFile opens it with delete access, but Python's open()
function doesn't support delete-access sharing unless an opener is
used that calls CreateFileW.

NamedTemporaryFile does open files with delete-access sharing, so any
process can delete the file if it's allowed by the file's security and
attributes. You may be thinking of unlinking. In Windows versions
prior to 10, that's always a two-step process. A file with its delete
disposition set doesn't get unlinked until all references for all open
instances are closed.

In Windows 10 (release 1709+), we have the option of using
SetFileInformationByHandle: FileDispositionInfoEx (21) with
FILE_DISPOSITION_FLAG_POSIX_SEMANTICS (2) and
FILE_DISPOSITION_FLAG_DELETE (1). The online documentation hasn't been
updated to include this, but it's supported in the headers for
_WIN32_WINNT_WIN10_RS1 and later. This operation unlinks the file as
soon as we close our handle, even if it has existing references. This
is explained in the remarks for the underlying NT system call [1]. In
particular this resolves the race condition related to handles opened
by anti-malware programs.

It may be worth adding support for deleting files by handle that tries
FileDispositionInfoEx in 1709+. This will work in about half of all
Windows systems. (About 40% still run Windows 7.) It's not a panacea
for Windows file-deleting woes. We still need to be able to open the
file with delete access, which requires existing opens to share delete
access.

[1]: 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ntddk/ns-ntddk-_file_disposition_information_ex
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Giampaolo Rodola'
On Tue, Mar 19, 2019 at 6:21 PM Guido van Rossum  wrote:

>> On Tue, Mar 19, 2019 at 10:14 AM Giampaolo Rodola'  
>> wrote:
>> Technically you cannot make it 100% safe, only less likely to occur.
> That seems unnecessarily pedantic. A cryptographic random generator, like the 
> one in the secrets module, *is* safe enough for all practical purposes (with 
> a large margin).
> [...]
> Hm, the random sequence (implemented in tempfile._RandomNameSequence) is 
> currently derived from the random module, which is not cryptographically 
> secure. Maybe all we need to do is replace its source of randomness with one 
> derived from the secrets module. That seems a one-line change.

Seems to make sense to me. Currently the random string is 8 chars
long, meaning (if I'm not mistaken) max 40320 possible combinations:

>>> len(list(itertools.permutations('wuoa736m'
40320

We may use 9 chars and get to 362880 and increase "security".
Regardless, it seems to me the original deprecation warning for
mktemp() was not really justified and should be removed. IMO it
probably makes more sense to just clarify in the doc that
NamedTemporaryFile is the right / safe way to create a file with a
unique name and avoid the theoretical, rare name collision you'd get
by using open(mktemp(), 'w') instead. Also I'm not sure I understand
what the code sample provided in the doc aims to achieve:
https://docs.python.org/3/library/tempfile.html#tempfile.mktemp
The way I see it, the main (or "correct") use case for mktemp() is
when you explicitly want a file name which does not exist, in order to
exercise legitimate scenarios like these ones:

>>> self.assertRaises(FileNotFoundError, os.unlink, tempfile.mktemp())

>>> dst = tempfile.mktemp()
>>> os.rename(src, dst)
>>> assert not os.path.exists(src) and os.path.exists(dst)

-- 
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Greg Ewing

Antoine Pitrou wrote:

Does it always work? According to the docs, """Whether the name can be
used to open the file a second time, while the named temporary file is
still open, varies across platforms


So use NamedTemporaryFile(delete = False) and close it before passing
it to the other program.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Gregory P. Smith
On Tue, Mar 19, 2019, 4:53 AM Ned Batchelder  wrote:

> On 3/19/19 4:13 AM, Serhiy Storchaka wrote:
> > 19.03.19 00:41, Raymond Hettinger пише:
> >> 4) Fix the tests in the third-party modules to be more focused on
> >> their actual test objectives, the semantics of the generated XML
> >> rather than the exact serialization.  This option would seem like the
> >> right-thing-to-do but it isn't trivial because the entire premise of
> >> the existing test is invalid.  For every case, we'll actually have to
> >> think through what the test objective really is.
> >
>
> Option 4 is misleading.  Is anyone here really offering to "fix the
> tests in third-party modules"?  Option 4 is actually, "do nothing, and
> let a multitude of projects figure out how to fix their tests, slowing
> progress in those projects as they try to support Python 3.8."
>

We've done Option 4 for every past behavior change of any form on feature
.Next releases.  We do try to encourage projects to run their tests on the
3.Next betas so that they can be ready before 3.Next.0 lands, some of us
even do it ourselves when we're interested.  Many things won't get ready
ahead of time, but the actual .0 release forces the issue as their users
start demanding it on occasion offering patches.  We don't bock a release
on existing user code being ready for it.

In my case, the test code has a generic function to compare an actual
> directory of files to an expected directory of files, so it isn't quite
> as simple as "just use the right XML comparison."  And I support Python
> 2.7, 3.5, etc, so tests still need to work under those versions.  None
> of this is impossible, but please try not to preach to us maintainers
> that we are doing it wrong, that it will be easy to fix, etc.  Using
> language like "the entire premise of the test is invalid" seems
> needlessly condescending.
>

Agreed, that was poor wording.  Lets not let that type of wording escape
python-dev into docs about a behavior change.

Wording aside, a test relying on undefined behavior is testing for things
the code under test doesn't actually need to care about being true, even if
it has happened to work for years.  Such a test is overspecified.
Potentially without the authors previously consciously realizing that.
It'll need refactoring to loosen its requirements.  How to loosen it is
always an implementation detail based on the constraints imposed upon the
test.  Difficulty lies within range(0, "Port Mercurial to Python 3").  But
the end result is nice: The code is healthier as tests focus more on what
was actually important rather than what was quicker to write that got the
original job done many years ago.

One of the suggested solutions, a DOM comparison is not enough. I
> don't just want to know that my actual XML is different than my expected
> XML.  I want to know where and how it differs.
>
> Textual comparison may be the "wrong" way to check XML, but it gives me
> many tools for working with the test results.  It was simple and it
> worked.  Now in Python 3.8, because Python doesn't want to add an
> optional flag to continue doing what it has always done, I need to
> re-engineer my tests.
>
> --Ned.
>

I understand that from a code owners perspective having to do any work, no
matter what the reason, is counted as re-engineering.  But that doesn't
make it wrong.  If "what it has always done" was unspecified and arbitrary
despite happening to not change in the past rather than something easy to
continue the stability of such as sorted attributes, it is a burden to
maintain that **unspecifiable** behavior in the language or library going
forward.

(Note that I have no idea what order the xml code in question happened to
impose upon attributes; if it went from sorted to not a "fix" to provide
users is clear: provide a way to keep it sorted for those who need that.
If it relied on insertion order or hash table iteration order or the phase
of the moon when the release was cut, we are right to refuse to maintain
unspecifiable implementation side effect behavior)

-gps
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Best way to specify docstrings for member objects

2019-03-19 Thread Raymond Hettinger



> On Mar 19, 2019, at 1:52 PM, MRAB  wrote:
> 
> Thinking ahead, could there ever be anything else that you might want also to 
> attach to member objects?

Our experience with property object suggests that once docstrings are 
supported, there don't seem to be any other needs.   But then, you never can 
tell ;-)


Raymond


"Difficult to see. Always in motion is the future." -- Master Yoda


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Best way to specify docstrings for member objects

2019-03-19 Thread MRAB

On 2019-03-19 18:55, Raymond Hettinger wrote:

I'm working on ways to make improve help() by giving docstrings to member 
objects.

One way to do it is to wait until after the class definition and then make 
individual, direct assignments to __doc__ attributes.This way widely the 
separates docstrings from their initial __slots__ definition.   Working 
downstream from the class definition feels awkward and doesn't look pretty.

There's another way I would like to propose¹.  The __slots__ definition already 
works with any iterable including a dictionary (the dict values are ignored), 
so we could use the values for the  docstrings.

This keeps all the relevant information in one place (much like we already do 
with property() objects).  This way already works, we just need a few lines in 
pydoc to check to see if a dict if present.  This way also looks pretty and 
doesn't feel awkward.

I've included worked out examples below.  What do you all think about the 
proposal?


[snip]

Thinking ahead, could there ever be anything else that you might want 
also to attach to member objects?


I suppose that if that's ever the case, the value could itself be 
expanded to be a dict, something like this:


__slots__ = {'mu' : {'__doc__': 'Arithmetic mean.'}, 'sigma': 
{'__doc__': 'Standard deviation.'}}


But that could be left to the future...
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Tim Delaney
On Wed, 20 Mar 2019 at 00:29, Serhiy Storchaka  wrote:

> 19.03.19 15:10, Tim Delaney пише:
> > Now Calibre is definitely in the wrong here - it should be able to
> > import regardless of the order of attributes. But the fact is that there
> > are a lot of tools out there that are semi-broken in a similar manner.
>
> Is not Calibre going to seat on Python 2 forever? This makes it
> non-relevant to the discussion about Python 3.8.
>

I was simply using Calibre as an example of a tool I'd encountered recently
that works correctly with input files with attributes in one order, but not
the other. That it happens to be using Python (of any vintage) is
irrelevant - could have been written in C, Go, Lua ... same problem that
XML libraries that arbitrarily sort (or otherwise manipulate the order of)
attributes can result in files that may not work with third-party tools.

Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Best way to specify docstrings for member objects

2019-03-19 Thread Abdur-Rahmaan Janhangeer
I have the impression that the line between variables and docs is a tidbit
too much blurred.

Yours,

Abdur-Rahmaan Janhangeer
Mauritius
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Best way to specify docstrings for member objects

2019-03-19 Thread Raymond Hettinger
I'm working on ways to make improve help() by giving docstrings to member 
objects.

One way to do it is to wait until after the class definition and then make 
individual, direct assignments to __doc__ attributes.This way widely the 
separates docstrings from their initial __slots__ definition.   Working 
downstream from the class definition feels awkward and doesn't look pretty.  

There's another way I would like to propose¹.  The __slots__ definition already 
works with any iterable including a dictionary (the dict values are ignored), 
so we could use the values for the  docstrings.   

This keeps all the relevant information in one place (much like we already do 
with property() objects).  This way already works, we just need a few lines in 
pydoc to check to see if a dict if present.  This way also looks pretty and 
doesn't feel awkward.

I've included worked out examples below.  What do you all think about the 
proposal?


Raymond


¹ https://bugs.python.org/issue36326


== Desired help() output ==

>>> help(NormalDist)
Help on class NormalDist in module __main__:

class NormalDist(builtins.object)
 |  NormalDist(mu=0.0, sigma=1.0)
 |  
 |  Normal distribution of a random variable
 |  
 |  Methods defined here:
 |  
 |  __init__(self, mu=0.0, sigma=1.0)
 |  NormalDist where mu is the mean and sigma is the standard deviation.
 |  
 |  cdf(self, x)
 |  Cumulative distribution function.  P(X <= x)
 |  
 |  pdf(self, x)
 |  Probability density function.  P(x <= X < x+dx) / dx
 |  
 |  --
 |  Data descriptors defined here:
 |  
 |  mu
 |  Arithmetic mean.
 |  
 |  sigma
 |  Standard deviation.
 |  
 |  variance
 |  Square of the standard deviation.



== Example of assigning docstrings after the class definition ==

class NormalDist:
'Normal distribution of a random variable'

__slots__ = ('mu', 'sigma')

def __init__(self, mu=0.0, sigma=1.0):
'NormalDist where mu is the mean and sigma is the standard deviation.'
self.mu = mu
self.sigma = sigma

@property
def variance(self):
'Square of the standard deviation.'
return self.sigma ** 2.

def pdf(self, x):
'Probability density function.  P(x <= X < x+dx) / dx'
variance = self.variance
return exp((x - self.mu)**2.0 / (-2.0*variance)) / sqrt(tau * variance)

def cdf(self, x):
'Cumulative distribution function.  P(X <= x)'
return 0.5 * (1.0 + erf((x - self.mu) / (self.sigma * sqrt(2.0

NormalDist.mu.__doc__ = 'Arithmetic mean'
NormalDist.sigma.__doc__ = 'Standard deviation'



== Example of assigning docstrings with a dict =

class NormalDist:
'Normal distribution of a random variable'

__slots__ = {'mu' : 'Arithmetic mean.', 'sigma': 'Standard deviation.'}

def __init__(self, mu=0.0, sigma=1.0):
'NormalDist where mu is the mean and sigma is the standard deviation.'
self.mu = mu
self.sigma = sigma

@property
def variance(self):
'Square of the standard deviation.'
return self.sigma ** 2.

def pdf(self, x):
'Probability density function.  P(x <= X < x+dx) / dx'
variance = self.variance
return exp((x - self.mu)**2.0 / (-2.0*variance)) / sqrt(tau * variance)

def cdf(self, x):
'Cumulative distribution function.  P(X <= x)'
return 0.5 * (1.0 + erf((x - self.mu) / (self.sigma * sqrt(2.0

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Stefan Behnel
Ned Batchelder schrieb am 19.03.19 um 12:53:
> I need to re-engineer my tests.

… or sort the attributes before serialisation, or use C14N always, or
change your code to create the attributes in sorted-by-name order. The new
behaviour allows for a couple of ways to deal with the issue of backwards
compatibility.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Can I submit more support of standard library for VxWorks after 3.8.0 beta1?

2019-03-19 Thread Brett Cannon
On Mon, Mar 18, 2019 at 7:22 PM Xin, Peixing 
wrote:

> Hi, Experts:
>
>
>
> Seeing from the Python 3.8.0 schedule(
> https://www.python.org/dev/peps/pep-0569/#schedule), new features will
> not be allowed to submit after 3.8.0 beta1. For VxWorks RTOS platform
> supporting CPython, we are using bpo-31904(
> https://bugs.python.org/issue31904) for PRs to submit our codes. Now I
> want to know whether we can add more standard library support for VxWorks
> AFTER 3.8.0 beta1?
>

If the question is whether we as core devs will merge PRs that extend
support in the stdlib, that will be up to the release manager. Since it
technically isn't a feature enhancement but a "bugfix" for VxWorks then it
will probably come down to what changes are required.

-Brett


>
>
>
>
> Thanks,
>
> Peixing
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Brett Cannon
On Tue, Mar 19, 2019 at 10:22 AM Guido van Rossum  wrote:

> On Tue, Mar 19, 2019 at 10:14 AM Giampaolo Rodola' 
> wrote:
>
>>
>> On Tue, 19 Mar 2019 at 17:47, Sebastian Rittau 
>> wrote:
>>
>>> Am 19.03.19 um 17:23 schrieb Giampaolo Rodola':
>>> > @Sebastian
>>> >> If there are valid use cases for mktemp(), I recommend renaming
>>> >> it to mkname_unsafe() or something equally obvious.
>>> > I'm -1 about adding an alias (there should be one and preferably only
>>> > one way to do it). Also mkstemp() and mkdtemp() are somewhat poorly
>>> > named IMO, but I wouldn't add an alias for them either.
>>> >
>>> Just to clarify: I was not suggesting creating an alias, I was suggesting
>>> renaming the function, but keeping the old name for a normal
>>> deprecation cycle.
>>>
>>> But I had another thought: If I understand correctly, the exploitability
>>> of mktemp() relies on the fact that between determining whether the
>>> file exists and creation an attacker can create the file themselves.
>>> Couldn't this problem be solved by generating a filename of sufficient
>>> length using the secrets module? This way the filename should be
>>> "unguessable" and safe.
>>
>>
>> Technically you cannot make it 100% safe, only less likely to occur.
>>
>
> That seems unnecessarily pedantic. A cryptographic random generator, like
> the one in the secrets module, *is* safe enough for all practical purposes
> (with a large margin).
>
>
>> And on a second thought (I retract :)) since this could be used in real
>> apps other than tests (I was too focused on that) I think this should be a
>> doc warning after all, not info. Doc may suggest to use mode=x when
>> creating the file, in order to remove the security implications.
>>
>
> Hm, the random sequence (implemented in tempfile._RandomNameSequence) is
> currently derived from the random module, which is not cryptographically
> secure. Maybe all we need to do is replace its source of randomness with
> one derived from the secrets module. That seems a one-line change.
>

If the only security vulnerability is due to the ability to guess a path
that would make sense (but I honestly don't know since I'm not a security
expert).

If Guido's suggestion isn't enough then I think that long with a rename of
the function to make it obvious that there's potential issues and
deprecating the old name makes the most sense.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Stefan Behnel
Nathaniel Smith schrieb am 19.03.19 um 00:15:
> That seems potentially simpler to implement than canonical XML
> serialization

C14N is already implemented for ElementTree, just needs to be ported to
Py3.8 and merged.

https://bugs.python.org/issue13611

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Brett Cannon
On Tue, Mar 19, 2019 at 6:15 AM Serhiy Storchaka 
wrote:

> 19.03.19 13:53, Ned Batchelder пише:
> > Option 4 is misleading.  Is anyone here really offering to "fix the
> > tests in third-party modules"?  Option 4 is actually, "do nothing, and
> > let a multitude of projects figure out how to fix their tests, slowing
> > progress in those projects as they try to support Python 3.8."
>
> Any option except option 1 (and option 2 with sorting by default)
> requires changing third-party code. You should either pass additional
> argument to serialization functions, or use special canonization functions.
>
> We should look at the problem from long perspective. Freezing the
> current behavior forever does not look good. If we need to break the
> user code, we should minimize the harm and provide convenient tools for
> reproducing the current behavior. And this is an opportunity to rewrite
> user tests in more appropriate form. In your case textual comparison may
> be the most appropriate form, but this may be not so in other cases.
>

In situations like this I think it's best to bite the bullet sooner rather
than later while acknowledging that folks like Ned are in a bind when they
have support older versions and thus have long-term support costs, too, and
try to make the transition as painless as possible (my guess is Ned's need
to support older versions will drop off faster than us having to support
the xml libraries in the stdlib going forward, hence my viewpoint).


>
> > Now in Python 3.8, because Python doesn't want to add an
> > optional flag to continue doing what it has always done, I need to
> > re-engineer my tests.
>
> Please wait yet some time. I hope to add canonicalization before the
> first beta.
>

For me I think canonicalization/stable pretty-print is the best option,
especially if we can put the canonicalization code up on PyPI for
supporting older versions of Python. Otherwise a function that does
something like an XOR to help diagnose what differs between 2 XML documents
is also seems like a good option to me.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Can I submit more support of standard library for VxWorks after 3.8.0 beta1?

2019-03-19 Thread Victor Stinner
You have to find someone to review your PRs. It takes time. We are all
volunteers. I looked and merged some VxWorks PRs.

Would it be possible to pay a core dev to do the reviews? That's an open
question for core devs and for WindRiver.

Victor

Le mardi 19 mars 2019, Xin, Peixing  a écrit :
> Hi, Experts:
>
>
>
> Seeing from the Python 3.8.0 schedule(
https://www.python.org/dev/peps/pep-0569/#schedule), new features will not
be allowed to submit after 3.8.0 beta1. For VxWorks RTOS platform
supporting CPython, we are using bpo-31904(
https://bugs.python.org/issue31904) for PRs to submit our codes. Now I want
to know whether we can add more standard library support for VxWorks AFTER
3.8.0 beta1?
>
>
>
> 
>
>
>
> Thanks,
>
> Peixing
>
>

-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What is the workflow for announcing tier 2 platform support?

2019-03-19 Thread Victor Stinner
Hi,

I don't think that we can say that Python supports VxWorks yet. Many PRs
are not merged yet. You are free to disrribute a modified version with your
changes.

The PEP 11 lists conditions to fully support a platform:
https://www.python.org/dev/peps/pep-0011/

My summary is: all tests must pass (it's ok to skip a few), a buildbot must
run and a core dev should handle any regression.

I expect that VxWorks will better fit in the "Best Effort" category: only
fix issues when someone proposes a fix.

See also my notes:
https://pythondev.readthedocs.io/platforms.html

Victor


Le mardi 19 mars 2019, Xin, Peixing  a écrit :
> Hi, Experts:
>
>
>
> VxWorks RTOS will release a new version at the end of June. For this new
version, we(Wind River) plan to announce the Python support officially here
(https://www.python.org/download/other/) for VxWorks as tier 2 platform.
Now my 2 questions come:
>
> 1.   To be qualified as tier 2 platform here
https://www.python.org/download/other/, do I need to get all test cases
pass on VxWorks RTOS? Now we have the following PRs for VxWorks. With these
PRs merged, we can get around 70% test cases PASS.
>
> 
>
> 2.   If we reach the announcement level, what is the workflow to
update the website statements here https://www.python.org/download/other/.
The following picture shows what we want to show.
>
>
>
> 
>
>
>
> Thanks,
>
> Peixing
>
>

-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Sebastian Krause
Guido van Rossum  wrote:
> If all you need is a random name, why not just use a random number
> generator?
> E.g. I see code like this:
>
> binascii.hexlify(os.urandom(8)).decode('ascii')

I tend to use os.path.join(dir, str(uuid.uuid4())) if I need to
create a random filename to pass to another program. However, it
would be nice to have a standard helper function that also allows me
to specify a prefix and suffix. Shouldn't it be possible to just
modify tempfile.mktemp() to generate much longer random filenames so
that it is virtually impossible that another program has already
created a file with the same name? Then the security problem is
gone, there is no need to continue deprecating this function and all
programs using it should continue to work.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Guido van Rossum
On Tue, Mar 19, 2019 at 10:14 AM Giampaolo Rodola' 
wrote:

>
> On Tue, 19 Mar 2019 at 17:47, Sebastian Rittau  wrote:
>
>> Am 19.03.19 um 17:23 schrieb Giampaolo Rodola':
>> > @Sebastian
>> >> If there are valid use cases for mktemp(), I recommend renaming
>> >> it to mkname_unsafe() or something equally obvious.
>> > I'm -1 about adding an alias (there should be one and preferably only
>> > one way to do it). Also mkstemp() and mkdtemp() are somewhat poorly
>> > named IMO, but I wouldn't add an alias for them either.
>> >
>> Just to clarify: I was not suggesting creating an alias, I was suggesting
>> renaming the function, but keeping the old name for a normal
>> deprecation cycle.
>>
>> But I had another thought: If I understand correctly, the exploitability
>> of mktemp() relies on the fact that between determining whether the
>> file exists and creation an attacker can create the file themselves.
>> Couldn't this problem be solved by generating a filename of sufficient
>> length using the secrets module? This way the filename should be
>> "unguessable" and safe.
>
>
> Technically you cannot make it 100% safe, only less likely to occur.
>

That seems unnecessarily pedantic. A cryptographic random generator, like
the one in the secrets module, *is* safe enough for all practical purposes
(with a large margin).


> And on a second thought (I retract :)) since this could be used in real
> apps other than tests (I was too focused on that) I think this should be a
> doc warning after all, not info. Doc may suggest to use mode=x when
> creating the file, in order to remove the security implications.
>

Hm, the random sequence (implemented in tempfile._RandomNameSequence) is
currently derived from the random module, which is not cryptographically
secure. Maybe all we need to do is replace its source of randomness with
one derived from the secrets module. That seems a one-line change.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Giampaolo Rodola'
On Tue, 19 Mar 2019 at 17:47, Sebastian Rittau  wrote:

> Am 19.03.19 um 17:23 schrieb Giampaolo Rodola':
> > @Sebastian
> >> If there are valid use cases for mktemp(), I recommend renaming
> >> it to mkname_unsafe() or something equally obvious.
> > I'm -1 about adding an alias (there should be one and preferably only
> > one way to do it). Also mkstemp() and mkdtemp() are somewhat poorly
> > named IMO, but I wouldn't add an alias for them either.
> >
> Just to clarify: I was not suggesting creating an alias, I was suggesting
> renaming the function, but keeping the old name for a normal
> deprecation cycle.
>
> But I had another thought: If I understand correctly, the exploitability
> of mktemp() relies on the fact that between determining whether the
> file exists and creation an attacker can create the file themselves.
> Couldn't this problem be solved by generating a filename of sufficient
> length using the secrets module? This way the filename should be
> "unguessable" and safe.


Technically you cannot make it 100% safe, only less likely to occur. And on
a second thought (I retract :)) since this could be used in real apps other
than tests (I was too focused on that) I think this should be a doc warning
after all, not info. Doc may suggest to use mode=x when creating the file,
in order to remove the security implications.

-- 
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Paul Moore
On Tue, 19 Mar 2019 at 16:47, Sebastian Rittau  wrote:
> But I had another thought: If I understand correctly, the exploitability
> of mktemp() relies on the fact that between determining whether the
> file exists and creation an attacker can create the file themselves.
> Couldn't this problem be solved by generating a filename of sufficient
> length using the secrets module? This way the filename should be
> "unguessable" and safe.

IMO, there's not much point trying to "fix" mktemp(). The issues with
it are clear and there are far better alternatives already available
for people who need them. The question here is simply about removing
the function "because people might mistakenly use it and create
security risks".

Personally, I don't think we should break the code of people who are
using mktemp() correctly, in awareness of its limitations, just out of
some idea of protecting people from themselves. Certainly we should
provide safe library functions wherever possible, but we should have
better reasons for removing functions that have been around for many,
many years than just "people might be using it wrongly".

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Sebastian Rittau

Am 19.03.19 um 17:23 schrieb Giampaolo Rodola':

@Sebastian

If there are valid use cases for mktemp(), I recommend renaming
it to mkname_unsafe() or something equally obvious.

I'm -1 about adding an alias (there should be one and preferably only
one way to do it). Also mkstemp() and mkdtemp() are somewhat poorly
named IMO, but I wouldn't add an alias for them either.


Just to clarify: I was not suggesting creating an alias, I was suggesting
renaming the function, but keeping the old name for a normal
deprecation cycle.

But I had another thought: If I understand correctly, the exploitability
of mktemp() relies on the fact that between determining whether the
file exists and creation an attacker can create the file themselves.
Couldn't this problem be solved by generating a filename of sufficient
length using the secrets module? This way the filename should be
"unguessable" and safe.

 - Sebastian

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Giampaolo Rodola'
On Tue, Mar 19, 2019 at 3:57 PM Antoine Pitrou  wrote:
>
>
> Le 19/03/2019 à 15:52, Guido van Rossum a écrit :
> >
> > If all you need is a random name, why not just use a random number
> > generator?
> > E.g. I see code like this:
> >
> > binascii.hexlify(os.urandom(8)).decode('ascii')
>
> mktemp() already does this, probably in a better way, including the
> notion of prefix, suffix, and parent directory.  Why should I have to
> reimplement it myself?

On top of that mktemp() tries exists() in a loop, diminishing the risk
of names collision.

-- 
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Giampaolo Rodola'
On Tue, Mar 19, 2019 at 2:06 PM Stéphane Wirtel  wrote:
>
> Hi,
>
> Context: raise a warning or remove tempfile.mktemp()
> BPO: https://bugs.python.org/issue36309
>
> Since 2.3, this function is deprecated in the documentation, just in the
> documentation. In the code, there is a commented RuntimeWarning.
> Commented by Guido in 2002, because the warning was too annoying (and I
> understand ;-)).
>
> So, in this BPO, we start to discuss about the future of this function
> and Serhiy proposed to discuss on the Python-dev mailing list.
>
> Question: Should we drop it or add a (Pending)DeprecationWarning?
>
> Suggestion and timeline:
>
> 3.8, we raise a PendingDeprecationWarning
> * update the code
> * update the documentation
> * update the tests
>   (check a PendingDeprecationWarning if sys.version_info == 3.8)
>
> 3.9, we change PendingDeprecationWarning to DeprecationWarning
>   (check DeprecationWarning if sys.version_info == 3.9)
>
> 3.9+, we drop tempfile.mktemp()
>
> What do you suggest?

I concur with others who think this should not be removed. It's used
in different stdlib and third party modules' test suites. I see
tempfile.mktemp() very similar to  test.support.find_unused_port() and
os.path.exists/isfile/isdir functions: they are all subject to race
conditions but still they are widely used and have reason to exist
(practicality beats purity). I suggest turning the doc deprecation
into a note:: or warning::, so that extra visibility is maintained.
Because the use case is legitimate and many fs-related APIs such as
this one are inevitably racy, I lean more towards a note:: rather than
a warning:: though, and we could probably do the same for os.path.is*
functions.

@Sebastian
> If there are valid use cases for mktemp(), I recommend renaming
> it to mkname_unsafe() or something equally obvious.

I'm -1 about adding an alias (there should be one and preferably only
one way to do it). Also mkstemp() and mkdtemp() are somewhat poorly
named IMO, but I wouldn't add an alias for them either.

-- 
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Anders Munch
Antoine Pitrou:
> And if there is an easy replacement, then how about re-implementing
> mktemp() using that replacement, instead of removing it?

Indeed.  The principal security issue with mktemp is the difficulty in creating 
a user-specific thing under a shared /tmp folder in a multi-user setup.

But if it hurts when you use /tmp, why use /tmp? Use a path with no 
world-accessible ancestor, or at least no world-writable ancestor.

On Windows, that means creating it somewhere under the CSIDL_LOCAL_APPDATA 
folder. Which is already the default for %TEMP% and %TMP%.
On Unix, it's a $HOME subfolder with access 700 or 600.
How about switching mktemp over to use that?

regards, Anders

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Antoine Pitrou
On Tue, 19 Mar 2019 15:12:06 +0100
Sebastian Rittau  wrote:
> Am 19.03.19 um 14:53 schrieb Victor Stinner:
> >
> > When I write tests, I don't really care of security, but
> > NamedTemporaryFile caused me many troubles on Windows: you cannot
> > delete a file if it's still open in a another program. It's way more
> > convenient to use tempfile.mktemp().
> >
> > O_EXCL, open(tmpname, "wx"), os.open(tmpname, os.O_CREAT | os.O_EXCL |
> > os.O_WRONLY), etc. can be used to get an error if the file already
> > exists.
> >
> > I agree that for production code where security matters,
> > tempfile.mktemp() must be avoided. But I would prefer to keep it for
> > tests.  
> 
> If there are valid use cases for mktemp(), I recommend renaming
> it to mkname_unsafe() or something equally obvious.
> [...]
> Adding a new function and following the deprecation process for the
> old one should only be a minor inconvenience for existing users that
> need it, should wake up existing users that should not use it in the
> first place, and still allows it to be used for relevant use cases.

That would be fine with me.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Antoine Pitrou

Le 19/03/2019 à 15:52, Guido van Rossum a écrit :
> 
> If all you need is a random name, why not just use a random number
> generator?
> E.g. I see code like this:
> 
>     binascii.hexlify(os.urandom(8)).decode('ascii')

mktemp() already does this, probably in a better way, including the
notion of prefix, suffix, and parent directory.  Why should I have to
reimplement it myself?

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Guido van Rossum
On Tue, Mar 19, 2019 at 6:27 AM Antoine Pitrou  wrote:

>
> -1.  Please don't remove tempfile.mktemp().  mktemp() is useful to
> create a temporary *name*.  All other tempfile functions create an
> actual file and impose additional burden, for example by making the
> file unaccessible by other processes.  But sometimes all I want is a
> temporary name that an *other* program will create / act on, not Python.
> It's a very common use case when writing scripts.
>
> The only reasonable workaround I can think of is to first create a
> temporary directory using mkdtemp(), then use a well-known name inside
> that directory.  But that has the same security implications AFAICT,
> since another process can come and create the file / symlink first.
>

If all you need is a random name, why not just use a random number
generator?
E.g. I see code like this:

binascii.hexlify(os.urandom(8)).decode('ascii')

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Antoine Pitrou
On Tue, 19 Mar 2019 15:10:40 +0100
Stéphane Wirtel  wrote:
> totally agree with you but this function is deprecated (2002) since 2.3,
> with a simle comment about a security issue.

"Deprecated" doesn't mean anything here.  It's just a mention in the
documentation.  It doesn't produce actual warnings when used.  And for
good reason: there are valid use cases.

> so, from today to 3.9+ there are approximatively 43 months -> 3,5 years.
> I think it's enough in term of time for the big projects to improve
> their code.

Please explain how the "improvement" would look like.  What is the
intended replacement for the use case I have explained, and how does it
improve on the statu quo?

And if there is an easy replacement, then how about re-implementing
mktemp() using that replacement, instead of removing it?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Paul Ganssle
I'm not sure the relationship with mkdir and mktemp here. I don't see
any uses of tempfile.mktemp in pip or setuptools, though they do use
os.mkdir (which is not deprecated).

Both pip and setuptools use pytest's tmpdir_factory.mktemp() in their
test suites, but I believe that is not the same thing.

On 3/19/19 9:39 AM, Antoine Pitrou wrote:
> On Tue, 19 Mar 2019 15:32:25 +0200
> Serhiy Storchaka  wrote:
>> 19.03.19 15:03, Stéphane Wirtel пише:
>>> Suggestion and timeline:
>>>
>>> 3.8, we raise a PendingDeprecationWarning
>>>  * update the code
>>>  * update the documentation
>>>  * update the tests
>>>(check a PendingDeprecationWarning if sys.version_info == 3.8)
>>>
>>> 3.9, we change PendingDeprecationWarning to DeprecationWarning
>>>(check DeprecationWarning if sys.version_info == 3.9)
>>>
>>> 3.9+, we drop tempfile.mktemp()  
>> This plan LGTM.
>>
>> Currently mkdir() is widely used in distutils, Sphinx, pip, setuptools, 
>> virtualenv, and many other third-party projects, so it will take time to 
>> fix all these places. But we should do this, because all this code 
>> likely contains security flaws.
> The fact that many projects, including well-maintained ones such Sphinx
> or pip, use mktemp(), may be a hint that replacing it is not as easy as
> the people writing the Python documentation seem to think.
>
> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Sebastian Rittau

Am 19.03.19 um 14:53 schrieb Victor Stinner:


When I write tests, I don't really care of security, but
NamedTemporaryFile caused me many troubles on Windows: you cannot
delete a file if it's still open in a another program. It's way more
convenient to use tempfile.mktemp().

O_EXCL, open(tmpname, "wx"), os.open(tmpname, os.O_CREAT | os.O_EXCL |
os.O_WRONLY), etc. can be used to get an error if the file already
exists.

I agree that for production code where security matters,
tempfile.mktemp() must be avoided. But I would prefer to keep it for
tests.


If there are valid use cases for mktemp(), I recommend renaming
it to mkname_unsafe() or something equally obvious. Experience
(and the list of packages still using mktemp() posted here) shows
that just adding a warning to documentation is not enough. Users
often discover functions by experimentation or looking at examples
on the internet.

mktemp() is also unfortunately named, as it does not create a temp
file as implied. This can also add to the impression that it is the
proper function to use.

Adding a new function and following the deprecation process for the
old one should only be a minor inconvenience for existing users that
need it, should wake up existing users that should not use it in the
first place, and still allows it to be used for relevant use cases.

I believe for security reasons sometimes inconvenient changes like
this are necessary.

 - Sebastian

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Stéphane Wirtel
totally agree with you but this function is deprecated (2002) since 2.3,
with a simle comment about a security issue.

2.3 -> 2.7, 3.0 -> 3.7, 13 releases and 17 years.

Maybe we could remove it with an official PendingDeprecationWarning.

Le 19/03/19 à 14:39, Antoine Pitrou a écrit :
> The fact that many projects, including well-maintained ones such Sphinx
> or pip, use mktemp(), may be a hint that replacing it is not as easy as
> the people writing the Python documentation seem to think.
What's the relation with the people writing the Python documentation?

The suggestion about the deprecation warning was proposed by Brett
Cannon, and Serhiy has proposed to deprecate this function with some
releases.

The final release for 3.8 is scheduled for October 2019
(PendingDeprecationWarning).
Maybe 3.9 will be released 18 months later (DeprecationWarning).
and maybe 3.10 or 4.0 will be released 18 months after 3.9.

so, from today to 3.9+ there are approximatively 43 months -> 3,5 years.
I think it's enough in term of time for the big projects to improve
their code.


Stéphane
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Victor Stinner
Hi,

I would prefer to keep tempfile.mktemp(), remove the deprecation, but
better explain the risk of race condition affecting security.

Le mar. 19 mars 2019 à 14:41, Chris Angelico  a écrit :
> Can't you create a NamedTemporaryFile and permit the other program to
> use it? I just tried that (with TiMidity, even though it's quite
> capable of just writing to stdout) and it worked fine.

When I write tests, I don't really care of security, but
NamedTemporaryFile caused me many troubles on Windows: you cannot
delete a file if it's still open in a another program. It's way more
convenient to use tempfile.mktemp().

O_EXCL, open(tmpname, "wx"), os.open(tmpname, os.O_CREAT | os.O_EXCL |
os.O_WRONLY), etc. can be used to get an error if the file already
exists.

I agree that for production code where security matters,
tempfile.mktemp() must be avoided. But I would prefer to keep it for
tests.

"with NamedTemporaryFile() as tmp: name = tmp.name" isn't a great
replacement for tempfile.mktemp(): it creates the file and it opens
it, whereas I only want a file name and be the first file to create
and open it.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Antoine Pitrou
On Wed, 20 Mar 2019 00:37:56 +1100
Chris Angelico  wrote:
> On Wed, Mar 20, 2019 at 12:25 AM Antoine Pitrou  wrote:
> >
> >
> > -1.  Please don't remove tempfile.mktemp().  mktemp() is useful to
> > create a temporary *name*.  All other tempfile functions create an
> > actual file and impose additional burden, for example by making the
> > file unaccessible by other processes.  But sometimes all I want is a
> > temporary name that an *other* program will create / act on, not Python.
> > It's a very common use case when writing scripts.
> >
> > The only reasonable workaround I can think of is to first create a
> > temporary directory using mkdtemp(), then use a well-known name inside
> > that directory.  But that has the same security implications AFAICT,
> > since another process can come and create the file / symlink first.  
> 
> Can't you create a NamedTemporaryFile and permit the other program to
> use it? I just tried that (with TiMidity, even though it's quite
> capable of just writing to stdout) and it worked fine.

Does it always work? According to the docs, """Whether the name can be
used to open the file a second time, while the named temporary file is
still open, varies across platforms (it can be so used on Unix; it
cannot on Windows NT or later)""".

tempfile.mktemp() is portable.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Stéphane Wirtel
and why not with a very long PendingDeprecationWarning? this warning
could be used in this case.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Antoine Pitrou
On Tue, 19 Mar 2019 15:32:25 +0200
Serhiy Storchaka  wrote:
> 19.03.19 15:03, Stéphane Wirtel пише:
> > Suggestion and timeline:
> > 
> > 3.8, we raise a PendingDeprecationWarning
> >  * update the code
> >  * update the documentation
> >  * update the tests
> >(check a PendingDeprecationWarning if sys.version_info == 3.8)
> > 
> > 3.9, we change PendingDeprecationWarning to DeprecationWarning
> >(check DeprecationWarning if sys.version_info == 3.9)
> > 
> > 3.9+, we drop tempfile.mktemp()  
> 
> This plan LGTM.
> 
> Currently mkdir() is widely used in distutils, Sphinx, pip, setuptools, 
> virtualenv, and many other third-party projects, so it will take time to 
> fix all these places. But we should do this, because all this code 
> likely contains security flaws.

The fact that many projects, including well-maintained ones such Sphinx
or pip, use mktemp(), may be a hint that replacing it is not as easy as
the people writing the Python documentation seem to think.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Chris Angelico
On Wed, Mar 20, 2019 at 12:25 AM Antoine Pitrou  wrote:
>
>
> -1.  Please don't remove tempfile.mktemp().  mktemp() is useful to
> create a temporary *name*.  All other tempfile functions create an
> actual file and impose additional burden, for example by making the
> file unaccessible by other processes.  But sometimes all I want is a
> temporary name that an *other* program will create / act on, not Python.
> It's a very common use case when writing scripts.
>
> The only reasonable workaround I can think of is to first create a
> temporary directory using mkdtemp(), then use a well-known name inside
> that directory.  But that has the same security implications AFAICT,
> since another process can come and create the file / symlink first.

Can't you create a NamedTemporaryFile and permit the other program to
use it? I just tried that (with TiMidity, even though it's quite
capable of just writing to stdout) and it worked fine.

>>> f = tempfile.NamedTemporaryFile(suffix=".flac")
>>> subprocess.check_call(["timidity", "-OF", "-o", f.name, 
>>> "Music/gp_peers.mid"])
... snip ...
Wrote 29645816/55940900 bytes(52.9949% compressed)
>>> data = f.read()
>>> len(data)
29645816

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Serhiy Storchaka

19.03.19 15:03, Stéphane Wirtel пише:

Suggestion and timeline:

3.8, we raise a PendingDeprecationWarning
 * update the code
 * update the documentation
 * update the tests
   (check a PendingDeprecationWarning if sys.version_info == 3.8)

3.9, we change PendingDeprecationWarning to DeprecationWarning
   (check DeprecationWarning if sys.version_info == 3.9)

3.9+, we drop tempfile.mktemp()


This plan LGTM.

Currently mkdir() is widely used in distutils, Sphinx, pip, setuptools, 
virtualenv, and many other third-party projects, so it will take time to 
fix all these places. But we should do this, because all this code 
likely contains security flaws.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Serhiy Storchaka

19.03.19 15:10, Tim Delaney пише:
Now Calibre is definitely in the wrong here - it should be able to 
import regardless of the order of attributes. But the fact is that there 
are a lot of tools out there that are semi-broken in a similar manner.


Is not Calibre going to seat on Python 2 forever? This makes it 
non-relevant to the discussion about Python 3.8.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Antoine Pitrou

-1.  Please don't remove tempfile.mktemp().  mktemp() is useful to
create a temporary *name*.  All other tempfile functions create an
actual file and impose additional burden, for example by making the
file unaccessible by other processes.  But sometimes all I want is a
temporary name that an *other* program will create / act on, not Python.
It's a very common use case when writing scripts.

The only reasonable workaround I can think of is to first create a
temporary directory using mkdtemp(), then use a well-known name inside
that directory.  But that has the same security implications AFAICT,
since another process can come and create the file / symlink first.

Regards

Antoine.


On Tue, 19 Mar 2019 14:03:11 +0100
Stéphane Wirtel  wrote:
> Hi,
> 
> Context: raise a warning or remove tempfile.mktemp()
> BPO: https://bugs.python.org/issue36309
> 
> Since 2.3, this function is deprecated in the documentation, just in the
> documentation. In the code, there is a commented RuntimeWarning.
> Commented by Guido in 2002, because the warning was too annoying (and I
> understand ;-)).
> 
> So, in this BPO, we start to discuss about the future of this function
> and Serhiy proposed to discuss on the Python-dev mailing list.
> 
> Question: Should we drop it or add a (Pending)DeprecationWarning?
> 
> Suggestion and timeline:
> 
> 3.8, we raise a PendingDeprecationWarning
> * update the code
> * update the documentation
> * update the tests
>   (check a PendingDeprecationWarning if sys.version_info == 3.8)
> 
> 3.9, we change PendingDeprecationWarning to DeprecationWarning
>   (check DeprecationWarning if sys.version_info == 3.9)
> 
> 3.9+, we drop tempfile.mktemp()
> 
> What do you suggest?
> 
> Have a nice day and thank you for your feedback.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Serhiy Storchaka

19.03.19 14:50, Antoine Pitrou пише:

2). Go into every XML module and add attribute sorting options to each function 
that generate xml.


What do you mean with "every XML module"? Are there many of them?


ElementTree and minidom. Maybe xmlrpc. And perhaps we need to add 
arguments in calls at higher level where these modules are used.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Serhiy Storchaka

19.03.19 13:53, Ned Batchelder пише:
Option 4 is misleading.  Is anyone here really offering to "fix the 
tests in third-party modules"?  Option 4 is actually, "do nothing, and 
let a multitude of projects figure out how to fix their tests, slowing 
progress in those projects as they try to support Python 3.8."


Any option except option 1 (and option 2 with sorting by default) 
requires changing third-party code. You should either pass additional 
argument to serialization functions, or use special canonization functions.


We should look at the problem from long perspective. Freezing the 
current behavior forever does not look good. If we need to break the 
user code, we should minimize the harm and provide convenient tools for 
reproducing the current behavior. And this is an opportunity to rewrite 
user tests in more appropriate form. In your case textual comparison may 
be the most appropriate form, but this may be not so in other cases.


Now in Python 3.8, because Python doesn't want to add an 
optional flag to continue doing what it has always done, I need to 
re-engineer my tests.


Please wait yet some time. I hope to add canonicalization before the 
first beta.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Tim Delaney
On Tue, 19 Mar 2019 at 23:13, David Mertz  wrote:

> In a way, this case makes bugs worse because they are not only a Python
> internal matter. XML is used to communicate among many tools and
> programming languages, and relying on assumptions those other tools will
> not follow us a bad habit.
>

I have a recent example I encountered where the 3.7 behaviour (sorting
attributes) results in a third-party tool behaving incorrectly, whereas
maintaining attribute order works correctly. The particular case was using
HTML  tags for importing into Calibre for converting to an ebook. The
most common symptom was that series indexes were sometimes being correctly
imported, and sometimes not. Occasionally other  tags would also fail
to be correctly imported.

Turns out that  gave consistently
correct results, whilst  was
erratic. And whilst I'd specified the  tags with the name attribute
first, I was then passing the HTML through BeautifulSoup, which sorted the
attributes.

Now Calibre is definitely in the wrong here - it should be able to import
regardless of the order of attributes. But the fact is that there are a lot
of tools out there that are semi-broken in a similar manner.

This to me is an argument to default to maintaining order, but provide a
way for the caller to control the order of attributes when formatting e.g.
pass an ordering function. If you want sorted attributes, pass the built-in
sorted function as your ordering function. But I think that's getting
beyond the scope of this discussion.

Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread Stéphane Wirtel
Hi,

Context: raise a warning or remove tempfile.mktemp()
BPO: https://bugs.python.org/issue36309

Since 2.3, this function is deprecated in the documentation, just in the
documentation. In the code, there is a commented RuntimeWarning.
Commented by Guido in 2002, because the warning was too annoying (and I
understand ;-)).

So, in this BPO, we start to discuss about the future of this function
and Serhiy proposed to discuss on the Python-dev mailing list.

Question: Should we drop it or add a (Pending)DeprecationWarning?

Suggestion and timeline:

3.8, we raise a PendingDeprecationWarning
* update the code
* update the documentation
* update the tests
  (check a PendingDeprecationWarning if sys.version_info == 3.8)

3.9, we change PendingDeprecationWarning to DeprecationWarning
  (check DeprecationWarning if sys.version_info == 3.9)

3.9+, we drop tempfile.mktemp()

What do you suggest?

Have a nice day and thank you for your feedback.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Antoine Pitrou


Hi Raymond,

As long as the new serialization order is deterministic (i.e. it's the
same every run and doesn't depend on e.g. hash randomization), then I
think it's fine to change it.

Some more comments / questions:

> 2). Go into every XML module and add attribute sorting options to each 
> function that generate xml.

What do you mean with "every XML module"? Are there many of them?

> Regardless of option chosen, we should make explicit whether on not the 
> Python standard library modules guarantee cross-release bytewise identical 
> output for XML.

IMO we certainly shouldn't.  XML is a serialization format used for
machine interoperability (even though "human-editable" was one of its
selling points at the start, rather misguidingly).  However, the output
should ideally be stable and deterministic accross all releases of a
given bugfix branch.

(i.e., if I run the same code multiple times on all 3.7.x versions, I
should get always the same output)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread David Mertz
In my opinion, any test that relied on a non-promised accident of
serialization is broken today. Very often, such bad tests mask bad
production code that makes the same unsafe assumptions.

This is similar to tests that assumed a certain dictionary order, before we
got guaranteed insertion order. Or like tests that rely on object identity
of short strings or small ints. Or like non-guaranteed identities in
pickles across versions.

In a way, this case makes bugs worse because they are not only a Python
internal matter. XML is used to communicate among many tools and
programming languages, and relying on assumptions those other tools will
not follow us a bad habit. Sure, most tests probably don't get to the point
of touching those external tools themselves, but staying from bad
assumptions about the domain isn't best practices.

That said, I think aN XML canonicalization function is generally a good
thing for Python to have. But it shouldn't be a stopper in releases.

On Mon, Mar 18, 2019, 6:47 PM Raymond Hettinger 
wrote:

> We're having a super interesting discussion on
> https://bugs.python.org/issue34160 .  It is now marked as a release
> blocker and warrants a broader discussion.
>
> Our problem is that at least two distinct and important users have written
> tests that depend on exact byte-by-byte comparisons of the final
> serialization.  So any changes to the XML modules will break those tests
> (not the applications themselves, just the test cases that assume the
> output will be forever, byte-by-byte identical).
>
> In theory, the tests are incorrectly designed and should not treat the
> module output as a canonical normal form.  In practice, doing an equality
> test on the output is the simplest, most obvious approach, and likely is
> being done in other packages we don't know about yet.
>
> With pickle, json, and __repr__, the usual way to write a test is to
> verify a roundtrip:  assert pickle.loads(pickle.dumps(data)) == data.  With
> XML, the problem is that the DOM doesn't have an equality operator.  The
> user is left with either testing specific fragments with
> element.find(xpath) or with using a standards compliant canonicalization
> package (not available from us). Neither option is pleasant.
>
> The code in the current 3.8 alpha differs from 3.7 in that it removes
> attribute sorting and instead preserves the order the user specified when
> creating an element.  As far as I can tell, there is no objection to this
> as a feature.  The problem is what to do about the existing tests in
> third-party code, what guarantees we want to make going forward, and what
> do we recommend as a best practice for testing XML generation.
>
> Things we can do:
>
> 1) Revert back to the 3.7 behavior. This of course, makes all the test
> pass :-)  The downside is that it perpetuates the practice of bytewise
> equality tests and locks in all implementation quirks forever.  I don't
> know of anyone advocating this option, but it is the simplest thing to do.
>
> 2). Go into every XML module and add attribute sorting options to each
> function that generate xml.  This gives users a way to make their tests
> pass for now. There are several downsides. a) It grows the API in a way
> that is inconsistent with all the other XML packages I've seen. b) We'll
> have to test, maintain, and document the API forever -- the API is already
> large and time consuming to teach. c) It perpetuates the notion that
> bytewise equality tests are the right thing to do, so we'll have this
> problem again if substitute in another code generator or alter any of the
> other implementation quirks (i.e. how CDATA sections are serialized).
>
> 3) Add a standards compliant canonicalization tool (see
> https://en.wikipedia.org/wiki/Canonical_XML ).  This is likely to be the
> right-way-to-do-it but takes time and energy.
>
> 4) Fix the tests in the third-party modules to be more focused on their
> actual test objectives, the semantics of the generated XML rather than the
> exact serialization.  This option would seem like the right-thing-to-do but
> it isn't trivial because the entire premise of the existing test is
> invalid.  For every case, we'll actually have to think through what the
> test objective really is.
>
> Of these, option 2 is my least preferred.  Ideally, we don't guarantee
> bytewise identical output across releases, and ideally we don't grow a new
> API that perpetuates the issue. That said, I'm not wedded to any of these
> options and just want us to do what is best for the users in the long run.
>
> Regardless of option chosen, we should make explicit whether on not the
> Python standard library modules guarantee cross-release bytewise identical
> output for XML. That is really the core issue here.  Had we had an explicit
> notice one way or the other, there wouldn't be an issue now.
>
> Any thoughts?
>
>
>
> Raymond Hettinger
>
>
> P.S.   Stefan Behnel is planning to remove attribute sorting from lxml.
> On the bug 

Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Ned Batchelder

On 3/19/19 4:13 AM, Serhiy Storchaka wrote:

19.03.19 00:41, Raymond Hettinger пише:
3) Add a standards compliant canonicalization tool (see 
https://en.wikipedia.org/wiki/Canonical_XML ).  This is likely to be 
the right-way-to-do-it but takes time and energy.


4) Fix the tests in the third-party modules to be more focused on 
their actual test objectives, the semantics of the generated XML 
rather than the exact serialization.  This option would seem like the 
right-thing-to-do but it isn't trivial because the entire premise of 
the existing test is invalid.  For every case, we'll actually have to 
think through what the test objective really is.


I think the combination of options 3 and 4 is the right thing. Not 
always the stable output is needed, in these cases option 4 should be 
considered. But there are valid use cases for the stable output, in 
these cases we need to provide an alternative in the stdlib. I am 
working on this.


Option 4 is misleading.  Is anyone here really offering to "fix the 
tests in third-party modules"?  Option 4 is actually, "do nothing, and 
let a multitude of projects figure out how to fix their tests, slowing 
progress in those projects as they try to support Python 3.8."


In my case, the test code has a generic function to compare an actual 
directory of files to an expected directory of files, so it isn't quite 
as simple as "just use the right XML comparison."  And I support Python 
2.7, 3.5, etc, so tests still need to work under those versions.  None 
of this is impossible, but please try not to preach to us maintainers 
that we are doing it wrong, that it will be easy to fix, etc.  Using 
language like "the entire premise of the test is invalid" seems 
needlessly condescending.


As one of the suggested solutions, a DOM comparison is not enough. I 
don't just want to know that my actual XML is different than my expected 
XML.  I want to know where and how it differs.


Textual comparison may be the "wrong" way to check XML, but it gives me 
many tools for working with the test results.  It was simple and it 
worked.  Now in Python 3.8, because Python doesn't want to add an 
optional flag to continue doing what it has always done, I need to 
re-engineer my tests.


--Ned.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/ned%40nedbatchelder.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Serhiy Storchaka

19.03.19 00:41, Raymond Hettinger пише:

3) Add a standards compliant canonicalization tool (see 
https://en.wikipedia.org/wiki/Canonical_XML ).  This is likely to be the 
right-way-to-do-it but takes time and energy.

4) Fix the tests in the third-party modules to be more focused on their actual 
test objectives, the semantics of the generated XML rather than the exact 
serialization.  This option would seem like the right-thing-to-do but it isn't 
trivial because the entire premise of the existing test is invalid.  For every 
case, we'll actually have to think through what the test objective really is.


I think the combination of options 3 and 4 is the right thing. Not 
always the stable output is needed, in these cases option 4 should be 
considered. But there are valid use cases for the stable output, in 
these cases we need to provide an alternative in the stdlib. I am 
working on this.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is XML serialization output guaranteed to be bytewise identical forever?

2019-03-19 Thread Gregory P. Smith
On Mon, Mar 18, 2019 at 9:44 PM Terry Reedy  wrote:

> On 3/18/2019 6:41 PM, Raymond Hettinger wrote:
> > We're having a super interesting discussion on
> https://bugs.python.org/issue34160 .  It is now marked as a release
> blocker and warrants a broader discussion.
> >
> > Our problem is that at least two distinct and important users have
> written tests that depend on exact byte-by-byte comparisons of the final
> serialization.  So any changes to the XML modules will break those tests
> (not the applications themselves, just the test cases that assume the
> output will be forever, byte-by-byte identical).
> >
> > In theory, the tests are incorrectly designed and should not treat the
> module output as a canonical normal form.  In practice, doing an equality
> test on the output is the simplest, most obvious approach, and likely is
> being done in other packages we don't know about yet.
> >
> > With pickle, json, and __repr__, the usual way to write a test is to
> verify a roundtrip:  assert pickle.loads(pickle.dumps(data)) == data.  With
> XML, the problem is that the DOM doesn't have an equality operator.  The
> user is left with either testing specific fragments with
> element.find(xpath) or with using a standards compliant canonicalization
> package (not available from us). Neither option is pleasant.
> >
> > The code in the current 3.8 alpha differs from 3.7 in that it removes
> attribute sorting and instead preserves the order the user specified when
> creating an element.  As far as I can tell, there is no objection to this
> as a feature.  The problem is what to do about the existing tests in
> third-party code, what guarantees we want to make going forward, and what
> do we recommend as a best practice for testing XML generation.
> >
> > Things we can do:
> >
> > 1) Revert back to the 3.7 behavior. This of course, makes all the test
> pass :-)  The downside is that it perpetuates the practice of bytewise
> equality tests and locks in all implementation quirks forever.  I don't
> know of anyone advocating this option, but it is the simplest thing to do.
>
> If it comes down to doing *something* to unblock the release ...
> 1b) Revert to 3.7 *and* document that byte equality with current ouput
> is *not* guaranteed.
>
> > 2). Go into every XML module and add attribute sorting options to each
> function that generate xml.  This gives users a way to make their tests
> pass for now. There are several downsides. a) It grows the API in a way
> that is inconsistent with all the other XML packages I've seen. b) We'll
> have to test, maintain, and document the API forever -- the API is already
> large and time consuming to teach. c) It perpetuates the notion that
> bytewise equality tests are the right thing to do, so we'll have this
> problem again if substitute in another code generator or alter any of the
> other implementation quirks (i.e. how CDATA sections are serialized).
> >
> > 3) Add a standards compliant canonicalization tool (see
> https://en.wikipedia.org/wiki/Canonical_XML ).  This is likely to be the
> right-way-to-do-it but takes time and energy.

>
> > 4) Fix the tests in the third-party modules to be more focused on their
> actual test objectives, the semantics of the generated XML rather than the
> exact serialization.  This option would seem like the right-thing-to-do but
> it isn't trivial because the entire premise of the existing test is
> invalid.  For every case, we'll actually have to think through what the
> test objective really is.

>
> > Of these, option 2 is my least preferred.  Ideally, we don't guarantee
> bytewise identical output across releases, and ideally we don't grow a new
> API that perpetuates the issue. That said, I'm not wedded to any of these
> options and just want us to do what is best for the users in the long run.
>

For (1) - don't revert in 3.8 - Do not worry about order or formatting of
serialized data changing between major Python releases.  change in 3.8?
that's 100% okay.  This already happens all the time between Python
releases.  We've changed dict iteration order between releases twice this
decade.

Within point releases of stable versions, ie 3.7.x? Up to the release
manager; it is semi-rude to change something like this within a stable
release unless there is a good reason, but we *believe* have done it
before. A general rule of thumb is to try not to without good reason though
unless the code to avoid doing so would be over complicated.

It is always the user code depending on the non-declared ordering within
output that is wrong, when we preserve it we're only doing them a temporary
favor that ultimately allows more problems to grow in the future.  Nobody
should use a text comparison on serialized data not explicitly stated as
canonical and call that test good by any standard unless you are writing a
test that for canonical output by a library that explicitly guarantees its
output will be canonical.

Agreed that your option (2) is not good for