[Python-Dev] Re: Small lament...

2023-04-01 Thread Eryk Sun
On 4/1/23, Skip Montanaro  wrote:
> Just wanted to throw this out there... I lament the loss of waking up on
> April 1st to see a creative April Fool's Day joke on one or both of these
> lists, often from our FLUFL... Maybe such frivolity still happens, just not
> in the Python ecosystem?

I thought this one was funny:

https://github.com/python/cpython/issues/103172
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/G2Z7QUZA4E3J6BE7HLIWM6R3FWHFYWTV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Debugging of native extensions on windows

2023-03-14 Thread Eryk Sun
On 3/13/23, Rokas Kupstys  wrote:
> I eventually stumbled on to process list showing
> ".venv/Scripts/python.exe" having spawned a subprocess... Which led me
> to "PC/launcher.c" which is what ".venv/Scripts/python.exe" really is.

For a standard Python installation, you can create a virtual
environment with the --symlinks option instead of the default
configuration that uses the venv launcher. Note, however, that using
symlinks doesn't work with the store app distribution of Python.

If your system doesn't have developer mode enabled, creating symlinks
requires "SeCreateSymbolicLinkPrivilege". By default this privilege is
only granted to administrators. However, an administrator can use the
management console "secpol.msc" snap-in to grant the symlink privilege
directly to a user account, or to one of the account's default enabled
groups such as "Authenticated Users". Add the user or group to the
"Create symbolic links" policy in "Security Settings" -> "Local
Policies" -> "User Rights Assignment". You'll have to log off and back
on again to get a new access token that has the symlink privilege.

Unfortunately, the shell API -- e.g. os.startfile() -- resolves the
final path of an executable before running it. This allows using
filesystem symlinks as if they're shortcuts (LNK files), but it
prevents using a symlink to change the name or path of an executable
to get different expected behavior, such as a Python virtual
environment that uses symlinks.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3PJJDU6WVNV7K65RZEDMBERCCAVIS5P6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: glob's new include_hidden parameter

2022-09-12 Thread Eryk Sun
On 9/12/22, Mats Wichmann  wrote:
>
> If `include_hidden` is true, the patterns '*', '?', '**'  will
> match hidden directories.

Shouldn't this explain what a "hidden directory" is? For example, a
Windows user may think this means a directory with
FILE_ATTRIBUTE_HIDDEN set, but that's not what's meant here. Also, I
think it should note that enabling include_hidden negates the earlier
claim that "files beginning with a dot (.) can only be matched by
patterns that also start with a dot". For example, glob.glob('*',
include_hidden=True) includes all of the conventionally hidden
directories and hidden files in the current directory.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VGEQQGHJDI2JMQ2SO6V6ULBUBNNTKDA2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Add -P command line option to not add sys.path[0]

2022-04-26 Thread Eryk Sun
On 4/26/22, Victor Stinner  wrote:
>
> There are 4 main ways to run Python:
>
> (1) python -m module [...]
> (2) python script.py [...]
> (3) python -c code [...]
> (4) python [...]
>
> (1) and (2) insert the directory of the module/script at sys.path[0].

Running a module with -m inserts the current working directory (the
path, not an empty string) at sys.path[0], followed by the module
directory at sys.path[1]. Only one entry is added if they're the same
directory.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/63CBL373SWD7P24TMQOHCJYDP76J4NTL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Restrict the type of __slots__

2022-03-19 Thread Eryk Sun
On 3/19/22, Eryk Sun  wrote:
> On 3/18/22, Ronald Oussoren via Python-Dev  wrote:
>>
>> - if __slots__ is a dict keep it as is
>> - Otherwise use tuple(__slots__) while constructing the class and store
>> that
>> value in the __slots__ attribute of the class
>
> If this is just for pydoc, then it can be updated with new behavior.
> For example, if the given __slots__ is a dict, set it as something
> like __slots_doc__, and rewrite __slots__ as a tuple of the keys.

Or extend pydoc to support a mappingproxy for __slots__.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NPIZQIZ5OPYU6V7YIGK5TZ75Z6WIOK5O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Restrict the type of __slots__

2022-03-19 Thread Eryk Sun
On 3/18/22, Ronald Oussoren via Python-Dev  wrote:
>
> - if __slots__ is a dict keep it as is
> - Otherwise use tuple(__slots__) while constructing the class and store that
> value in the __slots__ attribute of the class

If this is just for pydoc, then it can be updated with new behavior.
For example, if the given __slots__ is a dict, set it as something
like __slots_doc__, and rewrite __slots__ as a tuple of the keys.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y6RDNJZGO4FN3DCNTWJM7YA7WFLKIXDV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Improvements to the sys.path initialization documentation

2022-03-04 Thread Eryk Sun
On 3/4/22, Victor Stinner  wrote:
> it would be nice to move the last bits of the sys.path initialization
> from the site module to the getpath module. It's unpleasant to
> have a different sys.path depending if the site module is loaded
> or not.

I don't understand. The site packages directories, including virtual
environments, are a site extension.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5ZO73YHNL3BHXY4MHRITOGOECL2SZKPO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

2021-11-18 Thread Eryk Sun
On 11/13/21, Terry Reedy  wrote:
> On 11/13/2021 4:35 PM, pt...@austin.rr.com wrote:
>>
>> _퓟Ⅼ햠홲험ℋ풪Lᴰ푬핽﹏핷피헡 = 12
>>
>> def _픰ʰ퓸ʳ핥홚푛(픰, p푟픢fi햝핝횎푛, sᵤ푓헳헂푥헹ₑ횗):
>>
>>  ˢ헸i헽 = 퐥e혯(햘) - pr횎햋퐢x헅ᵉ퓷 - 풔홪ffi혅헹홚ₙ
>>
>>  if ski혱 > _퐏헟햠혊홴H핺L핯홀혙﹏L픈풩:
>>
>> 혴 = '%s[%d chars]%s' % (홨[:혱퐫핖푓핚xℓ풆핟], ₛ횔풊p, 퓼[퓁풆햓(횜) -
>> 홨횞풇fix홡ᵉ혯:])
>>
>>  return ₛ
>>
> * Does not at all work in CommandPrompt

It works for me when pasted into the REPL using the console in Windows
10. I pasted the code into a raw multiline string assignment and then
executed the string with exec(). The only issue is that most of the
pasted characters are displayed using the font's default glyph since
the console host doesn't have font fallback support. Even Windows
Terminal doesn't have font fallback support yet in the command-line
editing mode that Python's REPL uses. But Windows Terminal does
implement font fallback for normal output rendering, so if you assign
the pasted text to string `s`, then print(s) should display properly.

> even after supposedly changing to a utf-8 codepage with 'chcp 65000'.

Changing the console code page is unnecessary with Python 3.6+, which
uses the console's wide-character API. Also, even though it's
irrelevant for the REPL, UTF-8 is code page 65001. Code page 65000 is
UTF-7.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7FGNJ7TMASDOMQAS2LSSQAD2PPURT5W6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: What is __int__ still useful for?

2021-10-15 Thread Eryk Sun
On 10/15/21, Mark Dickinson  wrote:
>
> the proposal would be to remove that special role of `__trunc__` and
> reduce the `int` constructor to only looking at `__int__` and `__index__`.

For Real and Rational numbers, currently the required method to
implement is __trunc__(). ISTM that this proposal should include a
change to require __int__() in numbers.Real.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/R3JI6EGLMMBDNPCKFSNPLUNR2Q3ISAID/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: What is __int__ still useful for?

2021-10-14 Thread Eryk Sun
On 10/14/21, Antoine Pitrou  wrote:
> On Wed, 13 Oct 2021 17:00:49 -0700
> Guido van Rossum  wrote:
>>
>> so int() can't call __trunc__ (as was explained earlier in
>> the thread).

I guess this was meant to be "*just* call __trunc__". It's documented
that the int constructor calls the initializing object's __trunc__()
method if the object doesn't implement __int__() or __index__().

> Note that PyNumber_Long() is now the only place inside the interpreter
> calling the `nb_int` slot.  But since it also has those undesirable code
> paths accepting str and buffer-like objects, it's usable in fewer
> situations than you'd expect.

Maybe an alternate constructor could be added -- such as
int.from_number() -- which would be restricted to calling __int__(),
__index__(), and __trunc__().
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Q77PFIMCHDGB36LZTNMFG6NF7DE2UOSF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Have virtual environments led to neglect of the actual environment?

2021-02-28 Thread Eryk Sun
On 2/28/21, Oscar Benjamin  wrote:
>
> Oh, okay. So does that mean that it's always on PATH unless the user
> *explicitly unticks* the "install the launcher" box for both single
> user and all user installs?

If the launcher gets installed, it will be available in PATH. IIRC,
the installer only allows installing the launcher for the current user
if it is not installed already for all users. Thus only one version
should ever exist in PATH, which is set in either the user or system
"PATH" value in the registry, but not both.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3QWZRQSI5BTM3GQRWNH2UQN4L7RONAPJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Have virtual environments led to neglect of the actual environment?

2021-02-28 Thread Eryk Sun
On 2/28/21, Oscar Benjamin  wrote:
>
> - It is possible to configure a default version (although I think you
> have to do it with an environment variable)

The py launcher in Windows supports a "py.ini" file beside the
executable and in %LocalAppData%. The equivalent of the PY_PYTHON,
PY_PYTHON2, and PY_PYTHON3 environment variables can be set in the
"[defaults]" section as "python", "python2", and "python3" settings.

The ini file also supports a "[commands]" section to define additional
virtual commands for shebangs. Regular filepaths are also supported in
shebangs -- e.g. #!"path\to\any\program.exe".

> - I think that the launcher is only installed in an all users install.

It defaults to an all-users install, but it can also be installed for
just the current user in "%LocalAppData%\Programs\Python\Launcher". In
this case, the installation directory always gets added to PATH.

> - Listing installations with "py -0p" is somewhat cryptic

There's also the long-form options `--list` and `--list-paths`.

> - It would be better if you could use the launcher itself to set the
> default Python e.g. "py --set-default-python=3.8"

It's pretty simply to run `set PY_PYTHON=3.8`, and persist the value
in the registry with `setx.exe PY_PYTHON 3.8`. (But don't use setx.exe
naively to set PATH.)

> On the last point I think that although Anaconda doesn't install the
> launcher you can use the launcher to run the python from the Anaconda
> installation.

I don't use Anaconda, but I don't think that's supposed to be the case
according to PEP 514. The launcher only looks for PSF development
distributions in the "PythonCore" registry key.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JPQLBEYVWV2XGXO44JJ4YPN3KFJJYN2Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-11 Thread Eryk Sun
On 2/11/21, Inada Naoki  wrote:
>
> There is little difference between `encoding=None` and
> `encoding=locale.getpreferredencoding(False)`. The difference is:
>
> * When Python is using Windows, and
> * When when the file is console, and
> * (for open()) When PYTHONLEGACYWINDOWSSTDIO is set
> * (for TextIOWrapper()) When the file is not _WindowsConsoleIO
>
> encoding=None uses console codepage but

os.device_encoding() -- i.e. _Py_device_encoding() -- only works for
hard-coded file descriptors 0, 1, and 2, instead of detecting a
console file. So opening "CON", "CONIN$", or "CONOUT$" has never used
the console input or output code page, nor has opening a duped
standard I/O fd such as open(os.dup(0)). It would be easy to
generalize _Py_device_encoding() to detect console files, but it's new
behavior.

Python 3.8+ introduced a bug (issue 42261) in which, even with legacy
standard I/O enabled and file descriptors 0-2, the console input and
output code pages are ignored. For example:

C:\>chcp 437
Active code page: 437
C:\>set PYTHONLEGACYWINDOWSSTDIO=1
C:\>py -3.9 -c "import sys; print(sys.stdout.encoding)"
cp1252

Regarding the last bullet point, io.TextIOWrapper doesn't know
anything about io._WindowsConsoleIO. The decision to use UTF-8 is in
io.open(). So manually wrapping a _WindowsConsoleIO file with
TextIOWrapper uses the locale preferred encoding instead of UTF-8. For
example:

>>> fb = open('conin$', 'rb')
>>> fb.raw
<_io._WindowsConsoleIO mode='rb' closefd=True>
>>> f = io.TextIOWrapper(fb)
>>> f.encoding
'cp1252'

I don't know whether it's worth making TextIOWrapper check for
_WindowsConsoleIO in order to make it use UTF-8. It's not common to
manually wrap a binary-mode file.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QBNH3XGSNBQ7XIJ5E542JIQ5Q5E63MCU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-28 Thread Eryk Sun
On 10/28/20, Stephen J. Turnbull  wrote:
>
> Note: you can "fix" directory updates by mounting the filesystem r/o.

Mounting the filesystem as readonly is the extreme case. Popular Unix
systems support a "noatime" mount option that disables updating file
access times, unless one of the other timestamps changes. In Windows,
NTFS and ReFS support a system setting (but not per-volume) to disable
updating access times -- "NtfsDisableLastAccessUpdate" and
"RefsDisableLastAccessUpdate".
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E5AWEB3U5ZCQBWABOKAGL6CADRHBLEEP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-26 Thread Eryk Sun
On 10/26/20, Victor Stinner  wrote:
> Le lun. 19 oct. 2020 à 13:50, Steve Dower  a écrit
> :
>> Feel free to file a bug, but we'll likely only add a vague note to the
>> docs about how Windows works here rather than changing anything.
>
> I agree that this surprising behavior can be documented. Attempting to
> provide accurate access time in os.scandir() is likely to slow-down
> the function which would defeat its whole purpose.

I don't think the access time (st_atime) is a significant concern. I'm
concerned with the reliability of the file size (st_size) and
last-write time (st_mtime) in stat() results. Developers are used to
various filesystem policies on various platforms that limit when the
access time gets updated, if at all. FAT32 filesystems only have an
access date, and the driver in Windows fixes the access time at
midnight. Updating the access time in NTFS and ReFS can be completely
disabled at the system level; otherwise it's updated with a
granularity of one hour if it's only the access time that would be
updated.

The biggest concern for me is NTFS hardlinks, for which the st_size
and st_mtime in the directory entry is unreliable. When a file with
multiple hardlinks is modified, the filesystem only updates the
duplicated information in the directory entry of the opened link.
Because the entry in the directory doesn't include the link count or
even a boolean value to indicate that a file has multiple hardlinks,
if you don't know whether or not there's a possibility of hardlinks,
then os.stat() is required in order to reliably determine st_size and
st_mtime, to the extent that reliably knowing st_mtime is possible.

A general problem that affects even os.stat() is that a modified file
may only be noted by setting a flag (FO_FILE_MODIFIED) in the kernel
file object of the particular open. Whether it's immediately noted in
the last-write time of the shared FCB (file control block) is up to
filesystem policy.

Starting with Windows 10 1809 (as noted in [MS-FSA]), NTFS immediately
notes the modification time, so the st_mtime value from os.stat() is
current. In prior versions of NTFS, and with other Microsoft
filesystems such as FAT32, the last-write time is only noted when the
file is flushed to disk via FlushFileBuffers (i.e. os.fsync) or when
the open is closed.

This means that st_size may change without also changing st_mtime. I'm
using Windows 10 2004 currently, so I can't show an NTFS example, but
the following shows the behavior with FAT32:

f = open('spam.txt', 'w')
st1 = os.stat('spam.txt')
time.sleep(10)
f.write('spam')
f.flush()
st2 = os.stat('spam.txt')

The above write was noted only by setting the FO_FILE_MODIFIED flag on
the kernel file object. (The file object can be inspected with a local
kernel debugger.) The write time wasn't noted in the FCB, i.e.
st_mtime hasn't changed in st2:

>>> st2.st_size - st1.st_size
4
>>> st2.st_mtime - st1.st_mtime
0.0

The last-write time is noted when FlushFileBuffers (os.fsync) is
called on the open:

>>> os.fsync(f.fileno())
>>> st3 = os.stat('spam.txt')
>>> st3.st_mtime - st1.st_mtime
10.0

Note also that, with NTFS, to the extent that the FCB metadata is
current, calling os.stat() on a link updates the duplicated
information in the directory entry. So calling os.stat() on a NTFS
file may update the entry that's returned by a subsequent os.scandir()
call.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LEBCSKGSL7PMAFH6AQR5LFL7UJ4T5774/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-20 Thread Eryk Sun
On 10/19/20, Greg Ewing  wrote:
> On 20/10/20 4:52 am, Gregory P. Smith wrote:
>> Those of us with a traditional posix filesystem background may raise
>> eyeballs at this duplication, seeing a directory as a place that merely
>> maps names to inodes
>
> This is probably a holdover from MS-DOS, where there was no separate
> inode-like structure -- it was all in the directory entry.

DOS implemented a find-first/find-next API (int 21h 4E/4F) that
provided a file's name, attributes, size, and last write time/date. I
think it's clear that the design was influenced by the
readily-available contents of a FAT dirent. The Win32 API extended
this to FindFirstFile/FindNextFile, with added support for the long
filename, create and access times, and, in NT 5+, the reparse tag for
a reparse point.

NTFS had to support this metadata in the directory index, else
FindFirstFile/FindNextFile would be too expensive if the filesystem
had to fetch the metadata from the MFT for every matching file in a
listing. It tries to keep the duplicated metadata in sync -- such as
when a file is open, closed, manually extended in size, when the cache
is flushed, or when metadata is explicitly set (e.g.
SetFileInformationByHandle: FileBasicInfo). But for performance it
doesn't update the duplicated data every time a file is read from or
written to. And, in particular, if it's just the access time that
changed, it updates the duplicated access time with a one-hour
granularity. (There's also a registry value, as I mentioned
previously, that disables updating access times completely -- in both
the MFT record and the directory index.)

That said, if a file has multiple hardlinks the current NTFS
implementation for updating duplicated data is totally unreliable. It
only updates the accessed link. All other links go stale. We don't
have any reasonable way to special case this situation because the
directory entry doesn't include the number of links a file has. It has
to be opened and queried directly, but then one might as well do a
full stat() for every file.

I recommend relying on only the high-level is_dir(), is_file(), and
is_symlink() methods of os.scandir() items, to quickly process a
directory. inode() is reliable -- as much as is possible in Windows --
because the implementation gets the full stat info, but check to
ensure it's not 0. It's based on the file ID, which Windows
filesystems aren't required to support (or reliably support; it's not
stable in FAT). NTFS and ReFS support reliable 64-bit file IDs, and
opening by file ID.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JKK47AWKUOWPPBEAIRGIFRMW6FCPZILG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Eryk Sun
On 10/19/20, Steve Dower  wrote:
>
> Resolving the path is the most expensive part, even if the file is not
> opened (I've been working with the NTFS team on this area, and we've
> been benchmarking/analysing all of it).

If you say it's been extensively benchmarked and there's no direct way
around the speed bottleneck, then I take your word for it. To clarify
what I had in mind, I was hoping that because NTFS implements the fast
I/O function FastIoQueryOpen [1] (via  NtfsNetworkOpenCreate, as given
by its FastIoDispatch table) that IRP_MJ_CREATE would be bypassed and
that the filesystem would not incur a significant cost to parse the
remaining path. I figured that most of the work would be in the
ObObjectObjectByName and IopParseDevice executive calls that lead up
to querying the filesystem.

Anyway, it's unfortunate that the Windows API doesn't support NT
handle-relative names, except in the registry API. If we could call
NTAPI NtQueryAttributesFile [2] directly, then the ObjectAttributes
argument could be relative to a directory handle set in the
RootDirectory field. That would eliminate the vast majority of the
path-resolution cost. A handle-relative open or query goes straight to
the filesystem device, which goes straight to the directory that
contains the file.

To eliminate the cost of opening the directory handle, scandir() could
be rewritten to use CreateFileW and GetFileInformationByHandleEx:
FileIdBothDirectoryInfo [3] instead of FindFirstFileW / FindNextFileW.
Just cache the directory handle in place of caching the find handle.
scandir() would gain fd support in Windows. Opening a directory via
os.open requires the flag _O_OBTAIN_DIR (0x2000), defined in fcntl.h.

FileIdBothDirectoryInfo provides the file ID, so the implementation
would support the inode() method without calling stat(). It would
still directly support is_dir() and is_file() based on the file
attributes, and is_symlink() based on the file attributes and the
EaSize field. The Windows Protocols document that the latter contains
the reparse tag for a reparse point. The field is reused because a
reparse point can't have extended attributes.

All that said, I don't prefer to call NtQueryAttributesFile or any
other NTAPI function in Windows Python. I'd rather do the best
possible with just the Windows API. I wish there were a new
GetFileAttributesExExW function that supported handle-relative names.
Even better would be a new function that calls
NtQueryInformationByName -- something like GetFileInformationByName --
for FileStatInfo (and FileCaseSensitiveInfo as well, which is becoming
more of an issue), also with support for handle-relative names.

[1] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_fast_io_dispatch
[2] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-zwqueryfullattributesfile
[3] 
https://docs.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_id_both_dir_info
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GODUIB5WKVZLX4BVPEM2NS37JFHUXIID/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Eryk Sun
On 10/19/20, Steve Dower  wrote:
> On 19Oct2020 1242, Steve Dower wrote:
>> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>>> rather than the actual access time.
>>
>> Correction - os.stat() updates the access time to _now_, while
>> os.scandir() returns the last access time without updating it.
>
> Let me correct myself first :)
>
> *Windows* has decided not to update file access time metadata *in
> directory entries* on reads. os.stat() always[1] looks at the file entry
> metadata, while os.scandir() always looks at the directory entry metadata.
>
> My suggested approach still applies, other than the bit where we might
> fix os.stat(). The best we can do is regress os.scandir() to have
> similarly poor performance, but the best *you* can do is use os.stat()
> for accurate timings when files might be being modified while your
> program is running, and don't do it when you just need names/kinds (and
> I'm okay adding that note to the docs).

If this is the correction to which you're referring in the previous
message, I assumed you stood by the claim that os.stat() may update
st_atime. That shouldn't be the case, so there shouldn't be anything
that needs to be fixed there, unless I'm missing what you think needs
to be fixed. If it's actually a problem, then I'd really, really like
a test case that reproduces it. If it was just a misinterpreted test
case or mis-remembered fact, then that's good news for me. ;-)

Regarding updating the access time in the directory entry, in my
previous reply I explained that NTFS should update it with a one-hour
granularity. With FAT, it's daily.

Regarding the view that this is only about "accurate timings when
files might be being modified while your program is running", in my
previous messages I stressed that the directory entry for a hard link
may have the wrong size, change time, write time, and access time if
it wasn't the last link used to update the file. That has nothing to
do with the file being modified while the program is running. It's a
stale directory entry. If you call os.stat() on the stale link, NTFS
will update it with the correct metadata.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SUGIZ6OAXOD37USVBWAW7JRSUDBSMG7Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-19 Thread Eryk Sun
On 10/19/20, Steve Dower  wrote:
> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
>> TLDR: In os.scandir directory entries, atime is always a copy of mtime
>> rather than the actual access time.
>
> Correction - os.stat() updates the access time to _now_, while
> os.scandir() returns the last access time without updating it.

os.stat() shouldn't affect st_atime because it doesn't access the file
data. That has me curious if it can be reproduced.

With NTFS in Windows 10, I'd expect the os.stat() st_atime to change
immediately when the file data is read or modified. With other
filesystems, it may not be updated until the kernel file object that
was used to access the file's data is closed.

Note that updating the access time in NTFS can be disabled by the
"NtfsDisableLastAccessUpdate" value in
"HKLM\System\CurrentControlSet\Control\FileSystem". The default value
in Windows 10 should be 0x8002, which means the value is system
managed and updating the access time is enabled.

If it's only the access time that changes, the directory entry may be
updated with a significant granularity such as hourly or daily. For
NTFS, it's hourly. To confirm this, wait an hour from the current
access time in the directory entry; open the file; read some data; and
close the file. The access time in the directory entry should be
updated.

For details, download the [MS-FSA] PDF [1] and look for all references
to the following sections:

* 2.1.4.17 Algorithm for Noting That a File Has Been Modified
* 2.1.4.19 Algorithm for Noting That a File Has Been Accessed
* 2.1.4.18 Algorithm for Updating Duplicated Information

Also check the tables in Appendix A, which provide the update
granularity of file time stamps (presumably for directory entries) for
common Windows filesystems.

[1] 
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/860b1516-c452-47b4-bdbc-625d344e2041

Going back to my initial message, I can't stress enough that this
problem is at its worst when a file has multiple hardlinks. If a
particular link in a directory wasn't the last link used to access the
file, its duplicated metadata may have the wrong file size, access
time, modify time, and change time (the latter is not reported by
Python). As is, for the current implementation, I'd only rely on the
basic attributes such as whether it's a directory or reparse point
(symlink, mountpoint, etc) when using scandir() to quickly process a
directory. For reliable stat information, call os.stat().

I do think, however, that os.scandir() can be improved in Windows
without significant performance loss if it calls GetFileAttributesExW
to get st_file_attributes, st_size, st_ctime (create time), st_mtime,
and st_atime. This API call is relatively fast because it doesn't
require opening the file via CreateFileW, which is one of the more
expensive operations in os.stat(). But I haven't tried modifying
scandir() to benchmark it.

Ultimately, I'm waiting for Windows 10 to provide a WinAPI function
that calls the relatively new NTAPI function NtQueryInformationByName
[2] (by name, not by handle!) to get the FileStatInformation, as well
as for this information to be made available by handle via
GetFileInformationByHandleEx. Compared to GetFileAttributesExW, the
FileStatInformation additionally provides the file ID (if implemented
by the filesystem), change time, reparse tag, number of links, and the
effective access of the security context of the caller (i.e. process
or thread access token). The latter is something that we've never
impemented with os.stat(). It's not the same as POSIX
owner-group-other permissions. It would need a new attribute such as
st_effective_access. It could be used to provide a real implementation
of os.access() in Windows.

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntqueryinformationbyname
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NPP6GKAEI7SOVA45WTJ222YVEALTF6WO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.scandir bug in Windows?

2020-10-18 Thread Eryk Sun
On 10/15/20, Rob Cliffe via Python-Dev  wrote:
>
> TLDR: In os.scandir directory entries, atime is always a copy of mtime
> rather than the actual access time.

There are inconsistencies in various scenarios between between the
stat info from the directory entry and the stat info from the File
Control Block (FCB) -- the filesystem's in-memory record that's common
to all opens for a file/directory.

The worst case is for an NTFS file with multiple hardlinks, for which
the directory entry information is from the last time the file was
opened using a particular hardlink. The accurate NTFS file information
is in the file's Master File Table (MFT) record, which gets accessed
to populate the FCB and update the particular link when a file is
opened.

If you're looking for file times and file size, the only reliable
information comes from directly opening the file an querying the info
via GetFileInformationByHandle (called by os.stat),
GetFileInformationByHandleEx (FileBasicInfo, FileStandardInfo),
GetFileTime, and GetFileSizeEx.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IJIFZHPEEMVPD2LN6H3MY4KGRKNQ4TBQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: os.add_dll_directory and DLL search order

2020-06-22 Thread Eryk Sun
On 6/22/20, Steve Dower  wrote:
>
> What is likely happening here is that _sqlite3.pyd is being imported
> before _mapscript, and so there is already a SQLITE3 module in memory.
> Like Python, Windows will not attempt to import a second module with the
> same name, but will return the original one.

Qualified DLL loads won't interfere with each other, but dependent
DLLs are loaded by base name only. In these cases a SxS assembly
allows loading multiple DLLs that have the same base name. If the
assembly is referenced by a DLL, embed the manifest in the DLL as
resource 2. For example:

>>> import ctypes
>>> test1 = ctypes.CDLL('./test1')
>>> test2 = ctypes.CDLL('./test2')
>>> test1.call_spam.restype = None
>>> test2.call_spam.restype = None

>>> test1.call_spam()
spam v1.0
>>> test2.call_spam()
spam v2.0

>>> import win32process, win32api
>>> names = [win32api.GetModuleFileName(x)
... for x in win32process.EnumProcessModules(-1)]
>>> spams = [x for x in names if 'spam' in x]
>>> print(*spams, sep='\n')
C:\Temp\test\c\spam.dll
C:\Temp\test\c\spam_assembly\spam.dll

Source

spam1.c (spam.dll):

#include 

void __declspec(dllexport) spam()
{
printf("spam v1.0\n");
}


test1.c (test1.dll):

#pragma comment(lib, "spam")
void __declspec(dllimport) spam();

void __declspec(dllexport) call_spam()
{
spam();
}

---

spam_assembly/spam_assembly.manifest:








spam2.c (spam_assembly/spam.dll):

#include 

void __declspec(dllexport) spam()
{
printf("spam v2.0\n");
}


test2.c (test2.dll -- link with /manifest:embed,id=2):

#pragma comment(lib, "spam")
#pragma comment(linker, "/manifestdependency:\"\
type='win32' \
name='spam_assembly' \
version='2.0.0.0' \
processorArchitecture='amd64' \"")

void __declspec(dllimport) spam();

void __declspec(dllexport) call_spam()
{
spam();
}
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5PDVL7KOBCCIVRSYQH4WXHBCZ23KYKG3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread eryk sun
On 8/10/19, Rob Cliffe via Python-Dev  wrote:
> On 10/08/2019 11:50:35, eryk sun wrote:
>> On 8/9/19, Steven D'Aprano  wrote:
>>> I'm also curious why the string needs to *end* with a backslash. Both of
>>> these are the same path:
>>>
>>>  C:\foo\bar\baz\
>>>  C:\foo\bar\baz
>
> Also, the former is simply more *informative* - it tells the reader that
> baz is expected to be a directory, not a file.

This is an important point that I overlooked. The trailing backslash
is more than just a redundant character to inform human readers. Refer
to [MS-FSA] 2.1.5.1 "Server Requests an Open of a File" [1]. A
create/open fails with STATUS_OBJECT_NAME_INVALID if either of the
following is true:

* PathName contains a trailing backslash and
  CreateOptions.FILE_NON_DIRECTORY_FILE is
  TRUE.

* PathName contains a trailing backslash and
  StreamTypeToOpen is DataStream

For NtCreateFile or NtOpenFile (in the NT API), the
FILE_NON_DIRECTORY_FILE option restricts the call to a regular file,
and FILE_DIRECTORY_FILE restricts it to a directory. With neither
option, the call can target either a file or directory. A trailing
backslash is another information channel. It tells the filesystem that
the target has to be a directory. If we specify
FILE_NON_DIRECTORY_FILE with a trailing backslash on the name, this is
an immediate failure as an invalid name without even checking the
entry. If we specify neither option and use a trailing backslash, it's
an invalid name if the filesystem finds a regular file or data stream.
Had the call specified the FILE_DIRECTORY_FILE option, it would
instead fail with STATUS_NOT_A_DIRECTORY.

We can see this in practice in the published source for the fastfat
filesystem driver. FatCommonCreate [2] (for a create or open) has the
following code to handle the second case (in this code, an FCB is a
file control block for a regular file, and a DCB is a directory
control block):

if (NodeType(Fcb) == FAT_NTC_FCB) {
//
//  Check if we were only to open a directory
//
if (OpenDirectory) {
DebugTrace(0, Dbg, "Cannot open file as directory\n", 0);
try_return( Iosb.Status = STATUS_NOT_A_DIRECTORY );
}
DebugTrace(0, Dbg, "Open existing fcb, Fcb = %p\n", Fcb);
if ( TrailingBackslash ) {
try_return( Iosb.Status = STATUS_OBJECT_NAME_INVALID );
}

We observe the first case with a typical CreateFileW call, which uses
the option FILE_NON_DIRECTORY_FILE. In the following example "baz" is
a regular file:

>>> f = open(r'foo\bar\baz') # success
>>> try: open('foo\\bar\\baz\\')
... except OSError as e: print(e)
...
[Errno 22] Invalid argument: 'foo\\bar\\baz\\'

C EINVAL (22) is mapped from Windows ERROR_INVALID_NAME (123), which
is mapped from NT STATUS_OBJECT_NAME_INVALID (0xC033).

We can observe the second case with os.stat(), which calls CreateFileW
with backup semantics, which omits the FILE_NON_DIRECTORY_FILE option
in order to allow the call to open either a file or directory. In this
case the filesystem has to actually check that "baz" is a data file
before it can fail the call, as was shown in the fasfat code snippet
above:

>>> try: os.stat('foo\\bar\\baz\\')
... except OSError as e: print(e)
...
[WinError 123] The filename, directory name, or
volume label syntax is incorrect: 'foo\\bar\\baz\\'

[1] 
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/8ada5fbe-db4e-49fd-aef6-20d54b748e40
[2] 
https://github.com/microsoft/Windows-driver-samples/blob/74200/filesys/fastfat/create.c#L1398
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QPDXUY4OXR2XOCNUHSKC7QRQGAXWV5WQ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread eryk sun
On 8/10/19, eryk sun  wrote:
>
> The per-logon directory is located at "\\Sessions\\0\\DosDevices\\ Session ID>". In the Windows API, it's accessible as "//?/" or "//./",
> or with any mix of forward slashes or backslashes, but only the
> all-backslash form is special-cased to bypass the normalization step.

Correction: I slipped up in that last sentence. Only the all-backslash
form that's in the "?" namespace bypasses normalization, as most
Windows users should at least have seen in passing. These special
device paths pop up here and there. For example, r'\\?\C:\Temp\spam. .
.' allows creating or opening a file named "spam. . .", which the
Windows API would normalize as "spam". But I don't recommend
sidestepping the normal rules -- except for the path length limit
because there are ways to make long paths conveniently accessible
(e.g. symbolic links, bind-like mountpoints, and subst drives).

Sometimes people also come across "\\??\\" paths and come to the
mistaken conclusion that these can be used in Windows API programs.
No, they're for NT. The runtime library mangles them, e.g.
nt._getfullpathname(r'\??\C:') == 'C:\\??\\C:'.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VANNT2SIH7EBPEOUC6M7HI7PYASJPYC7/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-10 Thread eryk sun
On 8/9/19, Steven D'Aprano  wrote:
>
> I'm also curious why the string needs to *end* with a backslash. Both of
> these are the same path:
>
> C:\foo\bar\baz\
> C:\foo\bar\baz

The above two cases are equivalent. But that's not the case for the
root directory. Unlike Unix, filesystem namespaces are implemented
directly on devices. For example, "//./C:" might resolve to a volume
device such as "\\Device\\HarddiskVolume2". With a trailing slash
added, "//./C:/" resolves to "\\Device\\HarddiskVolume2\\", which is
the root directory of the mounted filesystem on the volume.

Also, as a classic DOS path, "C:" without a trailing slash expands to
the working directory on drive "C:". The system runtime library looks
for this path in a hidden environment variable named "=C:". The
Windows API never sets these hidden "=X:" drive variables. The C
runtime sets them, as does Python's os.chdir.

Some volume-management functions require a trailing slash or
backslash, such as GetVolumeInformationW [1].
GetVolumeNameForVolumeMountPointW [2] actually requires it to be a
trailing backslash. It will not accept a trailing forward slash such
as "C:\\Mount\\Volume/" (a bug since Windows 2000). The volume name
(e.g. "?\\Volume{----}\\")
returned by the latter includes a trailing backslash, which must be
present in the target path in order for a mountpoint to function
properly as a directory, else it would resolve to the volume device
instead of the root directory.

[1] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumeinformationw
[2] 
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvolumenameforvolumemountpointw

> If they're Windows developers, they ought to be aware that the Windows
> file system API allows / anywhere you can use \ and it is the
> common convention in Python to use forward slashes.

The Windows file API actually does not allow slash to be used anywhere
that we can use backslash. It's usually allowed, but not always. For
the most part, the conditions where forward slash is not supported are
intentional.

Windows replaces forward slash with backslash in normal DOS paths and
normal device paths. But sometimes we have to use a special form of
device path that bypasses normalization. A path that isn't normalized
can only use backslash as the path separator. For example, the most
common case is that the process doesn't have long paths enabled. In
this case we're limited to MAX_PATH, which limits file paths to a
paltry 259 characters (sans the terminating null); the current
directory to 258 characters (sans a trailing backslash and null); and
the path of a new directory to 247 characters (subtract 12 from 259 to
leave space for an 8.3 filename). By skipping DOS normalization, we
can access a path with up to about 32,750 characters (i.e. 32,767 sans
the length of the device name in the final NT path under
"\\Device\\").

(Long normalized paths are available starting in Windows 10, but the
system policy that allows this is disabled by default, and even if
enabled, each application has to declare itself to be long-path aware
in its manifest. This is declared for python[w].exe in Python 3.6+.)

A device path is an explicit reference to a user's local device
directory (in the object namespace), which shadows the global device
directory. In NT, this directory is aliased to a special "\\??\\"
prefix (backslash only). A local device directory is created for each
logon session (not terminal session) by the security system that runs
in terminal session 0 (i.e. the system services session). The
per-logon directory is located at "\\Sessions\\0\\DosDevices\\". In the Windows API, it's accessible as "//?/" or "//./",
or with any mix of forward slashes or backslashes, but only the
all-backslash form is special-cased to bypass the normalization step.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3SDFM2EKFO3UNTATS7KVBY2WOUTFMAF5/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-07 Thread eryk sun
On 8/7/19, Steve Dower  wrote:
>
> * change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to
> append (or chain) an extra message when either of the filenames contains c
> control characters (or change OSError to do it, or the default
> sys.excepthook)

On a related note for Windows, if the error is specifically
ERROR_INVALID_NAME, we could extend this to look for and warn about
the five reserved wildcard characters (asterisk, question mark, double
quote, less than, greater than), pipe, and colon. It's only sometimes
the case for colon because it's allowed in device names and used as
the name and type delimiter for stream names.

Kernel object names don't reserve wildcard characters, pipe, and
colon. So I wouldn't want anything but the control-character warning
if it's say ERROR_FILE_NOT_FOUND. An example would be
SharedMemory(name='Global\test'), or a similar error for registry key
and value names such as OpenKey(hkey, 'spam\test'), that is if winreg
were updated to include the name in the exception. Note that forward
slash is just a name character in these cases, not a path separator,
so we have to use backslash, even if just via replace('/', '\\').
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UFMVFL4QDUXLZFBWVW4YLAKPHQ6LTPDK/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-06 Thread eryk sun
On 8/5/19, Steve Dower  wrote:
>
> though I do also see many people bitten by FileNotFoundError
> because of a '\n' in their filename.

Thankfully the common filesystems used in Windows reserve ASCII
control characters in filenames (except not in stream names or
named-pipe names). So a mistaken string literal usually fails with a
more obvious ERROR_INVALID_NAME or C EINVAL instead of a mysterious
ERROR_FILE_NOT_FOUND or C ENOENT.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HTOH6MOHYIDD2UX7YSM2ZVY4BP32ATYL/


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-23 Thread eryk sun
On 3/23/19, Cameron Simpson  wrote:
>
> Also, the common examples are attackers who are not the user making the
> tempfile, in which case the _default_ mktemp is sort of secure with the
> above because it gets made in /tmp which on a modern POSIX system
> prevents _other_ uses from removing/renaming a file. (And Eryk I think
> described the Windows situation which is similarly protected).

Using NamedTemporaryFile(delete=False) or mkstemp() ensures that the
file is created and opened securely. in contrast, the filename from
mktemp() might be used naively in POSIX, such as open(path, "w"). This
file might grant read access to everyone depending on the file-mode
creation mask (umask). Also, since it neglects to use exclusive mode
("x"), it might open an existing file that grants read-write
permission to the world, or maybe it's a symlink.

By default, even naive use of the mktemp() name in Windows remains
secure, since every user has a separate temp directory that's only
accessible by privileged users such as SYSTEM, Administrators, and
Backup Operators (with SeBackupPrivilege and SeRestorePrivilege
enabled). The primary issue with a short name is an accidental name
collision with another program that's not as careful as Python's
tempfile. Using a longer name decreases the chance of this to
practically nothing.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-21 Thread eryk sun
On 3/20/19, Greg Ewing  wrote:
> Antoine Pitrou wrote:
>
>> How is it more secure than using mktemp()?
>
> It's not, but it solves the problem someone suggested of another
> program not being able to access and/or delete the file.

NamedTemporaryFile(delete=False) is more secure than naive use of
mktemp(). The file is created exclusively (O_EXCL). Another standard
user can't overwrite it. Nor can another standard user delete it if
it's created in the default temp directory (e.g. POSIX "/tmp" has the
sticky bit set). mkstemp() is similar but lacks the convenience and
reliable resource management of a Python file wrapper.

There's still the problem of accidental name collisions with other
processes that can access the file, i.e. processes running as the same
user or, in POSIX, processes running as the super user. I saw a
suggestion in this thread to increase the length of the random
sequence from 8 characters up to 22 characters in order to make this
problem extremely improbable.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-20 Thread eryk sun
On 3/20/19, Anders Munch  wrote:
>
> You are right, I must have mentally reversed the polarity of the delete
> argument.  And I didn't realise that the access right on a file had the
> power to prevent itself from being removed from the folder that it's in.  I
> thought the access flags were a property of the file itself and not the
> directory entry. Not sure how that works.

In POSIX, it's secure so long as we use a directory that doesn't grant
write access to other users, or one that has the sticky bit set such
as "/tmp". A directory that has the sticky bit set allows only root
and the owner of the file to unlink the file.

In Windows, a user's default %TEMP% directory is only accessible by
the user, SYSTEM, and Administrators. The only way others can delete a
file there is if the file security is modified to allow it (possible
for individual files, unlike POSIX). This works even with no access to
the temp directory itself because users have SeChangeNotifyPrivilege,
which bypasses traverse (execute) access checks.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remove tempfile.mktemp()

2019-03-19 Thread eryk sun
On 3/19/19, Victor Stinner  wrote:
>
> When I write tests, I don't really care of security, but
> NamedTemporaryFile caused me many troubles on Windows: you cannot
> delete a file if it's still open in a another program. It's way more
> convenient to use tempfile.mktemp().

Opening the file again for normal access is problematic.
NamedTemporaryFile opens it with delete access, but Python's open()
function doesn't support delete-access sharing unless an opener is
used that calls CreateFileW.

NamedTemporaryFile does open files with delete-access sharing, so any
process can delete the file if it's allowed by the file's security and
attributes. You may be thinking of unlinking. In Windows versions
prior to 10, that's always a two-step process. A file with its delete
disposition set doesn't get unlinked until all references for all open
instances are closed.

In Windows 10 (release 1709+), we have the option of using
SetFileInformationByHandle: FileDispositionInfoEx (21) with
FILE_DISPOSITION_FLAG_POSIX_SEMANTICS (2) and
FILE_DISPOSITION_FLAG_DELETE (1). The online documentation hasn't been
updated to include this, but it's supported in the headers for
_WIN32_WINNT_WIN10_RS1 and later. This operation unlinks the file as
soon as we close our handle, even if it has existing references. This
is explained in the remarks for the underlying NT system call [1]. In
particular this resolves the race condition related to handles opened
by anti-malware programs.

It may be worth adding support for deleting files by handle that tries
FileDispositionInfoEx in 1709+. This will work in about half of all
Windows systems. (About 40% still run Windows 7.) It's not a panacea
for Windows file-deleting woes. We still need to be able to open the
file with delete access, which requires existing opens to share delete
access.

[1]: 
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ntddk/ns-ntddk-_file_disposition_information_ex
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding test.support.safe_rmpath()

2019-02-19 Thread eryk sun
On 2/16/19, Richard Levasseur  wrote:
>
> First: The tempfile module is a poor fit for testing (don't get me wrong,
> it works, but its not *nice for use in tests*)*.* This is because:
> 1. Using it as a context manager is distracting. The indentation signifies
> a conceptual scope the reader needs to be aware of, but in a test context,
> its usually not useful. At worst, it covers most of the test. At best, its
> constrained to a block at the start.
> 2. tempfile defaults to binary mode instead of text; just another thing to
> bite you.
> 3. On windows, you can't reopen the file, so for cross-platform stuff, you
> can't even use it for this case.

Python opens files with at least read and write sharing in Windows, so
typically there's no problem with opening a file multiple times. The
problem is with deleting and renaming open files. Typically delete
access is not shared, and, even if it is, a normal delete just sets a
disposition. A deleted file is unlinked only after all handles have
been closed. Similarly, replacing an open file via os.replace will
fail because it can't be unlinked.

In Windows 10 we can delete and rename files with POSIX-like
semantics. To do this, open a handle with delete access and call
SetFileInformationByHandle to set the FileDispositionInfoEx or
FileRenameInfoEx information. Thus far this is supported by NTFS, and
I think it's only NTFS. It's still not completely like POSIX, since it
requires delete-access sharing. But it does provide immediate
unlinking, which avoids the race condition when trying to remove a
directory that has watched files. Programs that have open files that
have been unlinked can continue to access them normally.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ctypes: is it intentional that id() is the only way to get the address of an object?

2019-01-19 Thread eryk sun
On 1/18/19, Steven D'Aprano  wrote:
> On Thu, Jan 17, 2019 at 07:50:51AM -0600, eryk sun wrote:
>>
>> It's kind of dangerous to pass an object to C without an increment of
>> its reference count.
>
> "Kind of dangerous?" How dangerous?

I take that back. Dangerous is too strong of a word. It can be managed
if we're careful to avoid expressions like c_function(id(f())). Using
py_object simply avoids that problem.

Bear with me while I make a few more comments about py_object, even
though it's straying off topic.

For a type "O" argument (i.e. py_object is in the function's
`argtypes`), we might be able to borrow the reference from the
argument tuple. As implemented, however, the argument actually keeps
its own reference. For example, we can observe this by calling the
from_param method:

>>> b = bytearray(b'spam')
>>> arg = ctypes.py_object.from_param(b)
>>> print(arg)

>>> print(arg._obj)
bytearray(b'spam')

This is due to the type "O" setfunc, which needs to keep a reference
to the object when setting the value of a py_object instance. The
reference is stored as the _objects attribute. (For non-simple pointer
and aggregate types, _objects is instead a dict keyed by the index as
a hexadecimal string.)

(The getfunc and setfunc of a simple ctypes object are called to get
and set the value, which also includes cases in which we don't have an
actual py_object instance, such as function call arguments; pointer
and array indexes; and struct and union fields. These functions are
defined in Modules/_ctypes/cfield.c.)

IMO, a downside of py_object is that it's a simple type, so the
getfunc gets called automatically when getting fields or indexes. This
is annoying for py_object since a NULL value raises ValueError.
Returning None in this case isn't possible, in contrast to other
simple pointer types. We can work around this by subclassing
py_object. For example:

>>> a1 = (ctypes.py_object * 1)()
>>> a1[0]
Traceback (most recent call last):
  File "", line 1, in 
ValueError: PyObject is NULL

py_object = type('py_object', (ctypes.py_object,), {})

>>> a2 = (py_object * 1)()
>>> a2[0]


Then, like all ctypes pointers, a false boolean value means it's NULL:

>>> bool(a2[0])
False
>>> a2[0] = b'spam'
>>> bool(a2[0])
True

py_object doesn't help if a library holds onto the pointer and tries
to use it later on. For example, with Python's C API there are
functions that 'steal' a reference (with the assumption that it's a
newly created object, in which case it's more like 'claiming'), such
as PyTuple_SetItem. In this case, we need to increment the reference
count via Py_IncRef.

py_object can be returned from a callback without leaking a reference,
assuming the library manages the new reference. In contrast, other
types that need memory support have to leak a reference (e.g.
c_wchar_p, i.e. type "Z", needs a capsule object for the wchar_t
buffer). In case of a leak, we get warned with RuntimeWarning('memory
leak in callback function.').

> If I am reading this correctly, I think you are saying that using id()
> in this way is never(?) correct.

Yes, it's incorrect, but I've been guilty of using id() like this,
too, because it's convenient. Perhaps we could provide a function
that's explicitly specified to return the address, if implemented.
Maybe call it sys.getaddress()?

In my first reply, I provided two alternatives that use ctypes to
return the address instead of id(). So there's that as well. The fine
print is that ctypes is optional in the standard library. Platforms
and implementations don't have to support it.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ctypes: is it intentional that id() is the only way to get the address of an object?

2019-01-17 Thread eryk sun
On 1/17/19, Steven D'Aprano  wrote:
>
> I understand that the only way to pass the address of an object to
> ctypes is to use that id. Is that intentional?

It's kind of dangerous to pass an object to C without an increment of
its reference count. The proper way is to use a simple pointer of type
"O" (object), which is already created for you as the "py_object"
type.

>>> ctypes.py_object._type_
'O'
>>> ctypes.py_object.__bases__
(,)

It keeps a reference in the readonly _objects attribute. For example:

>>> b = bytearray(b'spam')
>>> sys.getrefcount(b)
2
>>> cb = ctypes.py_object(b)
>>> sys.getrefcount(b)
3
>>> cb._objects
bytearray(b'spam')
>>> del cb
>>> sys.getrefcount(b)
2

If you need the address without relying on id(), cast to a void pointer:

>>> ctypes.POINTER(ctypes.c_void_p)(cb)[0] == id(b)
True

Or instantiate a c_void_p from the py_object as a buffer:

>>> ctypes.c_void_p.from_buffer(cb).value == id(b)
True

Note that ctypes.cast() doesn't work in this case. It's implemented as
an FFI function that takes the object address as a void pointer. The
from_param method of c_void_p doesn't support py_object:

>>> ctypes.c_void_p.from_param(cb)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: wrong type
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Questions about signal handling.

2018-09-24 Thread eryk sun
On Fri, Sep 21, 2018 at 6:10 PM, Victor Stinner  wrote:
>
> Moreover, you can get the signal while you don't hold the GIL :-)

Note that, in Windows, SIGINT and SIGBREAK are implemented in the C
runtime and linked to the corresponding console control events in a
console application, such as python.exe. Console control events are
delivered on a new thread (i.e. no Python thread state) that starts at
CtrlRoutine in kernelbase.dll. The session server (csrss.exe) creates
this thread remotely upon request from the console host process
(conhost.exe).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tests failing on Windows with TESTFN

2018-07-29 Thread eryk sun
On Sun, Jul 29, 2018 at 2:21 PM, Jeremy Kloth  wrote:
>
>  try:
>  os.rename(new_file.name, self._path)
>  except FileExistsError:
> -os.remove(self._path)
> +temp_name = _create_temporary_name(self._path)
> +os.rename(self._path, temp_name)
>  os.rename(new_file.name, self._path)
> +os.remove(temp_name)

This should call os.replace to allow the file system to replace the
existing file.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tests failing on Windows with TESTFN

2018-07-29 Thread eryk sun
On Sun, Jul 29, 2018 at 12:35 PM, Steve Dower  wrote:
>
> One additional thing that may help (if support.unlink doesn't already do it)
> is to rename the file before deleting it. Renames are always possible even
> with open handles, and then you can create a new file at the original name.

Renaming open files typically fails with a sharing violation (32).
Most programs open files with read and write sharing but not delete
sharing. This applies to Python, except temporary files (i.e.
os.O_TEMPORARY) do share delete access. Renaming a file is effectively
adding a new link and deleting the old link, so it requires opening
the file with delete access. Also, renaming a directory that has open
files in the tree fails with access denied (5).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tests failing on Windows with TESTFN

2018-07-29 Thread eryk sun
On Sun, Jul 29, 2018 at 9:13 AM, Tim Golden  wrote:
>
> For an example:
>
> http://tjg.org.uk/test.log
>
> Thinkpad T420, 4Gb, i5, SSD
>
> Recently rebuilt and reinstalled: Win10, VS2017, TortoiseGit, standard
> Windows Antimalware, usual developer tools. That particular run was done
> with the laptop unattended (ie nothing else going on at the front end).
> But the problem is certainly not specific to this laptop.

On my last run I had one test directory that wasn't removed properly,
but nothing like the flood of EACCES and ERROR_ACCES_DENIED errors you
have in that log. Then again, I had Defender disabled by policy. I'll
enable it and add exceptions for my source and build directories, and
see how it goes.

It would be nice if OSError instances always captured the last Windows
error and NT status values when instantiated. We have no guarantees
that these values are valid, but in many contexts they are. In the
case of a test log, it would certainly help to clarify errors without
having to individually investigate each one.

For example, trying to open a directory as a file is a common error,
but all Python tells us on Windows is that it failed with EACCES. In
this case the last Windows error is ERROR_ACCESS_DENIED, which doesn't
help, but the last NT status code is STATUS_FILE_IS_A_DIRECTORY
(0xc0ba).
Here's a file opener that adds last_winerror and last_ntstatus values.

import os
ntdll = ctypes.WinDLL('ntdll')
kernel32 = ctypes.WinDLL('kernel32')

def nt_opener(path, flags):
try:
return os.open(path, flags)
except OSError as e:
last_ntstatus = ntdll.RtlGetLastNtStatus()
last_winerror = kernel32.GetLastError()
e.last_ntstatus = last_ntstatus & 2**32 - 1
e.last_winerror = (last_winerror if e.winerror is None
else e.winerror)
if e.errno is not None or e.winerror is not None:
# hack the last error/status into the error message
e.strerror = '[Last NtStatus {:#08x}] {}'.format(
e.last_ntstatus, e.strerror or '')
if e.winerror is None:
e.strerror = '[Last WinError {}] {}'.format(
e.last_winerror, e.strerror or '')
raise e from None

Opening a directory as a file:

>>> open('C:/Windows', opener=nt_opener)
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 17, in nt_opener
  File "", line 3, in nt_opener
PermissionError: [Errno 13] [Last WinError 5] [Last NtStatus 0xc0ba]
Permission denied: 'C:/Windows'
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tests failing on Windows with TESTFN

2018-07-28 Thread eryk sun
On Sat, Jul 28, 2018 at 9:17 PM, Jeremy Kloth  wrote:
>
> *PLEASE*, don't use tempfile to create files/directories in tests.  It
> is unfriendly to (Windows) buildbots.  The current approach of
> directory-per-process ensures no test turds are left behind, whereas
> the tempfile solution slowly fills up my buildbot.  Windows doesn't
> natively clean out the temp directory.

FYI, Windows 10 storage sense (under system->storage) can be
configured to delete temporary files on a schedule. Of course that
doesn't help with older systems.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tests failing on Windows with TESTFN

2018-07-28 Thread eryk sun
On Sat, Jul 28, 2018 at 5:20 PM, Tim Golden  wrote:
>
> I've got a mixture of Permission (winerror 13) & Access errors (winerror 5)

EACCES (13) is a CRT errno value. Python raises PermissionError for
EACCES and EPERM (1, not used). It also does the reverse mapping for
WinAPI calls, so PermissionError is raised either way. About 25 WinAPI
error codes map to EACCES. Commonly it's due to either
ERROR_ACCESS_DENIED (5) or ERROR_SHARING_VIOLATION (32).

open() uses read-write sharing but not delete sharing. In this case
trying to either delete an already open file or open a file that's
already open with delete access (e.g. an O_TEMPORARY open) both fail
with a sharing violation.

An access-denied error could be due to a range of causes. Over 20
NTAPI status codes map to ERROR_ACCESS_DENIED. Commonly for a file
it's due to one of the following status codes:

STATUS_ACCESS_DENIED (0xc022)
The file security doesn't grant the requested access
to the caller.

STATUS_DELETE_PENDING (0xc056)
The file's delete disposition is set, i.e. it's flagged to be
deleted when the last handle is closed. Opening a new
handle is disallowed for any access.

STATUS_FILE_IS_A_DIRECTORY (0xc0ba)
Except when using backup semantics, CreateFile calls
NtCreateFile with the flag FILE_NON_DIRECTORY_FILE,
so only non-directory files/devices can be opened.

STATUS_CANNOT_DELETE (0xc121)
The file is either readonly or memory mapped.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Windows] how to prevent the wrong version of zlib1.dll to be used by lib-dynload modules

2018-07-24 Thread eryk sun
On Mon, Jul 23, 2018 at 2:31 PM, Eric Le Lay  wrote:
>
> I encountered a problem with the Windows packaging of gPodder[1]
> using msys2:

Are you using regular Windows Python with msys2, or their custom port?

I installed msys2 and used pacman to install Python 3.6. The msys2
environment names libraries with an "msys-" prefix in the "/usr/bin"
directory, such as msys-python3.6m.dll, msys-readline7.dll, and
msys-z.dll (zlib). This is also the application directory of the msys2
build of Python (i.e. "/usr/bin/python.exe"), so it's the first
directory in the default DLL search path (ahead of system directories
and PATH). Unlike Windows Python, msys2 Python does not use the
alternate search path that replaces the application directory with the
DLL directory in the search path.

A way to implement this that allows multiple versions of a DLL to be
loaded in the same process is to use an assembly that includes the DLL
file in its ".manifest" file. Add the assembly to the
extension module's #2 manifest (typically embedded, but can be
".2"). The system looks for the ""
subdirectory in the module directory. In Windows 7+ you can also add a
probing path in a config file (i.e. ".config") [1] that
extends the SxS search path with up to 9 relative paths, which can be
up to two levels above the module directory (i.e. "..\..").

[1]: 
https://docs.microsoft.com/en-us/windows/desktop/SbsCs/application-configuration-files
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subprocess not escaping "^" on Windows

2018-01-09 Thread eryk sun
On Mon, Jan 8, 2018 at 9:26 PM, Steve Dower <steve.do...@python.org> wrote:
> On 09Jan2018 0744, eryk sun wrote:
>>
>> It's common to discourage using `shell=True` because it's considered
>> insecure. One of the reasons to use CMD in Windows is that it tries
>> ShellExecuteEx if CreateProcess fails. ShellExecuteEx supports "App
>> Paths" commands, file actions (open, edit, print), UAC elevation (via
>> "runas" or if requested by the manifest), protocols (including
>> "shell:"), and opening folders in Explorer. It isn't a scripting
>> language, however, so it doesn't pose the same risk as using CMD.
>> Calling ShellExecuteEx could be integrated in subprocess as a new
>> Popen parameter, such as `winshell` or `shellex`.
>
> This can also be used directly as os.startfile, the only downside being that
> you can't wait for the process to complete (but that's due to the underlying
> API, which may not end up starting a process but rather sending a message to
> an existing long-running one such as explorer.exe). I'd certainly recommend
> it for actions like "open this file with its default editor" or "browse to
> this web page with the default browser".

Yes, I forgot to mention that os.startfile can work sometimes. But
often one needs to pass command-line parameters. Also, os.startfile
also can't set a different working directory, nShow SW_* window state,
or flags such as SEE_MASK_NO_CONSOLE (prevent allocating a new
console). Rather than extend os.startfile, it seems more useful in
general to wrap ShellExecuteEx in _winapi and extend subprocess. Then
os.startfile can be reimplemented in terms of subprocess.Popen, like
os.popen.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subprocess not escaping "^" on Windows

2018-01-08 Thread eryk sun
On Sun, Jan 7, 2018 at 6:48 PM, Christian Tismer  wrote:
> That is true.
> list2cmdline escapes partially, but on NT and Windows10, the "^" must
> also be escaped, but is not. The "|" pipe symbol must also be escaped
> by "^", as many others as well.
>
> The effect was that passing a rexexp as parameter to a windows program
> gave me strange effects, and I recognized that "^" was missing.
>
> So I was asking for a coherent solution:
> Escape things completely or omit "shell=True".
>
> Yes, there is a list of chars to escape, and it is Windows version
> dependent. I can provide it if it makes sense.

subprocess.list2cmdline is meant to help support cross-platform code,
since Windows uses a command-line instead of an argv array. The
command-line parsing rules used by VC++ (and CommandLineToArgvW) are
the most common in practice. list2cmdline is intended for this set of
applications. Otherwise pass args as a string instead of a list.

In CMD we can quote part of a command line in double quotes to escape
special characters. The quotes are preserved in the application
command line. This can get complicated when we need to preserve
literal quotes in the command line of an application that uses VC++
backslash escaping. CMD doesn't recognize backslash as an escape
character, which gives rise to a quoting conflict between CMD and the
application. Some applications support translating single quotes to
double quotes in this case (e.g. schtasks.exe). Single quotes
generally aren't used in CMD, except in a `for /f` loop, but this can
be forced to use backquotes instead via `usebackq`.

Quoting doesn't escape the percent character that's used for
environment variables. In batch scripts percent can be escaped by
doubling it, but not in /c commands. Some applications can translate a
substitute character in this case, such as "~" (e.g. setx.exe).
Otherwise, we can usually disrupt matching an existing variable by
adding a "^" character after the first percent character. The "^"
escape character gets consumed later on in parsing -- as long as it's
not quoted (see the previous paragraph for complications).
Nonetheless, "^" is a valid name character, so there's still a
possibility of matching an environment variable (perhaps a malicious
one).  For example:

C:\>python -c "print('"%^"time%')"
%time%

C:\>set "^"time=spam"
C:\>python -c "print('"%^"time%')"
spam

Anyway, we're supposed to pass args as a string when using the shell
in POSIX, so we may as well stay consistent with this in Windows.
Practically no one wants the resulting behavior when passing a shell
command as a list in POSIX. For example:

>>> subprocess.call(['echo \\$0=$0 \\$1=$1', 'spam', 'eggs'], shell=True)
$0=spam $1=eggs

It's common to discourage using `shell=True` because it's considered
insecure. One of the reasons to use CMD in Windows is that it tries
ShellExecuteEx if CreateProcess fails. ShellExecuteEx supports "App
Paths" commands, file actions (open, edit, print), UAC elevation (via
"runas" or if requested by the manifest), protocols (including
"shell:"), and opening folders in Explorer. It isn't a scripting
language, however, so it doesn't pose the same risk as using CMD.
Calling ShellExecuteEx could be integrated in subprocess as a new
Popen parameter, such as `winshell` or `shellex`.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ctypes, memory mapped files and context manager

2017-01-08 Thread eryk sun
On Sun, Jan 8, 2017 at 8:25 AM, Armin Rigo  wrote:
>
> c_raw = ctypes.PYFUNCTYPE(ctypes.c_void_p, ctypes.c_void_p)(lambda p: p)

Use ctypes.addressof.

> addr = c_raw(ctypes.pointer(T.from_buffer(m)))
> b = ctypes.cast(addr, ctypes.POINTER(T)).contents

ctypes.cast uses an FFI call. In this case you can more simply use from_address:

b = T.from_address(ctypes.addressof(T.from_buffer(m)))

There's no supporting connection between b and m. If m was allocated
from a heap/pool/freelist, as opposed to a separate mmap
(VirtualAlloc) call, then you won't necessarily get a segfault (access
violation) if b is used after m has been deallocated or internally
realloc'd. It can lead to corrupt data and difficult to diagnose
errors. You're lucky if it segfaults.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ctypes, memory mapped files and context manager

2017-01-05 Thread eryk sun
On Thu, Jan 5, 2017 at 11:28 PM, Hans-Peter Jansen  wrote:
> Leaves the question, how stable this "interface" is?
> Accessing _objects here belongs to voodoo programming practices of course, but
> the magic is locally limited to just two lines of code, which is acceptable in
> order to get this context manager working without messing with the rest of the
> code.

My intent was not to suggest that anyone directly use the _objects
value / dict  in production code. It's a private implementation
detail. I was demonstrating the problem of simply releasing the buffer
and the large number of checks that would be required if b_ptr is
cleared. It would be simpler for a release() method to allocate new
memory for the object and set the b_needsfree flag, but this may hide
bugs. Operating on a released object should raise an exception.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ctypes, memory mapped files and context manager

2017-01-05 Thread eryk sun
On Thu, Jan 5, 2017 at 2:37 AM, Nick Coghlan  wrote:
> On 5 January 2017 at 10:28, Hans-Peter Jansen  wrote:
>> In order to get this working properly, the ctypes mapping needs a method to
>> free the mapping actively. E.g.:
>>
>> @contextmanager
>> def map_struct(m, n):
>> m.resize(n * mmap.PAGESIZE)
>> yield T.from_buffer(m)
>>  T.unmap_buffer(m)
>>
>> Other attempts with weakref and the like do not work due to the nature of the
>> ctypes types.
>
> I don't know ctypes well enough myself to comment on the idea of
> offering fully deterministic cleanup, but the closest you could get to
> that without requiring a change to ctypes is to have the context
> manager introduce a layer of indirection:

I think that's the best you can do with the current state of ctypes.

from_buffer was made safer in Python 3 by ensuring it keeps a
memoryview reference in the _objects attribute (i.e.
CDataObject.b_objects). Hans-Peter's problem is a consequence of this
reference. Simply calling release() on the underlying memoryview is
unsafe. For example:

>>> b = bytearray(2**20)
>>> a = ctypes.c_char.from_buffer(b)
>>> a._objects

>>> a._objects.release()
>>> del b
>>> a.value
Segmentation fault (core dumped)

A release() method on ctypes objects could release the memoryview and
also clear the CDataObject b_ptr field. In this case, any function
that accesses b_ptr would have to be modified to raise a ValueError
for a NULL value. Currently ctypes assumes b_ptr is valid, so this
would require adding a lot of checks.

On a related note, ctypes objects aren't tracking the number of
exported views like they should. resize() should raise a BufferError
in the following example:

>>> b = (ctypes.c_char * (2**20))(255)
>>> m = memoryview(b).cast('B')
>>> m[0]
255
>>> ctypes.resize(b, 2**22)
>>> m[0]
Segmentation fault (core dumped)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 528: Change Windows console encoding to UTF-8

2016-09-05 Thread eryk sun
On Mon, Sep 5, 2016 at 9:45 PM, Steve Dower  wrote:
>
> So it works, though the behaviour is a little strange when you do it from
> the interactive prompt:
>
 sys.stdin.buffer.raw.read(1)
> ɒprint('hi')
> b'\xc9'
 hi
 sys.stdin.buffer.raw.read(1)
> b'\x92'

>
> What happens here is the raw.read(1) rounds one byte up to one character,
> reads the turned alpha, returns a single byte of the two byte encoded form
> and caches the second byte. Then interactive mode reads from stdin and gets
> the rest of the characters, starting from the print() and executes that.
> Finally the next call to raw.read(1) returns the cached second byte of the
> turned alpha.
>
> This is basically only a problem because the readline implementation is
> totally separate from the stdin object and doesn't know about the small
> cache (and for now, I think it's going to stay that way - merging readline
> and stdin would be great, but is a fairly significant task that won't make
> 3.6 at this stage).

It needs to read a minimum of 2 codes in case the first character is a
lead surrogate. It can use a length 2 WCHAR buffer and remember how
many bytes have been written (for the general case -- not specifically
for this case).

Example failure using your 3rd patch:

>>> _ = write_console_input("\U0001print('hi')\r\n");\
... raw_read(1)
print('hi')
b'\xef'
>>>   File "", line 1
�print('hi')
 ^
SyntaxError: invalid character in identifier
>>> raw_read(1)
b'\xbf'
>>> raw_read(1)
b'\xbd'

The raw read captures the first surrogate code, and transcodes it as
the replacement character b'\xef\xbf\xbd' (U+FFFD). Then PyOS_Readline
captures the 2nd surrogate and decodes it as the replacement
character.

In the general case in which a lead surrogate is the last code read,
but not at index 0, it can use the internal buffer to save the code
for the next call.

Surrogates that aren't in valid pairs should be allowed to pass
through via surrogatepass. This aims for consistency with the
filesystem encoding PEP.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 528: Change Windows console encoding to UTF-8

2016-09-05 Thread eryk sun
On Mon, Sep 5, 2016 at 7:54 PM, Steve Dower <steve.do...@python.org> wrote:
> On 05Sep2016 1234, eryk sun wrote:
>>
>> Also, the console is UCS-2, which can't be transcoded between UTF-16
>> and UTF-8. Supporting UCS-2 in the console would integrate nicely with
>> the filesystem PEP. It makes it always possible to print
>> os.listdir('.'), copy and paste, and read it back without data loss.
>
> Supporting UTF-8 actually works better for this. We already use
> surrogatepass explicitly (on the filesystem side, with PEP 529) and
> implicitly (on the console side, using the Windows conversion API).

CP_UTF8 requires valid UTF-16 text. MultiByteToWideChar and
WideCharToMultiByte are of no practical use here. For example:

>>> raw_read = sys.stdin.buffer.raw.read
>>> _ = write_console_input('\ud800\ud800\r\n'); raw_read(16)
��
b'\xef\xbf\xbd\xef\xbf\xbd\r\n'

This requires Python's "surrogatepass" error handler. It's also
required to decode UTF-8 that's potentially WTF-8 from
os.listdir(b'.'). Coming from the wild, there's a chance that
arbitrary bytes have invalid sequences other than lone surrogates, so
it needs to fall back on "replace" to deal with errors that
"surrogatepass" doesn't handle.

> Writing a partial character is easily avoidable by the user. We can either
> fail with an error or print garbage, and currently printing garbage is the
> most compatible behaviour. (Also occurs on Linux - I have a VM running this
> week for testing this stuff.)

Are you sure about that? The internal screen buffer of a Linux
terminal is bytes; it doesn't transcode to a wide-character format. In
the Unix world, almost everything is "get a byte, get a byte, get a
byte, byte, byte". Here's what I see in Ubuntu using GNOME Terminal,
for example:

>>> raw_write = sys.stdout.buffer.raw.write
>>> b = 'αβψδε\n'.encode()
>>> b
b'\xce\xb1\xce\xb2\xcf\x88\xce\xb4\xce\xb5\n'
>>> for c in b: _ = raw_write(bytes([c]))
...
αβψδε

Here it is on Windows with your patch:

>>> raw_write = sys.stdout.buffer.raw.write
>>> b = 'αβψδε\n'.encode()
>>> b
b'\xce\xb1\xce\xb2\xcf\x88\xce\xb4\xce\xb5\n'
>>> for c in b: _ = raw_write(bytes([c]))
...
��

For the write case this can be addressed by identifying an incomplete
sequence at the tail end and either buffering it as 'written' or
rejecting it for the user/buffer to try again with the complete
sequence. I think rejection isn't a good option when the incomplete
sequence starts at index 0. That should be buffered. I prefer
buffering in all cases.

>> It would probably be simpler to use UTF-16 in the main pipeline and
>> implement Martin's suggestion to mix in a UTF-8 buffer. The UTF-16
>> buffer could be renamed as "wbuffer", for expert use. However, if
>> you're fully committed to transcoding in the raw layer, I'm certain
>> that these problems can be addressed with small buffers and using
>> Python's codec machinery for a flexible mix of "surrogatepass" and
>> "replace" error handling.
>
> I don't think it actually makes things simpler. Having two buffers is
> generally a bad idea unless they are perfectly synced, which would be
> impossible here without data corruption (if you read half a utf-8 character
> sequence and then read the wide buffer, do you get that character or not?).

Martin's idea, as I understand it, is a UTF-8 buffer that reads from
and writes to the text wrapper. It necessarily consumes at least one
character and buffers it to allow reading per byte. Likewise for
writing, it buffers bytes until it can write a character to the text
wrapper. ISTM, it has to look for incomplete lead-continuation byte
sequences at the tail end, to hold them until the sequence is
complete, at which time it either decodes to a valid character or the
U+FFFD replacement character.

Also, I found that read(n) has to read a character at a time. That's
the only way to emulate line-input mode to detect "\n" and stop
reading. Technically this is implemented in a RawIOBase, which
dictates that operations should use a single system call, but since
it's interfacing with a text wrapper around a buffer around the actual
UCS-2 raw console stream, any notion of a 'system call' would be a
sham.

Because of the UTF-8 buffering there is a synchronization issue, but
it has character granularity. For example, when decoding UTF-8, you
don't get half of a surrogate pair. You decode the full character, and
write that as a discrete unit to the text wrapper. I'd have to
experiment to see how bad this can get. If it's too confusing the idea
isn't practical.

On the plus side, when working with text it's all native UCS-2 up to
the TextIOWrapper, so it's as 

Re: [Python-Dev] PEP 528: Change Windows console encoding to UTF-8

2016-09-05 Thread eryk sun
I have some suggestions. With ReadConsoleW, CPython can use the
pInputControl parameter to set a CtrlWakeup mask. This enables a
Unix-style Ctrl+D for ending a read without having to press enter. For
example:

>>> CTRL_MASK = 1 << 4
>>> inctrl = (ctypes.c_ulong * 4)(16, 0, CTRL_MASK, 0)
>>> _ = kernel32.ReadConsoleW(hStdIn, buf, 100, pn, inctrl); print()
spam
>>> buf.value
'spam\x04'
>>> pn[0]
5

read() would have to manually replace '\x04' with NUL. Ctrl+Z can also
be added to the mask:

>>> CTRL_MASK = 1 << 4 | 1 << 26
>>> inctrl = (ctypes.c_ulong * 4)(16, 0, CTRL_MASK, 0)
>>> _ = kernel32.ReadConsoleW(hStdIn, buf, 100, pn, inctrl); print()
spam
>>> buf.value
'spam\x1a'

I'd like a method to query, set and unset
ENABLE_VIRTUAL_TERMINAL_PROCESSING mode for the screen buffer
(sys.stdout and sys.stderr) without having to use ctypes. The console
in Windows 10 has built-in VT100 emulation, but it's initially
disabled. The cmd shell enables it, but Python scripts aren't always
run from cmd.exe. Sometimes they're run in a new console from Explorer
or via "start", etc. For example, IPython could check for this to
provide more bells and whistles when PyReadline isn't installed.

Finally, functions such as WriteConsoleInputW and
ReadConsoleOutputCharacter require opening CONIN$ or CONOUT$ with
GENERIC_READ | GENERIC_WRITE access. The initial handles given to a
console process have read-write access. For opening a new handle by
device name, WindowsConsoleIO should first try GENERIC_READ |
GENERIC_WRITE -- with a fallback to either GENERIC_READ or
GENERIC_WRITE. The fallback is necessary for CON, which uses the
desired access to determine whether to open the input buffer or screen
buffer.

---

Paul, do you have example code that uses the 'raw' stream? Using the
buffer should behave as it always has -- at least in this regard.
sys.stdin.buffer requests a large block, such as 8 KB. But since the
console defaults to a cooked mode (i.e. processed input and line input
-- control keys, command-line editing, input history, and aliases),
ReadConsole returns when enter is pressed or when interrupted. It
returns at least '\r\n', unless interrupted by Ctrl+C, Ctrl+Break or a
custom CtrlWakeup key. However, if line-input mode is disabled,
ReadConsole returns as soon as one or more characters is available in
the input buffer.

As to kbhit() returning true, this does not mean that read(1) from
console input won't block (not unless line-input mode is disabled). It
does mean that getwch() won't block (note the "w" in there; this one
reads Unicode characters).The CRT's conio functions (e.g. kbhit,
getwch) put the console input buffer in a raw mode (e.g. ^C is read as
'\x03' instead of generating a CTRL_C_EVENT) and call the lower-level
functions PeekConsoleInputW (kbhit) and ReadConsoleInputW (getwch), to
peek at and read input event records.

---

Splitting surrogate pairs across reads is a problem. Granted, this
should rarely be an issue given the size of the reads that the buffer
requests and the typical line length. In most cases the buffer
completely consumes the entire line in one read. But in principle the
raw stream shouldn't replace split surrogates with the U+FFFD
replacement character. For example, with Steve's patch from issue
1602:

>>> _ = write_console_input('\U0001\r\n');\
... b1 = raw_read(4); b2 = raw_read(4); b3 = raw_read(8)

>>> b1, b2
(b'\xef\xbf\xbd', b'\xef\xbf\xbd')

Splitting UTF-8 sequences across writes is more common. Currently a
raw write doesn't handle this correctly:

>>> b = 'eggs \U0001 spam\n'.encode('utf-8')
>>> _ = raw_write(b[:6]); _ = raw_write(b[6:])
eggs  spam

Also, the console is UCS-2, which can't be transcoded between UTF-16
and UTF-8. Supporting UCS-2 in the console would integrate nicely with
the filesystem PEP. It makes it always possible to print
os.listdir('.'), copy and paste, and read it back without data loss.

It would probably be simpler to use UTF-16 in the main pipeline and
implement Martin's suggestion to mix in a UTF-8 buffer. The UTF-16
buffer could be renamed as "wbuffer", for expert use. However, if
you're fully committed to transcoding in the raw layer, I'm certain
that these problems can be addressed with small buffers and using
Python's codec machinery for a flexible mix of "surrogatepass" and
"replace" error handling.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] File system path encoding on Windows

2016-08-22 Thread eryk sun
On Mon, Aug 22, 2016 at 3:58 PM, Steve Dower  wrote:
> All MSVC users have been pushed towards Unicode for many years. The .NET
> Framework has defaulted to UTF-8 its entire existence. The use of code pages
> has been discouraged for decades. We're not going first :)

I just wrote a simple function to enumerate the 822 system locales on
my Windows box (using EnumSystemLocalesEx and GetLocaleInfoEx, which
are Unicode-only functions), and 36.7% of them lack an ANSI codepage.
They're Unicode-only locales. UTF-8 is the only way to support these
locales with a bytes API.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the Argument Clinic DSL

2016-08-04 Thread eryk sun
On Thu, Aug 4, 2016 at 11:33 PM, Alexander Belopolsky
 wrote:
>
> On Thu, Aug 4, 2016 at 7:12 PM, Larry Hastings  wrote:
>>
>> C extension functions get the module passed in automatically, but this is
>> done internally and from the Python level you can't see it.
>
> Always something new to learn!  This was not so in Python 2.x - self was
> passed as NULL to the C module functions.  When did this change?

In 2.x this is the `self` parameter (actually named "passthrough" in
the source) of Py_InitModule4 [1, 2]. You probably use the
Py_InitModule or Py_InitModule3 macros, which pass NULL for this
parameter:

#define Py_InitModule(name, methods) \
Py_InitModule4(name, methods, (char *)NULL, (PyObject *)NULL, \
   PYTHON_API_VERSION)

#define Py_InitModule3(name, methods, doc) \
Py_InitModule4(name, methods, doc, (PyObject *)NULL, \
   PYTHON_API_VERSION)

Python 3's PyModule_Create2 [3-5] API makes this a reference to the
module. It's currently implemented in
PyModule_AddFunctions [6, 7].

[1]: https://docs.python.org/2/c-api/allocation.html#c.Py_InitModule4
[2]: https://hg.python.org/cpython/file/v2.7.12/Python/modsupport.c#l31
[3]: https://docs.python.org/3/c-api/module.html#c.PyModule_Create2
[4]: https://hg.python.org/cpython/file/v3.5.2/Objects/moduleobject.c#l133
[5]: https://hg.python.org/cpython/file/v3.0b1/Objects/moduleobject.c#l63
[6]: https://docs.python.org/3/c-api/module.html#c.PyModule_AddFunctions
[7]: https://hg.python.org/cpython/file/v3.5.2/Objects/moduleobject.c#l387
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in theos module?

2016-02-10 Thread eryk sun
On Wed, Feb 10, 2016 at 2:30 PM, Andrew Barnert via Python-Dev
 wrote:
>   [^3]: Say you write a program that assumes it will only be run on Shift-JIS 
> systems, and you use
> CreateFileA to create a file named "ハローワールド". The actual bytes you're sending 
> are cp436
> for "ânâìü[âÅü[âïâh", so the file on the CD is named, in Unicode, 
> "ânâìü[âÅü[âïâh".

Unless the system default was changed or the program called
SetFileApisToOEM, CreateFileA would decode using the ANSI codepage
1252, not the OEM codepage 437 (not 436), i.e.
"ƒnƒ\x8d\x81[ƒ\x8f\x81[ƒ‹ƒh". Otherwise the example is right. But the
transcoding strategy won't work in general. For example, if the tables
are turned such that the ANSI codepage is 932 and the program passes a
bytes name from codepage 1252, the user on the other end won't be able
to transcode without error if the original bytes contained invalid
DBCS sequences that were mapped to the default character, U+30FB. This
transcodes as the meaningless string "\x81E". The user can replace
that string with "--" and enjoy a nice game of hang man.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread eryk sun
On Tue, Feb 9, 2016 at 3:22 AM, Victor Stinner <victor.stin...@gmail.com> wrote:
> 2016-02-09 1:37 GMT+01:00 eryk sun <eryk...@gmail.com>:
>> For example, in codepage 932 (Japanese), it's an error if a lead byte
>> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
>> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
>> uncommon). In this case the ANSI API substitutes the default character
>> for Japanese, '・' (U+30FB, Katakana middle dot).
>>
>> >>> locale.getpreferredencoding()
>> 'cp932'
>> >>> open(b'\xe05', 'w').close()
>> >>> os.listdir('.')
>> ['・']
>> >>> os.listdir(b'.')
>> [b'\x81E']
>>
>> All invalid sequences get mapped to '・', which roundtrips as
>> b'\x81\x45', so you can't reliably create and open files with
>> arbitrary bytes paths in this locale.
>
> Oh, and I forgot to ask: what is your filesystem? Is it the same
> behaviour for NTFS, FAT32, network shared directories, etc.?

That was tested using NTFS, but the same would apply to FAT32, exFAT,
and UDF since they all use Unicode [1]. CreateFile[A|W] wraps the
NtCreateFile system call. The NT executive is Unicode, so the system
call receives the filename using a Unicode-only OBJECT_ATTRIBUTES [2]
record. I can't say what an arbitrary non-Microsoft filesystem will do
with the U+30FB character when it processes the IRP_MJ_CREATE. I was
only concerned with ANSI<=>Unicode conversion that's implemented in
the ntdll.dll runtime library.

[1]: https://msdn.microsoft.com/en-us/library/ee681827
[2]: https://msdn.microsoft.com/en-us/library/ff557749
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread eryk sun
On Tue, Feb 9, 2016 at 3:21 AM, Victor Stinner <victor.stin...@gmail.com> wrote:
> 2016-02-09 1:37 GMT+01:00 eryk sun <eryk...@gmail.com>:
>> For example, in codepage 932 (Japanese), it's an error if a lead byte
>> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
>> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
>> uncommon). In this case the ANSI API substitutes the default character
>> for Japanese, '・' (U+30FB, Katakana middle dot).
>>
>> >>> locale.getpreferredencoding()
>> 'cp932'
>> >>> open(b'\xe05', 'w').close()
>> >>> os.listdir('.')
>> ['・']
>> >>> os.listdir(b'.')
>> [b'\x81E']
>
> Hum, I'm not sure that I understand your example.

Say I create a sequence of files with the names "file_à[N].txt"
encoded in Latin-1, where N is 0-2. They all map to the same file in a
Japanese system locale:

>>> open(b'file_\xe00.txt', 'w').close(); os.listdir('.')
['file_・.txt']
>>> open(b'file_\xe01.txt', 'w').close(); os.listdir('.')
['file_・.txt']
>>> open(b'file_\xe02.txt', 'w').close(); os.listdir('.')
['file_・.txt']
>>> os.listdir(b'.')
[b'file_\x81E.txt']

This isn't a problem with a single-byte codepage such as 1251. For
example, codepage 1251 doesn't map b"\x98" to any character, but
harmlessly maps it to "\x98" (SOS in the C1 Controls block).

Single-byte code pages still have the problem that when a filename is
created using the wide-character API, listing it as bytes may use
either an approximate mapping (e.g. "à" => "a" in 1251) or the
codepage default character (e.g. "\xd7" => "?" in 1251).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-08 Thread eryk sun
On Mon, Feb 8, 2016 at 2:41 PM, Chris Barker  wrote:
> Just to clarify -- what does it currently do for bytes? IIUC, Windows uses
> UTF-16, so can you pass in UTF-16 bytes? Or when using bytes is is assuming
> some Windows ANSI-compatible encoding? (and what does it return?)

UTF-16 is used in the [W]ide-character API. Bytes paths use the [A]NSI
codepage. For a single-byte codepage, the ANSI API rountrips, i.e. a
bytes path that's passed to CreateFileA matches the listing from
FindFirstFileA. But for a DBCS codepage arbitrary bytes paths do not
roundtrip. Invalid byte sequences map to the default character. Note
that an ASCII question mark is not always the default character. It
depends on the codepage.

For example, in codepage 932 (Japanese), it's an error if a lead byte
(i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
uncommon). In this case the ANSI API substitutes the default character
for Japanese, '・' (U+30FB, Katakana middle dot).

>>> locale.getpreferredencoding()
'cp932'
>>> open(b'\xe05', 'w').close()
>>> os.listdir('.')
['・']
>>> os.listdir(b'.')
[b'\x81E']

All invalid sequences get mapped to '・', which roundtrips as
b'\x81\x45', so you can't reliably create and open files with
arbitrary bytes paths in this locale.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When does `PyType_Type.tp_alloc get assigned to PyType_GenericAlloc ?

2016-02-07 Thread eryk sun
On Sun, Feb 7, 2016 at 7:58 AM, Randy Eels  wrote:
>
> Yet, I can't seem to understand where and when does the `tp_alloc` slot of
> PyType_Type get re-assigned to PyType_GenericAlloc. Does that even happen?
> Or am I missing something bigger?

_Py_InitializeEx_Private in Python/pylifecycle.c calls _Py_ReadyTypes
in Objects/object.c. This calls PyType_Ready(_Type) in
Objects/typeobject.c, which assigns type->tp_base = _Type
and then calls inherit_slots. This executes COPYSLOT(tp_alloc), which
assigns PyType_Type.tp_alloc = PyBaseObject_Type.tp_alloc, which is
statically assigned as PyType_GenericAlloc.

Debug trace on Windows:

0:000> bp python35!PyType_Ready
0:000> g
Breakpoint 0 hit
python35!PyType_Ready:
`6502d160 4053pushrbx
0:000> ?? ((PyTypeObject *)@rcx)->tp_name
char * 0x`650e4044
 "object"
0:000> g
Breakpoint 0 hit
python35!PyType_Ready:
`6502d160 4053pushrbx
0:000> ?? ((PyTypeObject *)@rcx)->tp_name
char * 0x`651d8e5c
 "type"
0:000> bp python35!inherit_slots
0:000> g
Breakpoint 1 hit
python35!inherit_slots:
`6502c440 48895c2408  mov qword ptr [rsp+8],rbx
ss:`0028f960={
python35!PyType_Type
(`6527cba0)}

At entry to inherit_slots, PyType_Type.tp_alloc is NULL:

0:000> ?? python35!PyType_Type.tp_alloc
 * 0x`
0:000> pt
python35!inherit_slots+0xd17:
`6502d157 c3  ret

At exit it's set to PyType_GenericAlloc:

0:000> ?? python35!PyType_Type.tp_alloc
 * 0x`65025580
0:000> ln 65025580
(`65025580)   python35!PyType_GenericAlloc   |
(`650256a0)   python35!PyType_GenericNew
Exact matches:
python35!PyType_GenericAlloc (void)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python environment registration in the Windows Registry

2016-02-03 Thread eryk sun
On Wed, Feb 3, 2016 at 7:33 PM, Eric Snow  wrote:
> Just wanted to quickly point out another use of the WIndows registry
> in Python: WindowsRegistryFinder [1].  This is an import "meta-path"
> finder that locates modules declared (*not* defined) in the registry.
> I'm not familiar with the Windows registry nor do I know if anyone is
> using this finder.

The "Modules" key (WindowsRegistryFinder in 3.3+ and previously
PyWin_FindRegisteredModule) adds individual modules by subkey name,
with the filepath in the default value (the filename can differ, but
it can't use an arbitrary extension). The "PythonPath" and "Modules"
keys both date back to Mark Hammond's Windows port in the mid 1990s.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python environment registration in the Windows Registry

2016-02-03 Thread eryk sun
On Wed, Feb 3, 2016 at 10:46 AM, Steve Dower  wrote:
>
> sys.path.extend(read_subkeys(fr'HKCU\Software\Python\PythonCore\{sys.winver}\PythonPath\**'))
> sys.path.extend(read_subkeys(fr'HKLM\Software\Python\PythonCore\{sys.winver}\PythonPath\**'))

It seems like a bug (in spirit at least) that this step isn't skipped
for -E and -I (Py_IgnoreEnvironmentFlag, Py_IsolatedFlag).

> I haven't looked into pywin32's use of this recently - I tend to only use
> Christoph Gohlke's  wheels that don't register anything.

I install the pypiwin32 wheel using pip, which uses pypiwin32.pth:

# .pth file for the PyWin32 extensions
win32
win32\lib
Pythonwin

import 
os;os.environ["PATH"]+=(';'+os.path.join(sitedir,"pypiwin32_system32"))

This is different from a PythonPath subkey in a couple of respects.
The paths listed in .pth files are appended to sys.path instead of
prepended. They also don't get added when run with -S or for a venv
environment that excludes site-packages.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com