Re: tempname.mktemp functionality deprecation

2017-05-02 Thread Cameron Simpson

On 01May2017 14:31, Tim Chase  wrote:

On 2017-05-01 18:40, Gregory Ewing wrote:

The following function should be immune to race conditions
and doesn't use mktemp. [loop trying names until os.link does not fail die 
to an existing name]


Ah, this is a good alternative and solves the problem at hand.

As a side-note, apparently os.rename() is only atomic on *nix
systems, but not on Windows.  For the time being, I'm okay with that.


Just to your point about my (bogus) suggestion with NamedTemporaryFile, only 
alternative is a temporary directory, then making a file inside that. But 
Gregory Ewing's suggestion is much more direct.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-05-01 Thread Tim Chase
On 2017-05-01 18:40, Gregory Ewing wrote:
> The following function should be immune to race conditions
> and doesn't use mktemp.
> 
> def templink(destpath):
>  """Create a hard link to the given file with a unique name.
>  Returns the name of the link."""
>  pid = os.getpid()
>  i = 1
>  while True:
>  linkpath = "%s-%s-%s" % (destpath, pid, i)
>  try:
>  os.link(destpath, linkpath)
>  except FileExistsError:
>  i += 1
>  else:
>  break
>  return linkpath

Ah, this is a good alternative and solves the problem at hand.

As a side-note, apparently os.rename() is only atomic on *nix
systems, but not on Windows.  For the time being, I'm okay with that.

Thanks for your assistance!

-tkc

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-05-01 Thread eryk sun
On Sat, Apr 29, 2017 at 6:45 PM, Tim Chase
 wrote:
> Working on some deduplication code, I want do my my best at
> performing an atomic re-hard-linking atop an existing file, akin to
> "ln -f source.txt dest.txt"
>
> However, when I issue
>
>   os.link("source.txt", "dest.txt")
>
> it fails with an OSError (EEXISTS).  This isn't surprising as it's
> documented.  Unfortunately, os.link doesn't support something like
>
>   os.link("source.txt", "dest.txt", force=True)

FYI, on Windows this is possible if you use the NTAPI functions
NtOpenFile and NtSetInformationFile instead of WinAPI CreateHardLink.
Using the NT API can also support src_dir_fd, dst_dir_fd, and
follow_symlinks=True [1]. I have a prototype that uses ctypes. I named
this parameter "replace_existing". It's atomic, but it will fail if
the destination is open. An open file can't be unlinked on Windows.

[1]: MSDN claims that "if the path points to a symbolic link, the function
 [CreateHardLink] creates a hard link to the target". Unless I'm
 misreading, this statement is wrong because it actually links to the
 symlink, i.e. the reparse point.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-05-01 Thread Gregory Ewing

The following function should be immune to race conditions
and doesn't use mktemp.

def templink(destpath):
"""Create a hard link to the given file with a unique name.
Returns the name of the link."""
pid = os.getpid()
i = 1
while True:
linkpath = "%s-%s-%s" % (destpath, pid, i)
try:
os.link(destpath, linkpath)
except FileExistsError:
i += 1
else:
break
return linkpath

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-04-30 Thread Tim Chase
On 2017-05-01 08:41, Cameron Simpson wrote:
> On 30Apr2017 06:52, Tim Chase  wrote:
> >> > - use a GUID-named temp-file instead for less chance of
> >> > collision?
> 
> You could, but mktemp is supposed to robustly perform that task,
> versus "very very probably".

Though with the potential of its race-condition, mktemp() isn't a much
stronger guarantee.  A GUID seems like the best route.

> >> > - I happen to already have a hash of the file contents, so use
> >> >   the .hexdigest() string as the temp-file name?
> 
> Hashes collide. (Yes, I know that for your purposes we consider
> that they don't; I have a very similar situation of my own). And
> what if your process is running twice, or leaves around a previous
> temp file by accident (or interruption) _or_ the file tree contains
> filenames named after the hash of their content (not actually
> unheard of)?

In both case #1 (a *file* happens to have the name of the SHA256 hash,
but has different file contents) and case #2 (another process running
generates a *link* with the SHA256 of the matching content), the
os.link() should fail with the EEXISTS which I'm okay with.
Likewise, if there's an interruption, I'd rather have the stray
SHA-named link floating around than lose an existing file-name.

> What about some variation on:
> 
>   from tempfile import NamedTemporaryFile
>   ...
>   with NamedTemporaryFile(dir=your_target_directory) as T:
>   use T.name, and do your rename/unlink in here

As mentioned in my follow-up (which strangely your reply came in with
a References header referencing), the NamedTemporaryFile creates the
file on-disk, which means os.link(source, T.name) fails with the
EEXISTS.

-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-04-30 Thread Tim Chase
On 2017-05-01 09:15, Ben Finney wrote:
> I reported this – for a different use case – in issue26362 [0]
> .
> 
> The suggested solutions in the documentation do not address the use
> case described there; and they do not address the use case you've
> described here either.
> 
> Would you be kind enough to update that issue with a description of
> your use case as well?

Done, linking to this thread:  http://bugs.python.org/msg292648

-tkc



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-04-30 Thread Dan Stromberg
On Sat, Apr 29, 2017 at 11:45 AM, Tim Chase
 wrote:
> Working on some deduplication code, I want do my my best at
> performing an atomic re-hard-linking atop an existing file, akin to
> "ln -f source.txt dest.txt"
>
> However, when I issue
>
>   os.link("source.txt", "dest.txt")
>
> it fails with an OSError (EEXISTS).  This isn't surprising as it's
> documented.  Unfortunately, os.link doesn't support something like
>
>   os.link("source.txt", "dest.txt", force=True)

FWIW, ln -f appears to unlink on Linux Mint 18 (GNU coreutils 8.25):
$ strace -f ln -f file file2 2>&1 | tail -15
below cmd output started 2017 Sun Apr 30 04:47:53 PM PDT
munmap(0x7f804fcb4000, 147404)  = 0
brk(NULL)   = 0x225c000
brk(0x227d000)  = 0x227d000
stat("file2", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
lstat("file", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
lstat("file2", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
linkat(AT_FDCWD, "file", AT_FDCWD, "file2", 0) = -1 EEXIST (File exists)
unlink("file2") = 0
linkat(AT_FDCWD, "file", AT_FDCWD, "file2", 0) = 0
lseek(0, 0, SEEK_CUR)   = -1 ESPIPE (Illegal seek)
close(0)= 0
close(1)= 0
close(2)= 0
exit_group(0)   = ?
+++ exited with 0 +++

HTH
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-04-30 Thread Ben Finney
Tim Chase  writes:

> Unfortunately, tempfile.mktemp() is described as deprecated since 2.3
> (though appears to still exist in the 3.4.2 that is the default Py3 on
> Debian Stable). While the deprecation notice says "In version 2.3 of
> Python, this module was overhauled for enhanced security. It now
> provides three new functions, NamedTemporaryFile(), mkstemp(), and
> mkdtemp(), which should eliminate all remaining need to use the
> insecure mktemp() function", as best I can tell, all of the other
> functions/objects in the tempfile module return a file object, not a
> string suitable for passing to link().

The problem you describe is that ‘tmpfile.mktemp’ is deprecated, but
there is no other supported standard-library API which does its job.

I reported this – for a different use case – in issue26362 [0]
.

The suggested solutions in the documentation do not address the use case
described there; and they do not address the use case you've described
here either.

Would you be kind enough to update that issue with a description of your
use case as well?


[0] The issue currently has a message from me, over a year ago, saying
that I will “work on a patch soon”. I'd welcome someone else taking
that job.

-- 
 \ “Books and opinions, no matter from whom they came, if they are |
  `\ in opposition to human rights, are nothing but dead letters.” |
_o__)  —Ernestine Rose |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-04-30 Thread Cameron Simpson

On 30Apr2017 06:52, Tim Chase  wrote:

On 2017-04-29 20:51, Devin Jeanpierre wrote:

On Sat, Apr 29, 2017 at 11:45 AM, Tim Chase wrote
> So which route should I pursue?
> - go ahead and use tempfile.mktemp() ignoring the deprecation?


I'd be tempted to. But...


> - use a GUID-named temp-file instead for less chance of collision?


You could, but mktemp is supposed to robustly perform that task, versus "very 
very probably".



> - I happen to already have a hash of the file contents, so use
>   the .hexdigest() string as the temp-file name?


Hashes collide. (Yes, I know that for your purposes we consider that they 
don't; I have a very similar situation of my own). And what if your process is 
running twice, or leaves around a previous temp file by accident (or 
interruption) _or_ the file tree contains filenames named after the hash of 
their content (not actually unheard of)?



> - some other solution I've missed?


What about some variation on:

 from tempfile import NamedTemporaryFile
 ...
 with NamedTemporaryFile(dir=your_target_directory) as T:
 use T.name, and do your rename/unlink in here

Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-04-30 Thread Tim Chase
On 2017-04-29 20:51, Devin Jeanpierre wrote:
> On Sat, Apr 29, 2017 at 11:45 AM, Tim Chase wrote
> > So which route should I pursue?
> >
> > - go ahead and use tempfile.mktemp() ignoring the deprecation?
> >
> > - use a GUID-named temp-file instead for less chance of collision?
> >
> > - I happen to already have a hash of the file contents, so use
> >   the .hexdigest() string as the temp-file name?
> >
> > - some other solution I've missed?
> 
> I vote the last one: you can read the .name attribute of the
> returned file(-like) object from NamedTemporaryFile to get a path
> to a file, which can be passed to other functions.

Unfortunately, his entails the file-preexisting, causing the same
EEXISTS problem as before:

  $ cd ~/tmp
  $ echo hello > a
  $ python
  ...
  >>> from tempfile import NamedTemporaryFile as NTF
  >>> f = NTF(dir='.')
  >>> import os
  >>> os.link('a', f.name)
  Traceback (most recent call last):
File "", line 1, in 
  OSError: [Errno 17] File exists
  >>> f.name
  '/home/tim/tmp/tmpokEpht'

-tkc


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-04-30 Thread Gregory Ewing

Devin Jeanpierre wrote:

I vote the last one: you can read the .name attribute of the returned
file(-like) object from NamedTemporaryFile to get a path to a file,
which can be passed to other functions.


I don't think that helps. You would have to delete the file
first before you could create a link with that name, and that
would leave a window of opportunity for another process to
create something with the same name.

I would generate a name that's likely to be unique (using
mktemp() or otherwise) and try to create a link with that name.
If it fails because the name is in use, generate another name
and try again.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-04-29 Thread Chris Angelico
On Sun, Apr 30, 2017 at 1:51 PM, Devin Jeanpierre
 wrote:
> I guess ideally, one would use linkat instead of os.link[*], but that's
> platform-specific and not exposed in Python AFAIK.

It is actually - src_dir_fd and dst_dir_fd.

https://docs.python.org/3/library/os.html#os.link

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tempname.mktemp functionality deprecation

2017-04-29 Thread Devin Jeanpierre
On Sat, Apr 29, 2017 at 11:45 AM, Tim Chase
 wrote:
> Unfortunately, tempfile.mktemp() is described as deprecated
> since 2.3 (though appears to still exist in the 3.4.2 that is the
> default Py3 on Debian Stable). While the deprecation notice says
> "In version 2.3 of Python, this module was overhauled for enhanced
> security. It now provides three new functions, NamedTemporaryFile(),
> mkstemp(), and mkdtemp(), which should eliminate all remaining need
> to use the insecure mktemp() function", as best I can tell, all of
> the other functions/objects in the tempfile module return a file
> object, not a string suitable for passing to link().
>
> So which route should I pursue?
>
> - go ahead and use tempfile.mktemp() ignoring the deprecation?
>
> - use a GUID-named temp-file instead for less chance of collision?
>
> - I happen to already have a hash of the file contents, so use
>   the .hexdigest() string as the temp-file name?
>
> - some other solution I've missed?

I vote the last one: you can read the .name attribute of the returned
file(-like) object from NamedTemporaryFile to get a path to a file,
which can be passed to other functions.

I guess ideally, one would use linkat instead of os.link[*], but that's
platform-specific and not exposed in Python AFAIK. Maybe things would
be better if all the functions that accept filenames should also
accept files, and do the best job they can? (if a platform supports
using the fd instead, use that, otherwise use f.name).

.. *: 
http://stackoverflow.com/questions/17127522/create-a-hard-link-from-a-file-handle-on-unix/18644492#18644492

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list


tempname.mktemp functionality deprecation

2017-04-29 Thread Tim Chase
Working on some deduplication code, I want do my my best at
performing an atomic re-hard-linking atop an existing file, akin to
"ln -f source.txt dest.txt"

However, when I issue

  os.link("source.txt", "dest.txt")

it fails with an OSError (EEXISTS).  This isn't surprising as it's
documented.  Unfortunately, os.link doesn't support something like

  os.link("source.txt", "dest.txt", force=True)

However, I don't want to

  os.unlink("dest.txt")
  os.link("source.txt", "dest.txt")

in the event the power goes out between the unlink() and the link(),
leaving me in a state where dest.txt is deleted but the link hasn't
yet happened.

So my plan was to do something like

  temp_name = tempfile.mktemp(dir=DIRECTORY_CONTAINING_SOURCE_TXT)
  os.link("source.txt", temp_name)
  try:
os.rename(temp_name, "dest.txt") # docs guarantee this is atomic
  except OSError:
os.unlink(temp_name)

There's still the potential leakage if a crash occurs, but I'd rather
have an extra hard-link floating around than lose an original
file-name.

Unfortunately, tempfile.mktemp() is described as deprecated
since 2.3 (though appears to still exist in the 3.4.2 that is the
default Py3 on Debian Stable). While the deprecation notice says
"In version 2.3 of Python, this module was overhauled for enhanced
security. It now provides three new functions, NamedTemporaryFile(),
mkstemp(), and mkdtemp(), which should eliminate all remaining need
to use the insecure mktemp() function", as best I can tell, all of
the other functions/objects in the tempfile module return a file
object, not a string suitable for passing to link().

So which route should I pursue?

- go ahead and use tempfile.mktemp() ignoring the deprecation?

- use a GUID-named temp-file instead for less chance of collision?

- I happen to already have a hash of the file contents, so use
  the .hexdigest() string as the temp-file name?

- some other solution I've missed?


Thanks,

-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list