Re: [Python-Dev] file() vs open(), round 7
On Sun, 2005-12-25 at 20:38 -0800, Aahz wrote: > Guido sez in > http://mail.python.org/pipermail/python-dev/2004-July/045921.html > that it is not correct to recommend using ``file()`` instead of > ``open()``. However, because ``open()`` currently *is* an alias to > ``file()``, we end up with the following problem (verified in current > HEAD) where doing ``help(open)`` brings up the docs for ``file()``: [...] > This is confusing. I suggest that we make ``open()`` a factory function > right now. (I'll submit a bug report (and possibly a patch) after I get > agreement.) Not totally related but... way back in 2001-2002, I did some work on writing a Virtual File System interface for Python. See; http://minkirri.apana.org.au/~abo/projects/osVFS The idea was that you could import a module "vfs" as "os", and then any file operations would go through the virtual file system. I had modules for things "fakeroot", "mountable", "ftpfs" etc. The vfs module had full os functionality so it was a "drop in replacement". The one wart was open(), because it is the only filesystem operation that wasn't in the os module. At the time I worked around this by adding a vfs.file() method, and suggesting that people alias open() to vfs.file(). Note that os.open() already exists as a low-level file open function, and hence could not be used as a file-object-factory method. I'm wondering if it wouldn't be a good idea to centralise all filesystem operations into the os module, including open() or file(). Perhaps the builtin open() and file() could call os.file()... or P3K could remove the builtins... I dunno... it just felt ugly at the time that open() was the one oddity. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
Martin v. Löwis wrote: > M.-A. Lemburg wrote: >>> Here's a rough draft: >>> >>>def textopen(name, mode="r", encoding=None): >>>if "U" not in mode: >>>mode += "U" >> >> The "U" is not needed when opening files using codecs - >> these always break lines using .splitlines() which >> breaks lines according to the Unicode rules and also >> knows about the various line break variants on different >> platforms. > > Still, codecs typically don't implement universal newlines > correctly. If you specify 'U', then do .read(), you deserve > to get \n (U+0010) as the line separator; with most codecs, > you get whatever line breaks where in the file. > > Passing 'U' to the underlying stream is wrong, as well: > if the stream is double-byte oriented (e.g. UTF-16), > the 'U' filtering will rarely do anything, but if it does > something, it will be wrong. > > I agree that it would be desirable to have textopen always > default to universal newlines, however, this is difficult > to implement. I think that codecs solve the problem in a better way. If you want to read lines from a stream, you'd use .readline() or .readlines() to read the lines, and not expect .read() to magically apply some conversion to the original data. Both line methods have a parameter keepends (which defaults to True). This parameter specifies whether you will get the original line end markers or not, which makes it possible to let the application implement whatever logic it finds appropriate. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 27 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
M.-A. Lemburg wrote: >>Here's a rough draft: >> >>def textopen(name, mode="r", encoding=None): >>if "U" not in mode: >>mode += "U" > > > The "U" is not needed when opening files using codecs - > these always break lines using .splitlines() which > breaks lines according to the Unicode rules and also > knows about the various line break variants on different > platforms. Still, codecs typically don't implement universal newlines correctly. If you specify 'U', then do .read(), you deserve to get \n (U+0010) as the line separator; with most codecs, you get whatever line breaks where in the file. Passing 'U' to the underlying stream is wrong, as well: if the stream is double-byte oriented (e.g. UTF-16), the 'U' filtering will rarely do anything, but if it does something, it will be wrong. I agree that it would be desirable to have textopen always default to universal newlines, however, this is difficult to implement. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
Phillip J. Eby wrote: > At 04:20 PM 12/27/2005 +0100, M.-A. Lemburg wrote: >> Phillip J. Eby wrote: >> > At 02:35 PM 12/27/2005 +0100, Fredrik Lundh wrote: >> >> M.-A. Lemburg wrote: >> >> >> can we add a opentext factory for file/codecs.open while we're at >> it ? >> >>> Why a new factory function ? Can't we just redirect to codecs.open() >> >>> in case an encoding keyword argument is passed to open() ?! >> >> I think open is overloaded enough as it is. Using separate >> functions for >> >> distinct >> >> use cases is also a lot better than keyword trickery. >> >> >> >> Here's a rough draft: >> >> >> >> def textopen(name, mode="r", encoding=None): >> >> if "U" not in mode: >> >> mode += "U" >> >> if encoding: >> >> return codecs.open(name, mode, encoding) >> >> return file(name, mode) >> > >> > Nice. It should probably also check whether there's a 'b' or 't' in >> 'mode' >> > and raise an error if so. >> >> Why should it do that ? > > It's not necessary if both codecs.open() and file() raise an error when > there's both a 'U' and a 't' or 'b' in the mode string, I suppose. I see what you mean. codecs.open() doesn't work with 'U'. >> FYI: codecs.open() explicitly adds the 'b' to the mode since >> we don't want the platform text mode interfere with the >> Unicode line breaking. > > I think maybe you're confusing the wrapped file's mode with the > passed-in mode, here. The passed-in mode should contain at most one of > 'b', 't', or 'U', IIUC. The mode used for the wrapped file should of > course always be 'b', but that's not visible to the user of the routine. Thinking about this some more, I think it's better to make encoding mandatory and to not use file() at all in the API. When we move to all text is Unicode in Py3k, we'll have to require this anyway, so why not start with it now. That said, I think that a name "textfile" would be more appropriate for the factory function, like you suggested. Valid values for mode would then be 'r', 'w' and 'a'. 'U' is not needed. 'b' and 't' neither. The '+' modes don't work well with codecs. >> > I'd also prefer to call it 'textfile', as that >> > reads more nicely with "for line in textfile(...):" use cases, and >> it does >> > return a file object. >> >> Nope: open() is only guaranteed to return a file-like object, >> e.g. codecs.open() will return a wrapped version of a file >> object. > > I meant it's a "file object" in use case terms, not that > isinstance(ob,file). We usually call this an "xyz-like object" (meaning that the object provides a certain kind of interface). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 27 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
Phillip J. Eby wrote: > >but that was made at a time when it wasn't clear if "open" or "file" would > >be the preferred way to open a file. now that we've settled on the verb > >form, I think "textopen" or "opentext" would be slightly more consistent. > > Sorry, I'm confused. Who settled on the verb form? Are you saying Guido's > 2002 post supports open() instead of file(), or is there some more recent > reference to this? see: http://mail.python.org/pipermail/python-dev/2004-July/045921.html "I recently saw a checkin that changed a call to open() into a call to file(), suggesting that using file() is more "politically correct" than open(). I'm not sure I agree with this." http://mail.python.org/pipermail/python-dev/2004-July/045967.html "Anyway, here's my future-proof distinction: use open() as the factory function, and file for type-testing." ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
At 04:37 PM 12/27/2005 +0100, Fredrik Lundh wrote: >but that was made at a time when it wasn't clear if "open" or "file" would >be the preferred way to open a file. now that we've settled on the verb >form, I think "textopen" or "opentext" would be slightly more consistent. Sorry, I'm confused. Who settled on the verb form? Are you saying Guido's 2002 post supports open() instead of file(), or is there some more recent reference to this? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
Phillip J. Eby wrote: > >Here's a rough draft: > > > > def textopen(name, mode="r", encoding=None): > > if "U" not in mode: > > mode += "U" > > if encoding: > > return codecs.open(name, mode, encoding) > > return file(name, mode) > > Nice. It should probably also check whether there's a 'b' or 't' in 'mode' > and raise an error if so. I'd also prefer to call it 'textfile', as that > reads more nicely with "for line in textfile(...):" use cases, and it does > return a file object. textfile was my original proposal: http://mail.python.org/pipermail/python-dev/2002-March/021115.html but that was made at a time when it wasn't clear if "open" or "file" would be the preferred way to open a file. now that we've settled on the verb form, I think "textopen" or "opentext" would be slightly more consistent. but I agree that textfile looks a bit better. how about "opentextfile" ? ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
At 04:20 PM 12/27/2005 +0100, M.-A. Lemburg wrote: >Phillip J. Eby wrote: > > At 02:35 PM 12/27/2005 +0100, Fredrik Lundh wrote: > >> M.-A. Lemburg wrote: > >> > can we add a opentext factory for file/codecs.open while we're at it ? > >>> Why a new factory function ? Can't we just redirect to codecs.open() > >>> in case an encoding keyword argument is passed to open() ?! > >> I think open is overloaded enough as it is. Using separate functions for > >> distinct > >> use cases is also a lot better than keyword trickery. > >> > >> Here's a rough draft: > >> > >> def textopen(name, mode="r", encoding=None): > >> if "U" not in mode: > >> mode += "U" > >> if encoding: > >> return codecs.open(name, mode, encoding) > >> return file(name, mode) > > > > Nice. It should probably also check whether there's a 'b' or 't' in 'mode' > > and raise an error if so. > >Why should it do that ? It's not necessary if both codecs.open() and file() raise an error when there's both a 'U' and a 't' or 'b' in the mode string, I suppose. >FYI: codecs.open() explicitly adds the 'b' to the mode since >we don't want the platform text mode interfere with the >Unicode line breaking. I think maybe you're confusing the wrapped file's mode with the passed-in mode, here. The passed-in mode should contain at most one of 'b', 't', or 'U', IIUC. The mode used for the wrapped file should of course always be 'b', but that's not visible to the user of the routine. > > I'd also prefer to call it 'textfile', as that > > reads more nicely with "for line in textfile(...):" use cases, and it does > > return a file object. > >Nope: open() is only guaranteed to return a file-like object, >e.g. codecs.open() will return a wrapped version of a file >object. I meant it's a "file object" in use case terms, not that isinstance(ob,file). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
Phillip J. Eby wrote: > At 02:35 PM 12/27/2005 +0100, Fredrik Lundh wrote: >> M.-A. Lemburg wrote: >> can we add a opentext factory for file/codecs.open while we're at it ? >>> Why a new factory function ? Can't we just redirect to codecs.open() >>> in case an encoding keyword argument is passed to open() ?! >> I think open is overloaded enough as it is. Using separate functions for >> distinct >> use cases is also a lot better than keyword trickery. >> >> Here's a rough draft: >> >> def textopen(name, mode="r", encoding=None): >> if "U" not in mode: >> mode += "U" >> if encoding: >> return codecs.open(name, mode, encoding) >> return file(name, mode) > > Nice. It should probably also check whether there's a 'b' or 't' in 'mode' > and raise an error if so. Why should it do that ? FYI: codecs.open() explicitly adds the 'b' to the mode since we don't want the platform text mode interfere with the Unicode line breaking. > I'd also prefer to call it 'textfile', as that > reads more nicely with "for line in textfile(...):" use cases, and it does > return a file object. Nope: open() is only guaranteed to return a file-like object, e.g. codecs.open() will return a wrapped version of a file object. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 27 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
At 02:35 PM 12/27/2005 +0100, Fredrik Lundh wrote: >M.-A. Lemburg wrote: > > >> can we add a opentext factory for file/codecs.open while we're at it ? > > > > Why a new factory function ? Can't we just redirect to codecs.open() > > in case an encoding keyword argument is passed to open() ?! > >I think open is overloaded enough as it is. Using separate functions for >distinct >use cases is also a lot better than keyword trickery. > >Here's a rough draft: > > def textopen(name, mode="r", encoding=None): > if "U" not in mode: > mode += "U" > if encoding: > return codecs.open(name, mode, encoding) > return file(name, mode) Nice. It should probably also check whether there's a 'b' or 't' in 'mode' and raise an error if so. I'd also prefer to call it 'textfile', as that reads more nicely with "for line in textfile(...):" use cases, and it does return a file object. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
Fredrik Lundh wrote: > M.-A. Lemburg wrote: > >>> can we add a opentext factory for file/codecs.open while we're at it ? >> Why a new factory function ? Can't we just redirect to codecs.open() >> in case an encoding keyword argument is passed to open() ?! > > I think open is overloaded enough as it is. Using separate functions for > distinct > use cases is also a lot better than keyword trickery. Fair enough. > Here's a rough draft: > > def textopen(name, mode="r", encoding=None): > if "U" not in mode: > mode += "U" The "U" is not needed when opening files using codecs - these always break lines using .splitlines() which breaks lines according to the Unicode rules and also knows about the various line break variants on different platforms. > if encoding: > return codecs.open(name, mode, encoding) > return file(name, mode) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 27 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
M.-A. Lemburg wrote: >> can we add a opentext factory for file/codecs.open while we're at it ? > > Why a new factory function ? Can't we just redirect to codecs.open() > in case an encoding keyword argument is passed to open() ?! I think open is overloaded enough as it is. Using separate functions for distinct use cases is also a lot better than keyword trickery. Here's a rough draft: def textopen(name, mode="r", encoding=None): if "U" not in mode: mode += "U" if encoding: return codecs.open(name, mode, encoding) return file(name, mode) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
Fredrik Lundh wrote: > Aahz wrote: > >> class file(object) >> | file(name[, mode[, buffering]]) -> file object >> | >> | Open a file. The mode can be 'r', 'w' or 'a' for reading (default), >> [...] >> | Note: open() is an alias for file(). >> >> This is confusing. I suggest that we make ``open()`` a factory function >> right now. (I'll submit a bug report (and possibly a patch) after I get >> agreement.) > > +1. > > can we add a opentext factory for file/codecs.open while we're at it ? Why a new factory function ? Can't we just redirect to codecs.open() in case an encoding keyword argument is passed to open() ?! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 27 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] file() vs open(), round 7
Aahz wrote: > class file(object) > | file(name[, mode[, buffering]]) -> file object > | > | Open a file. The mode can be 'r', 'w' or 'a' for reading (default), > [...] > | Note: open() is an alias for file(). > > This is confusing. I suggest that we make ``open()`` a factory function > right now. (I'll submit a bug report (and possibly a patch) after I get > agreement.) +1. can we add a opentext factory for file/codecs.open while we're at it ? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com