[Python-Dev] Re: The repr of a sentinel
Hello, and thanks for the PEP, I feel like the 3-lines declaration of a new sentinel would discourage a bit its adoption compared to just "sentinel = object()" From what I understand from the PEP, if new classes are defined inside the closure of a factory function, some Python implementations would have trouble copying/pickling them? Would it be doable to have a single Sentinel class, whose instances store their representation and some autogenerated UUID, and which automatically return internally stored singletons (depending on this UUID) when called multiple times or unpickled ? This would require some __new__() and unpickling magic, but nothing too CPython-specific (or am I missing something?). regards, Pascal Le 24/05/2021 à 16:28, Tal Einat a écrit : On Mon, May 24, 2021 at 3:30 AM Luciano Ramalho wrote: On Sun, May 23, 2021 at 3:37 AM Tal Einat wrote: I put up an early draft of a PEP on a branch in the PEPs repo: https://github.com/python/peps/blob/sentinels/pep-.rst Thanks for that PEP, Tal. Good ideas and recap there. I think repr= should have a default: the name of the class within <>: . Sentinels don't have state or any other data besides a name, so I would prefer not to force users to create a class just so they can instantiate it. Why not just this? NotGiven = sentinel('') I'm seriously considering that now. The issues I ran into with this approach are perhaps not actually problematic. On the other hand, if the user must create a class, the class itself should be the sentinel. Class objects are already singletons, so that makes sense. Here is a possible class-based API: class NotGiven(Sentinel): pass That's it. Now I can use NotGiven as the sentinel, and its default repr is . Behind the scenes we can have a SentinelMeta metaclass with all the magic that could be required--including the default __repr__ method. What do you think? One issue with that is that such sentinels don't have their own class, so you can't write a strict type signature, such as `Union[str, NotGivenType]`. Another issue is that having these objects be classes, rather than normal instances of classes, could be surprising and confusing. For those two reasons, for now, I think generating a unique object with its own unique class is preferable. Sorry about my detour into the rejected idea of a factory function. Please don't apologize! I put those ideas in the "Rejected Ideas" section mostly to have them written down with a summary of the considerations related to them. They shouldn't be considered finally rejected unless and until the PEP is finished and accepted. - Tal ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HL74JC3OF7Y3F5RDYVACAFODL4E3CBI6/ Code of Conduct: http://python.org/psf/codeofconduct/ . ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JZEPSN4ZSAZ6QWXN75GFWQRJMJLPN37M/ Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-Dev] (#19562) Asserts in Python stdlib code (datetime.py)
Le 17/11/2013 12:27, Steven D'Aprano a écrit : What I would like to know is if people *knowingly* add costly asserts to performance-critical code, with the intent of disabling them at runtime using -OO. Yes, I have knowingly added costly asserts to code with the intend of disabling them at runtime. Was it *performance-critical* code? I don't know, that was the point of my earlier rambling -- I could demonstrate a speedup of the individual functions in benchmarks, but nobody spent the effort to determine which functions were performance critical. Hi, my 2 cents: asserts have been of a great help in the robustness of our provisioning framework, these are like tests embedded in code, to *consistently* check what would be VERY hard to test from the outside, from unit-tests. It makes us gain much time when we develop, because asserts (often used for method contract checking) immediately break stuffs if we make dumb programming errors, like giving the wrong type of variable as parameter etc. (if you send a string instead of a list of strings to a method, it could take a while before the errors gets noticed, since their behaviour is quite close) We also add asserts with very expensive operations (like fully checking the proper synchronization of our DBs with the mockups of remote partners, after each provisioning command treated), so that we don't need to call something like that after every line of unit-test we write. In production, we then make sure we use -O flag to avoid doubling our treatments times and traffic. regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Solving the import-deadlock case
Thanks for the comments, in my particular case we're actually on a provisioning /framework/, so we chose the easy (lazy?) way, i.e initializing miscellaneous modules at loading times (like Django or others do, I think), rather than building an proper initialization dispatcher to be called from eg. a wsgi launcher. It works pretty well actually, except that nasty (but fortunately very rare) import deadlock. ^^ Since module loading errors *might* occur for tons of reasons (i.e searching the disk for py files already IS a side effect...), and since the current behaviour (letting children module survive disconnected from their parent) is more harmful than useful, I guess that the cleanup that Nick evocated iwould be the path to follow, wouldn't it ? thanks, Regards, Pascal Le 02/07/2013 23:32, Nick Coghlan a écrit : On 3 Jul 2013 04:34, Pascal Chambon python...@gmail.com mailto:python...@gmail.com wrote: Hello everyone, I'd like to bring your attention to this issue, since it touches the fundamentals of python's import workflow: http://bugs.python.org/issue17716 I've tried to post it on the python-import ML for weeks, but it must still be blocked somewhere in a moderation queue, so here I come ^^ TLDR version: because of the way import current works, if importing a package temporarily fails whereas importing one of its children succeeded, we reach an unusable state, all subsequent attempts at importing that package will fail if a from...import is used somewhere. Typically, it makes a web worker broken, even though the typical behaviour of such process woudl be to retry loading, again and again, the failing view. I agree that a module loading should be, as much as possible, side effects free, and thus shouldn't have temporary errors. But well, in practice, module loading is typically the time where process-wide initialization are done (modifying sys.path, os.environ, instantiating connection or thread pools, registering atexit handler, starting maintenance threads...), so that case has chances to happen at a moment or another, especially if accesses to filesystem or network (SQL...) are done at module loading, due to the lack of initialization system at upper levels. That's why I propose modifying the behaviour of module import, so that submodules are deleted as well when a parent module import fails. True, it means they will be reloaded as well when importing the parent will start again, but anyway we already have a double execution problem with the reloading of the parent module, so it shouldn't make a big difference. The only other solution I'd see would be to SYSTEMATICALLY perform name (re)binding when processing a from...import statement, to recover from the previously failed initialization. Dunno if it's a good idea. On a (separate but related) topic, to be safer on module reimports or reloadings, it could be interesting to add some idempotency to common initialization tasks ; for example the atexit registration system, wouldn't it be worth adding a boolean flag to explicitely skip registration if a callable with same fully qualified name is already registered. Do you have opinions on these subjects ? Back on topic... As I stated on the issue, I think purging the whole subtree when a package implicitly imports child modules is the least bad of the available options, and better than leaving the child modules in place in violation of the all parent packages can be assumed to be in sys.modules invariant (which is what we do now). Cheers, Nick. thanks, regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org mailto:Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/chambon.pascal%40wanadoo.fr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Solving the import-deadlock case
Hello everyone, I'd like to bring your attention to this issue, since it touches the fundamentals of python's import workflow: http://bugs.python.org/issue17716 /I've tried to post it on the python-import ML for weeks, but it must still be blocked somewhere in a moderation queue, so here I come ^^/ TLDR version: because of the way import current works, if importing a package temporarily fails whereas importing one of its children succeeded, we reach an unusable state, all subsequent attempts at importing that package will fail if a from...import is used somewhere. Typically, it makes a web worker broken, even though the typical behaviour of such process woudl be to retry loading, again and again, the failing view. I agree that a module loading should be, as much as possible, side effects free, and thus shouldn't have temporary errors. But well, in practice, module loading is typically the time where process-wide initialization are done (modifying sys.path, os.environ, instantiating connection or thread pools, registering atexit handler, starting maintenance threads...), so that case has chances to happen at a moment or another, especially if accesses to filesystem or network (SQL...) are done at module loading, due to the lack of initialization system at upper levels. That's why I propose modifying the behaviour of module import, so that submodules are deleted as well when a parent module import fails. True, it means they will be reloaded as well when importing the parent will start again, but anyway we already have a double execution problem with the reloading of the parent module, so it shouldn't make a big difference. The only other solution I'd see would be to SYSTEMATICALLY perform name (re)binding when processing a from...import statement, to recover from the previously failed initialization. Dunno if it's a good idea. On a (separate but related) topic, to be safer on module reimports or reloadings, it could be interesting to add some idempotency to common initialization tasks ; for example the atexit registration system, wouldn't it be worth adding a boolean flag to explicitely skip registration if a callable with same fully qualified name is already registered. Do you have opinions on these subjects ? thanks, regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Attribute lookup ambiguity
Greg Ewing a écrit : Pascal Chambon wrote: I don't follow you there - in my mind, the default __getattribute__ could simply have wrapped all its operations inside soem kind of try..catch AttributeError: mechanism, and thus been able to fallback to __getattr__ in any way. But then it would be incorrect to say that __getattribute__ raises an exception. When we say that a function raises an exception, we normally mean that the exception propagates out of the function and can be seen by the caller, not that it was raised and caught somewhere inside the function. Indeed, but I've never run into any doc mentionning that the default __getattribute__ raised in exception instead of forwarding to __getattr__ by itself. All I've found is If the class also defines __getattr__() http://docs.python.org/reference/datamodel.html#object.__getattr__, the latter will not be called unless __getattribute__() http://docs.python.org/reference/datamodel.html#object.__getattribute__ either calls it explicitly or raises an AttributeError http://docs.python.org/library/exceptions.html#exceptions.AttributeError; that sentence which simply offers two alternatives for the behaviour of customized __gettattribute__ methods, without giving any hint on the behaviourthat was chosen when implementing object.__gettattribute__. Or am I missing some other doc which I'm supposed to know :? In the face of ambiguity, refuse the temptation to guess, as we say anyway, so I propose we patch the doc to clarify this point for newcomers ^^ Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Attribute lookup ambiguity
Michael Foord a écrit : On 20/03/2010 12:00, Pascal Chambon wrote: But the point which for me is still unclear, is : does the default implementation of __getattribute__ (the one of object) call __getattr__ by himself, or does it rely on its caller for that, by raising an AttributeError ? For Python2, it's blatantly the latter case which is favoured, but since it looks like an implementation detail at the moment, I propose we settle it (and document it) once for all. Ah right, my apologies. So it is still documented behaviour - __getattr__ is obviously called by the Python runtime and not by __getattribute__. (It isn't just by getattr as the same behaviour is shown when doing a normal attribute lookup and not via the getattr function.) I really don't see the docs you're referring to ; until I tested myself, I think I had no obvious reasons to guess that __getattribute__ relied on the upper level caller instead of finishing the hard job himself. Nick Coghlan a écrit : Michael Foord wrote: Well, the documentation you pointed to specifies that __getattr__ will be called if __getattribute__ raises an AttributeError, it just doesn't specify that it is done by object.__getattribute__ (which it isn't). And as for why not: because __getattribute__ implementations need to be able to call object.__getattribute__ without triggering the fallback behaviour. Cheers, Nick. I guess there are cases in which it is beneficial indeed. Michael Foord wrote: Well, the documentation you pointed to specifies that __getattr__ will be called if __getattribute__ raises an AttributeError, it just doesn't specify that it is done by object.__getattribute__ (which it isn't). If __getattribute__ raises an exception, it won't get a chance to do anything else, so something outside of __getattribute__ must catch the AttributeError and calling __getattr__. So I think the docs *are* specifying the behaviour here, if only by implication. I don't follow you there - in my mind, the default __getattribute__ could simply have wrapped all its operations inside soem kind of try..catch AttributeError: mechanism, and thus been able to fallback to __getattr__ in any way. If I sum it up properly the semantic is : -A.obj and getattr(A, obj) are exactly the same -They trigger the calling of __getattribute__ on the object (or it's python core equivalent) -By default, this __getattribute__ browse the whole object hierarchy according to well known rules (__dict__, type, type's ancestors..), handling descriptor protocols and the like. But it doesn't fallback to __getattr__ - it raises an AttributeError instead. -getattr() falls back to __getattr__ if __getattribute__ fails -customized __getattribute__ methods have the choice between calling __getattr__ by themselves, or delegating it to getattr() by raising an exception. Wouldn't it be worth completing the doc with these point ? They really didn't seem obvious to me basically (even though, after analysis, some behaviours make more sense than others). I might submit a patch. regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Attribute lookup ambiguity
Michael Foord a écrit : On 19/03/2010 18:58, Pascal Chambon wrote: Hello I've already crossed a bunch of articles detailing python's attribute lookup semantic (__dict__, descriptors, order of base class traversing...), but I have never seen, so far, an explanation of WHICH method did waht, exactly. I assumed that getattr(a, b) was the same as a.__getattribute__(b), and that this __getattribute__ method (or the hidden routine replacing it when we don't override it in our class) was in charge of doing the whole job of traversing the object tree, checking descriptors, binding methods, calling __getattr__ on failure etc. However, the test case below shows that __getattribute__ does NOT call __getattr__ on failure. So it seems it's an upper levl machinery, in getattr(), which is in chrge of that last action. Python 3 has the behavior you are asking for. It would be a backwards incompatible change to do it in Python 2 as __getattribute__ *not* calling __getattr__ is the documented behaviour. Python 3.2a0 (py3k:78770, Mar 7 2010, 20:32:50) [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin class x: ... def __getattribute__(s, name): ... print ('__getattribute__', name) ... raise AttributeError ... def __getattr__(s, name): ... print ('__getattr__', name) ... a = x() a.b __getattribute__ b __getattr__ b I'm confused there, because the script you gave behaves the same in python 2.6. And according to the doc, it's normal, getattr() reacts to an AttributeError from __getattribute__, by calling __getattr__ : Python 2.6.5 documentation object.__getattribute__(/self/, /name/) Called unconditionally to implement attribute accesses for instances of the class. If the class also defines __getattr__() http://docs.python.org/reference/datamodel.html#object.__getattr__, the latter will not be called unless __getattribute__() http://docs.python.org/reference/datamodel.html#object.__getattribute__ either calls it explicitly or raises an AttributeError http://docs.python.org/library/exceptions.html#exceptions.AttributeError. This method should return the (computed) attribute value or raise an AttributeError http://docs.python.org/library/exceptions.html#exceptions.AttributeError exception. In order to avoid infinite recursion in this method, its implementation should always call the base class method with the same name to access any attributes it needs, for example, object.__getattribute__(self, name). But the point which for me is still unclear, is : does the default implementation of __getattribute__ (the one of object) call __getattr__ by himself, or does it rely on its caller for that, by raising an AttributeError ? For Python2, it's blatantly the latter case which is favoured, but since it looks like an implementation detail at the moment, I propose we settle it (and document it) once for all. This list is not really an appropriate place to ask questions like this though, comp.lang.python would be better. All the best, Michael Fooord Sorry if I misposted, I just (wrongly ?) assumed that it was more an undecided, implementation-specific point (since the doc gave possible behaviours for __getattribute__, without precising which one was the default one), and thus targetted the hands-in-core-code audience only. Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Attribute lookup ambiguity
Hello I've already crossed a bunch of articles detailing python's attribute lookup semantic (__dict__, descriptors, order of base class traversing...), but I have never seen, so far, an explanation of WHICH method did waht, exactly. I assumed that getattr(a, b) was the same as a.__getattribute__(b), and that this __getattribute__ method (or the hidden routine replacing it when we don't override it in our class) was in charge of doing the whole job of traversing the object tree, checking descriptors, binding methods, calling __getattr__ on failure etc. However, the test case below shows that __getattribute__ does NOT call __getattr__ on failure. So it seems it's an upper levl machinery, in getattr(), which is in chrge of that last action. Is that on purpose ? Considering that __getattribute__ (at lest, object.__getattribute__) does 90% of the hard job, why are these 10% left ? Can we find somewhere the details of who must do what when customizing attribute access ? Shouldn't we inform people about the fact that __getattribute__ isn't sufficient in itself to lookup an attribute ? Thanks for the attention, regards, Pascal === INPUT === class A(object): def __getattribute__(self, name): print A getattribute, name return object.__getattribute__(self, name) def __getattr__(self, name): print A getattr, name return hello A class B(A): def __getattribute__(self, name): print B getattribute, name return A.__getattribute__(self, name) def __getattr__(self, name): print B getattr, name return hello B print A().obj print --- print B().obj print --- print getattr(B(), obj) print - print object.__getattribute__(B(), obj) # DOES NOT CALL __getattr__() !!! === OUTPUT === A getattribute obj A getattr obj hello A --- B getattribute obj A getattribute obj B getattr obj hello B --- B getattribute obj A getattribute obj B getattr obj hello B - Traceback (most recent call last): File C:\Users\Pakal\Desktop\test_object_model.py, line 34, in module print object.__getattribute__(B(), obj) # DOES NOT CALL __getattr__() !!!??? AttributeError: 'B' object has no attribute 'obj' ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buffered streams design + raw io gotchas
Allright, so in the case of regular files I may content myself of BufferedRandom. And maybe I'll put some warnings concerning the returning of raw streams by factory functions. Thanks, Regards, Pascal Guido van Rossum a écrit : IIRC here is the use case for buffered reader/writer vs. random: a disk file opened for reading and writing uses a random access buffer; but a TCP stream stream, while both writable and readable, should use separate read and write buffers. The reader and writer don't have to worry about reversing the I/O direction. But maybe I'm missing something about your question? --Guido On Thu, Feb 18, 2010 at 1:59 PM, Pascal Chambon chambon.pas...@gmail.com wrote: Hello, As I continue experimenting with advanced streams, I'm currently beginning an important modification of io's Buffered and Text streams (removal of locks, adding of methods...), to fit the optimization process of the whole library. However, I'm now wondering what the idea is behind the 3 main buffer classes : Bufferedwriter, Bufferedreader and Bufferedrandom. The i/o PEP claimed that the two first ones were for sequential streams only, and the latter for all kinds of seekable streams; but as it is implemented, actually the 3 classes can be returned by open() for seekable files. Am I missing some use case in which this distinction would be useful (for optimizations ?) ? Else, I guess I should just create a RSBufferedStream class which handles all kinds of situations, raising InsupportedOperation exceptions whenever needed after all, text streams act that way (there is no TextWriter or TextReader stream), and they seem fine. Also, io.open() might return a raw file stream when we set buffering=0. The problem is that raw file streams are NOT like buffered streams with a buffer limit of zero : raw streams might fail writing/reading all the data asked, without raising errors. I agree this case should be rare, but it might be a gotcha for people wanting direct control of the stream (eg. for locking purpose), but no silently incomplete read/write operation. Shouldn't we rather return a write through buffered stream in this case buffering=0, to cleanly handle partial read/write ops ? regards, Pascal PS : if you have 3 minutes, I'd be very interested by your opinion on the advanced modes draft below. Does it seem intuitive to you ? In particular, shouldn't the + and - flags have the opposite meaning ? http://bytebucket.org/pchambon/python-rock-solid-tools/wiki/rsopen.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Buffered streams design + raw io gotchas
Hello, As I continue experimenting with advanced streams, I'm currently beginning an important modification of io's Buffered and Text streams (removal of locks, adding of methods...), to fit the optimization process of the whole library. However, I'm now wondering what the idea is behind the 3 main buffer classes : Bufferedwriter, Bufferedreader and Bufferedrandom. The i/o PEP claimed that the two first ones were for sequential streams only, and the latter for all kinds of seekable streams; but as it is implemented, actually the 3 classes can be returned by open() for seekable files. Am I missing some use case in which this distinction would be useful (for optimizations ?) ? Else, I guess I should just create a RSBufferedStream class which handles all kinds of situations, raising InsupportedOperation exceptions whenever needed after all, text streams act that way (there is no TextWriter or TextReader stream), and they seem fine. Also, io.open() might return a raw file stream when we set buffering=0. The problem is that raw file streams are NOT like buffered streams with a buffer limit of zero : raw streams might fail writing/reading all the data asked, without raising errors. I agree this case should be rare, but it might be a gotcha for people wanting direct control of the stream (eg. for locking purpose), but no silently incomplete read/write operation. Shouldn't we rather return a write through buffered stream in this case buffering=0, to cleanly handle partial read/write ops ? regards, Pascal PS : if you have 3 minutes, I'd be very interested by your opinion on the advanced modes draft below. Does it seem intuitive to you ? In particular, shouldn't the + and - flags have the opposite meaning ? http://bytebucket.org/pchambon/python-rock-solid-tools/wiki/rsopen.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Forking and Multithreading - enemy brothers
Hello Some update about the spawnl() thingy ; I've adapted the win32 code to have a new unix Popen object, which works with a spawn() semantic. It's quite straightforward, and the mutiprocessing call of a python functions works OK. But I've run into some trouble : synchronization primitives. Win32 semaphore can be teleported to another process via the DuplicateHandle() call. But unix named semaphores don't work that way - instead, they must be opened with the same name by each spawned subprocess. The problem here, the current semaphore C code is optimized to forbid semaphore sharing (other than via fork) : use of (O_EXL|O_CREAT) on opening, immediate unlinking of new semaphores So if we want to benefit from sync primitives with this spawn() implementation, we need a working named semaphore implementation, too... What's the best in your opinion ? Editing the current multiprocessing semaphore's behaviour to allow (with specific options, attributes and methods) its use in this case ? Or adding a new NamedSemaphore type like this one ? http://semanchuk.com/philip/posix_ipc/ Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module improvements
Antoine Pitrou a écrit : What is the difference between file handle and a regular C file descriptor? Is it some Windows-specific thing? If so, then perhaps it deserves some Windows-specific attribute (handle?). At the moment it's windows-specific, but it's not impossible that some other OSes also rely on specific file handles (only emulating C file descriptors for compatibility). I've indeed mirrored the fileno concept, with a handle argument for constructors, and a handle() getter. On Fri, Feb 5, 2010 at 5:28 AM, Antoine Pitrou solip...@pitrou.net wrote: Pascal Chambon pythoniks at gmail.com writes: By the way, I'm having trouble with the name attribute of raw files, which can be string or integer (confusing), ambiguous if containing a relative path, Why is it ambiguous? It sounds like you're using str() of the name and then can't tell whether the file is named e.g. '1' or whether it refers to file descriptor 1 (i.e. sys.stdout). As Jean-Paul mentioned, I find confusing the fact that it can be a relative path, and sometimes not a path at all. I'm pretty sure many programmers haven't even cared in their library code that it could be a non-string, using concatenation etc. on it... However I guess that the history is so high on it, that I'll have to conform to this semantic, putting all paths/fileno/handle in the same name property, and adding an origin property telling how to interpret the name... Methods too would deserve some auto-forwarding. If you want to bufferize a raw stream which also offers size(), times(), lock_file() and other methods, how can these be accessed from a top-level buffering/text stream ? I think it's a bad idea. If you forget to implement one of the standard IO methods (e.g. seek()), it will get forwarded to the raw stream, but with the wrong semantics (because it won't take buffering into account). It's better to require the implementor to do the forwarding explicitly if desired, IMO. The problem is, doing that forwarding is quite complicated. IO is a collection of core tools for working with streams, but it's currently not flexible enough to let people customize them too... For example, if I want to add a new series of methods to all standard streams, which simply forward calls to new raw stream features, what do I do ? Monkey-patching base classes (RawFileIO, BufferedIOBase...) ? Not a good pattern. Subclassing FileIO+BufferedWriter+BufferredReader+BufferedRandom+TextIOWrapper ? That's really redundant... And there are sepecially flaws around BufferedRandom. This stream inherits BufferedWriter and BufferedRandom, and overrides some methods. How do I do to extend it ? I'd want to reuse its methods, but then have it forward calls to MY buffered classes, not original BufferedWriter or BufferredReader classes. Should I modify its __bases__ to edit the inheritance tree ? Handy but not a good pattern... I'm currently getting what I want with a triple inheritance (praying for the MRO to be as I expect), but it's really not straightforward. Having BufferedRandom as an additional layer would slow down the system, but allow its reuse with custom buffered writers and readers... - I feel thread-safety locking and stream stream status checking are currently overly complicated. All methods are filled with locking calls and CheckClosed() calls, which is both a performance loss (most io streams will have 3 such levels of locking, when 1 would suffice) FileIO objects don't have a lock, so there are 2 levels of locking at worse, not 3 (and, actually, TextIOWrapper doesn't have a lock either, although perhaps it should). As for the checkClosed() calls, they are probably cheap, especially if they bypass regular attribute lookup. CheckClosed calls are cheap, but they can easily be forgotten in one of the dozens of methods involved... My own FileIO class alas needs locking, because for example, on windows truncating a file means seeking + setting end of file + restoring pointer. And I TextIOWrapper seems to deserve locks. Maybe excerpts like this one really are thread-safe, but a long study would be required to ensure it. if whence == 2: # seek relative to end of file if cookie != 0: raise IOError(can't do nonzero end-relative seeks) self.flush() position = self.buffer.seek(0, 2) self._set_decoded_chars('') self._snapshot = None if self._decoder: self._decoder.reset() return position Since we're anyway in a mood of imbricating streams, why not simply adding a safety stream on top of each stream chain returned by open() ? That layer could gracefully handle mutex locking, CheckClosed() calls, and even, maybe, the attribute/method forwarding I evocated above. It's an interesting idea, but it could also end up slower than the current situation. First because you are adding a level
[Python-Dev] IO module improvements
Hello The new modular io system of python is awesome, but I'm running into some of its limits currently, while replacing the raw FileIO with a more advanced stream. So here are a few ideas and questions regarding the mechanisms of this IO system. Note that I'm speaking in python terms, but these ideas should also apply to the C implementation (with more programming hassle of course). - some streams have specific attributes (i.e mode, name...), but since they'll often been wrapped inside buffering or encoding streams, these attributes will not be available to the end user. So wouldn't it be great to implement some transversal inheritance, simply by delegating to the underlying buffer/raw-stream, attribute retrievals which fail on the current stream ? A little __getattr__ should do it fine, shouldn't it ? By the way, I'm having trouble with the name attribute of raw files, which can be string or integer (confusing), ambiguous if containing a relative path, and which isn't able to handle the new case of my library, i.e opening a file from an existing file handle (which is ALSO an integer, like C file descriptors...) ; I propose we deprecate it for the benefit or more precise attributes, like path (absolute path) and origin (which can be path, fileno, handle and can be extended...). Methods too would deserve some auto-forwarding. If you want to bufferize a raw stream which also offers size(), times(), lock_file() and other methods, how can these be accessed from a top-level buffering/text stream ? So it would be interesting to have a system through which a stream can expose its additional features to top level streams, and at the same time tell these if they must flush() or not before calling these new methods (eg. asking the inode number of a file doesn't require flushing, but knowing its real size DOES require it.). - I feel thread-safety locking and stream stream status checking are currently overly complicated. All methods are filled with locking calls and CheckClosed() calls, which is both a performance loss (most io streams will have 3 such levels of locking, when 1 would suffice) and error-prone (some times ago I've seen in sources several functions in which checks and locks seemed lacking). Since we're anyway in a mood of imbricating streams, why not simply adding a safety stream on top of each stream chain returned by open() ? That layer could gracefully handle mutex locking, CheckClosed() calls, and even, maybe, the attribute/method forwarding I evocated above. I know a pure metaprogramming solution would maybe not suffice for performance-seekers, but static implementations should be doable as well. - some semantic decisions of the current system are somehow dangerous. For example, flushing errors occuring on close are swallowed. It seems to me that it's of the utmost importance that the user be warned if the bytes he wrote disappeared before reaching the kernel ; shouldn't we decidedly enforce a don't hide errors everywhere in the io module ?. Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Forking and Multithreading - enemy brothers
Matt Knox a écrit : Jesse Noller jnoller at gmail.com writes: We already have an implementation that spawns a subprocess and then pushes the required state to the child. The fundamental need for things to be pickleable *all the time* kinda makes it annoying to work with. just a lurker here... but this topic hits home with me so thought I'd chime in. I'm a windows user and I would *love* to use multiprocessing a lot more because *in theory* it solves a lot of the problems I deal with very nicely (lot sof financial data number crunching). However, the pickling requirement makes it very very difficult to actually get any reasonably complex code to work properly with it. A lot of the time the functions I want to call in the spawned processes are actually fairly self contained and don't need most of the environment of the parent process shoved into it, so it's annoying that it fails because some data I don't even need in the child process can't be pickled. What about having an option to skip all the parent environment data pickling and require the user to manually invoke any imports that are needed in the target functions as the first step inside their target function? for example... def target_function(object_from_module_xyz): import xyz return object_from_module_xyz.do_something() and if I forgot to import all the stuff necessary for the arguments being passed into my function to work, then it's my own problem. Although maybe there is some obvious problem with this that I am not seeing. Anyway, just food for thought. - Matt Hello I don't really get it there... it seems to me that multiprocessing only requires picklability for the objects it needs to transfer, i.e those given as arguments to the called function, and thsoe put into multiprocessing queues/pipes. Global program data needn't be picklable - on windows it gets wholly recreated by the child process, from python bytecode. So if you're having pickle errors, it must be because the object_from_module_xyz itself is *not* picklable, maybe because it contains references to unpicklable objets. In such case, properly implementing pickle magic methods inside the object should do it, shouldn't it ? Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Forking and Multithreading - enemy brothers
Although I would be in favor of an atfork callback registration system (similar to atexit), it seems there is no way to solve the fork() problem automatically with this. Any attempt to acquire/release locks automatically will lead to deadlocks, as it is necessary to know the exact program workflow to take locks in the right order. I guess spawnl semantic (i.e, like win32's CreateProcess()) can't become the default multiprocessing behaviour, as too many programs implicitly rely on the whole sharing of data under unix (and py3k itself is maybe becoming a little too mature for new compatility breaks) ; but well, as long as there are options to enforce this behaviour, it should be fine for everyone. I'm quite busy with other libraries at the moment, but I'll study the integration of spawnl into the multiprocessing package, during coming weeks. B-) Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Forking and Multithreading - enemy brothers
The word dogma is a good one in this context however. We ( ;-)) have accepted and promoted the dogma that multiprocessing is the solution to parallelism in the face of the GIL. While it needn't be applicable in any and every situation, we should make it so that it is applicable often enough. Again, wishing won't make it so: there is no sane way to mix threading and fork-without-exec except by keeping the parent process single threaded until after any fork() calls. Some applications may seem to work when violating this rule, but their developers are doomed to hair loss over time. You pointed it out : fork() was not designed to work together with multithreading ; furthermore in many cases its data-duplication semantic is absolutely unneeded to solve the real problem. So we can let fork-without-exec multiprocessing (with or without threads) for those who need it, and offer safer multiprocessing for those who just seek use of ease and portability - via spawn() semantic. Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Forking and Multithreading - enemy brothers
So, if a patch was proposed for the multiprocessing, allowing an unified spawnl, thread-safe, semantic, do you think something could prevent its integration ? We may ignore the subprocess module, since fork+exec shouldn't be bothered by the (potentially disastrous) state of child process data. But it bothers me to think multithreading and multiprocessing are currently opposed whereas theoretically nothing justifies it... Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Forking and Multithreading - enemy brothers
/[...] What dangers do you refer to specifically? Something reproducible? -L / Since it's a race condition issue, it's not easily reproducible with normal libraries - which only take threading locks for small moments. But it can appear if your threads make good use of the threading module. By forking randomly, you have chances that the main locks of the logging module you frozen in an acquired state (even though their owner threads are not existing in the child process), and your next attempt to use logging will result in a pretty deadlock (on some *nix platforms, at least). This issue led to the creation of python-atfork by the way. Stefan Behnel a écrit : Stefan Behnel, 30.01.2010 07:36: Pascal Chambon, 29.01.2010 22:58: I've just recently realized the huge problems surrounding the mix of multithreading and fork() - i.e that only the main thread actually survived the fork(), and that process data (in particular, synchronization primitives) could be left in a dangerously broken state because of such forks, if multithreaded programs. I would *never* have even tried that, but it doesn't surprise me that it works basically as expected. I found this as a quick intro: http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2003-09/0672.html ... and another interesting link that also describes exec() usage in this context. http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them Stefan Yep, these links sum it up quite well. But to me it's not a matter of trying to mix threads and fork - most people won't on purpose seek trouble. It's simply the fact that, in a multithreaded program (i.e, any program of some importance), multiprocessing modules will be impossible to use safely without a complex synchronization of all threads to prepare the underlying forking (and we know that using multiprocessing can be a serious benefit, for GIL/performance reasons). Solutions to fork() issues clearly exist - just add a use_forking=yes attribute to subprocess functions, and users will be free to use the spawnl() semantic, which is already implemented on win32 platforms, and which gives full control over both threads and subprocesses. Honestly, I don't see how it will complicate stuffs, except slightly for the programmer which will have to edit the code to add spwawnl() support (I might help on that). Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Forking and Multithreading - enemy brothers
/[...] What dangers do you refer to specifically? Something reproducible? -L / Since it's a race condition issue, it's not easily reproducible with normal libraries - which only take threading locks for small moments. But it can appear if your threads make good use of the threading module. By forking randomly, you have chances that the main locks of the logging module you frozen in an acquired state (even though their owner threads are not existing in the child process), and your next attempt to use logging will result in a pretty deadlock (on some *nix platforms, at least). This issue led to the creation of python-atfork by the way. Stefan Behnel a écrit : Stefan Behnel, 30.01.2010 07:36: Pascal Chambon, 29.01.2010 22:58: I've just recently realized the huge problems surrounding the mix of multithreading and fork() - i.e that only the main thread actually survived the fork(), and that process data (in particular, synchronization primitives) could be left in a dangerously broken state because of such forks, if multithreaded programs. I would *never* have even tried that, but it doesn't surprise me that it works basically as expected. I found this as a quick intro: http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2003-09/0672.html ... and another interesting link that also describes exec() usage in this context. http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them Stefan Yep, these links sum it up quite well. But to me it's not a matter of trying to mix threads and fork - most people won't on purpose seek trouble. It's simply the fact that, in a multithreaded program (i.e, any program of some importance), multiprocessing modules will be impossible to use safely without a complex synchronization of all threads to prepare the underlying forking (and we know that using multiprocessing can be a serious benefit, for GIL/performance reasons). Solutions to fork() issues clearly exist - just add a use_forking=yes attribute to subprocess functions, and users will be free to use the spawnl() semantic, which is already implemented on win32 platforms, and which gives full control over both threads and subprocesses. Honestly, I don't see how it will complicate stuffs, except slightly for the programmer which will have to edit the code to add spwawnl() support (I might help on that). Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Forking and Multithreading - enemy brothers
Hello, I've just recently realized the huge problems surrounding the mix of multithreading and fork() - i.e that only the main thread actually survived the fork(), and that process data (in particular, synchronization primitives) could be left in a dangerously broken state because of such forks, if multithreaded programs. What bothers me most is that I've actually never seen, in python docs, any mention of that problems (linux docs are very discreet as well). It's as if multithreading and multiprocessing were orthogonal designs, whereas it can quickly happen that someone has a slightly multithreaded programs, and suddenly uses the multiprocessing module to perform a separate, performance-demanding task ; with disasters in store, since few people are blatantly aware of the underlying dangers... So here are a few propositions to improve this matter : * documenting the fork/multithreading danger, in fork(), multiprocessing and maybe subprocess (is it concerned, or is the fork+exec always safe ?) modules. If it's welcome, I might provide documentation patches of course. * providing means of taming the fork() beast : is there a possibility for the inclusion of python-atfork and similar projects into the stdlib (I mean, their semantic, not the monkey-patch way they currently use) ? It would also help a lot the proper management of file handle inheritance. * maybe the most important : providing means to get rid of fork() whenever wanted. I'm especially thinking about the multiprocessing module : it seems it always uses forking on *nix platforms. Wouldn't it be better to also offer a spawnl() semantic, to allow safe multiprocessing use even in applications crowded with threads ? Win32 already uses something like that, so all the infrastructure of data transfer is already there, and it would enforce cross-platform compatibility. Since multiprocessing theoretically means a low coupling, and little sharing of data, I guess this kind of spawnl() semantic would be highly sufficient for most situations, which don't require fork-based multiprocessing and its huge sharing of process data (in my opinion, inheriting file descriptors is all a child process can require from its parent. Does it make sense to you ? Regards, Pascal Chambon ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition V2
Antoine Pitrou a écrit : Hello, So here is the proposed semantic, which matches established conventions: *IOBase.truncate(n: int = None) - int* [...] I still don't think there is a sufficient benefit in breaking compatibility. If you want the file pointer to remain the same, you can save it first and restore it afterwards manually. Sure, but won't this truncate become some kind of a burden for py3k, if it's twice misleading (it's not a real truncation since it can extend the file, and it's not even a truncation or resizing in posix/win32 style, since the filepointer is moved) ? Since it was an undocumented behaviour, and py3k doesn't seem to be present yet in production environments (or is it ?), I'd promote this late-but-maybe-not-too-late change. But if the consensus prefers the current behaviour, well, it'll be OK to me too, as long as it's sufficiently documented and advertised. *Propositions of doc update* Please open tracker issues for these kinds of suggestions. Is the tracker Ok for simple suggestions too ? I thought it was rather for obvious bugfixes, and to-be-discused propositions had better be in mailing-lists... OK then, I'll open bugtracker issues for these. B-) Instead of than size, perhaps than n. Whoups, indeed _ Actually the signature would rather be: *IOBase.truncate(size: int = None) - int* And I forgot to mention that truncate returns the new file size (according to the current PEP)... Should an exception be raised if start and/or end are out of range? I'd advocate it yep, for the sake of explicit errors. However, should it be a ValueError (the ones io functions normally use) or an IndexError (which is technically more accurate, but might confuse the user) ? Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module precisions and exception hierarchy
+-InvalidFileNameError (filepath max lengths, or ? / : characters in a windows file name...) This might be a bit too precise. Unix just has EINVAL, which covers any kind of invalid parameter, not just file names. Allright thanks, an InvalidParameter (or similar) exception should do it better then. Personally I'd love to see a richer set of exceptions for IO errors, so long as they can be implemented for all supported platforms and no information (err number from the os) is lost. I've been implementing a fake 'file' type [1] for Silverlight which does IO operations using local browser storage. The use case is for an online Python tutorial running in the browser [2]. Whilst implementing the exception behaviour (writing to a file open in read mode, etc) I considered improving the exception messages as they are very poor - but decided that being similar to CPython was more important. Michael [1] http://code.google.com/p/trypython/source/browse/trunk/trypython/app/storage.py and http://code.google.com/p/trypython/source/browse/trunk/trypython/app/tests/test_storage.py [2] http://www.trypython.org/ Cool stuff :-) It's indeed quite unsure at the moment which exceptions it will really be possible (and relevant) to implement in a cross-platform way... I guess I should use my own fileio implementation as a playground and a proof of concept, before we specify anything for CPython. What happens isn't specified, but in practice (with the current implementation) the overwriting will happen at the byte level, without any check for correctness at the character level. Actually, read+write text streams are implemented quite crudely, and little testing is done of them. The reason, as you discovered, is that the semantics are too weak, and it is not obvious how stronger semantics could look like. People wanting to do sophisticated random reads+writes over a text file should probably handle the encoding themselves and access the file at the binary level. It sounds ok to me, as long as we notify users about this danger (I've myself just realized about it). Most newcomers may happily open an UTF8 text file, and read/write in it carelessly, without realizing that the characters they write actually screw up the file... How about just making IOError = OSError, and introducing your proposed subclasses? Does the usage of IOError vs OSError have *any* useful semantics? I though that OSError dealt with a larger set of errors than IOError, but after checking the errno codes, it seems that they're all more or less related to IO problems (if we include interprocess communication in I/O). So theoretically, IOErrors and OSErrors might be merged. Note that in this case, windowsErrors would have to become children of EnvironmentError, because windows error code really seem to go farther than io errors (they deal with recursion limits, thousands of PC parameters...). The legacy is so heavy that OSError would have to remain as is, I think, but we might simply forget it in new io modules, and concentrate on an IOError hierarchy to provide all the info needed by the developer. Some of the error messages are truly awful though as things stand, especially for someone new to Python. Try to read from a file handle opened in read mode for example: IOError: [Errno 9] Bad file descriptor Subdividing the IOError exception won't help with that, because all you have to go on when deciding which exception to raise is the error code returned by the OS. If the same error code results from a bunch of different things, there's not much Python can do to sort them out. Well, you don't only have the error number, you also have the context of this exception. IOErrors subclasses would particularly be useful in a high level IO contect, when each single method can issue lots of system calls (to check the file, lock it, edit it...). If the error is raised during your locking operation, you can decide to sort it as LockingError even if the error code provided might appear in several different situations. Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition V2
Hello Below is a corrected version of the PEP update, adding the start/end indexes proposition and fixing functions signatures. Does anyone disagree with these specifications ? Or can we consider it as a target for the next versions of the io module ? I would have no problem to implement this behaviour in my own pure python FileIO system, however if someone is willing to patch the _fileio implementation, it'd save a lot of time - I most probably won't have the means to setup a C compilation environment under windows and linux, and properly update/test this, before January (when I get freelance...). I launch another thread on other to-be-discussed IO points B-) Regards, Pascal PEP UPDATE for new I/O system - v2 === **Truncate and file pointer semantics** Rationale : The current implementation of truncate() always move the file pointer to the new end of file. This behaviour is interesting for compatibility, if the file has been reduced and the file pointer is now past its end, since some platforms might require 0 = filepointer = filesize. However, there are several arguments against this semantic: * Most common standards (posix, win32…) allow the file pointer to be past the end of file, and define the behaviour of other stream methods in this case * In many cases, moving the filepointer when truncating has no reasons to happen (if we’re extending the file, or reducing it without going beneath the file pointer) * Making 0 = filepointer = filesize a global rule of the python IO module doesn’t seems possible, since it would require modifications of the semantic of other methods (eg. seek() should raise exceptions or silently disobey when asked to move the filepointer past the end of file), and lead to incoherent situations when concurrently accessing files without locking (what if another process truncates to 0 bytes the file you’re writing ?) So here is the proposed semantic, which matches established conventions: *IOBase.truncate(n: int = None) - int* Resizes the file to the size specified by the positive integer n, or by the current filepointer position if n is None. The file must be opened with write permissions. If the file was previously larger than size, the extra data is discarded. If the file was previously shorter than size, its size is increased, and the extended area appears as if it were zero-filled. In any case, the file pointer is left unchanged, and may point beyond the end of file. Note: trying to read past the end of file returns an empty string, and trying to write past the end of file extends it by zero-ing the gap. On rare platforms which don't support file pointers to be beyond the end of file, all these behaviours shall be faked thanks to internal storage of the wanted file pointer position (silently extending the file, if necessary, when a write operation occurs). *Propositions of doc update* *RawIOBase*.read(n: int) - bytes Read up to n bytes from the object and return them. Fewer than n bytes may be returned if the operating system call returns fewer than n bytes. If 0 bytes are returned, and n was not 0, this indicates end of file. If the object is in non-blocking mode and no bytes are available, the call returns None. *RawIOBase*.readinto(b: bytearray, [start: int = None], [end: int = None]) - int start and end are used as slice indexes, so that the bytearray taken into account is actually range = b[start:end] (or b[start:], b[:end] or b[:], depending on the arguments which are not None). Read up to len(range) bytes from the object and store them in b, returning the number of bytes read. Like .read, fewer than len(range) bytes may be read, and 0 indicates end of file if len(range) is not 0. None is returned if a non-blocking object has no bytes available. The length of b is never changed. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] IO module precisions and exception hierarchy
Found in current io PEP : Q: Do we want to mandate in the specification that switching between reading and writing on a read-write object implies a .flush()? Or is that an implementation convenience that users should not rely on? - it seems that the only important matter is : file pointer positions and bytes/characters read should always be the ones that the user expects, as if there were no buffering. So flushing or not may stay a non-mandatory behaviour, as long as the buffered streams ensures this data integrity. Eg. If a user opens a file in r/w mode, writes two bytes in it (which stay buffered), and then reads 2 bytes, the two bytes read should be those on range [2:4] of course, even though the file pointer would, due to python buffering, still be at index 0. Q from me : What happens in read/write text files, when overwriting a three-bytes character with a single-byte character ? Or at the contrary, when a single chinese character overrides 3 ASCII characters in an UTF8 file ? Is there any system designed to avoid this data corruption ? Or should TextIO classes forbid read+write streams ? IO Exceptions : Currently, the situation is kind of fuzzy around EnvironmentError subclasses. * OSError represents errors notified by the OS via errno.h error codes (as mirrored in the python errno module). errno.h errors (less than 125 error codes) seem to represent the whole of *nix system errors. However, Windows has many more system errors (15000+). So windows errors, when they can't be mapped to one of the errno errors are raises as WindowsError instances (a subclass of OSError), with the special attribute winerror indicating that win32 error code. * IOError are errors raised because of I/O problems, but they use errno codes, like OSError. Thus, at the moment IOErrors rather have the semantic of particular case of OSError, and it's kind of confusing to have them remain in their own separate tree... Furthermore, OSErrors are often used where IOErrors would perfectly fit, eg. in low level I/O functions of the OS module. Since OSErrors and IOErrors are slightly mixed up when we deal with IO operations, maybe the easiest way to make it clearer would be to push to their limits already existing designs. - the os module should only raise OSErrors, whatever the os operation involved (maybe it's already the case in CPython, isn't it ?) - the io module should only raise IOErrors and its subclasses, so that davs can easily take measures depending on the cause of the io failure (except 1 OSError exception, it's already the case in _fileio) - other modules refering to i/o might maybe keep their current (fuzzy) behaviour, since they're more platform specific, and should in the end be replaced by a crossplatform solution (at least I'd love it to happen) Until there, there would be no real benefits for the user, compared to catching EnvironmentErrors as most probably do. But the sweet thing would be to offer a concise but meaningfull IOError hierarchy, so that we can easily handle most specific errors gracefully (having a disk full is not the same level of gravity as simply having another process locking your target file). Here is a very rough beginning of IOError hierarchy. I'd liek to have people's opinion on the relevance of these, as well as on what other exceptions should be distinguished from basic IOErrors. IOError +-InvalidStreamError (eg. we try to write on a stream opened in readonly mode) +-LockingError +-PermissionError (mostly *nix chmod stuffs) +-FileNotFoundError +-DiskFullError +-MaxFileSizeError (maybe hard to implement, happens when we exceed 4Gb on fat32 and stuffs...) +-InvalidFileNameError (filepath max lengths, or ? / : characters in a windows file name...) Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fuzziness in io module specs
Well, system compatibility argues strongl in favor of not letting filepointer EOF. However, is that really necessary to move the pointer to EOF in ANY case ? I mean, if I extend the file, or if I reduce it without going lower than my current filepointer, I really don't expect at all the io system to move my pointer to the end of file, just for fun. In these patterns, people would have to remember their current filepointer, to come back to where they were, and that's not pretty imo... If we agree on the simple mandatory expression 0 = filepointer = EOF (for cross-platform safety), then we just have to enforce it when the rule is broken : reducing the size lower than the filepointer, and seeking past the end of file. All other conditions should leav the filepointer where the user put it. Shouldnt it be so ? Concerning the naming of truncate(), would it be possible to deprecate it and alias it to resize() ? It's not very gratifying to have duplicated methods at the beginning of a major release, but I feel too that truncate is a misleading term, that had better be replaced asap. Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] POSIX [Fuzziness in io module specs]
What we could do with is better platform-independent ways of distinguishing particular error conditions, such as file not found, out of space, etc., either using subclasses of IOError or mapping error codes to a set of platform-independent ones. Well, mapping all errors (including C ones and windows-specific ones) to a common set would be extremely useful for developers indeed. I guess some advanced windows errors will never have equivalents elsewhere, but does anyone know an error code set which would be relevant to cover all memorty, filesystem, io and locking aspects ? Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition
Hello After weighing up here and that, here is what I have come with. Comments and issue notifications more than welcome, of course. The exception thingy is not yet addressed. Regards, Pascal *Truncate and file pointer semantics* Rationale : The current implementation of truncate() always move the file pointer to the new end of file. This behaviour is interesting for compatibility, if the file has been reduced and the file pointer is now past its end, since some platforms might require 0 = filepointer = filesize. However, there are several arguments against this semantic: * Most common standards (posix, win32...) allow the file pointer to be past the end of file, and define the behaviour of other stream methods in this case * In many cases, moving the filepointer when truncating has no reasons to happen (if we're extending the file, or reducing it without going beneath the file pointer) * Making 0 = filepointer = filesize a global rule of the python IO module doesn't seems possible, since it would require modifications of the semantic of other methods (eg. seek() should raise exceptions or silently disobey when asked to move the filepointer past the end of file), and lead to incoherent situations when concurrently accessing files without locking (what if another process truncates to 0 bytes the file you're writing ?) So here is the proposed semantic, which matches established conventions: *RawIOBase.truncate(n: int = None) - int* *(same for BufferedIOBase.truncate(pos: int = None) - int)* Resizes the file to the size specified by the positive integer n, or by the current filepointer position if n is None. The file must be opened with write permissions. If the file was previously larger than n, the extra data is discarded. If the file was previously shorter than n, its size is increased, and the extended area appears as if it were zero-filled. In any case, the file pointer is left unchanged, and may point beyond the end of file. Note: trying to read past the end of file returns an empty string, and trying to write past the end of file extends it by zero-ing the gap. On rare platforms which don't support file pointers to be beyond the end of file, all these behaviours shall be faked thanks to internal storage of the wanted file pointer position (silently extending the file, if necessary, when a write operation occurs). *Proposition of doc update* *RawIOBase*.read(n: int) - bytes Read up to n bytes from the object and return them. Fewer than n bytes may be returned if the operating system call returns fewer than n bytes. If 0 bytes are returned, and n was not 0, this indicates end of file. If the object is in non-blocking mode and no bytes are available, the call returns None. *RawIOBase*.readinto(b: bytes) - int Read up to len(b) bytes from the object and stores them in b, returning the number of bytes read. Like .read, fewer than len(b) bytes may be read, and 0 indicates end of file if b is not 0. None is returned if a non-blocking object has no bytes available. The length of b is never changed. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition
Daniel Stutzbach a écrit : On Sun, Sep 20, 2009 at 4:48 AM, Pascal Chambon chambon.pas...@gmail.com mailto:chambon.pas...@gmail.com wrote: *RawIOBase*.readinto(b: bytes) - int bytes are immutable. The signature is: *RawIOBase*.readinto(b: bytearray) - int Your efforts in working on clarifying these important corner cases is appreciated. :-) You're welcome B-) Indeed my copy/paste of the current pep was an epic fail - you'll all have recognized readinto actually dealt with bytearrays, contrarily to what the current pep tells - http://www.python.org/dev/peps/pep-3116/ RawIOBase.read(int) takes a positive-or-zero integer indeed (I am used to understanding this, as opposed to strictly positive) Does MRAb's suggestion of providing beginning and end offsets for the bytearray meets people's expectations ? Personnaly, I feel readinto is a very low-level method, mostly used by read() to get a result from low-level native functions (fread, readfile), and read() always provides a buffer with the proper size... are there cases in which these two additional arguments would provide some real gain ? Concerning the backward compatibility problem, I agree we should not break specifications, but breaking impelmentation details is another thing for me. It's a golden rule in programmers' world : thou shalt NEVER rely on implementation details. Programs that count on these (eg. thinking that listdir() will always returns . and .. as first item0... until it doesnt anymore) encounter huge problems when changing of platform or API version. When programming with the current truncate(), I would always have moved the file pointer after truncating the file, simply because I have no idea of what might happen to it (nothing was documented on this at the moment, and looking at the sources is really not a sustainable behaviour). So well, it's a pity if some early 3.1 users relied on it, but if we stick to the current semantic we still have a real coherency problem - seek() is not limited in range, and some experienced programmers might be trapped by this non-conventionnal truncate() if they rely on posix or previous python versions... I really dislike the idea that truncate() might move my file offset even when there are no reasons for it. Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] POSIX [Fuzziness in io module specs]
@pitrou: non-blocking IO in python ? which ones are you thinking about ? I have currently no plan to work on asynchronous IO like win32's readFileEx() etc. (too many troubles for the benefit), however I'd be interested by getting non-blocking operations on IPC pipes (I've crossed several people in trouble with that, having a process never end on some OSes because they couldn't stop threads blocked on pipes). This reimplementation is actually necessary to get file locking, because advanced win32 operations only work on real file handles, not the handles that are underlying the C API layer. Furthermore, some interesting features (like O_EXCL | O_CREAT) are not possible with the current io implementations. So well, reimplementation required ^^ Else, allright, I'll try to summarize the various points in a PEP-update. Concerning the truncate method however, on second thought I feel we might take distance from Posix API for naming, precisely since it's anyway too platform-specific (windows knows nothing about Posix, and even common unix-like systems modify it in a way or another - several systems don't zero-fill files when extending them). When seeing truncate, in my opinion, most people will think it's only to reduce file size (for beginners), or will immediately get in mind all the tips of posix-like systems (for more experienced developers). Shouldn't we, like other cross-platform APIs, use a more unambiguous notion, like setLength (java) or resize (Qt) ? And let the filepointer untouched, simply because there are no reasons to move it, especially when extending the file (yep, on windows we're forced to move the pointer, but it's easy to fix) ? If it's too late to modify the IO API, too bad, but I don't feel comfortable with the truncate word. And I don't like the fact that we move the filepointer to prevent it from exceeding the file size, whereas on the other hand we can seek() anywhere without getting exceptions (and so, set the filepointer past the end of file). Having 0 = filepointer = EOF is OK to me, but then we have to enforce it for all functions, not just truncate. Concerning exceptions, which one is raised is not so important to me, as long as it's well documented and not tricky (eg. WindowsErrors are OK to me, because they subclass OSError, so most cross-platform programs wont even have to know about them). I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...) and processes. This semantic would be perfect for me, and it's already 95% here, we would just have to fix some unwelcomed OSErrors exceptions in the iomodule. Isn't that worth it ? It'd simplify programmers' job a lot, and allow a more subtle treatment of exceptions (if everyone just catches Environment errors, without being sure of which subcless is actually raised, we miss the point of IOError and OSError). Regards, Pascal James Y Knight a écrit : On Sep 18, 2009, at 8:58 PM, Antoine Pitrou wrote: I'm not sure that's true. Various Unix/Linux man pages are readily available on the Internet, but they regard specific implementations, which often depart from the spec in one way or another. POSIX specs themselves don't seem to be easily reachable; you might even have to pay for them. The POSIX specs are quite easily accessible, without payment. I got my quote by doing: man 3p ftruncate I had previously done: apt-get install manpages-posix-dev to install the posix manpages. That package contains the POSIX standard as of 2003. Which is good enough for most uses. It seems to be available here, if you don't have a debian system: http://www.kernel.org/pub/linux/docs/man-pages/man-pages-posix/ There's also a webpage, containing the official POSIX 2008 standard: http://www.opengroup.org/onlinepubs/9699919799/ And to navigate to ftruncate from there, click System Interfaces in the left pane, System Interfaces in the bottom pane, and then ftruncate in the bottom pane. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/firephoenix%40wanadoo.fr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] POSIX [Fuzziness in io module specs]
Good example with os.write(f.fileno(), 'blah') - and you obtain the same error if you try to open an io.FileIo by providing a file descriptor instead of a file name as first argument. This would really deserve an unification. Actually, since Windows Error Codes concern any possible error (IO, file permissions, memory problems...), I thought the best would be to convert them to the most appropriate python standard exception, only defaulting to WindowsError (i.e, OSError's hierarchy) when no other exception type matches. So at the moment, I use a decorator to automatically convert all errors on stream operations into IOErrors. Error codes are not the same as unix ones indeed, but I don't know if it's really important (imo, most people just want to know if the operation was successful, I don't know if many developers scan error codes to act accordingly). For IOError types that really matter (eg. file already locked, buffer full), the easiest is actually to use subclasses of IOError (the io module already does that, even though I'll maybe have to create new exceptions for errors like file already exists or file already locked by another process) Regards, Pascal Daniel Stutzbach a écrit : On Sat, Sep 19, 2009 at 2:46 AM, Pascal Chambon chambon.pas...@gmail.com mailto:chambon.pas...@gmail.com wrote: This reimplementation is actually necessary to get file locking, because advanced win32 operations only work on real file handles, not the handles that are underlying the C API layer. Furthermore, some interesting features (like O_EXCL | O_CREAT) are not possible with the current io implementations. So well, reimplementation required ^^ Concerning exceptions, which one is raised is not so important to me, as long as it's well documented and not tricky (eg. WindowsErrors are OK to me, because they subclass OSError, so most cross-platform programs wont even have to know about them). If you use real Windows file handles (instead of the POSIX-ish Windows API), won't you need to return WindowsErrors? I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...) and processes. If that were documented and a firm rule, that would certainly be great. It's not too hard to find counterexamples in the current codebase. Also, I'm not sure how one could avoid needing to raise WindowsError in some cases. Maybe someone with more knowledge of the history of IOError vs. OSError could chime in. Python 2.6: os.write(f.fileno(), 'blah') Traceback (most recent call last): File stdin, line 1, in module OSError: [Errno 9] Bad file descriptor f.write('blah') Traceback (most recent call last): File stdin, line 1, in module IOError: [Errno 9] Bad file descriptor -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/firephoenix%40wanadoo.fr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] POSIX [Fuzziness in io module specs]
@pitrou: non-blocking IO in python ? which ones are you thinking about ? I have currently no plan to work on asynchronous IO like win32's readFileEx() etc. (too many troubles for the benefit), however I'd be interested by getting non-blocking operations on IPC pipes (I've crossed several people in trouble with that, having a process never end on some OSes because they couldn't stop threads blocked on pipes). This reimplementation is actually necessary to get file locking, because advanced win32 operations only work on real file handles, not the handles that are underlying the C API layer. Furthermore, some interesting features (like O_EXCL | O_CREAT) are not possible with the current io implementations. So well, reimplementation required ^^ Else, allright, I'll try to summarize the various points in a PEP-update. Concerning the truncate method however, on second thought I feel we might take distance from Posix API for naming, precisely since it's anyway too platform-specific (windows knows nothing about Posix, and even common unix-like systems modify it in a way or another - several systems don't zero-fill files when extending them). When seeing truncate, in my opinion, most people will think it's only to reduce file size (for beginners), or will immediately get in mind all the tips of posix-like systems (for more experienced developers). Shouldn't we, like other cross-platform APIs, use a more unambiguous notion, like setLength (java) or resize (Qt) ? And let the filepointer untouched, simply because there are no reasons to move it, especially when extending the file (yep, on windows we're forced to move the pointer, but it's easy to fix) ? If it's too late to modify the IO API, too bad, but I don't feel comfortable with the truncate word. And I don't like the fact that we move the filepointer to prevent it from exceeding the file size, whereas on the other hand we can seek() anywhere without getting exceptions (and so, set the filepointer past the end of file). Having 0 = filepointer = EOF is OK to me, but then we have to enforce it for all functions, not just truncate. Concerning exceptions, which one is raised is not so important to me, as long as it's well documented and not tricky (eg. WindowsErrors are OK to me, because they subclass OSError, so most cross-platform programs wont even have to know about them). I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...) and processes. This semantic would be perfect for me, and it's already 95% here, we would just have to fix some unwelcomed OSErrors exceptions in the iomodule. Isn't that worth it ? It'd simplify programmers' job a lot, and allow a more subtle treatment of exceptions (if everyone just catches Environment errors, without being sure of which subcless is actually raised, we miss the point of IOError and OSError). Regards, Pascal James Y Knight a écrit : On Sep 18, 2009, at 8:58 PM, Antoine Pitrou wrote: I'm not sure that's true. Various Unix/Linux man pages are readily available on the Internet, but they regard specific implementations, which often depart from the spec in one way or another. POSIX specs themselves don't seem to be easily reachable; you might even have to pay for them. The POSIX specs are quite easily accessible, without payment. I got my quote by doing: man 3p ftruncate I had previously done: apt-get install manpages-posix-dev to install the posix manpages. That package contains the POSIX standard as of 2003. Which is good enough for most uses. It seems to be available here, if you don't have a debian system: http://www.kernel.org/pub/linux/docs/man-pages/man-pages-posix/ There's also a webpage, containing the official POSIX 2008 standard: http://www.opengroup.org/onlinepubs/9699919799/ And to navigate to ftruncate from there, click System Interfaces in the left pane, System Interfaces in the bottom pane, and then ftruncate in the bottom pane. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/firephoenix%40wanadoo.fr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] POSIX [Fuzziness in io module specs]
Antoine Pitrou a écrit : Hello, Pascal Chambon pythoniks at gmail.com writes: @pitrou: non-blocking IO in python ? which ones are you thinking about ? I was talking about the existing support for non-blocking IO in the FileIO class (look up EAGAIN in fileio.c), as well as in the Buffered* objects. Allright, I'll check that EAGAIN stuff, that I hadn't even noticed :) And I don't like the fact that we move the filepointer to prevent it from exceeding the file size, I don't see what you mean: Well the sample code you showed is not shocking, but I'd like to have a coherency the with file.seek(), because if truncate() prevents out-of-bound file pointer, other methods should do the same as well (and raise IOError when seeking out of file bounds). I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...) and processes. Ok, but the distinction is certainly fuzzy in many cases. I have no problem with trying to change the corner cases you mention, though. The case which could be problematic there is the file opening, because it can involve problems at all levels of the OS (filesystem not existing, permission problems, file locking...), so we should keep it in the EnvironmentError area. But as soon as a file is open, I guess only IOErrors can be involved (no space left, range locked etc), so enforcing all this to raise IOError would be OK I think. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Fuzziness in io module specs
Hello everyone I'm currently working on a reimplementation of io.FileIO, which would allow cross-platform file range locking and all kinds of other safety features ; however I'm slightly stuck due to some specification fuzziness in the IO docs. CF http://bugs.python.org/issue6939 The main points that annoy me at the moment : - it is unclear what truncate() methods do with the file pointer, and even if the current implementation simply moves it to the truncation point, it's very contrary to the standard way of doing under unix, where the file pointer is normally left unchanged. Shouldn't we specify that the file pointer remains unmoved, and fix the _fileio module accordingly ? - exceptions are not always specified, and even if most of them are IOErrors, weirdly, in some cases, an OSError is raised instead (ie, if we try to wrap a wrong file descriptor when instanciating a new FileIO). This might lead to bad program crashes if some people don't refuse the temptation to guess and only get prepared to catch IOErrors - the doc sometimes says that when we receive an empty string from a read() operation, without exceptions, it means the file is empty. However, with the current implementation, if we call file.read(0), we simply receive , even though it doesn't mean that we're at EOF. Shouldn't we avoid this (rare, I admit) ambiguity on the return value, by preventing read(0) ? Or at least, note in the doc that (we receive an empty string) - (the file is at EOF OR we called read with 0 as parameter) ? Are there some arguments that I don't know, which lead to this or that particular implementation choice ? I'd strongly advocate very detailled specifications, letting no room for cross-platform subtilities (that's also a strong goal of my reimplemntation), since that new IO system (which saved me a lot of coding time, by the way) should become the base of many programs. So wouldn't it be a godo idea to write some kind of mini-pep, just to fix the corner cases of the current IO documentation ? I might handle it, if no more-knowledgeable people feels like it. Regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hello everyone + little question around Cpython/stackless
Allright then, I understand the problem... Thanks a lot, regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Hello everyone + little question around Cpython/stackless
Hello snakemen and snakewomen I'm Pascal Chambon, a french engineer just leaving my Telecom School, blatantly fond of Python, of its miscellaneous offsprings and of all what's around dynamic languages and high level programming concepts. I'm currently studying all I can find on stackless python, PYPY and the concepts they've brought to Python, and so far I wonder : since stackless python claims to be 100% compatible with CPython's extensions, faster, and brings lots of fun stuffs (tasklets, coroutines and no C stack), how comes it hasn't been merged back, to become the standard 'fast' python implementation ? Would I have missed some crucial point around there ? Isn't that a pity to maintain two separate branches if they actually complete each other very well ? Waiting for your lights on this subject, regards, Pascal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com