[Python-Dev] Re: The repr of a sentinel

2021-05-25 Thread Pascal Chambon

Hello, and thanks for the PEP,

I feel like the 3-lines declaration of a new sentinel would discourage a 
bit its adoption compared to just "sentinel = object()"
From what I understand from the PEP, if new classes are defined inside 
the closure of a factory function, some Python implementations would 
have trouble copying/pickling them?


Would it be doable to have a single Sentinel class, whose instances 
store their representation and some autogenerated UUID, and which 
automatically return internally stored singletons (depending on this 
UUID) when called multiple times or unpickled ?
This would require some __new__() and unpickling magic, but nothing too 
CPython-specific (or am I missing something?).


regards,

Pascal


Le 24/05/2021 à 16:28, Tal Einat a écrit :

On Mon, May 24, 2021 at 3:30 AM Luciano Ramalho  wrote:

On Sun, May 23, 2021 at 3:37 AM Tal Einat  wrote:

I put up an early draft of a PEP on a branch in the PEPs repo:
https://github.com/python/peps/blob/sentinels/pep-.rst

Thanks for that PEP, Tal. Good ideas and recap there.

I think repr= should have a default: the name of the class within <>:
.

Sentinels don't have state or any other data besides a name, so I
would prefer not to force users to create a class just so they can
instantiate it.

Why not just this?

NotGiven = sentinel('')

I'm seriously considering that now. The issues I ran into with this
approach are perhaps not actually problematic.


On the other hand, if the user must create a class, the class itself
should be the sentinel. Class objects are already singletons, so that
makes sense.

Here is a possible class-based API:

class NotGiven(Sentinel):
 pass

That's it. Now I can use NotGiven as the sentinel, and its default
repr is .

Behind the scenes we can have a SentinelMeta metaclass with all the
magic that could be required--including the default __repr__ method.

What do you think?

One issue with that is that such sentinels don't have their own class,
so you can't write a strict type signature, such as `Union[str,
NotGivenType]`.

Another issue is that having these objects be classes, rather than
normal instances of classes, could be surprising and confusing.

For those two reasons, for now, I think generating a unique object
with its own unique class is preferable.


Sorry about my detour into the rejected idea of a factory function.

Please don't apologize! I put those ideas in the "Rejected Ideas"
section mostly to have them written down with a summary of the
considerations related to them. They shouldn't be considered finally
rejected unless and until the PEP is finished and accepted.

- Tal
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HL74JC3OF7Y3F5RDYVACAFODL4E3CBI6/
Code of Conduct: http://python.org/psf/codeofconduct/
.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JZEPSN4ZSAZ6QWXN75GFWQRJMJLPN37M/
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-Dev] (#19562) Asserts in Python stdlib code (datetime.py)

2013-11-17 Thread Pascal Chambon


Le 17/11/2013 12:27, Steven D'Aprano a écrit :

What I would like to know is if people *knowingly* add costly asserts
to performance-critical code, with the intent of disabling them at
runtime using -OO.

Yes, I have knowingly added costly asserts to code with the intend of
disabling them at runtime. Was it *performance-critical* code? I don't
know, that was the point of my earlier rambling -- I could demonstrate a
speedup of the individual functions in benchmarks, but nobody spent the
effort to determine which functions were performance critical.


Hi,

my 2 cents:
asserts have been of a great help in the robustness of our provisioning 
framework, these are like tests embedded in code, to *consistently* 
check what would be VERY hard to test from the outside, from unit-tests.


It makes us gain much time when we develop, because asserts (often used 
for method contract checking) immediately break stuffs if we make dumb 
programming errors, like giving the wrong type of variable as parameter 
etc. (if you send a string instead of a list of strings to a method, it 
could take a while before the errors gets noticed, since their behaviour 
is quite close)


We also add asserts with very expensive operations (like fully checking 
the proper synchronization of our DBs with the mockups of remote 
partners, after each provisioning command treated), so that we don't 
need to call something like that after every line of unit-test we write.


In production, we then make sure we use -O flag to avoid doubling our 
treatments times and traffic.


regards,
Pascal
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Solving the import-deadlock case

2013-07-03 Thread Pascal Chambon

Thanks for the comments,

in my particular case we're actually on a provisioning /framework/, so 
we chose the easy (lazy?) way, i.e initializing miscellaneous modules at 
loading times (like Django or others do, I think), rather than building 
an proper initialization dispatcher to be called from eg. a wsgi launcher.
It works pretty well actually, except that nasty (but fortunately very 
rare) import deadlock. ^^


Since module loading errors *might* occur for tons of reasons (i.e 
searching the disk for py files already IS a side effect...), and since 
the current behaviour (letting children module survive disconnected from 
their parent) is more harmful than useful, I guess that the cleanup that 
Nick evocated iwould be the path to follow, wouldn't it ?


thanks,
Regards,
Pascal

Le 02/07/2013 23:32, Nick Coghlan a écrit :



On 3 Jul 2013 04:34, Pascal Chambon python...@gmail.com 
mailto:python...@gmail.com wrote:


 Hello everyone,

 I'd like to bring your attention to this issue, since it touches the 
fundamentals of python's import workflow:

 http://bugs.python.org/issue17716

 I've tried to post it on the python-import ML for weeks, but it must 
still be blocked somewhere in a moderation queue, so here I come ^^


 TLDR version: because of the way import current works, if importing 
a package temporarily fails whereas importing one of its children 
succeeded, we reach an unusable state, all subsequent attempts at 
importing that package will fail if a from...import is used 
somewhere. Typically, it makes a web worker broken, even though the 
typical behaviour of such process woudl be to retry loading, again and 
again, the failing view.


 I agree that a module loading should be, as much as possible, side 
effects free, and thus shouldn't have temporary errors. But well, in 
practice, module loading is typically the time where process-wide 
initialization are done (modifying sys.path, os.environ, instantiating 
connection or thread pools, registering atexit handler, starting 
maintenance threads...), so that case has chances to happen at a 
moment or another, especially if accesses to filesystem or network 
(SQL...) are done at module loading, due to the lack of initialization 
system at upper levels.


 That's why I propose modifying the behaviour of module import, so 
that submodules are deleted as well when a parent module import fails. 
True, it means they will be reloaded as well when importing the parent 
will start again, but anyway we already have a double execution 
problem with the reloading of the parent module, so it shouldn't make 
a big difference.
 The only other solution I'd see would be to SYSTEMATICALLY perform 
name (re)binding when processing a from...import statement, to recover 
from the previously failed initialization. Dunno if it's a good idea.


 On a (separate but related) topic, to be safer on module reimports 
or reloadings, it could be interesting to add some idempotency to 
common initialization tasks ; for example the atexit registration 
system, wouldn't it be worth adding a boolean flag to explicitely skip 
registration if a callable with same fully qualified name is already 
registered.


 Do you have opinions on these subjects ?

Back on topic...

As I stated on the issue, I think purging the whole subtree when a 
package implicitly imports child modules is the least bad of the 
available options, and better than leaving the child modules in place 
in violation of the all parent packages can be assumed to be in 
sys.modules invariant (which is what we do now).


Cheers,
Nick.

 thanks,
 regards,
 Pascal

 ___
 Python-Dev mailing list
 Python-Dev@python.org mailto:Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/chambon.pascal%40wanadoo.fr
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Solving the import-deadlock case

2013-07-02 Thread Pascal Chambon

Hello everyone,

I'd like to bring your attention to this issue, since it touches the 
fundamentals of python's import workflow:

http://bugs.python.org/issue17716

/I've tried to post it on the python-import ML for weeks, but it must 
still be blocked somewhere in a moderation queue, so here I come ^^/


TLDR version: because of the way import current works, if importing a 
package temporarily fails whereas importing one of its children 
succeeded, we reach an unusable state, all subsequent attempts at 
importing that package will fail if a from...import is used somewhere. 
Typically, it makes a web worker broken, even though the typical 
behaviour of such process woudl be to retry loading, again and again, 
the failing view.


I agree that a module loading should be, as much as possible, side 
effects free, and thus shouldn't have temporary errors. But well, in 
practice, module loading is typically the time where process-wide 
initialization are done (modifying sys.path, os.environ, instantiating 
connection or thread pools, registering atexit handler, starting 
maintenance threads...), so that case has chances to happen at a moment 
or another, especially if accesses to filesystem or network (SQL...) are 
done at module loading, due to the lack of initialization system at 
upper levels.


That's why I propose modifying the behaviour of module import, so that 
submodules are deleted as well when a parent module import fails. True, 
it means they will be reloaded as well when importing the parent will 
start again, but anyway we already have a double execution problem 
with the reloading of the parent module, so it shouldn't make a big 
difference.
The only other solution I'd see would be to SYSTEMATICALLY perform name 
(re)binding when processing a from...import statement, to recover from 
the previously failed initialization. Dunno if it's a good idea.


On a (separate but related) topic, to be safer on module reimports or 
reloadings, it could be interesting to add some idempotency to common 
initialization tasks ; for example the atexit registration system, 
wouldn't it be worth adding a boolean flag to explicitely skip 
registration if a callable with same fully qualified name is already 
registered.


Do you have opinions on these subjects ?

thanks,
regards,
Pascal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Attribute lookup ambiguity

2010-03-23 Thread Pascal Chambon

Greg Ewing a écrit :


Pascal Chambon wrote:

I don't follow you there - in my mind, the default __getattribute__ 
could simply have wrapped all its operations inside soem kind of 
try..catch AttributeError: mechanism, and thus been able to 
fallback to __getattr__ in any way.


But then it would be incorrect to say that __getattribute__
raises an exception.

When we say that a function raises an exception, we normally
mean that the exception propagates out of the function and
can be seen by the caller, not that it was raised and caught
somewhere inside the function.

Indeed, but I've never run into any doc mentionning that the default 
__getattribute__ raised in exception instead of forwarding to 
__getattr__ by itself.
All I've found is If the class also defines __getattr__() 
http://docs.python.org/reference/datamodel.html#object.__getattr__, 
the latter will not be called unless __getattribute__() 
http://docs.python.org/reference/datamodel.html#object.__getattribute__ 
either calls it explicitly or raises an AttributeError 
http://docs.python.org/library/exceptions.html#exceptions.AttributeError; 
that sentence which simply offers two alternatives for the behaviour of 
customized __gettattribute__ methods, without giving any hint on the 
behaviourthat was chosen when implementing object.__gettattribute__.


Or am I missing some other doc which I'm supposed to know  :?

In the face of ambiguity, refuse the temptation to guess, as we say 
anyway, so I propose we patch the doc to clarify this point for newcomers ^^


Regards,
Pascal

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Attribute lookup ambiguity

2010-03-22 Thread Pascal Chambon

Michael Foord a écrit :

On 20/03/2010 12:00, Pascal Chambon wrote:



But the point which for me is still unclear, is : does the default 
implementation of __getattribute__ (the one of object) call 
__getattr__ by himself, or does it rely on its caller for that, by 
raising an AttributeError ? For Python2, it's blatantly the latter 
case which is favoured, but since it looks like an implementation 
detail at the moment, I propose we settle it (and document it) once 
for all.


Ah right, my apologies. So it is still documented behaviour - 
__getattr__ is obviously called by the Python runtime and not by 
__getattribute__. (It isn't just by getattr as the same behaviour is 
shown when doing a normal attribute lookup and not via the getattr 
function.)


I really don't see the docs you're referring to ; until I tested myself, 
I think I had no obvious reasons to guess that __getattribute__ relied 
on the upper level caller instead of finishing the hard job himself.



Nick Coghlan a écrit :

Michael Foord wrote:
  

Well, the documentation you pointed to specifies that __getattr__ will
be called if __getattribute__ raises an AttributeError, it just doesn't
specify that it is done by object.__getattribute__ (which it isn't).



And as for why not: because __getattribute__ implementations need to be
able to call object.__getattribute__ without triggering the fallback
behaviour.

Cheers,
Nick.

  

I guess there are cases in which it is beneficial indeed.



Michael Foord wrote:

Well, the documentation you pointed to specifies that __getattr__ 
will be called if __getattribute__ raises an AttributeError, it just 
doesn't specify that it is done by object.__getattribute__ (which it 
isn't).


If __getattribute__ raises an exception, it won't get a chance to
do anything else, so something outside of __getattribute__ must
catch the AttributeError and calling __getattr__. So I think the
docs *are* specifying the behaviour here, if only by implication.

I don't follow you there - in my mind, the default __getattribute__ 
could simply have wrapped all its operations inside soem kind of 
try..catch AttributeError: mechanism, and thus been able to fallback 
to __getattr__ in any way.



If I sum it up properly the semantic is :
-A.obj and getattr(A, obj) are exactly the same
-They trigger the calling of __getattribute__ on the object (or it's 
python core equivalent)
-By default, this __getattribute__ browse the whole object hierarchy 
according to well known rules (__dict__, type,  type's ancestors..), 
handling descriptor protocols and the like. But it doesn't fallback to 
__getattr__ - it raises an AttributeError instead.

-getattr() falls back to __getattr__ if __getattribute__ fails
-customized __getattribute__ methods have the choice between calling 
__getattr__ by themselves, or delegating it to getattr() by raising an 
exception.


Wouldn't it be worth completing the doc with these point ? They really 
didn't seem obvious to me basically (even though, after analysis, some 
behaviours make more sense than others).

I might submit a patch.

regards,
Pascal


  


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Attribute lookup ambiguity

2010-03-20 Thread Pascal Chambon

Michael Foord a écrit :


On 19/03/2010 18:58, Pascal Chambon wrote:

Hello

I've already crossed a bunch of articles detailing python's attribute 
lookup semantic (__dict__, descriptors, order of base class 
traversing...), but I have never seen, so far, an explanation of 
WHICH method did waht, exactly.


I assumed that getattr(a, b) was the same as a.__getattribute__(b), 
and that this __getattribute__ method (or the hidden routine 
replacing it when we don't override it in our class) was in charge of 
doing the whole job of traversing the object tree, checking 
descriptors, binding methods, calling __getattr__ on failure etc.


However, the test case below shows that __getattribute__ does NOT 
call __getattr__ on failure. So it seems it's an upper levl 
machinery, in getattr(), which is in chrge of that last action.


Python 3 has the behavior you are asking for. It would be a backwards 
incompatible change to do it in Python 2 as __getattribute__ *not* 
calling __getattr__ is the documented behaviour.


Python 3.2a0 (py3k:78770, Mar 7 2010, 20:32:50)
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
 class x:
... def __getattribute__(s, name):
... print ('__getattribute__', name)
... raise AttributeError
... def __getattr__(s, name):
... print ('__getattr__', name)
...
 a = x()
 a.b
__getattribute__ b
__getattr__ b
I'm confused there, because the script you gave behaves the same in 
python 2.6. And according to the doc, it's normal, getattr() reacts to 
an AttributeError from __getattribute__, by calling __getattr__ :



Python 2.6.5 documentation

object.__getattribute__(/self/, /name/)

   Called unconditionally to implement attribute accesses for instances
   of the class. If the class also defines __getattr__()
   http://docs.python.org/reference/datamodel.html#object.__getattr__,
   the latter will not be called unless __getattribute__()
   http://docs.python.org/reference/datamodel.html#object.__getattribute__
   either calls it explicitly or raises an AttributeError
   http://docs.python.org/library/exceptions.html#exceptions.AttributeError.
   This method should return the (computed) attribute value or raise an
   AttributeError
   http://docs.python.org/library/exceptions.html#exceptions.AttributeError
   exception. In order to avoid infinite recursion in this method, its
   implementation should always call the base class method with the
   same name to access any attributes it needs, for example,
   object.__getattribute__(self, name).



But the point which for me is still unclear, is : does the default 
implementation of __getattribute__ (the one of object) call 
__getattr__ by himself, or does it rely on its caller for that, by 
raising an AttributeError ? For Python2, it's blatantly the latter case 
which is favoured, but since it looks like an implementation detail at 
the moment, I propose we settle it (and document it) once for all.





This list is not really an appropriate place to ask questions like 
this though, comp.lang.python would be better.


All the best,

Michael Fooord
Sorry if I misposted, I just (wrongly ?) assumed that it was more an 
undecided, implementation-specific point (since the doc gave possible 
behaviours for __getattribute__, without precising which one was the 
default one), and thus targetted the hands-in-core-code audience only.


Regards,
Pascal


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Attribute lookup ambiguity

2010-03-19 Thread Pascal Chambon

Hello

I've already crossed a bunch of articles detailing python's attribute 
lookup semantic (__dict__, descriptors, order of base class 
traversing...), but I have never seen, so far, an explanation of WHICH 
method did waht, exactly.


I assumed that getattr(a, b) was the same as a.__getattribute__(b), and 
that this __getattribute__ method (or the hidden routine replacing it 
when we don't override it in our class) was in charge of doing the whole 
job of traversing the object tree, checking descriptors, binding 
methods, calling __getattr__ on failure etc.


However, the test case below shows that __getattribute__ does NOT call 
__getattr__ on failure. So it seems it's an upper levl machinery, in 
getattr(), which is in chrge of that last action.


Is that on purpose ? Considering that __getattribute__ (at lest, 
object.__getattribute__) does 90% of the hard job, why are these 10% left ?
Can we find somewhere the details of who must do what when customizing 
attribute access ?
Shouldn't we inform people about the fact that __getattribute__ isn't 
sufficient in itself to lookup an attribute ?


Thanks for the attention,
regards,
Pascal



===
INPUT
===

class A(object):

   def __getattribute__(self, name):
   print A getattribute, name
   return object.__getattribute__(self, name)

   def __getattr__(self, name):
   print A getattr, name
   return hello A


class B(A):


   def __getattribute__(self, name):
   print B getattribute, name
   return A.__getattribute__(self, name)

   
   def __getattr__(self, name):

   print B getattr, name
   return hello B
   
   


print A().obj
print ---
print B().obj
print ---
print getattr(B(), obj)
print -
print object.__getattribute__(B(), obj) # DOES NOT CALL __getattr__() !!!


===
OUTPUT
===

A getattribute obj
A getattr obj
hello A
---
B getattribute obj
A getattribute obj
B getattr obj
hello B
---
B getattribute obj
A getattribute obj
B getattr obj
hello B
-
Traceback (most recent call last):
 File C:\Users\Pakal\Desktop\test_object_model.py, line 34, in module
   print object.__getattribute__(B(), obj) # DOES NOT CALL 
__getattr__() !!!???

AttributeError: 'B' object has no attribute 'obj'
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buffered streams design + raw io gotchas

2010-02-20 Thread Pascal Chambon


Allright, so in the case of regular files I may content myself of 
BufferedRandom.
And maybe I'll put some warnings concerning the returning of raw streams 
by factory functions.


Thanks,

Regards,
Pascal


Guido van Rossum a écrit :

IIRC here is the use case for buffered reader/writer vs. random: a
disk file opened for reading and writing uses a random access buffer;
but a TCP stream stream, while both writable and readable, should use
separate read and write buffers. The reader and writer don't have to
worry about reversing the I/O direction.

But maybe I'm missing something about your question?

--Guido

On Thu, Feb 18, 2010 at 1:59 PM, Pascal Chambon
chambon.pas...@gmail.com wrote:
  

Hello,

As I continue experimenting with advanced streams, I'm currently beginning
an important modification of io's Buffered and Text streams (removal of
locks, adding of methods...), to fit the optimization process of the whole
library.
However, I'm now wondering what the idea is behind the 3 main buffer classes
: Bufferedwriter, Bufferedreader and Bufferedrandom.

The i/o PEP claimed that the two first ones were for sequential streams
only, and the latter for all kinds of seekable streams; but as it is
implemented, actually the 3 classes can be returned by open() for seekable
files.

Am I missing some use case in which this distinction would be useful (for
optimizations ?) ? Else, I guess I should just create a RSBufferedStream
class which handles all kinds of situations, raising InsupportedOperation
exceptions whenever needed after all, text streams act that way (there
is no TextWriter or TextReader stream), and they seem fine.

Also, io.open() might return a raw file stream when we set buffering=0. The
problem is that raw file streams are NOT like buffered streams with a buffer
limit of zero : raw streams might fail writing/reading all the data asked,
without raising errors. I agree this case should be rare, but it might be a
gotcha for people wanting direct control of the stream (eg. for locking
purpose), but no silently incomplete read/write operation.
Shouldn't we rather return a write through buffered stream in this case
buffering=0, to cleanly handle partial read/write ops ?

regards,
Pascal

PS : if you have 3 minutes, I'd be very interested by your opinion on the
advanced modes draft below.
Does it seem intuitive to you ? In particular, shouldn't the + and -
flags have the opposite meaning ?
http://bytebucket.org/pchambon/python-rock-solid-tools/wiki/rsopen.html



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/guido%40python.org






  


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Buffered streams design + raw io gotchas

2010-02-18 Thread Pascal Chambon

Hello,

As I continue experimenting with advanced streams, I'm currently 
beginning an important modification of io's Buffered and Text streams 
(removal of locks, adding of methods...), to fit the optimization 
process of the whole library.
However, I'm now wondering what the idea is behind the 3 main buffer 
classes : Bufferedwriter, Bufferedreader and Bufferedrandom.


The i/o PEP claimed that the two first ones were for sequential streams 
only, and the latter for all kinds of seekable streams; but as it is 
implemented, actually the 3 classes can be returned by open() for 
seekable files.


Am I missing some use case in which this distinction would be useful 
(for optimizations ?) ? Else, I guess I should just create a 
RSBufferedStream class which handles all kinds of situations, raising 
InsupportedOperation exceptions whenever needed after all, text 
streams act that way (there is no TextWriter or TextReader stream), and 
they seem fine.


Also, io.open() might return a raw file stream when we set buffering=0. 
The problem is that raw file streams are NOT like buffered streams with 
a buffer limit of zero : raw streams might fail writing/reading all the 
data asked, without raising errors. I agree this case should be rare, 
but it might be a gotcha for people wanting direct control of the stream 
(eg. for locking purpose), but no silently incomplete read/write operation.
Shouldn't we rather return a write through buffered stream in this 
case buffering=0, to cleanly handle partial read/write ops ?


regards,
Pascal

PS : if you have 3 minutes, I'd be very interested by your opinion on 
the advanced modes draft below.
Does it seem intuitive to you ? In particular, shouldn't the + and - 
flags have the opposite meaning ?

http://bytebucket.org/pchambon/python-rock-solid-tools/wiki/rsopen.html



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-02-09 Thread Pascal Chambon

Hello

Some update about the spawnl() thingy ;

I've adapted the win32 code to have a new unix Popen object, which works 
with a spawn() semantic. It's quite straightforward, and the 
mutiprocessing call of a python functions works OK.


But I've run into some trouble : synchronization primitives.
Win32 semaphore can be teleported to another process via the 
DuplicateHandle() call. But unix named semaphores don't work that way - 
instead, they must be opened with the same name by each spawned subprocess.
The problem here, the current semaphore C code is optimized to forbid 
semaphore sharing (other than via fork) : use of (O_EXL|O_CREAT) on 
opening, immediate unlinking of new semaphores


So if we want to benefit from sync primitives with this spawn() 
implementation, we need a working named semaphore implementation, too...


What's the best in your opinion ? Editing the current multiprocessing 
semaphore's behaviour to allow (with specific options, attributes and 
methods) its use in this case ? Or adding a new NamedSemaphore type like 
this one ?

http://semanchuk.com/philip/posix_ipc/

Regards,
Pascal







___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] IO module improvements

2010-02-06 Thread Pascal Chambon

Antoine Pitrou a écrit :
  
What is the difference between file handle and a regular C file descriptor?

Is it some Windows-specific thing?
If so, then perhaps it deserves some Windows-specific attribute (handle?).
  
At the moment it's windows-specific, but it's not impossible that some 
other OSes also rely on specific file handles (only emulating C file 
descriptors for compatibility).
I've indeed mirrored the fileno concept, with a handle argument for 
constructors, and a handle() getter.




On Fri, Feb 5, 2010 at 5:28 AM, Antoine Pitrou solip...@pitrou.net wrote:
  

Pascal Chambon pythoniks at gmail.com writes:


By the way, I'm having trouble with the name attribute of raw files,
which can be string or integer (confusing), ambiguous if containing a
relative path,
  


Why is it ambiguous? It sounds like you're using str() of the name and
then can't tell whether the file is named e.g. '1' or whether it
refers to file descriptor 1 (i.e. sys.stdout).

  
As Jean-Paul mentioned, I find confusing the fact that it can be a 
relative path, and sometimes not a path at all. I'm pretty sure many 
programmers haven't even cared in their library code that it could be a 
non-string, using concatenation etc. on it...
However I guess that the history is so high on it, that I'll have to 
conform to this semantic, putting all paths/fileno/handle in the same 
name property, and adding an origin property telling how to 
interpret the name...


Methods too would deserve some auto-forwarding. If you want to bufferize 
a raw stream which also offers size(), times(), lock_file() and other 
methods, how can these be accessed from a top-level buffering/text 
stream ?



I think it's a bad idea. If you forget to implement one of the standard IO
methods (e.g. seek()), it will get forwarded to the raw stream, but with the
wrong semantics (because it won't take buffering into account).

It's better to require the implementor to do the forwarding explicitly if
desired, IMO.
  
The problem is, doing that forwarding is quite complicated. IO is a 
collection of core tools for working with streams, but it's currently 
not flexible enough to let people customize them too...
For example, if I want to add a new series of methods to all standard 
streams, which simply forward calls to new raw stream features, what do 
I do ? Monkey-patching base classes (RawFileIO, BufferedIOBase...) ? Not 
a good pattern. Subclassing 
FileIO+BufferedWriter+BufferredReader+BufferedRandom+TextIOWrapper ? 
That's really redundant...


And there are sepecially flaws around BufferedRandom. This stream 
inherits BufferedWriter and BufferedRandom, and overrides some methods. 
How do I do to extend it ? I'd want to reuse its methods, but then have 
it forward calls to MY buffered classes, not original BufferedWriter or 
BufferredReader classes. Should I modify its __bases__ to edit the 
inheritance tree ? Handy but not a good pattern... I'm currently getting 
what I want with a triple inheritance (praying for the MRO to be as I 
expect), but it's really not straightforward.
Having BufferedRandom as an additional layer would slow down the system, 
but allow its reuse with custom buffered writers and readers...


- I feel thread-safety locking and stream stream status checking are 
currently overly complicated. All methods are filled with locking calls 
and CheckClosed() calls, which is both a performance loss (most io 
streams will have 3 such levels of locking, when 1 would suffice)



FileIO objects don't have a lock, so there are 2 levels of locking at worse, not
3 (and, actually, TextIOWrapper doesn't have a lock either, although perhaps it
should).
As for the checkClosed() calls, they are probably cheap, especially if they
bypass regular attribute lookup.
  
CheckClosed calls are cheap, but they can easily be forgotten in one of 
the dozens of methods involved...
My own FileIO class alas needs locking, because for example, on windows 
truncating a file means seeking + setting end of file + restoring pointer.
And I TextIOWrapper seems to deserve locks. Maybe excerpts like this one 
really are thread-safe, but a long study would be required to ensure it.


  if whence == 2: # seek relative to end of file
   if cookie != 0:
   raise IOError(can't do nonzero end-relative seeks)
   self.flush()
   position = self.buffer.seek(0, 2)
   self._set_decoded_chars('')
   self._snapshot = None
   if self._decoder:
   self._decoder.reset()
   return position

  
Since we're anyway in a mood of imbricating streams, why not simply 
adding a safety stream on top of each stream chain returned by open() 
? That layer could gracefully handle mutex locking, CheckClosed() calls, 
and even, maybe, the attribute/method forwarding I evocated above.



It's an interesting idea, but it could also end up slower than the current
situation.
First because you are adding a level

[Python-Dev] IO module improvements

2010-02-05 Thread Pascal Chambon

Hello

The new modular io system of python is awesome, but I'm running into 
some of its limits currently, while replacing the raw FileIO with a more 
advanced stream.
So here are a few ideas and questions regarding the mechanisms of this 
IO system. Note that I'm speaking in python terms, but these ideas 
should also apply to the C implementation (with more programming hassle 
of course).


- some streams have specific attributes (i.e mode, name...), but since 
they'll often been wrapped inside buffering or encoding streams, these 
attributes will not be available to the end user.


So wouldn't it be great to implement some transversal inheritance, 
simply by delegating to the underlying buffer/raw-stream, attribute 
retrievals which fail on the current stream ? A little __getattr__ 
should do it fine, shouldn't it ?
By the way, I'm having trouble with the name attribute of raw files, 
which can be string or integer (confusing), ambiguous if containing a 
relative path, and which isn't able to handle the new case of my 
library, i.e opening a file from an existing file handle (which is ALSO 
an integer, like C file descriptors...) ; I propose we deprecate it for 
the benefit or more precise attributes, like path (absolute path) and 
origin (which can be path, fileno, handle and can be extended...).


Methods too would deserve some auto-forwarding. If you want to bufferize 
a raw stream which also offers size(), times(), lock_file() and other 
methods, how can these be accessed from a top-level buffering/text 
stream ? So it would be interesting to have a system through which a 
stream can expose its additional features to top level streams, and at 
the same time tell these if they must flush() or not before calling 
these new methods (eg. asking the inode number of a file doesn't require 
flushing, but knowing its real size DOES require it.).


- I feel thread-safety locking and stream stream status checking are 
currently overly complicated. All methods are filled with locking calls 
and CheckClosed() calls, which is both a performance loss (most io 
streams will have 3 such levels of locking, when 1 would suffice) and 
error-prone (some times ago I've seen in sources several functions in 
which checks and locks seemed lacking).
Since we're anyway in a mood of imbricating streams, why not simply 
adding a safety stream on top of each stream chain returned by open() 
? That layer could gracefully handle mutex locking, CheckClosed() calls, 
and even, maybe, the attribute/method forwarding I evocated above. I 
know a pure metaprogramming solution would maybe not suffice for 
performance-seekers, but static implementations should be doable as well.


- some semantic decisions of the current system are somehow dangerous. 
For example, flushing errors occuring on close are swallowed. It seems 
to me that it's of the utmost importance that the user be warned if the 
bytes he wrote disappeared before reaching the kernel ; shouldn't we 
decidedly enforce a don't hide errors everywhere in the io module ?.


Regards,
Pascal



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-02-04 Thread Pascal Chambon

Matt Knox a écrit :

Jesse Noller jnoller at gmail.com writes:

  

We already have an implementation that spawns a
subprocess and then pushes the required state to the child. The
fundamental need for things to be pickleable *all the time* kinda
makes it annoying to work with.




just a lurker here... but this topic hits home with me so thought I'd chime
in. I'm a windows user and I would *love* to use multiprocessing a lot more
because *in theory* it solves a lot of the problems I deal with very nicely
(lot sof financial data number crunching). However, the pickling requirement
makes it very very difficult to actually get any reasonably complex code to
work properly with it.

A lot of the time the functions I want to call in the spawned processes are
actually fairly self contained and don't need most of the environment of the
parent process shoved into it, so it's annoying that it fails because some data
I don't even need in the child process can't be pickled.

What about having an option to skip all the parent environment data pickling
and require the user to manually invoke any imports that are needed in the
target functions as the first step inside their target function?

for example...

def target_function(object_from_module_xyz):
import xyz
return object_from_module_xyz.do_something()

and if I forgot to import all the stuff necessary for the arguments being
passed into my function to work, then it's my own problem.

Although maybe there is some obvious problem with this that I am not seeing.

Anyway, just food for thought.

- Matt

  


Hello

I don't really get it there... it seems to me that multiprocessing only 
requires picklability for the objects it needs to transfer, i.e those 
given as arguments to the called function, and thsoe put into 
multiprocessing queues/pipes. Global program data needn't be picklable - 
on windows it gets wholly recreated by the child process, from python 
bytecode.


So if you're having pickle errors, it must be because the 
object_from_module_xyz itself is *not* picklable, maybe because it 
contains references to unpicklable objets. In such case, properly 
implementing pickle magic methods inside the object should do it, 
shouldn't it ?


Regards,
Pascal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-02-02 Thread Pascal Chambon


Although I would be in favor of an atfork callback registration system 
(similar to atexit), it seems there is no way to solve the fork() 
problem automatically with this. Any attempt to acquire/release locks 
automatically will lead to deadlocks, as it is necessary to know the 
exact program workflow to take locks in the right order.


I guess spawnl semantic (i.e, like win32's CreateProcess()) can't become 
the default multiprocessing behaviour, as too many programs implicitly 
rely on the whole sharing of data under unix (and py3k itself is maybe 
becoming a little too mature for new compatility breaks) ; but well, as 
long as there are options to enforce this behaviour, it should be fine 
for everyone.


I'm quite busy with other libraries at the moment, but I'll study the 
integration of spawnl into the multiprocessing package, during coming 
weeks. B-)



Regards,
Pascal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-02-02 Thread Pascal Chambon


  

The word dogma is a good one in this context however. We ( ;-)) have
accepted and promoted the dogma that multiprocessing is the solution to
parallelism in the face of the GIL. While it needn't be applicable in any and
every situation, we should make it so that it is applicable often enough.



Again, wishing won't make it so:  there is no sane way to mix threading
and fork-without-exec except by keeping the parent process single
threaded until after any fork() calls.  Some applications may seem to
work when violating this rule, but their developers are doomed to hair
loss over time.

  
You pointed it out : fork() was not designed to work together with 
multithreading ; furthermore in many cases its data-duplication semantic 
is absolutely unneeded to solve the real problem.


So we can let fork-without-exec multiprocessing (with or without 
threads) for those who need it, and offer safer multiprocessing for 
those who just seek use of ease and portability - via spawn() semantic.


Regards, Pascal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-02-01 Thread Pascal Chambon


So, if a patch was proposed for the multiprocessing, allowing an unified 
spawnl, thread-safe, semantic, do you think something could prevent 
its integration ?


We may ignore the subprocess module, since fork+exec shouldn't be 
bothered by the (potentially disastrous) state of child process data.
But it bothers me to think multithreading and multiprocessing are 
currently opposed whereas theoretically nothing justifies it...


Regards,
Pascal





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-01-30 Thread Pascal Chambon


/[...]
What dangers do you refer to specifically? Something reproducible?
-L
/


Since it's a race condition issue, it's not easily reproducible with 
normal libraries - which only take threading locks for small moments.
But it can appear if your threads make good use of the threading module. 
By forking randomly, you have chances that the main locks of the logging 
module you frozen in an acquired state (even though their owner 
threads are not existing in the child process), and your next attempt to 
use logging will result in a pretty deadlock (on some *nix platforms, at 
least). This issue led to the creation of python-atfork by the way.



Stefan Behnel a écrit :

Stefan Behnel, 30.01.2010 07:36:
  

Pascal Chambon, 29.01.2010 22:58:


I've just recently realized the huge problems surrounding the mix of
multithreading and fork() - i.e that only the main thread actually
survived the fork(), and that process data (in particular,
synchronization primitives) could be left in a dangerously broken state
because of such forks, if multithreaded programs.
  

I would *never* have even tried that, but it doesn't surprise me that it
works basically as expected. I found this as a quick intro:

http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2003-09/0672.html



... and another interesting link that also describes exec() usage in this
context.

http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

Stefan

  

Yep, these links sum it up quite well.
But to me it's not a matter of trying to mix threads and fork - most 
people won't on purpose seek trouble.
It's simply the fact that, in a multithreaded program (i.e, any program 
of some importance), multiprocessing modules will be impossible to use 
safely without a complex synchronization of all threads to prepare the 
underlying forking (and we know that using multiprocessing can be a 
serious benefit, for GIL/performance reasons).
Solutions to fork() issues clearly exist - just add a use_forking=yes 
attribute to subprocess functions, and users will be free to use the 
spawnl() semantic, which is already implemented on win32 platforms, and 
which gives full control over both threads and subprocesses. Honestly, I 
don't see how it will complicate stuffs, except slightly for the 
programmer which will have to edit the code to add spwawnl() support (I 
might help on that).


Regards,
Pascal


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-01-30 Thread Pascal Chambon


/[...]
What dangers do you refer to specifically? Something reproducible?
-L
/


Since it's a race condition issue, it's not easily reproducible with 
normal libraries - which only take threading locks for small moments.
But it can appear if your threads make good use of the threading module. 
By forking randomly, you have chances that the main locks of the logging 
module you frozen in an acquired state (even though their owner 
threads are not existing in the child process), and your next attempt to 
use logging will result in a pretty deadlock (on some *nix platforms, at 
least). This issue led to the creation of python-atfork by the way.



Stefan Behnel a écrit :

Stefan Behnel, 30.01.2010 07:36:
  

Pascal Chambon, 29.01.2010 22:58:


I've just recently realized the huge problems surrounding the mix of
multithreading and fork() - i.e that only the main thread actually
survived the fork(), and that process data (in particular,
synchronization primitives) could be left in a dangerously broken state
because of such forks, if multithreaded programs.
  

I would *never* have even tried that, but it doesn't surprise me that it
works basically as expected. I found this as a quick intro:

http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2003-09/0672.html



... and another interesting link that also describes exec() usage in this
context.

http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

Stefan

  

Yep, these links sum it up quite well.
But to me it's not a matter of trying to mix threads and fork - most 
people won't on purpose seek trouble.
It's simply the fact that, in a multithreaded program (i.e, any program 
of some importance), multiprocessing modules will be impossible to use 
safely without a complex synchronization of all threads to prepare the 
underlying forking (and we know that using multiprocessing can be a 
serious benefit, for GIL/performance reasons).
Solutions to fork() issues clearly exist - just add a use_forking=yes 
attribute to subprocess functions, and users will be free to use the 
spawnl() semantic, which is already implemented on win32 platforms, and 
which gives full control over both threads and subprocesses. Honestly, I 
don't see how it will complicate stuffs, except slightly for the 
programmer which will have to edit the code to add spwawnl() support (I 
might help on that).


Regards,
Pascal


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Forking and Multithreading - enemy brothers

2010-01-29 Thread Pascal Chambon

Hello,

I've just recently realized the huge problems surrounding the mix of 
multithreading and fork() - i.e that only the main thread actually 
survived the fork(), and that process data (in particular, 
synchronization primitives) could be left in a dangerously broken state 
because of such forks, if multithreaded programs.


What bothers me most is that I've actually never seen, in python docs, 
any mention of that problems (linux docs are very discreet as well). 
It's as if multithreading and multiprocessing were orthogonal designs, 
whereas it can quickly happen that someone has a slightly multithreaded 
programs, and suddenly uses the multiprocessing module to perform a 
separate, performance-demanding task ; with disasters in store, since 
few people are blatantly aware of the underlying dangers...


So here are a few propositions to improve this matter :

* documenting the fork/multithreading danger, in fork(), multiprocessing 
and maybe subprocess (is it concerned, or is the fork+exec always safe 
?) modules. If it's welcome, I might provide documentation patches of 
course.


* providing means of taming the fork() beast : is there a possibility 
for the inclusion of python-atfork and similar projects into the stdlib 
(I mean, their semantic, not the monkey-patch way they currently use) ? 
It would also help a lot the proper management of file handle inheritance.


* maybe the most important : providing means to get rid of fork() 
whenever wanted. I'm especially thinking about the multiprocessing 
module : it seems it always uses forking on *nix platforms. Wouldn't it 
be better to also offer a spawnl() semantic, to allow safe 
multiprocessing use even in applications crowded with threads ? Win32 
already uses something like that, so all the infrastructure of data 
transfer is already there, and it would enforce cross-platform 
compatibility. Since multiprocessing theoretically means a low coupling, 
and little sharing of data, I guess this kind of spawnl() semantic would 
be highly sufficient for most situations, which don't require fork-based 
multiprocessing and its huge sharing of process data (in my opinion, 
inheriting file descriptors is all a child process can require from its 
parent.


Does it make sense to you ?

Regards,
Pascal Chambon

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition V2

2009-09-28 Thread Pascal Chambon

Antoine Pitrou a écrit :

Hello,

  

So here is the proposed semantic, which matches established conventions:

*IOBase.truncate(n: int = None) - int*


[...]

I still don't think there is a sufficient benefit in breaking 
compatibility. If you want the file pointer to remain the same, you can 
save it first and restore it afterwards manually.


  
Sure, but won't this truncate become some kind of a burden for py3k, if 
it's twice misleading (it's not a real truncation since it can extend 
the file, and it's not even a truncation or resizing in posix/win32 
style, since the filepointer is moved) ?
Since it was an undocumented behaviour, and py3k doesn't seem to be 
present yet in production environments (or is it ?), I'd promote this 
late-but-maybe-not-too-late change.
But if the consensus prefers the current behaviour, well, it'll be OK to 
me too, as long as it's sufficiently documented and advertised.



*Propositions of doc update*



Please open tracker issues for these kinds of suggestions.
  
Is the tracker Ok for simple suggestions too ? I thought it was rather 
for obvious bugfixes, and to-be-discused propositions had better be in 
mailing-lists... OK then, I'll open bugtracker issues for these. B-)





Instead of than size, perhaps than n.


Whoups, indeed _
Actually the signature would rather be:
*IOBase.truncate(size: int = None) - int*
And I forgot to mention that truncate returns the new file size 
(according to the current PEP)...




Should an exception be raised if start and/or end are out of range?
I'd advocate it yep, for the sake of explicit errors. However, should 
it be a ValueError (the ones io functions normally use) or an IndexError 
(which is technically more accurate, but might confuse the user) ?



Regards,
Pascal




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] IO module precisions and exception hierarchy

2009-09-28 Thread Pascal Chambon




 +-InvalidFileNameError (filepath max lengths, or ? / :  characters 
in a windows file name...)


This might be a bit too precise. Unix just has EINVAL, which
covers any kind of invalid parameter, not just file names.

Allright thanks, an InvalidParameter (or similar) exception should do it 
better then.



Personally I'd love to see a richer set of exceptions for IO errors, 
so long as they can be implemented for all supported platforms and no 
information (err number from the os) is lost.


I've been implementing a fake 'file' type [1] for Silverlight which 
does IO operations using local browser storage. The use case is for an 
online Python tutorial running in the browser [2]. Whilst implementing 
the exception behaviour (writing to a file open in read mode, etc) I 
considered improving the exception messages as they are very poor - 
but decided that being similar to CPython was more important.


Michael

[1] 
http://code.google.com/p/trypython/source/browse/trunk/trypython/app/storage.py 
and 
http://code.google.com/p/trypython/source/browse/trunk/trypython/app/tests/test_storage.py 


[2] http://www.trypython.org/


Cool stuff  :-)
It's indeed quite unsure at the moment which exceptions it will really 
be possible (and relevant) to implement in a cross-platform way... I 
guess I should use my own fileio implementation as a playground and a 
proof of concept, before we specify anything for CPython.



What happens isn't specified, but in practice (with the current 
implementation) the overwriting will happen at the byte level, without 
any check for correctness at the character level.


Actually, read+write text streams are implemented quite crudely, and 
little testing is done of them. The reason, as you discovered, is that 
the semantics are too weak, and it is not obvious how stronger semantics 
could look like. People wanting to do sophisticated random reads+writes 
over a text file should probably handle the encoding themselves and 
access the file at the binary level.
  
It sounds ok to me, as long as we notify users about this danger (I've 
myself just realized about it). Most newcomers may happily open an UTF8 
text file, and read/write in it carelessly, without realizing that the 
characters they write actually screw up the file...



How about just making IOError = OSError, and introducing your proposed 
subclasses? Does the usage of IOError vs OSError have *any* useful 
semantics?


I though that OSError dealt with a larger set of errors than IOError, 
but after checking the errno codes, it seems that they're all more or 
less related to IO problems (if we include interprocess communication in 
I/O). So theoretically, IOErrors and OSErrors might be merged. Note that 
in this case, windowsErrors would have to become children of 
EnvironmentError, because windows error code really seem to go farther 
than io errors (they deal with recursion limits, thousands of PC 
parameters...).
The legacy is so heavy that OSError would have to remain as is, I think, 
but we might simply forget it in new io modules, and concentrate on an 
IOError hierarchy to provide all the info needed by the developer.






Some of the error messages are truly awful though as things stand, 
especially for someone new to Python. Try to read from a file handle 
opened in read mode for example: IOError: [Errno 9] Bad file descriptor

Subdividing the IOError exception won't help with
that, because all you have to go on when deciding
which exception to raise is the error code returned
by the OS. If the same error code results from a
bunch of different things, there's not much Python
can do to sort them out.

Well, you don't only have the error number, you also have the context of 
this exception. IOErrors subclasses would particularly be useful in a 
high level IO contect, when each single method can issue lots of 
system calls (to check the file, lock it, edit it...). If the error is 
raised during your locking operation, you can decide to sort it as 
LockingError even if the error code provided might appear in several 
different situations.


Regards,
Pascal


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition V2

2009-09-27 Thread Pascal Chambon

Hello

Below is a corrected version of the PEP update, adding the start/end 
indexes proposition and fixing functions signatures. Does anyone 
disagree with these specifications ? Or can we consider it as a target 
for the next versions of the io module ?
I would have no problem to implement this behaviour in my own pure 
python FileIO system, however if someone is willing to patch the _fileio 
implementation, it'd save a lot of time - I most probably won't have the 
means to setup a C compilation environment under windows and linux, and 
properly update/test this, before January (when I get freelance...).


I launch another thread on other to-be-discussed IO points B-)

Regards,
Pascal

 PEP UPDATE for new I/O system - v2 ===

**Truncate and file pointer semantics**

Rationale :

The current implementation of truncate() always move the file pointer to 
the new end of file.


This behaviour is interesting for compatibility, if the file has been 
reduced and the file pointer is now past its end, since some platforms 
might require 0 = filepointer = filesize.


However, there are several arguments against this semantic:

   * Most common standards (posix, win32…) allow the file pointer to be
 past the end of file, and define the behaviour of other stream
 methods in this case
   * In many cases, moving the filepointer when truncating has no
 reasons to happen (if we’re extending the file, or reducing it
 without going beneath the file pointer)
   * Making 0 = filepointer = filesize a global rule of the python IO
 module doesn’t seems possible, since it would require
 modifications of the semantic of other methods (eg. seek() should
 raise exceptions or silently disobey when asked to move the
 filepointer past the end of file), and lead to incoherent
 situations when concurrently accessing files without locking (what
 if another process truncates to 0 bytes the file you’re writing ?)

So here is the proposed semantic, which matches established conventions:

*IOBase.truncate(n: int = None) - int*

Resizes the file to the size specified by the positive integer n, or by 
the current filepointer position if n is None.


The file must be opened with write permissions.

If the file was previously larger than size, the extra data is discarded.
If the file was previously shorter than size, its size is increased, and
the extended area appears as if it were zero-filled.

In any case, the file pointer is left unchanged, and may point beyond
the end of file.

Note: trying to read past the end of file returns an empty string, and
trying to write past the end of file extends it by zero-ing the gap. On
rare platforms which don't support file pointers to be beyond the end of
file, all these behaviours shall be faked thanks to internal storage of
the wanted file pointer position (silently extending the file, if
necessary, when a write operation occurs).



*Propositions of doc update*

*RawIOBase*.read(n: int) - bytes

Read up to n bytes from the object and return them. Fewer than n bytes
may be returned if the operating system call returns fewer than n bytes.
If 0 bytes are returned, and n was not 0, this indicates end of file. If
the object is in non-blocking mode and no bytes are available, the call
returns None.


*RawIOBase*.readinto(b: bytearray, [start: int = None], [end: int = 
None]) - int


start and end are used as slice indexes, so that the bytearray taken 
into account is actually range = b[start:end] (or b[start:], b[:end] or 
b[:], depending on the arguments which are not None).


Read up to len(range) bytes from the object and store them in b, returning
the number of bytes read. Like .read, fewer than len(range) bytes may be
read, and 0 indicates end of file if len(range) is not 0.
None is returned if a non-blocking object has no bytes available. The 
length of b is never changed.




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] IO module precisions and exception hierarchy

2009-09-27 Thread Pascal Chambon

Found in current io PEP :
Q: Do we want to mandate in the specification that switching between 
reading and writing on a read-write object implies a .flush()? Or is 
that an implementation convenience that users should not rely on?
- it seems that the only important matter is : file pointer positions 
and bytes/characters read should always be the ones that the user 
expects, as if there
were no buffering. So flushing or not may stay a non-mandatory 
behaviour, as long as the buffered streams ensures this data integrity.
Eg. If a user opens a file in r/w mode, writes two bytes in it (which 
stay buffered), and then reads 2 bytes, the two bytes read should be 
those on range [2:4] of course, even though the file pointer would, due 
to python buffering, still be at index 0.



Q from me : What happens in read/write text files, when overwriting a 
three-bytes character with a single-byte character ? Or at the contrary, 
when a single chinese character overrides 3 ASCII characters in an UTF8 
file ? Is there any system designed to avoid this data corruption ? Or 
should TextIO classes forbid read+write streams ?



IO Exceptions :
Currently, the situation is kind of fuzzy around EnvironmentError 
subclasses.
* OSError represents errors notified by the OS via errno.h error codes 
(as mirrored in the python errno module).
errno.h errors (less than 125 error codes) seem to represent the whole 
of *nix system errors. However, Windows has many more system errors 
(15000+). So windows errors, when they can't be mapped to one of the 
errno errors are raises as WindowsError instances (a subclass of 
OSError), with the special attribute winerror indicating that win32 
error code.
* IOError are errors raised because of I/O problems, but they use 
errno codes, like OSError.


Thus, at the moment IOErrors rather have the semantic of particular 
case of OSError, and it's kind of confusing to have them remain in 
their own separate tree... Furthermore, OSErrors are often used where 
IOErrors would perfectly fit, eg. in low level I/O functions of the OS 
module.
Since OSErrors and IOErrors are slightly mixed up when we deal with IO 
operations, maybe the easiest way to make it clearer would be to push to 
their limits already existing designs.


- the os module should only raise OSErrors, whatever the os operation 
involved (maybe it's already the case in CPython, isn't it ?)
- the io module should only raise IOErrors and its subclasses, so that 
davs can easily take measures depending on the cause of the io failure 
(except 1 OSError exception, it's already the case in _fileio)
- other modules refering to i/o might maybe keep their current (fuzzy) 
behaviour, since they're more platform specific, and should in the end 
be replaced by a crossplatform solution (at least I'd love it to happen)


Until there, there would be no real benefits for the user, compared to 
catching EnvironmentErrors as most probably do. But the sweet thing 
would be to offer a concise but meaningfull IOError hierarchy, so that 
we can easily handle most specific errors gracefully (having a disk full 
is not the same level of gravity as simply having another process 
locking your target file).


Here is a very rough beginning of IOError hierarchy. I'd liek to have 
people's opinion on the relevance of these, as well as on what other 
exceptions should be distinguished from basic IOErrors.


IOError
 +-InvalidStreamError  (eg. we try to write on a stream opened in 
readonly mode)

 +-LockingError
 +-PermissionError (mostly *nix chmod stuffs)
 +-FileNotFoundError
 +-DiskFullError
 +-MaxFileSizeError (maybe hard to implement, happens when we exceed 
4Gb on fat32 and stuffs...)
 +-InvalidFileNameError (filepath max lengths, or ? / :  characters 
in a windows file name...)


Regards,
Pascal



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs

2009-09-20 Thread Pascal Chambon
Well, system compatibility argues strongl in favor of not letting 
filepointer  EOF.
However, is that really necessary to move the pointer to EOF in ANY case 
? I mean, if I extend the file, or if I reduce it without going lower 
than my current filepointer, I really don't expect at all the io system 
to move my pointer to the end of file, just for fun. In these 
patterns, people would have to remember their current filepointer, to 
come back to where they were, and that's not pretty imo...


If we agree on the simple mandatory expression 0 = filepointer = EOF 
(for cross-platform safety), then we just have to enforce it when the 
rule is broken : reducing the size lower than the filepointer, and 
seeking past the end of file. All other conditions should leav the 
filepointer where the user put it. Shouldnt it be so ?


  
Concerning the naming of truncate(), would it be possible to deprecate 
it and alias it to resize() ? It's not very gratifying to have 
duplicated methods at the beginning of a major release, but I feel too 
that truncate is a misleading term, that had better be replaced asap.


Regards,
Pascal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] POSIX [Fuzziness in io module specs]

2009-09-20 Thread Pascal Chambon




What we could do with is better platform-independent
ways of distinguishing particular error conditions,
such as file not found, out of space, etc., either
using subclasses of IOError or mapping error codes
to a set of platform-independent ones.



Well, mapping all errors (including C ones and windows-specific ones) to 
a common set would be extremely useful for developers indeed.
I guess some advanced windows errors will never have equivalents 
elsewhere, but does anyone know an error code set which would be 
relevant to cover all memorty, filesystem, io and locking aspects ?



Regards,
Pascal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition

2009-09-20 Thread Pascal Chambon

Hello

After weighing up here and that, here is what I have come with. Comments 
and issue notifications more than welcome, of course. The exception 
thingy is not yet addressed.


Regards,
Pascal


*Truncate and file pointer semantics*

Rationale :

The current implementation of truncate() always move the file pointer to 
the new end of file.


This behaviour is interesting for compatibility, if the file has been 
reduced and the file pointer is now past its end, since some platforms 
might require 0 = filepointer = filesize.


However, there are several arguments against this semantic:

   * Most common standards (posix, win32...) allow the file pointer to
 be past the end of file, and define the behaviour of other stream
 methods in this case
   * In many cases, moving the filepointer when truncating has no
 reasons to happen (if we're extending the file, or reducing it
 without going beneath the file pointer)
   * Making 0 = filepointer = filesize a global rule of the python IO
 module doesn't seems possible, since it would require
 modifications of the semantic of other methods (eg. seek() should
 raise exceptions or silently disobey when asked to move the
 filepointer past the end of file), and lead to incoherent
 situations when concurrently accessing files without locking (what
 if another process truncates to 0 bytes the file you're writing ?)

So here is the proposed semantic, which matches established conventions:

*RawIOBase.truncate(n: int = None) - int*

*(same for BufferedIOBase.truncate(pos: int = None) - int)*

Resizes the file to the size specified by the positive integer n, or by 
the current filepointer position if n is None.


The file must be opened with write permissions.

If the file was previously larger than n, the extra data is discarded. 
If the file was previously shorter than n, its size is increased, and 
the extended area appears as if it were zero-filled.


In any case, the file pointer is left unchanged, and may point beyond 
the end of file.


Note: trying to read past the end of file returns an empty string, and 
trying to write past the end of file extends it by zero-ing the gap. On 
rare platforms which don't support file pointers to be beyond the end of 
file, all these behaviours shall be faked thanks to internal storage of 
the wanted file pointer position (silently extending the file, if 
necessary, when a write operation occurs).




*Proposition of doc update*

*RawIOBase*.read(n: int) - bytes

Read up to n bytes from the object and return them. Fewer than n bytes 
may be returned if the operating system call returns fewer than n bytes. 
If 0 bytes are returned, and n was not 0, this indicates end of file. If 
the object is in non-blocking mode and no bytes are available, the call 
returns None.


*RawIOBase*.readinto(b: bytes) - int

Read up to len(b) bytes from the object and stores them in b, returning 
the number of bytes read. Like .read, fewer than len(b) bytes may be 
read, and 0 indicates end of file if b is not 0. None is returned if a 
non-blocking object has no bytes available. The length of b is never 
changed.





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fuzziness in io module specs - PEP update proposition

2009-09-20 Thread Pascal Chambon

Daniel Stutzbach a écrit :
On Sun, Sep 20, 2009 at 4:48 AM, Pascal Chambon 
chambon.pas...@gmail.com mailto:chambon.pas...@gmail.com wrote:


*RawIOBase*.readinto(b: bytes) - int


bytes are immutable.  The signature is:

*RawIOBase*.readinto(b: bytearray) - int

Your efforts in working on clarifying these important corner cases is 
appreciated. :-)



You're welcome B-)

Indeed my copy/paste of the current pep was an epic fail - you'll all 
have recognized readinto actually dealt with bytearrays, contrarily to 
what the current pep tells

- http://www.python.org/dev/peps/pep-3116/

RawIOBase.read(int) takes a positive-or-zero integer indeed (I am used 
to understanding this, as opposed to strictly positive)


Does MRAb's suggestion of providing beginning and end offsets for the 
bytearray meets people's expectations ? Personnaly, I feel readinto is a 
very low-level method, mostly used by read() to get a result from 
low-level native functions (fread, readfile), and read() always provides 
a buffer with the proper size... are there cases in which these two 
additional arguments would provide some real gain ?



Concerning the backward compatibility problem, I agree we should not 
break specifications, but breaking impelmentation details is another 
thing for me. It's a golden rule in programmers' world : thou shalt 
NEVER rely on implementation details. Programs that count on these (eg. 
thinking that listdir() will always returns . and .. as first 
item0... until it doesnt anymore) encounter huge problems when changing 
of platform or API version. When programming with the current 
truncate(), I would always have moved the file pointer after truncating 
the file, simply because I have no idea of what might happen to it 
(nothing was documented on this at the moment, and looking at the 
sources is really not a sustainable behaviour).
So well, it's a pity if some early 3.1 users relied on it, but if we 
stick to the current semantic we still have a real coherency problem - 
seek() is not limited in range, and some experienced programmers might 
be trapped by this non-conventionnal truncate() if they rely on posix or 
previous python versions... I really dislike the idea that truncate() 
might move my file offset even when there are no reasons for it.


Regards,
Pascal




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] POSIX [Fuzziness in io module specs]

2009-09-19 Thread Pascal Chambon


@pitrou: non-blocking IO in python ? which ones are you thinking about ?
I have currently no plan to work on asynchronous IO like win32's 
readFileEx() etc. (too many troubles for the benefit), however I'd be 
interested by getting non-blocking operations on IPC pipes (I've crossed 
several people in trouble with that, having a process never end on some 
OSes because they couldn't stop threads blocked on pipes).
This reimplementation is actually necessary to get file locking, because 
advanced win32 operations only work on real file handles, not the 
handles that are underlying the C API layer. Furthermore, some 
interesting features (like O_EXCL | O_CREAT) are not possible with the 
current io implementations. So well, reimplementation required ^^


Else, allright, I'll try to summarize the various points in a PEP-update.

Concerning the truncate method however, on second thought I feel we 
might take distance from Posix API for naming, precisely since it's 
anyway too platform-specific (windows knows nothing about Posix, and 
even common unix-like systems modify it in a way or another - several 
systems don't zero-fill files when extending them).
When seeing truncate, in my opinion, most people will think it's only 
to reduce file size (for beginners), or will immediately get in mind 
all the tips of posix-like systems (for more experienced developers). 
Shouldn't we, like other cross-platform APIs, use a more unambiguous 
notion, like setLength (java) or resize (Qt) ? And let the 
filepointer untouched, simply because there are no reasons to move it, 
especially when extending the file (yep, on windows we're forced to move 
the pointer, but it's easy to fix) ?
If it's too late to modify the IO API, too bad, but I don't feel 
comfortable with the truncate word. And I don't like the fact that we 
move the filepointer to prevent it from exceeding the file size, whereas 
on the other hand we can seek() anywhere without getting exceptions (and 
so, set the filepointer past the end of file). Having 0 = filepointer 
= EOF is OK to me, but then we have to enforce it for all functions, 
not just truncate.


Concerning exceptions, which one is raised is not so important to me, as 
long as it's well documented and not tricky (eg. WindowsErrors are OK to 
me, because they subclass OSError, so most cross-platform programs wont 
even have to know about them).
I had the feeling that IOErrors were for operations on file streams 
(opening, writing/reading, closing...), whereas OSErrors were for 
manipulations on filesystems (renaming, linking, stating...) and 
processes. This semantic would be perfect for me, and it's already 95% 
here, we would just have to fix some unwelcomed OSErrors exceptions in 
the iomodule. Isn't that worth it ? It'd simplify programmers' job a 
lot, and allow a more subtle treatment of exceptions (if everyone just 
catches Environment errors, without being sure of which subcless is 
actually raised, we miss the point of IOError and OSError).


Regards,
Pascal







James Y Knight a écrit :


On Sep 18, 2009, at 8:58 PM, Antoine Pitrou wrote:

I'm not sure that's true. Various Unix/Linux man pages are readily
available on the Internet, but they regard specific implementations,
which often depart from the spec in one way or another. POSIX specs
themselves don't seem to be easily reachable; you might even have to 
pay

for them.



The POSIX specs are quite easily accessible, without payment.

I got my quote by doing:
man 3p ftruncate

I had previously done:
apt-get install manpages-posix-dev
to install the posix manpages. That package contains the POSIX 
standard as of 2003. Which is good enough for most uses. It seems to 
be available here, if you don't have a debian system:

http://www.kernel.org/pub/linux/docs/man-pages/man-pages-posix/

There's also a webpage, containing the official POSIX 2008 standard:
   http://www.opengroup.org/onlinepubs/9699919799/

And to navigate to ftruncate from there, click System Interfaces in 
the left pane, System Interfaces in the bottom pane, and then 
ftruncate in the bottom pane.


James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/firephoenix%40wanadoo.fr 









___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] POSIX [Fuzziness in io module specs]

2009-09-19 Thread Pascal Chambon
Good example with os.write(f.fileno(), 'blah') - and you obtain the 
same error if you try to open an io.FileIo by providing a file 
descriptor instead of a file name as first argument. This would really 
deserve an unification.


Actually, since Windows Error Codes concern any possible error (IO, file 
permissions, memory problems...), I thought the best would be to convert 
them to the most appropriate python standard exception, only defaulting 
to WindowsError (i.e, OSError's hierarchy) when no other exception type 
matches. So at the moment, I use a decorator to automatically convert 
all errors on stream operations into IOErrors. Error codes are not the 
same as unix ones indeed, but I don't know if it's really important 
(imo, most people just want to know if the operation was successful, I 
don't know if many developers scan error codes to act accordingly). For 
IOError types that really matter (eg. file already locked, buffer full), 
the easiest is actually to use subclasses of IOError (the io module 
already does that, even though I'll maybe have to create new exceptions 
for errors like file already exists or file already locked by another 
process)


Regards,
Pascal

Daniel Stutzbach a écrit :
On Sat, Sep 19, 2009 at 2:46 AM, Pascal Chambon 
chambon.pas...@gmail.com mailto:chambon.pas...@gmail.com wrote:


This reimplementation is actually necessary to get file locking,
because advanced win32 operations only work on real file handles,
not the handles that are underlying the C API layer. Furthermore,
some interesting features (like O_EXCL | O_CREAT) are not possible
with the current io implementations. So well, reimplementation
required ^^

 


Concerning exceptions, which one is raised is not so important to
me, as long as it's well documented and not tricky (eg.
WindowsErrors are OK to me, because they subclass OSError, so most
cross-platform programs wont even have to know about them).


If you use real Windows file handles (instead of the POSIX-ish Windows 
API), won't you need to return WindowsErrors?
 


I had the feeling that IOErrors were for operations on file
streams (opening, writing/reading, closing...), whereas OSErrors
were for manipulations on filesystems (renaming, linking,
stating...) and processes.


If that were documented and a firm rule, that would certainly be 
great.  It's not too hard to find counterexamples in the current 
codebase.  Also, I'm not sure how one could avoid needing to raise 
WindowsError in some cases.


Maybe someone with more knowledge of the history of IOError vs. 
OSError could chime in.


Python 2.6:

 os.write(f.fileno(), 'blah')
Traceback (most recent call last):
  File stdin, line 1, in module
OSError: [Errno 9] Bad file descriptor
 f.write('blah')
Traceback (most recent call last):
  File stdin, line 1, in module
IOError: [Errno 9] Bad file descriptor

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC
http://stutzbachenterprises.com 




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/firephoenix%40wanadoo.fr
  


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] POSIX [Fuzziness in io module specs]

2009-09-19 Thread Pascal Chambon


@pitrou: non-blocking IO in python ? which ones are you thinking about ?
I have currently no plan to work on asynchronous IO like win32's 
readFileEx() etc. (too many troubles for the benefit), however I'd be 
interested by getting non-blocking operations on IPC pipes (I've crossed 
several people in trouble with that, having a process never end on some 
OSes because they couldn't stop threads blocked on pipes).
This reimplementation is actually necessary to get file locking, because 
advanced win32 operations only work on real file handles, not the 
handles that are underlying the C API layer. Furthermore, some 
interesting features (like O_EXCL | O_CREAT) are not possible with the 
current io implementations. So well, reimplementation required ^^


Else, allright, I'll try to summarize the various points in a PEP-update.

Concerning the truncate method however, on second thought I feel we 
might take distance from Posix API for naming, precisely since it's 
anyway too platform-specific (windows knows nothing about Posix, and 
even common unix-like systems modify it in a way or another - several 
systems don't zero-fill files when extending them).
When seeing truncate, in my opinion, most people will think it's only 
to reduce file size (for beginners), or will immediately get in mind 
all the tips of posix-like systems (for more experienced developers). 
Shouldn't we, like other cross-platform APIs, use a more unambiguous 
notion, like setLength (java) or resize (Qt) ? And let the 
filepointer untouched, simply because there are no reasons to move it, 
especially when extending the file (yep, on windows we're forced to move 
the pointer, but it's easy to fix) ?
If it's too late to modify the IO API, too bad, but I don't feel 
comfortable with the truncate word. And I don't like the fact that we 
move the filepointer to prevent it from exceeding the file size, whereas 
on the other hand we can seek() anywhere without getting exceptions (and 
so, set the filepointer past the end of file). Having 0 = filepointer 
= EOF is OK to me, but then we have to enforce it for all functions, 
not just truncate.


Concerning exceptions, which one is raised is not so important to me, as 
long as it's well documented and not tricky (eg. WindowsErrors are OK to 
me, because they subclass OSError, so most cross-platform programs wont 
even have to know about them).
I had the feeling that IOErrors were for operations on file streams 
(opening, writing/reading, closing...), whereas OSErrors were for 
manipulations on filesystems (renaming, linking, stating...) and 
processes. This semantic would be perfect for me, and it's already 95% 
here, we would just have to fix some unwelcomed OSErrors exceptions in 
the iomodule. Isn't that worth it ? It'd simplify programmers' job a 
lot, and allow a more subtle treatment of exceptions (if everyone just 
catches Environment errors, without being sure of which subcless is 
actually raised, we miss the point of IOError and OSError).


Regards,
Pascal

James Y Knight a écrit :


On Sep 18, 2009, at 8:58 PM, Antoine Pitrou wrote:

I'm not sure that's true. Various Unix/Linux man pages are readily
available on the Internet, but they regard specific implementations,
which often depart from the spec in one way or another. POSIX specs
themselves don't seem to be easily reachable; you might even have to pay
for them.



The POSIX specs are quite easily accessible, without payment.

I got my quote by doing:
man 3p ftruncate

I had previously done:
apt-get install manpages-posix-dev
to install the posix manpages. That package contains the POSIX 
standard as of 2003. Which is good enough for most uses. It seems to 
be available here, if you don't have a debian system:

http://www.kernel.org/pub/linux/docs/man-pages/man-pages-posix/

There's also a webpage, containing the official POSIX 2008 standard:
   http://www.opengroup.org/onlinepubs/9699919799/

And to navigate to ftruncate from there, click System Interfaces in 
the left pane, System Interfaces in the bottom pane, and then 
ftruncate in the bottom pane.


James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/firephoenix%40wanadoo.fr 






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] POSIX [Fuzziness in io module specs]

2009-09-19 Thread Pascal Chambon




Antoine Pitrou a écrit :

Hello,

Pascal Chambon pythoniks at gmail.com writes:
  

@pitrou: non-blocking IO in python ? which ones are you thinking about ?



I was talking about the existing support for non-blocking IO in the FileIO class
(look up EAGAIN in fileio.c), as well as in the Buffered* objects.

  


Allright, I'll check that EAGAIN stuff, that I hadn't even noticed  :)



And I don't like the fact that we 
move the filepointer to prevent it from exceeding the file size,



I don't see what you mean:

  


Well the sample code you showed is not shocking, but I'd like to have a 
coherency the with file.seek(), because if truncate() prevents 
out-of-bound file pointer, other methods should do the same as well (and 
raise IOError when seeking out of file bounds).




I had the feeling that IOErrors were for operations on file streams 
(opening, writing/reading, closing...), whereas OSErrors were for 
manipulations on filesystems (renaming, linking, stating...) and 
processes.



Ok, but the distinction is certainly fuzzy in many cases. I have no problem with
trying to change the corner cases you mention, though.
  
The case which could be problematic there is the file opening, because 
it can involve problems at all levels of the OS (filesystem not 
existing, permission problems, file locking...), so we should keep it in 
the EnvironmentError area.
But as soon as a file is open, I guess only IOErrors can be involved (no 
space left, range locked etc), so enforcing all this to raise IOError 
would be OK I think.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Fuzziness in io module specs

2009-09-18 Thread Pascal Chambon

Hello everyone

I'm currently working on a reimplementation of io.FileIO, which would 
allow cross-platform file range locking and all kinds of other safety 
features ; however I'm slightly stuck due to some specification 
fuzziness in the IO docs.

CF http://bugs.python.org/issue6939

The main points that annoy me at the moment :
- it is unclear what truncate() methods do with the file pointer, and 
even if the current implementation simply moves it to the truncation 
point, it's very contrary to the standard way of doing under unix, where 
the file pointer is normally left unchanged. Shouldn't we specify that 
the file pointer remains unmoved, and fix the _fileio module accordingly ?
- exceptions are not always specified, and even if most of them are 
IOErrors, weirdly, in some cases, an OSError is raised instead (ie, if 
we try to wrap a wrong file descriptor when instanciating a new FileIO). 
This might lead to bad program crashes if some people don't refuse the 
temptation to guess and only get prepared to catch IOErrors
- the doc sometimes says that when we receive an empty string from a 
read() operation, without exceptions, it means the file is empty. 
However, with the current implementation, if we call file.read(0), we 
simply receive , even though it doesn't mean that we're at EOF. 
Shouldn't we avoid this (rare, I admit) ambiguity on the return value, 
by preventing read(0) ? Or at least, note in the doc that (we receive an 
empty string) - (the file is at EOF OR we called read with 0 as 
parameter) ?


Are there some arguments that I don't know, which lead to this or that 
particular implementation choice ?
I'd strongly advocate very detailled specifications, letting no room for 
cross-platform subtilities (that's also a strong goal of my 
reimplemntation), since that new IO system (which saved me a lot of 
coding time, by the way) should become the base of many programs.


So wouldn't it be a godo idea to write some kind of mini-pep, just to 
fix the corner cases of the current IO documentation ? I might handle 
it, if no more-knowledgeable people feels like it.


Regards,
Pascal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hello everyone + little question around Cpython/stackless

2008-12-23 Thread Pascal Chambon

Allright then, I understand the problem...

Thanks a lot,
regards,
Pascal


  



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Hello everyone + little question around Cpython/stackless

2008-12-22 Thread Pascal Chambon


Hello snakemen and snakewomen

I'm Pascal Chambon, a french engineer just leaving my Telecom School, 
blatantly fond of Python, of its miscellaneous offsprings and of all 
what's around dynamic languages and high level programming concepts.



I'm currently studying all I can find on stackless python, PYPY and the 
concepts they've brought to Python, and so far I wonder : since 
stackless python claims to be 100% compatible with CPython's extensions, 
faster, and brings lots of fun stuffs (tasklets, coroutines and no C 
stack), how comes it hasn't been merged back, to become the standard 
'fast' python implementation ? Would I have missed some crucial point 
around there ? Isn't that a pity to maintain two separate branches if 
they actually complete each other very well ?


Waiting for your lights on this subject,
regards,
Pascal



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com