New submission from David Beazley d...@dabeaz.com:
Is the struct.pack() function supposed to automatically encode Unicode strings
into binary? For example:
struct.pack(10s,Jalape\u00f1o)
b'Jalape\xc3\xb1o\x00'
This is Python 3.2b1.
--
components: Library (Lib)
messages: 124727
David Beazley d...@dabeaz.com added the comment:
Note: This is what happens in Python 2.6.4:
import struct
struct.pack(10s,uJalape\u00f1o)
Traceback (most recent call last):
File stdin, line 1, in module
struct.error: argument for 's' must be a string
David Beazley d...@dabeaz.com added the comment:
Hmmm. Well, the docs seem to say that it's allowed and that it will be encoded
as UTF-8.
Given the treatment of Unicode/bytes elsewhere in Python 3, all I can say is
that this behavior is rather surprising
David Beazley d...@dabeaz.com added the comment:
Why is it even encoding at all? Almost every other part of Python 3 forces you
to be explicit about bytes/string conversion. For example:
struct.pack(10s, x.encode('utf-8'))
Given that automatic conversion is documented, it's not clear what
David Beazley d...@dabeaz.com added the comment:
I encountered this issue is in the context of distributed
computing/interprocess communication involving binary-encoded records (and
encoding/decoding such records using struct). At its core, this is all about
I/O--something where encodings
David Beazley d...@dabeaz.com added the comment:
Actually, here's another one of my favorite examples:
import struct
struct.pack(s,\xf1)
b'\xc3'
Not only does this not encode the correct value, it doesn't even encode the
entire UTF-8 encoding (just the first byte of it). Like I said
David Beazley d...@dabeaz.com added the comment:
As a user of Python 3, I would like echo Victor's comment about fixing the API
right now as opposed to having to deal with it later. I can only speak for
myself, but I would guess that anyone using Python 3 already understands that
it's
David Beazley d...@dabeaz.com added the comment:
Thanks everyone for looking at this!
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10783
New submission from David Beazley d...@dabeaz.com:
Is something like this supposed to work:
import gzip
import io
f = io.TextIOWrapper(gzip.open(foo.gz),encoding='ascii'))
Traceback (most recent call last):
File stdin, line 1, in module
AttributeError: readable
In a nutshell--reading
David Beazley d...@dabeaz.com added the comment:
It goes without saying that this also needs to be checked with the bz2 module.
A quick check seems to indicate that it has the same problem.
While you're at it, maybe someone could add an 'open' function to bz2 to make
it symmetrical with gzip
David Beazley d...@dabeaz.com added the comment:
C or not, wrapping a BZ2File instance with a TextIOWrapper to get text still
seems like something that someone might want to do. I doubt it would take much
modification to give BZ2File instances the required set of methods
David Beazley d...@dabeaz.com added the comment:
Do Python devs really view gzip and bz2 as two totally completely different
animals? They both have the same functionality and would be used for the same
kinds of things. Maybe I'm missing something
David Beazley d...@dabeaz.com added the comment:
Hmmm. Interesting. In the big picture, it might be an interesting project for
someone (not necessarily the core devs) to sit down and refactor both of these
modules so that they play nice with Python 3 I/O system. Obviously that's a
project
David Beazley d...@dabeaz.com added the comment:
Have any other programming environments ever had a feature where a socket
timeout returns an exception containing partial data?I'm not aware of one
offhand and speaking as a systems programmer, something like this might be
somewhat
David Beazley d...@dabeaz.com added the comment:
A comment from the training world: The instability of IDLE on the Mac makes
teaching introductory Python courses a nightmare at the moment. Sure, one
might argue that students should install an alternative editor, but then you
usually end up
David Beazley d...@dabeaz.com added the comment:
Just wanted to say that I agree it's nonsense to continue reading on a socket
that timed out (I'm not even sure what I might have been thinking when I first
submitted this bug other than just experimenting with edge cases of the socket
David Beazley d...@dabeaz.com added the comment:
Anyone contemplating the use of aio_ functions should first go read The Story
of Mel.
http://www.catb.org/jargon/html/story-of-mel.html
--
nosy: +dabeaz
___
Python tracker rep...@bugs.python.org
http
David Beazley d...@dabeaz.com added the comment:
Glad you liked it! I think there is a bit of a cautionary tale in there
though. With aio_, there is the promise of better performance, but you're also
going to need a *LOT* of advance planning and thought to avoid creating a
tangled coding
David Beazley d...@dabeaz.com added the comment:
Bump. This is still broken in Python 3.2.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10791
David Beazley d...@dabeaz.com added the comment:
If I can find some time, I may took a look at this. I just noticed that
similar problems arise trying to wrap TextIOWrapper around the file-like
objects returned by urllib.request.urlopen as well.
In the big picture, some discussion of what
David Beazley d...@dabeaz.com added the comment:
Python 3.2 (r32:88445, Feb 20 2011, 21:51:21)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type help, copyright, credits or license for more information.
import gzip
import io
f = io.TextIOWrapper(gzip.open(file.gz),encoding='latin-1
New submission from David Beazley d...@dabeaz.com:
Is io.FileIO.write() supposed to accept and implicitly encode Unicode strings
as illustrated by this simple example?
f = open(/dev/null,wb,buffering=0)
f.write(Hello World\n)
12
Moreover, is the behavior of BufferedWriter objects supposed
New submission from David Beazley d...@dabeaz.com:
Documentation (e.g., docstrings) for the io module make mention of a
BlockingIOError exception that might be raised if operations are performed on a
file that's in non-blocking mode. However, I am unable to get this exception
on any
New submission from David Beazley d...@dabeaz.com:
Background
---
In order to multitask with threads, a critical part of the Python
interpreter implementation concerns the behavior of I/O operations
such as read, write, send, and receive. Specifically, whenever an I/O
operation
David Beazley d...@dabeaz.com added the comment:
The comment on the CPU-bound workload is valid--it is definitely true that
Python 2.6 results will degrade as the workload of each tick is increased.
Maybe a better way to interpreter those results is as a baseline of what kind
of I/O
David Beazley d...@dabeaz.com added the comment:
I posted some details about the priority GIL modifications I showed during my
PyCON open-space session here:
http://www.dabeaz.com/blog/2010/02/revisiting-thread-priorities-and-new.html
I am attaching the .tar.gz file with modifications
David Beazley d...@dabeaz.com added the comment:
Here's a short benchmark for everyone who thinks that my original benchmark was
somehow related to TCP behavior. This one doesn't even involve sockets:
from threading import Thread
import time
def writenums(f,n):
start = time.time
David Beazley d...@dabeaz.com added the comment:
Whoa, that's pretty diabolically evil with bufsize=1. On my machine, doing
that just absolutely kills the performance (13 seconds without the spinning
thread versus 557 seconds with the thread!). Or, put another way, the writing
performance
David Beazley d...@dabeaz.com added the comment:
Almost forgot--if I turn off one of the CPU cores, the time drops from 557
seconds to 32 seconds. Gotta love it!
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7946
David Beazley d...@dabeaz.com added the comment:
Oh the situation definitely matters. Although, in the big picture, most
programmers would probably prefer to have fast I/O performance over slow I/O
performance :-).
--
___
Python tracker rep
David Beazley d...@dabeaz.com added the comment:
I absolutely agree 100% that it is not worth trying to fix the GIL for every
conceivable situation (although if you could, I wouldn't complain).
To me, there are really only two scenarios worth worrying about:
1. Get rid of all
David Beazley d...@dabeaz.com added the comment:
Without looking at this patch, I think it would wise to proceed with caution on
incorporating any kind of GIL patch into 2.X. If there is anything to be taken
away from my own working studying the GIL, it's that the problem is far more
tricky
David Beazley d...@dabeaz.com added the comment:
I'm not sure where you're getting your information, but the original GIL
problem *DEFINITELY* exists on multicore Windows machines. I've had numerous
participants try it in training classes and workshops they've all observed
severely degraded
David Beazley d...@dabeaz.com added the comment:
Just ran the CPU-bound GIL test on my wife's dual core Windows Vista machine.
The code runs twice as slow using two threads as it does using no threads
(original observed behavior in my GIL talk
David Beazley d...@dabeaz.com added the comment:
It's not a simple mutex because if you did that, you would have performance
problems much worse than those described in issue 7946.
http://bugs.python.org/issue7946
--
___
Python tracker rep
David Beazley d...@dabeaz.com added the comment:
The analysis of instruction cache behavior is interesting---I could definitely
see that coming into play given the heavy penalty that one sees going to
multiple cores (it's a side effect in addition everything else that goes wrong
David Beazley d...@dabeaz.com added the comment:
I must be missing something, but why, exactly would you want multiple CPU-bound
threads to yield every 100 ticks? Frankly, that sounds like a horrible idea
that is going to hammer your system with excessive context switching overhead
David Beazley d...@dabeaz.com added the comment:
Sorry, but I don't see how you can say that the round-robin GIL and the legacy
GIL have the same behavior based solely on the result of a performance
benchmark. Do you have any kind of thread scheduling trace that proves they
are scheduling
David Beazley d...@dabeaz.com added the comment:
I'm sorry, I still don't get the supposed benefits of this round-robin patch
over the legacy GIL. Given that using interpreter ticks as a basis for thread
scheduling is problematic to begin with (mostly due to the fact that ticks have
totally
David Beazley d...@dabeaz.com added the comment:
What bothers me most about this discussion is that the Windows implementation
(legacy GIL) is being held up as an example of what we should be doing on
posix. Yet, if I go run the same thread tests that I presented in my GIL talks
David Beazley d...@dabeaz.com added the comment:
I hope everyone realizes that all of this bike-shedding about emulated
semaphores versus real semaphores is mostly a non-issue. For one thing, go
look at how a real semaphore is implemented by reading the source code to
pthreads or some other
David Beazley d...@dabeaz.com added the comment:
I'm sorry, but even in the presence of fair locking, I still don't like this
patch. The main problem is that it confuses fair locking with fair CPU
use---something that this patch does not and can not achieve on any platform.
The main problem
David Beazley d...@dabeaz.com added the comment:
I've attached a test fair.py that gives an example of the fair CPU scheduling
issue. In this test, there are two threads, one of which has fast-running
ticks, one of which has slow-running ticks.
Here is their sequential performance (OS-X
David Beazley d...@dabeaz.com added the comment:
I'm not trying to be a pain here, but do you have any explanation as to why,
with fair scheduling, the observed execution time of multiple CPU-bound threads
is substantially worse than with unfair scheduling?
From your own benchmarks, consider
David Beazley d...@dabeaz.com added the comment:
One other comment. Running the modified fair.py file on my Linux system using
Python compiled with semaphores shows they they are *definitely* not fair.
Here's the relevant part of your test:
Treaded, balanced execution, with quickstop:
fast
David Beazley d...@dabeaz.com added the comment:
I'm definitely sure that semaphores were being used in my test---I stuck a
print statement inside the code that creates locks just to make sure it was
using the semaphore version :-).
Unfortunately, at this point I think most of this discussion
David Beazley d...@dabeaz.com added the comment:
As a followup, since I'm not sure anyone actually here actually tried a fair
GIL on Linux, I incorporated your suggested fairness patch to the
condition-variable version of the GIL (using this pseudocode you wrote as a
guide):
with gil.cond
David Beazley d...@dabeaz.com added the comment:
Here are the results of running the fair.py test on a Mac OS-X system using a
fair GIL implementation (modified condition variable):
[ Fair GIL, Dual-Core, OS-X ]
Sequential execution
slow: 5.490943 (0 left)
fast: 0.369257 (0 left)
Threaded
David Beazley d...@dabeaz.com added the comment:
I know that multicore processors are all the rage right now, but one thing that
concerns me about this patch is its effect on single-core systems. If you
apply this on a single-CPU, are threads just going to sit there and thrash
New submission from David Beazley d...@dabeaz.com:
The attached patch makes two simple refinements to the new GIL implemented in
Python 3.2. Each is briefly described below.
1. Changed mechanism for thread time expiration
In the current implementation, threads perform a timed-wait
David Beazley d...@dabeaz.com added the comment:
Can't decide whether this should be attached to Issue 7946 or not.
I will also post it there. (Feel free to close this issue if you want to keep
7946 alive).
--
___
Python tracker rep
David Beazley d...@dabeaz.com added the comment:
The attached patch makes two simple refinements to the new GIL implemented in
Python 3.2. Each is briefly described below.
1. Changed mechanism for thread time expiration
In the current implementation, threads perform a timed-wait
David Beazley d...@dabeaz.com added the comment:
One comment on that patch I just submitted. Basically, it's an attempt to make
an extremely simple tweak to the GIL that fixes most of the problems discussed
here in an extremely simple manner. I don't have any special religious
attachment
David Beazley d...@dabeaz.com added the comment:
Here is the result of running the writes.py test with the patch I submitted.
This is on OS-X.
bash-3.2$ ./python.exe writes.py
t1 2.83990693092 0
t2 3.27937912941 0
t1 5.54346394539 1
t2 6.68237304688 1
t1 8.9648039341 2
t2 9.60041999817 2
t1
David Beazley d...@dabeaz.com added the comment:
Greg,
I like the idea of the monitor suspending if no thread owns the GIL. Let me
work on that. Good point on embedded systems.
Antoine,
Yes, the gil monitor is completely independent and simply ticks along every 5
ms. A worst case
Changes by David Beazley d...@dabeaz.com:
Removed file: http://bugs.python.org/file17084/dabeaz_gil.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7946
David Beazley d...@dabeaz.com added the comment:
I've updated the GIL patch to reflect concerns about the monitor thread running
forever. This version has a suspension mechanism where the monitor goes to
sleep if nothing is going on for awhile. It gets resumed if threads try to
acquire
David Beazley d...@dabeaz.com added the comment:
I've also attached a new file schedtest.py that illustrates a subtle difference
between having the GIL monitor thread and not having the monitor.
Without the monitor, every thread is responsible for its own scheduling. If
you have a lot
Changes by David Beazley d...@dabeaz.com:
Removed file: http://bugs.python.org/file17094/dabeaz_gil.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7946
David Beazley d...@dabeaz.com added the comment:
New version of patch that will probably fix Windows-XP problems. Was doing
something stupid in the monitor (not sure how it worked on Unix).
--
Added file: http://bugs.python.org/file17102/dabeaz_gil.patch
Changes by David Beazley d...@dabeaz.com:
Removed file: http://bugs.python.org/file17102/dabeaz_gil.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7946
David Beazley d...@dabeaz.com added the comment:
Added extra pointer check to avoid possible segfault.
--
Added file: http://bugs.python.org/file17104/dabeaz_gil.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7946
David Beazley d...@dabeaz.com added the comment:
That second access of gil_last_holder-cpu_bound is safe because that block of
code is never entered unless some other thread currently holds the GIL. If a
thread holds the GIL, then gil_last_holder is guaranteed to have a valid value
David Beazley d...@dabeaz.com added the comment:
I stand corrected. However, I'm going to have to think of a completely
different approach for carrying out that functionality as I don't know how the
take_gil() function is able to determine whether gil_last_holder has been
deleted
Changes by David Beazley d...@dabeaz.com:
Removed file: http://bugs.python.org/file17104/dabeaz_gil.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7946
David Beazley d...@dabeaz.com added the comment:
One more attempt at fixing tricky segfaults. Glad someone had some eagle eyes
on this :-).
--
Added file: http://bugs.python.org/file17106/dabeaz_gil.patch
___
Python tracker rep...@bugs.python.org
David Beazley added the comment:
Just as a note, there is a distinct possibility that a property in a
superclass could be some other kind of descriptor object that's not a property.
To handle that case, the solution of
super(self.__class__, self.__class__).x.fset(self, value)
would actually
New submission from David Beazley:
Suppose you subclass a dictionary:
class mdict(dict):
def __getitem__(self, index):
print('Getting:', index)
return super().__getitem__(index)
Now, suppose you define a function and perform these steps that reassign the
function's
David Beazley added the comment:
I have run into this bug myself. Agree that a file-like object should never
report itself as closed unless .close() has been explicitly called on it.
HTTPResponse should not return itself as closed after the end-of-file has been
reached.
I think
New submission from David Beazley:
The bz2 library in Python3.3b1 doesn't support iteration for text-mode
properly. Example:
f = bz2.open('access-log-0108.bz2')
next(f) # Works
b'140.180.132.213 - - [24/Feb/2008:00:08:59 -0600] GET /ply/ply.html HTTP/1.1
200 97238\n'
g = bz2.open
David Beazley added the comment:
File attached.The file can be read in its entirety in binary mode.
--
Added file: http://bugs.python.org/file26673/access-log-0108.bz2
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15546
David Beazley added the comment:
I could have used this feature myself somewhat recently. It was in some code
involving document matching where zero or more possible candidates were
assigned a score and I was trying to find the max score. The fact that an
empty list was a possibility
David Beazley added the comment:
To me, the fact that m = max(s) if s else default doesn't work with iterators
alone makes this worthy of consideration.
I would also note that min/max are the only reduction functions that don't have
the ability to work with a possibly empty sequence
New submission from David Beazley:
I've been playing with the interaction of ctypes and memoryviews and am curious
about intended behavior. Consider the following:
import ctypes
d = ctypes.c_double()
m = memoryview(d)
m.ndim
0
m.shape
()
m.readonly
False
m.itemsize
8
As you can see
David Beazley added the comment:
I don't want to read the representation by copying it into a bytes object. I
want direct access to the underlying memory--including the ability to modify
it. As it stands now, it's completely useless
David Beazley added the comment:
Even with the d format, I'm not sure why it can't be cast to simple byte-view.
None of that seems to work at all.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15944
David Beazley added the comment:
I don't think memoryviews should be imposing any casting restrictions at all.
It's low level. Get out of the way.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15944
David Beazley added the comment:
No, I want to be able to access the raw bytes sitting behind a memoryview as
bytes without all of this casting and reinterpretation. Just show me the raw
bytes. Not doubles, not ints, not structure packing, not copying into byte
strings, or whatever
David Beazley added the comment:
Just to be specific, why is something like this not possible?
d = ctypes.c_double()
m = memoryview(d)
m[0:8] = b'abcdefgh'
d.value
8.540883223036124e+194
(Doesn't have to be exactly like this, but what's wrong with overwriting bytes
with bytes
David Beazley added the comment:
I should add that 0-dim indexing doesn't work as described either:
import ctypes
d = ctypes.c_double()
m = memoryview(d)
m[()]
Traceback (most recent call last):
File stdin, line 1, in module
NotImplementedError: memoryview: unsupported format d
David Beazley added the comment:
There's probably a bigger discussion about memoryviews for a rainy day.
However, the number one thing that would save all of this in my book would be
to make sure cast('B') is universally supported regardless of format including
endianness--especially
David Beazley added the comment:
One followup note---I think it's fine to punt on cast('B') if the memoryview is
non-contiguous. That's a rare case that's probably not as common.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org
New submission from David Beazley:
This is somewhat related to an earlier bug report concerning memory views, but
as far as I can tell, ctypes is not encoding the '.format' attribute correctly
in most cases. Consider this example:
First, create a ctypes array:
a = (ctypes.c_double * 3
New submission from David Beazley:
The PyUnicode_AsWideCharString() function is described as creating a new buffer
of type wchar_t allocated by PyMem_Alloc() (which must be freed by the user).
However, if you use this function, it causes the size of the original string
object to permanently
David Beazley added the comment:
I should quickly add, is there any way to simply have this function not keep
the wchar_t buffer around afterwards? That would be great.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16254
David Beazley added the comment:
Maybe it's not a bug, but I still think it's undesirable. Basically, you have
a function that allocates a buffer, fills it with data, and allows the buffer
to be destroyed. Yet, as a side effect, it allocates a second buffer, fills
it, and permanently
David Beazley added the comment:
Another note: the PyUnicode_AsUTF8String() doesn't leave the UTF-8 encoded byte
string behind on the original string object. I got into this thinking that
PyUnicode_AsWideCharString() might have similar behavior
David Beazley added the comment:
Funny thing, this feature breaks the interactive interpreter in the most basic
way on OS X systems. For example, the tab key won't even work to indent. You
can't even type the most basic programs into the interactive interpreter. For
example:
for i
David Beazley added the comment:
There are other kinds of libraries that might want to access the .buf
attribute. For example, the llvmpy extension. Exposing it would be useful.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org
David Beazley added the comment:
Well, a lot of things in this big bad world are dangerous. Don't see how this
is any more dangerous than all of the peril that tools like ctypes and llvmpy
already provide.
--
___
Python tracker rep
David Beazley added the comment:
One of the other goals of memoryviews is to make memory access less hacky. To
that end, it would be nice to have the .buf attribute available given that all
of the other attributes are already there. I don't see why people should need
to do some even more
David Beazley added the comment:
inal comment. It seems that one can generally avoid a lot of nastiness if
importlib.reload() is used instead. For example:
mod = sys.modules[spec.name] = module_from_spec(spec)
importlib.reload(mod)
This works for both source and Extension modules
David Beazley added the comment:
Sorry. I take back the previous message. It still doesn't quite do what I
want. Anyways, any insight or thoughts about this would be appreciated ;-).
--
___
Python tracker rep...@bugs.python.org
http
New submission from David Beazley:
I have been investigating some of the new importlib machinery and the addition
of ModuleSpec objects. I am a little curious about the intended handling of C
Extension modules going forward.
Backing up for a moment, consider a pure Python module. It seems
David Beazley added the comment:
Note: Might be related to Issue 19713.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23642
___
___
Python-bugs
David Beazley added the comment:
This is great news. Read the PEP draft and think this is a very good thing to
be addressing. Thanks, Brett.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23642
New submission from David Beazley:
Just a note that Python-3.5.0rc1 fails to compile on Mac OS X 10.8.5 with the
following compiler:
bash$ clang --version
Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.6.0
Thread model: posix
bash$
Here
David Beazley added the comment:
It's still broken on Python 3.5b4.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23441
___
___
Python-bugs
David Beazley added the comment:
This bug is still present in Python 3.5, but it occurs if you attempt to do a
readline() on a socket that's in non-blocking mode. In that case, you probably
DO want to retry at a later time (unlike the timeout case
David Beazley added the comment:
Please don't make flush() close the file on a BlockingIOError. That would be
an unfortunate mistake and make it impossible to implement non-blocking I/O
correctly with buffered I/O.
--
___
Python tracker <
1 - 100 of 122 matches
Mail list logo