[issue21074] Too aggressive constant folding

2017-05-23 Thread Andrew Dalke

Andrew Dalke added the comment:

Again, I do not propose any changes to the existing optimizer. I do not need 
anything changed for my code to work.

My goal is to counter-balance comments which suggest that perfectly normal code 
is somehow folly and arcane. These caused me some bewilderment and self-doubt 
as I tried to establish that my test suite was not, in fact, poorly written. 
Others with the same issue should not face the same confusion. 

I especially do not want to see the years of experience with the current 
optimizer used to justify repeating the same decisions in some future AST-based 
optimizer. http://bugs.python.org/issue2506#msg64764 gives an example of how 
the lack of complaints over several years is used to argue against changing 
compiler behavior.

Terms like "folly" and "arcane" also suggest an outright rejection of 
considering to support in the future what seems like totally reasonable code. 

I realize now that there is a more immediately actionable item. I have just 
added #30440 as a request to document these effects. I have removed my name 
from its nosy list in hopes of reducing Raymond Hettinger's concerns about 
comfort and safety, and thus perhaps increase the likelihood that this will be 
documented.

"I apologize if you were offended", which I will take as being sincere, happens 
to also be one of the most common examples of an insincere apology. Bowing out 
when there is a reference to the CoC gives undue power to others, and hinders 
the ability to apply its spirit to all but the most egregious situations.

Even if I accept the idea that "sane" and "insane" have technical meanings, 
that does not exempt their use from questions about being considerate and 
respective. Django and others replaced their use of the technical terms 
"master" and "slave", following a trend which is at least 13 years old; see 
http://edition.cnn.com/2003/TECH/ptech/11/26/master.term.reut/ . Note that I am 
not proposing to avoid using the terms "sane" and "insane", only asserting that 
there is no clean exception for words which also have a technical sense or 
meaning, even when used for that technical sense.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21074>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30440] document peephole optimizer effects

2017-05-23 Thread Andrew Dalke

New submission from Andrew Dalke:

The peephole optimizer is an overall benefit to Python but it has some 
side-effects that occasionally cause problems. These are well-known in the 
issue tracker, but there is no other documentation which will help a Python 
programmer figure out which constructs may be a problem.

1) it will compute large integer constants and save them in the .pyc file. The 
following results in a 19M .pyc file. 

def compute_largest_known_prime():
  return 2**74207281 - 1

As an example of someone affected by the compile-time evaluation, see 
https://stackoverflow.com/questions/34113609/why-does-python-preemptively-hang-when-trying-to-calculate-a-very-large-number/
 . Note the many people who struggled to find definitive documentation.

2) it will create and discard large strings. Consider this variant of the code 
in test_zlib.py, where I have replaced the imported module variable "_1G" with 
its value:

@bigmemtest(size=_4G + 4, memuse=1, dry_run=False)
def test_big_buffer(self, size):
data = b"nyan" * (2**30 + 1)  # the original uses "_1G"
self.assertEqual(zlib.crc32(data), 1044521549)
self.assertEqual(zlib.adler32(data), 2256789997)

The byte compiler will create the ~4GB string then discard it, even though the 
function will not be called on machines with insufficient RAM.

As an example of how I was affected by this, see #21074 .

3) The optimizer affects control flow such that the coverage.py gives false 
positives about unreached code.

As examples of how people are affected, see #2506 and 
https://bitbucket.org/ned/coveragepy/issues/198/continue-marked-as-not-covered 
. The last item on the coverage.py tracker asks "Is this limitation documented 
anywhere?"

I do not believe that the current peephole optimizer should be changed to 
support these use cases, only that there should be documentation about how the 
optimizer may negatively affect real-world code.

The domain expert on this topic is Raymond Hettinger. He does not feel safe in 
issues where I am involved. As I believe my continued presence on this issue 
will inhibit the documentation changes which I think are needed, I will remove 
my name from this issue and not be further involved.

--
assignee: docs@python
components: Documentation
messages: 294248
nosy: dalke, docs@python
priority: normal
severity: normal
status: open
title: document peephole optimizer effects
versions: Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30440>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30440] document peephole optimizer effects

2017-05-23 Thread Andrew Dalke

Changes by Andrew Dalke <da...@dalkescientific.com>:


--
nosy:  -dalke

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30440>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21074] Too aggressive constant folding

2017-05-22 Thread Andrew Dalke

Andrew Dalke added the comment:

I do not think quoting the Zen of Python helps anything. As I wrote, "it gives 
different answers depending on where one draws the line." This includes 
"practicality beats purity".

>From my viewpoint, the peephole optimizer isn't going to change because the 
>core developers prioritize the purity of not adding special cases over the 
>practicality of supporting reasonable real-world code. Or the purity of the 
>long-awaited AST optimizer over the practicality of changing the existing, 
>fragile peephole optimizer.

I also appreciate the viewpoint that the practicality of a maintainable 
peephole optimizer beats the impossible purity of trying to support all use 
cases gracefully. My line, of course, only wants it to handle my use case, 
which is the issue reported here.

My goal from this is not to re-open the topic. It is to provide a 
counter-balance to opinions expressed here that place all blame onto the 
programmer whose 'folly' lead to 'arcane' and 'insane' code. (The 'insane' is 
used at http://bugs.python.org/issue30293#msg293172 as "Burdening the optimizer 
with insanity checks just slows down the compilation of normal, sane code.")

The use case pulled from my project, which is very near to the original report 
by INADA Naoki, seems entirely sane and not at all arcane. How else might one 
test 64-bit addressing than by constructing values which are over 4GB in 
length? Indeed, Python itself has similar test code. Quoting 
Lib/test/test_zlib.py:


# Issue #10276 - check that inputs >=4GB are handled correctly.
class ChecksumBigBufferTestCase(unittest.TestCase):

@bigmemtest(size=_4G + 4, memuse=1, dry_run=False)
def test_big_buffer(self, size):
data = b"nyan" * (_1G + 1)
self.assertEqual(zlib.crc32(data), 1044521549)
self.assertEqual(zlib.adler32(data), 2256789997)


Is the difference between happiness and "folly" really the difference between 
writing "_1G" and "2**30"? If so, how are people supposed to learn the true 
path? Is that not exactly the definition of 'arcane'?

The Code of Conduct which governs comments here requests that we be considerate 
and respective. Terms like 'folly' and 'arcane', at least for what I think is 
an entirely reasonable use case, seems to run counter to that spirit.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21074>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30416] constant folding opens compiler to quadratic time hashing

2017-05-21 Thread Andrew Dalke

Andrew Dalke added the comment:

A complex solution is to stop constant folding when there are more than a few 
levels of tuples. I suspect there aren't that many cases where there are more 
than 5 levels of tuples and where constant creation can't simply be assigned 
and used as a module variable.

This solution would become even more complex should constant propagation be 
supported.

Another option is to check the value about to be added to co_consts. If it is a 
container, then check if it would require more than a few levels of hash calls. 
If so, then simply add it without ensuring uniqueness.

This could be implemented because the compiler could be told how to carry out 
that check for the handful of supported container types.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30416>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21074] Too aggressive constant folding

2017-05-21 Thread Andrew Dalke

Andrew Dalke added the comment:

I know this issue was closed many years ago, and I don't propose re-opening it. 
I write this comment because some of the earlier comments here make it sound 
like only a foolish or perverse programmer might be affected by this 'too 
aggressive constant folding'. I'll provide a real-world example of how it 
affected me. It took me several hours to track it down, and even longer to 
decide that the fault shouldn't be solely attributed to poor coding practices 
on my side.

I recently updated a code base from Python 2.7 to Python 3.5+. It contains a C 
extension which handles 64-bit indexing. One of the test files, 
"test_large.py", exercises the API with multi-gigabyte strings. It typically 
takes about 10 minutes to run so it isn't part of the normal test suite. 
Instead, it's decorated with a @unittest.skipUnless(), and only enabled if the 
file is executed directly or if an environment variable is set.

The file creates about 25 multi-GB string constants, like 's = b"\xfe" * 
(2**32+1)'. Those alone require a minute to create, but that's acceptable 
overhead because these tests are usually skipped, and when not skipped are only 
10% of the total run-time. Here is an example extracted from my code; this 
tests the population count on a byte string:

RUN_ALL = "LARGE_TESTS" in os.environ
if __name__ ==  "__main__":
RUN_ALL = True

@unittest.skipUnless(RUN_ALL, "large tests not enabled; set LARGE_TESTS")
class LargeTests(unittest.TestSuite):
def test_popcount(self):
s = b"\xfe\xff" * (2**31 + 1)
self.assertEqual(bitops.byte_popcount(s), 15*(2**31 + 1))

if __name__ ==  "__main__":
unittest.main()

As part of the update I did a 'move function' refactoring across the code base 
and re-ran the tests. Unit test discovery seemed to hang and ^C didn't work. 
Investigation showed it was in __import__("test_large"), which was needed 
because I had changed code in test_large.py. I finally figured out it was due 
to constant folding, which created the string, found it was too large, and 
discarded it. Test discovery took a minute, even though all of the tests were 
marked as 'skip' and would not be called.

Once done, the compiler generated a .pyc file. I hadn't noticed the problem 
earlier because the .py file rarely changed, so rarely needed to be recompiled. 
It would have a bigger issue if I ran test_large.py directly, as that will 
always trigger the one minute compilation, even if I specified a test which 
didn't use them. (There were no bugs in 64-bit handling during the update so I 
didn't need to do that.)

I was going to report the issue, then found that INADA Naoki had reported 
almost exactly the same issue here, back in 2014.

I was bewildered by some of the comments here, because they seemed to suggest I 
was at fault for writing such poor quality code in the first place. Others may 
be in the same boat as me, so I'll make a few points to add some 
counter-balance.

"Are we really supposed to protect programmers from their own folly by 
second-guessing when constant folding might be required and when it might not?"

If there is a 'folly', it is shared with the developers of Python's 
constant-folding implementation who thought there wouldn't be a problem, and 
provide no mechanisms (like #2506 proposed in 2008 to disable optimization; 
also available in #26145) which might help people like me diagnose a problem. 
But I don't think there was any folly. There was an engineering decision that 
the benefits of constant folding outweighed the negatives. Just like in my case 
there was an engineering decision that constant expressions which worked in 
Python 2.5-2.7 didn't need to be made future-proof against improved 
constant-folding.

"How is the interpreter supposed to know the function isn't called?"

Perhaps a standard-library decorator which says that a function will be skipped 
when run as a unit test? But really, the question should be "how is the 
*byte-code compiler* supposed to know". This highlights a shift between the 
Python I started with, which left everything up to the run-time virtual machine 
interpreter, and the smarter compile-time optimizer we have now. As it gets 
smarter, we developers seem to need to know more about how the optimizer works 
in order to avoid unwanted side-effects. Currently this knowledge is 'arcane'.

"simply declare a manifest constant and use that instead"

The fundamental problem is there's no way for a programmer to create large 
constant value which is safe from a sufficiently smart compiler, and nothing 
which outlines how smart the compiler will get. Instead, people figure out what 
works operationally, but that's specific to a given CPython version.

My code ran into problems because Python's constant folding improved from under 
me. Even if I follow that advice, how do I avo

[issue30416] constant folding opens compiler to quadratic time hashing

2017-05-20 Thread Andrew Dalke

New submission from Andrew Dalke:

Others have reported issues like #21074 where the peephole compiler generates 
and discards large strings, and #30293 where it generates multi-MB integers and 
stores them in the .pyc.

This is a different issue. The code:

  def tuple20():
return 1,)*20,)*20,)*20,)*20,)*20,)*20,)*20,)*20

takes over four minutes (257 seconds) to compile on my machine. The seemingly 
larger:

  def tuple30():
return 1,)*30,)*30,)*30,)*30,)*30,)*30,)*30,)*30

takes a small fraction of a second to compile, and is equally fast to run.

Neither code generates a large data structure. In fact, they only needs about 
1K.

A sampling profiler of the first case, taken around 30 seconds into the run, 
shows that nearly all of the CPU time is spent in computing the hash of the 
highly-nested tuple20, which must visit a very large number of elements even 
though there are only a small number of unique elements. The call chain is:

Py_Main -> PyRun_SimpleFileExFlags -> PyAST_CompileObject -> compiler_body -> 
compiler_function -> compiler_make_closure -> compiler_add_o -> PyDict_GetItem 
and then into the tuple hash code.

It appears that the peephole optimizer converts the highly-nested tuple20 into 
a constant value. The compiler then creates the code object with its co_consts. 
Specifically, compiler_make_closure uses a dictionary to ensure that the 
elements in co_consts are unique, and mapped to the integer used by LOAD_CONST.

It takes about 115 seconds to compute hash(tuple20). I believe the hash is 
computed twice, once to check if the object is present, and the second to 
insert it. I suspect most of the other 26 seconds went to computing the 
intermediate constants in the tuple.

Based on the previous issues I highlighted in my first paragraph, I believe 
this report will be filed under "Doctor, doctor, it hurts when I do this"/"Then 
don't do it." I see no easy fix, and cannot think of how it would come about in 
real-world use.

I point it out because in reading the various issues related to the peephole 
optimizer there's a subset of people who propose a look-before-you-leap 
technical solution of avoiding an optimization where the estimated result is 
too large. While it does help, it does not avoid all of the negatives of the 
peephole optimizer, or any AST-based optimizer with similar capabilities. I 
suspect even most core developers aren't aware of this specific negative.

--
components: Interpreter Core
messages: 294050
nosy: dalke
priority: normal
severity: normal
status: open
title: constant folding opens compiler to quadratic time hashing
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30416>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29211] assertRaises with exceptions re-raised from a generator kills generator

2017-01-08 Thread Andrew Dalke

New submission from Andrew Dalke:

The unittest assertRaises/assertRaisesRegex implementation calls 
traceback.clear_frames() because of issue9815 ("assertRaises as a context 
manager keeps tracebacks and frames alive").

However, if the traceback is from an exception created in a generator, caught, 
and re-raised outside of the generator, then the clear_frames() will cause the 
generator to raise a StopIteration exception the next time it is used.

Here is a reproducible where I create a generator and wrap it inside of an 
object API:

def simple_gen():
yield 1, None
try:
1/0
except ZeroDivisionError as err:
yield None, err
yield 3, None

class Spam:
def __init__(self):
self.gen = simple_gen()
def get_next(self):
value, err = next(self.gen)
if err is not None:
raise err
return value

I can test this without unittest using the following:

def simple_test():
spam = Spam()
assert spam.get_next() == 1
try:
spam.get_next()
except ZeroDivisionError:
pass
else:
raise AssertionError
assert spam.get_next() == 3
print("simple test passed")

simple_test()


This prints "simple test passed", as expected.

The unittest implementation is simpler:

import unittest

class TestGen(unittest.TestCase):
def test_gen(self):
spam = Spam()
self.assertEqual(spam.get_next(), 1)
with self.assertRaises(ZeroDivisionError):
spam.get_next()
self.assertEqual(spam.get_next(), 3)

unittest.main()

but it reports an unexpected error:

==
ERROR: test_gen (__main__.TestGen)
--
Traceback (most recent call last):
 File "clear.py", line 40, in test_gen
   self.assertEqual(spam.get_next(), 3)
 File "clear.py", line 13, in get_next
   value, err = next(self.gen)
StopIteration

--
Ran 1 test in 0.000s

FAILED (errors=1)

I have tracked it down to the call to traceback.clear_frames(tb) in 
unittest/case.py. The following ClearFrames context manager will call 
traceback.clear_frames() if requested. The test code uses ClearFrames to 
demonstrate that the call to clear_frames() is what causes the unexpected 
StopIteration exception:


import traceback

class ClearFrames:
   def __init__(self, clear_frames):
   self.clear_frames = clear_frames
   def __enter__(self):
   return self

   def __exit__(self, exc_type, exc_value, tb):
   assert exc_type is ZeroDivisionError, exc_type
   if self.clear_frames:
   traceback.clear_frames(tb)  # This is the only difference between 
the tests.
   return True

# This is essentially the same test case as before, but structured using
# a context manager that either does or does not clear the traceback frames.
def clear_test(clear_frames):
spam = Spam()
assert spam.get_next() == 1
with ClearFrames(clear_frames):
spam.get_next()
try:
assert spam.get_next() == 3
except StopIteration:
print(" ... got StopIteration")
return
print(" ... clear_test passed")

print("\nDo not clear frames")
clear_test(False)
print("\nClear frames")
clear_test(True)


The output from this test is:

Do not clear frames
 ... clear_test passed

Clear frames
 ... got StopIteration

There are only a dozen or so tests in my code which are affected by this. 
(These are from a test suite which I am porting from 2.7 to 3.5.) I can easily 
re-write them to avoid using assertRaisesRegex.

I have no suggestion for a longer-term solution.

--
components: Library (Lib)
messages: 285006
nosy: dalke
priority: normal
severity: normal
status: open
title: assertRaises with exceptions re-raised from a generator kills generator
type: behavior
versions: Python 3.5

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29211>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23455] file iterator deemed broken; can resume after StopIteration

2015-02-12 Thread Andrew Dalke

New submission from Andrew Dalke:

The file iterator is deemed broken. As I don't think it should be made 
non-broken, I suggest the documentation should be changed to point out when 
file iteration is broken. I also think the term 'broken' is a label with 
needlessly harsh connotations and should be softened.

The iterator documentation uses the term 'broken' like this (quoting here from 
https://docs.python.org/3.4/library/stdtypes.html):

  Once an iterator’s __next__() method raises StopIteration,
  it must continue to do so on subsequent calls. Implementations
  that do not obey this property are deemed broken.

(Older versions comment This constraint was added in Python 2.3; in Python 
2.2, various iterators are broken according to this rule.)

An IOBase is supposed to support the iterator protocol (says 
https://docs.python.org/3.4/library/io.html#io.IOBase ). However, it does not, 
nor does the documentation say that it's broken in the face of a changing file 
(eg, when another process appends to a log file).

  % ./python.exe 
  Python 3.5.0a1+ (default:4883f9046b10, Feb 11 2015, 04:30:46) 
  [GCC 4.8.4] on darwin
  Type help, copyright, credits or license for more information.
   f = open(empty)
   next(f)
  Traceback (most recent call last):
File stdin, line 1, in module
  StopIteration
  
   ^Z
  Suspended
  % echo Hello!  empty
  % fg
  ./python.exe

   next(f)
  'Hello!\n'

This is apparently well-known behavior, as I've come across several references 
to it on various Python-related lists, including this one from Miles in 2008: 
https://mail.python.org/pipermail/python-list/2008-September/491920.html .

  Strictly speaking, file objects are broken iterators:

Fredrik Lundh in the same thread ( 
https://mail.python.org/pipermail/python-list/2008-September/521090.html ) says:

  it's a design guideline, not an absolute rule

The 7+ years of 'broken' behavior in Python suggests that /F is correct. But 
while 'broken' could be considered a meaningless label, it carries with it some 
rather negative connotations. It sounds like developers are supposed to make 
every effort to avoid broken code, when that's not something Python itself 
does. It also means that my code can be called broken solely because it 
assumed Python file iterators are non-broken. I am not happy when people say my 
code is broken.

It is entirely reasonable that a seek(0) would reset the state and cause 
next(it) to not continue to raise a StopIteration exception. However, errors 
can arise when using Python file objects, as an iterator, to parse a log file 
or any other files which are appended to by another process.

Here's an example of code that can break. It extracts the first and last 
elements of an iterator; more specifically, the first and last lines of a file. 
If there are no lines it returns None for both values; and if there's only one 
line then it returns the same line as both values.

  def get_first_and_last_elements(it):
first = last = next(it, None)
for last in it:
pass
return first, last

This code expects a non-broken iterator. If passed a file, and the file were 1) 
initially empty when the next() was called, and 2) appended to by the time 
Python reaches the for loop, then it's possible for first value to be None 
while last is a string.

This is unexpected, undocumented, and may lead to subtle errors.

There are work-arounds, like ensuring that the StopIteration only occurs once:

  def get_first_and_last_elements(it):
first = last = next(it, None)
if last is not None:
for last in it:
pass
return first, last

but much existing code expects non-broken iterators, such as the Python example 
implementation at 
https://docs.python.org/2/library/itertools.html#itertools.dropwhile . (I have 
a reproducible failure using it, a fork(), and a file iterator with a sleep() 
if that would prove useful.)

Another option is to have a wrapper around file object iterators to keep 
raising StopIteration, like:

   def safe_iter(it):
   yield from it

   # -or-  (line for line in file_iter)

but people need to know to do this with file iterators or other potentially 
broken iterators. The current documentation does not say when file iterators 
are broken, and I don't know which other iterators are also broken.

I realize this is a tricky issue.

I don't think it's possible now to change the file's StopIteration behavior. I 
expect that there is code which depends on the current brokenness, the ability 
to seek() and re-iterate is useful, and the idea that next() returns text if 
and only if readline() is not empty is useful and well-entrenched. Pypy has the 
same behavior as CPython so any change will take some time to propagate to the 
other implementations.

Instead, I'm fine with a documentation change in io.html . It currently says:

  IOBase (and its subclasses) support the iterator protocol,
  meaning that an IOBase object can be iterated over yielding

[issue21523] quadratic-time compilation in the number of 'and' or 'or' expressions

2014-05-21 Thread Andrew Dalke

Andrew Dalke added the comment:

Live and learn. I did my first bisect today.

The first bad revision is:
changeset:   51920:ef8fe9088696
branch:  legacy-trunk
parent:  51916:4e1556012584
user:Jeffrey Yasskin jyass...@gmail.com
date:Sat Feb 28 19:03:21 2009 +
summary: Backport r69961 to trunk, replacing JUMP_IF_{TRUE,FALSE} with

I confirmed that the parent did not have the problem.

If you want me to diagnose this further, then I'll need some hints on what to 
do next.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21523
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21523] quadratic-time compilation in the number of 'and' or 'or' expressions

2014-05-18 Thread Andrew Dalke

New submission from Andrew Dalke:

Python's compiler has quadratic-time time behavior based on the number of and 
or or expressions. A profile shows that stackdepth_walk is calling itself in 
a stack at least 512 levels deep. (My profiler doesn't go higher than that.)

I've reduced it to a simple test case. Compiling functions of the form

def f(x):
x * x  # Repeat N times

takes linear time in the number of lines N, while functions of the form

def g(x):
x and x  # Repeat N times

takes quadratic time in N. Here's an example of running the attached 
demonstration code on a fresh build of Python from version control:

Results from 3.5.0a0 (default:de01f7c37b53, May 18 2014, 13:18:43)  

  numusing  using
 tests'*'   'and'
   100   0.002  0.002
   200   0.003  0.004
   400   0.005  0.010
   800   0.012  0.040
  1600   0.023  0.133
  3200   0.042  0.906
  6400   0.089  5.871
 12800   0.188  27.581
 25600   0.404  120.800

The same behavior occurs when I replace 'and' with 'or'.

The same behavior also occurs under Python 2.7.2, 3.3.5, 3.4.0. (I don't have 
builds of 3.1 or 3.2 for testing.)

However, the demonstration code shows linear time under Python 2.6.6:

Results from 2.6.6 (r266:84374, Aug 31 2010, 11:00:51)  

  numusing  using
 tests'*'   'and'
   100   0.003  0.001
   200   0.002  0.002
   400   0.006  0.008
   800   0.010  0.010
  1600   0.019  0.022
  3200   0.039  0.045
  6400   0.085  0.098
 12800   0.176  0.203
 25600   0.359  0.423
 51200   0.726  0.839

I came across this problem because my code imports a large machine-generated 
module. It was originally written for Python 2.6, where it worked just fine. 
When I tried to import it under new Pythons, the import appeared to hang, and I 
killed it after a minute or so.

As a work-around, I have re-written the code generator to use chained 
if-statements instead of the short-circuit and operator.

--
components: Interpreter Core
files: quadratic_shortcircuit_compilation.py
messages: 218742
nosy: dalke
priority: normal
severity: normal
status: open
title: quadratic-time compilation in the number of 'and' or 'or' expressions
type: performance
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5
Added file: 
http://bugs.python.org/file35279/quadratic_shortcircuit_compilation.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21523
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7827] recv_into() argument 1 must be pinned buffer, not bytearray

2012-02-04 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

It does look like #8104 resolved it. I tested on 2.7.2 and verified that it's 
no longer a problem, so I moved this to closed/duplicate.

--
resolution:  - duplicate
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7827
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13653] reorder set.intersection parameters for better performance

2011-12-23 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

My belief is that the people who use set.intersection with more than two terms 
are 1) going to pass in a list of sets, and 2) don't care about the specific 
order.

To check the validity of my belief, I did a Google Code Search to find cases of 
people using set intersection in Python. I searched for set\.intersection\(\* 
and \.intersection\(.*\, lang:^python$, among others.

I am sad to report that the most common way to compute set.intersection(*list) 
is by using reduce, like:

possible = (set(index[c]) for c in set(otp))
possible = reduce(lambda a, b: a.intersection(b), possible)


That comes from:
  git://github.com/Kami/python-yubico-client.git /yubico/modhex.py
and similar uses are in:
  git://github.com/sburns/PyCap.git /redcap/rc.py
  http://hltdi-l3.googlecode.com/hg//xdg/languages/morpho/fst.py
  http://dsniff.googlecode.com/svn/trunk/dsniff/lib/fcap.py


As well as in the Rosetta Code example for a simple inverted index, at:
  http://rosettacode.org/wiki/Inverted_index#Python

This was also implemented more verbosely in:

http://eats.googlecode.com/svn/trunk/server/eats/views/main.py
intersected_set = sets[0]
for i in range(1, len(sets)):
intersected_set = intersected_set.intersection(sets[i])

and 

http://iocbio.googlecode.com/svn/trunk/iocbio/microscope/cluster_tools.py
s = set (range (len (data[0])))
for d in zip(*data):
s = s.intersection(set(find_outliers(d, zoffset=zoffset)))
return sorted(s)

In other words, 7 codebases use manual pairwise reduction rather than use the 
equivalent code in Python. (I have not checked for which are due to backwards 
compatibility requirements.)

On the other hand, if someone really wants to have a specific intersection 
order, this shows that it's very easy to write.


I found 4 other code bases where set intersection was used for something other 
than binary intersection, and used the built-in intersection().



git://github.com/valda/wryebash.git/experimental/bait/bait/presenter/impl/filters.py
def get_visible_node_ids(self, filterId):
if filterId in self.idMask:
visibleNodeIdSets = [f.get_visible_node_ids(filterId) for f in 
self._filters]
return set.intersection(*[v for v in visibleNodeIdSets if v is not 
None])
return None



http://wallproxy.googlecode.com/svn/trunk/local/proxy.py
if threads[ct].intersection(*threads.itervalues()):
raise ValueError('All threads failed')
(here, threads' values contain sets)



git://github.com/argriffing/xgcode.git/20100623a.py
header_sets = [set(x) for x in header_list]
header_intersection = set.intersection(*header_sets)




http://pyvenn.googlecode.com/hg//venn.py
to_exclude = set()
for ii in xrange(0, len(self.sets)):
if (i  (2**ii)):
sets_to_intersect.append(sets_by_power_of_two[i  (2**ii)])
else:
to_exclude = to_exclude.union(sets_by_power_of_two[(2**ii)])
final = set.intersection(*sets_to_intersect) - to_exclude



These all find the intersection of sets (not iterators), and the order of 
evaluation does not appear like it will affect the result.

I do not know though if there will be a performance advantage in these cases to 
reordering. I do know that in my code, and any inverted index, there is an 
advantage.

And I do know that the current CPython implementation has bad worst-case 
performance.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13653
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13653] reorder set.intersection parameters for better performance

2011-12-22 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

In Issue3069, Arnaud Delobelle proposed support for multiple values to 
set.intersection() and set.union(), writing Intersection is optimized by 
sorting all sets/frozensets/dicts in increasing order of size and only 
iterating over elements in the smallest.

Raymond Hettinger commented therein that he had just added support for multiple 
parameters. However, he did not pick up the proposed change in the attached 
patch which attempts to improve the intersection performance.

Consider the attached benchmark, which constructs an inverted index mapping a 
letter to the set of words which contain that letter. (Rather, to word index.) 
Here's the output:

## Example output:
# a has 144900 words
# j has 3035 words
# m has 62626 words
# amj takes 5.902/1000 (verify: 289)
# ajm takes 0.292/1000 (verify: 289)
# jma takes 0.132/1000 (verify: 289)


Searching set.intersection(inverted_index[j], inverted_index[m], 
inverted_index[a]) is fully 44 times faster than searching a, m, j!

Of course, the set.intersection() supports any iterable, so would only be an 
optimization for when all of the inputs are set types.

BTW, my own experiments suggest that sorting isn't critical. It's more 
important to find the most anti-correlated set to the smallest set, and the 
following does that dynamically by preferentially choosing sets which are 
likely to not match elements of the smallest set:

def set_intersection(*input_sets):
N = len(input_sets)
min_index = min(range(len(input_sets)), key=lambda x: len(input_sets[x]))
best_mismatch = (min_index+1)%N

new_set = set()
for element in input_sets[min_index]:
# This failed to match last time; perhaps it's a mismatch this time?
if element not in input_sets[best_mismatch]:
continue

# Scan through the other sets
for i in range(best_mismatch+1, best_mismatch+N):
j = i % N
if j == min_index:
continue
# If the element isn't in the set then perhaps this
# set is a better rejection test for the next input element
if element not in input_sets[j]:
best_mismatch = j
break
else:
# The element is in all of the other sets
new_set.add(element)
return new_set


Using this in the benchmark gives

amj takes 0.972/1000 (verify: 289)
ajm takes 0.972/1000 (verify: 289)
jma takes 0.892/1000 (verify: 289)

which clearly shows that this Python algorithm is still 6 times faster (for the 
worst case) than the CPython code.

However, the simple sort solution:


def set_intersection_sorted(*input_sets):
input_sets = sorted(input_sets, key=len)
new_set = set()
for element in input_sets[0]:
if element in input_sets[1]:
if element in input_sets[2]:
new_set.add(element)
return new_set

gives times of 

amj takes 0.492/1000 (verify: 289)
ajm takes 0.492/1000 (verify: 289)
jma takes 0.422/1000 (verify: 289)

no doubt because there's much less Python overhead than my experimental 
algorithm.

--
components: Interpreter Core
files: set_intersection_benchmark.py
messages: 150124
nosy: dalke
priority: normal
severity: normal
status: open
title: reorder set.intersection parameters for better performance
type: enhancement
versions: Python 3.4
Added file: http://bugs.python.org/file24081/set_intersection_benchmark.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13653
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1602133] non-framework built python fails to define environ properly

2011-07-17 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

I confirm that under Python 2.7.2 while trying to build a 3rd-party package 
(from rdkit.org) I get the error


Linking CXX shared library ../../lib/libRDBoost.dylib
ld: warning: path '/usr/local/lib/libpython2.7.a' following -L not a directory
Undefined symbols:
  _environ, referenced from:
  _initposix in libpython2.7.a(posixmodule.o)
 (maybe you meant: cstring=ignore_environment)
ld: symbol(s) not found
collect2: ld returned 1 exit status

My Python-2.7 was configured with ./configure and is not a framework install. 
I applied the patch to my local 2.7 copy and the third party package builds 
without a problem.

--
nosy: +dalke

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1602133
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10809] complex() comments wrongly say it supports NaN and inf

2011-01-02 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

complex(nan) raises ValueError: complex() arg is a malformed string  while 
complex(float(nan)) returns (nan+0j). This was reported in 
http://bugs.python.org/issue2121 with the conclusion wont fix.

complex(inf) has the same behaviors.

The implementation in complexobject.c says 


/* a valid complex string usually takes one of the three forms:

 float  - real part only
 floatj - imaginary part only
 floatsigned-floatj   - real and imaginary parts

   where float represents any numeric string that's accepted by the
   float constructor (including 'nan', 'inf', 'infinity', etc.), and
   signed-float is any string of the form float whose first
   character is '+' or '-'.

This comment is wrong and it distracted me for a while as I tried to figure out 
why complex(nan) wasn't working. It should be fixed, with the word 
including replaced by excluding.

I don't have a real need for complex(nan) support - this was of intellectual 
interest only. Also of intellectual interest, PyPy 1.4 does accept 
complex(nan) but converts complex(nan+nanj) to (nannanj), so it suffers 
from the strange corner cases which Raymond points out when advocating for 
wont fix.

Because

--
assignee: d...@python
components: Documentation
messages: 125104
nosy: dalke, d...@python
priority: normal
severity: normal
status: open
title: complex() comments wrongly say it supports NaN and inf
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10809] complex() comments wrongly say it supports NaN and inf

2011-01-02 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

Well that's ... interesting. While I compiled 2.7 and was looking at the 2.7 
code my tests were against 2.6. 


Python 2.7 (trunk:74969:87651M, Jan  2 2011, 21:58:12) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type help, copyright, credits or license for more information.
 complex(nan-nanj)
(nan+nanj)
 

This means that the comments are correct and the error was in my understanding, 
as influenced by issue2121.

I therefore closed this.

--
resolution:  - out of date
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10698] doctest load_tests() typo

2010-12-13 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

doctest.html Section 24.2.5 Unittest API says:


def load_tests(loader, tests, ignore):
tests.addTests(doctest.DocTestSuite(my_module_with_doctests))
return test

That last line should be return tests

--
assignee: d...@python
components: Documentation
messages: 123904
nosy: dalke, d...@python
priority: normal
severity: normal
status: open
title: doctest load_tests() typo
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10698
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7827] recv_into() argument 1 must be pinned buffer, not bytearray

2010-02-01 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

Since I see the change to test needed, I've attached a diff against Python 
2.6's test_socket.py. I would have generated one against the 2.7 version in 
subversion but that test doesn't exit.

--
keywords: +patch
Added file: http://bugs.python.org/file16082/test_socket.py.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7827
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7827] recv_into() argument 1 must be pinned buffer, not bytearray

2010-01-31 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

In Python 2.6 and Python 2.7a2+, I can't socket.recv_into(a byte array 
instance).

I get a TypeError which complains about a pinned buffer. I have only an 
inkling of what that means. Since an array.array(b) works there, and  since 
it works in Python 3.1.1, and since I thought the point of a bytearray was to 
make things like recv_into easier, I think this exception is a bug in Python 
2.6 and 2.7.

Here's my reproducibles: 

Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin 
Type help, copyright, credits or license for more information. 
 import socket 
 sock = socket.socket() 
 sock.connect( (python.org, 80) ) 
 sock.send(bGET / HTTP/1.0\r\n\r\n) 
18 
 buf = bytearray(b  * 10) 
 sock.recv_into(buf) 

Traceback (most recent call last): 
  File stdin, line 1, in module 
TypeError: recv_into() argument 1 must be pinned buffer, not bytearray 

I expected a bytearray to work there. In fact, I thought the point of 
bytearray was to allow this to work. 
By comparison, an array of bytes does work: 
 import array 
 arr = array.array(b) 
 arr.extend(map(ord, This is a test)) 
 len(arr) 
14 
 sock.recv_into(arr) 
14 
 arr
array('b', [72, 84, 84, 80, 47, 49, 46, 49, 32, 51, 48, 50, 32, 70]) 
 .join(map(chr, arr))
'HTTP/1.1 302 F' 

I don't even know what a pinned buffer means, and searching 
python.org isn't helpful.

Using a bytearray in Python 3.1.1 *does* work: 
Python 3.1.1 (r311:74480, Jan 31 2010, 23:07:16) 
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin 
Type help, copyright, credits or license for more information. 
 import socket 
 sock = socket.socket() 
 sock.connect( (python.org, 80) ) 
 sock.send(bGET / HTTP/1.0\r\n\r\n) 
18 
 buf = bytearray(b  * 10) 
 sock.recv_into(buf) 
10 
 buf 
bytearray(b'HTTP/1.1 3') 

For reference, here's an example with 2.7a2+ (freshly built out of version 
control) showing that it does not work there.

Python 2.7a2+ (trunk:74969:77901M, Feb  1 2010, 02:44:24) 
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type help, copyright, credits or license for more information.
 import socket
 sock = socket.socket()
 sock.connect( (python.org, 80)  )
 b = bytearray(b  * 10)
 sock.recv_into(b)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: recv_into() argument 1 must be pinned buffer, not bytearray


--
components: IO
messages: 98644
nosy: dalke
severity: normal
status: open
title: recv_into() argument 1 must be pinned buffer, not bytearray
type: behavior
versions: Python 2.6, Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7827
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7192] webbrowser.get(firefox) does not work on Mac with installed Firefox

2009-10-23 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

I have Firefox and Safari installed on my Mac. Safari is the default.

I wanted to try out Crunchy (http://code.google.com/p/crunchy/). It's 
developed under Firefox and does not work under Safari. I tried. ;)

It starts the web browser with the following.

try:
client = webbrowser.get(firefox)
client.open(url)
return
except:
try:
client = webbrowser.get()
client.open(url)
return
except:
print('Please open %s in Firefox.' % url)

On my Mac, webbrowser.get(firefox) fails, so this ends up opening in 
Safari. Which does not work to view the code.

Thing is, I have Firefox installed, so it should work. But the Mac code in 
webbrowser appears to only open in the default browser.

The following bit of code works well enough to get crunchy to work

class MacOSXFirefox(BaseBrowser):
def open(self, url, new=0, autoraise=True):
subprocess.check_call([/usr/bin/open, -b, 
org.mozilla.firefox, url])

register(firefox, None, MacOSXFirefox('firefox'), -1)

but I don't know enough about the Mac nor about webbrowser to know if I'm 
the right path. For example, I don't know if there are ways to support 
'new' and 'autoraise' through /usr/bin/open or if there's a better 
solution.

Attached is the full diff.

--
components: Library (Lib)
files: webbrowser.py.diff
keywords: patch
messages: 94387
nosy: dalke
severity: normal
status: open
title: webbrowser.get(firefox) does not work on Mac with installed Firefox
type: feature request
Added file: http://bugs.python.org/file15188/webbrowser.py.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7192
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7172] BaseHTTPServer.BaseHTTPRequestHandler.responses[405] has a small mistake

2009-10-19 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

BaseHTTPServer.BaseHTTPRequestHandler.responses contains a mapping from 
HTTP status codes to the 2-ple (shortmessage, longmessage), based on RFC 
2616.

The 2-ple for 405 is ('Method Not Allowed','Specified method is invalid 
for this server.'),

RFC 405 says An origin server SHOULD return the status code 405 (Method 
Not Allowed) if the method is known by the origin server but not allowed 
for the requested resource.

I think the message should be Specified method is invalid for this 
resource. That is, change server to resource.

--
components: Library (Lib)
messages: 94262
nosy: dalke
severity: normal
status: open
title: BaseHTTPServer.BaseHTTPRequestHandler.responses[405] has a small mistake
type: feature request

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7172
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7172] BaseHTTPServer.BaseHTTPRequestHandler.responses[405] has a small mistake

2009-10-19 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

Wasn't thinking. I'm not quoting from RFC 405, I'm quoting the 405 
section from RFC 2616.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7172
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-09-11 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

I'm still undecided on if this is a bug or not.  The problem occurs even 
when I'm not reading data from a file of an unknown size.  My example 
causes a MemoryError on my machine even though the file I'm reading 
contains 0 bytes.

The problem is Python's implementation is alloc the requested bytes and 
truncate if needed vs what I expected read chunks at a time up to the 
requested number of bytes.  There's nothing in the documentation which 
states the implementation, although Note that this method may call the 
underlying C function fread more than once in an effort to acquire as 
close to size bytes as possible. leans slightly towards my 
interpretation.

I looked a little for real-world cases that could cause a denial-of-
service attack but didn't find one.

If there is a problem, it will occur very rarely.  Go ahead an mark it 
as will not fix or something similar.  I don't think the change in the 
code is justifiable.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2271] msi installs to the incorrect location (C drive)

2008-08-18 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

Yes, that installed Python 2.6 into the correct location (C:\Python26 
instead of into the root directory).

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2271] msi installs to the incorrect location (C drive)

2008-08-15 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

I also have this problem.  (2.5 msi installer under Win2K with a non-
admin account granted admin privs).  Python installs just fine under 
C:\ (instead of C:\Python25) but then I run into problems installing 
the win32 extensions.

Searching the web I found this posting from 2005
  http://mail.python.org/pipermail/python-list/2005-
September/341874.html

That poster created an SF bug report which is now issue1298962.  He 
linked to http://tinyurl.com/82dt2 which states:

Windows Installler has no recognition of power
users, so these users fall into the category of 
non admin when running an install. 

That describes exactly my situation.  The solution is, apparently:

To mark certain properties as safe for configuration,
you can add them to the SecureCustomProperties list
in the property table of the MSI file. 

which Martin reported here.  Martin suggested using orca, but I have no 
idea of what that is (unix/mac dweeb that I am), and it doesn't exist 
on this machine.

I know this is pretty much a me too report.  I'm doing so to say that 
it has been an ongoing problem here at my client's site.  They are not 
software developers here, and rather than trying to track down the 
right person with full admin rights to come to each person's desktop, 
they've been installing an old pre-msi version of Python.

I would like to see this fixed before 2.6 is released.  All I can do to 
help though is to test an installer, which I will do gladly.

--
nosy: +dalke

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

I tested it with Python 2.5 on a Mac, Python 2.5 on FreeBSD, and Python 
2.6b2+ (from SVN as of this morning) on a Mac.

Perhaps the memory allocator on your machine is making a promise it can't 
keep?

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

You're right.  I mistook the string implementation for the list one 
which does keep a preallocated section in case of growth.  Strings of 
course don't grow so there's no need for that.

I tracked the memory allocation all the way down to 
obmalloc.c:PyObject_Realloc .  The call goes to realloc(p, nbytes) which 
is a C lib call.  It appears that the memory space is not reallocated.

That was enough to be able to find the python-dev thread Darwin's 
realloc(...) implementation never shrinks allocations from Jan. 2005, 
Bob Ippolito's post realloc.. doesn’t? 
(http://bob.pythonmac.org/archives/2005/01/01/realloc-doesnt/ ) and 
Issue1092502 .

Mind you, I also get the problem on FreeBSD 2.6 so it isn't Darwin 
specific.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

FreeBSD is why my hosting provider uses.  Freebsd.org calls 2.6 legacy 
but the latest update was earlier this year.

There is shared history with Macs.  I don't know the details though.  I 
just point out that the problem isn't only on Darwin.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-08 Thread Andrew Dalke

New submission from Andrew Dalke [EMAIL PROTECTED]:

I wrote a buggy PNG parser which ended up doing several file.read(large 
value).  It causes a MemoryError, which was strange because the file was 
only a few KB long.

I tracked it down to the implementation of read().  When given a size 
hint it preallocates the return string with that size.  If the hint is 
for 10MB then the string returned will be preallocated fro 10MB, even if 
the actual read is empty.

Here's a reproducible

BLOCKSIZE = 10*1024*1024

f=open(empty.txt, w)
f.close()

f=open(empty.txt)
data = []
for i in range(1):
s = f.read(BLOCKSIZE)
assert len(s) == 0
data.append(s)


I wasn't sure if this is properly a bug, but since the MemoryError 
exception I got was quite unexpected and required digging into the 
source code to figure out, I'll say that it is.

--
components: Interpreter Core
messages: 70924
nosy: dalke
severity: normal
status: open
title: file read preallocs 'size' bytes which can cause memory problems
type: resource usage

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2009] Grammar change to prevent shift/reduce problem with varargslist

2008-02-05 Thread Andrew Dalke

Andrew Dalke added the comment:

I've been working from the Grammar file from CVS for 2.6 ... I thought.  
For example, I see # except_clause: 'except' [test [('as' | ',') 
test]] which is a 2.6-ism.

svn log says it hasn't changed since 2007-05-19, when except/as was 
added.

What did I miss?

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2009
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2009] Grammar change to prevent shift/reduce problem with varargslist

2008-02-04 Thread Andrew Dalke

New submission from Andrew Dalke:

I wrote a translator from the CFG used in the Grammar file into a form for PLY. 
 I 
found one problem with

varargslist: ((fpdef ['=' test] ',')*
  ('*' NAME [',' '**' NAME] | '**' NAME) |
  fpdef ['=' test] (',' fpdef ['=' test])* [','])

This grammar definition is ambiguous until the presence/lack of a *.  PLY 
complains:

state 469

(28) varargslist - fpdef EQUAL test COMMA .
(32) varargslist_star - fpdef EQUAL test COMMA .
(35) varargslist_star3 - COMMA . fpdef
(36) varargslist_star3 - COMMA . fpdef EQUAL test
(39) fpdef - . NAME
(40) fpdef - . LPAR fplist RPAR

  ! shift/reduce conflict for NAME resolved as shift.
  ! shift/reduce conflict for LPAR resolved as shift.

RPARreduce using rule 28 (varargslist - fpdef EQUAL test COMMA 
.)
COLON   reduce using rule 28 (varargslist - fpdef EQUAL test COMMA 
.)
STARreduce using rule 32 (varargslist_star - fpdef EQUAL test 
COMMA .)
DOUBLESTAR  reduce using rule 32 (varargslist_star - fpdef EQUAL test 
COMMA .)
NAMEshift and go to state 165
LPARshift and go to state 163

  ! NAME[ reduce using rule 32 (varargslist_star - fpdef EQUAL 
test 
COMMA 
.) ]
  ! LPAR[ reduce using rule 32 (varargslist_star - fpdef EQUAL 
test 
COMMA 
.) ]

fpdef  shift and go to state 515



My fix was to use this definition when I did the translation.

varargslist: ((fpdef ['=' test] (',' fpdef ['=' test])* 
   (',' '*' NAME [',' '**' NAME] | ',' '**' NAME | [','])) |
  ('*' NAME [',' '**' NAME]) |
  ('**' NAME))


So far I've not found a functional difference between these two definitions, 
and 
the only change to ast.c is to update the comment based on this section.

By making this change it would be easier for the handful of people who write 
parsers for Python based on a yacc-like look-ahead(1) parser to use that file 
more 
directly.

--
components: None
messages: 62055
nosy: dalke
severity: minor
status: open
title: Grammar change to prevent shift/reduce problem with varargslist
type: rfe

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2009
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2011] compiler.parse(1; ) adds unexpected extra Discard(Const(None)) to parse tree

2008-02-04 Thread Andrew Dalke

New submission from Andrew Dalke:

Python 2.6a0 (trunk:60565M, Feb  4 2008, 01:21:28) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type help, copyright, credits or license for more information.
 from compiler import parse
 parse(1;)
Module(None, Stmt([Discard(Const(1)), Discard(Const(None))]))

I did not expect the Discard(Const(None)).  Instead, I expected

Module(None, Stmt([Discard(Const(1))]))

--
components: Library (Lib)
messages: 62057
nosy: dalke
severity: minor
status: open
title: compiler.parse(1;) adds unexpected extra Discard(Const(None)) to parse 
tree
type: behavior
versions: Python 2.6

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2011
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2011] compiler.parse(1; ) adds unexpected extra Discard(Const(None)) to parse tree

2008-02-04 Thread Andrew Dalke

Andrew Dalke added the comment:

This really is a minor point.  I don't track the 3K list and I see now that the 
compiler module won't be in Python 3k - good riddance - so feel free to discard 
this as well as the other open compiler module bugs.

I want to experiment with adding instrumentation for branch coverage.  To do 
that I 
want to get the character ranges of each term in the AST.  The Python compiler 
module doesn't keep track of that so I'm developing a new parser based on PLY.

I've developed it and I'm now cross-checking the generated ASTs to verify they 
are 
identical.  In this case the compiler module generates an extra node in the AST 
so 
I had to add backwards compatibility support.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2011
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1889] string literal documentation differs from implementation

2008-01-21 Thread Andrew Dalke

New submission from Andrew Dalke:

The reference manual documentation for raw string literals says

Note also that a single backslash followed by a newline is
interpreted as those two characters as part of the string, *not* as a line
continuation.

This is not the observed behavior.

 s = ABC\
... 123
 s
'ABC123'
 

Line continuations are ignored by triple quoted strings.



In addition, the reference manual documentation for \x escapes says

| ``\xhh``| Character with hex value *hh*   | (4,5) |

where footnote (4) stays

   Unlike in Standard C, at most two hex digits are accepted.

However, the implementation requires exactly two hex digits:

 \x41
'A'
 \x4.
ValueError: invalid \x escape
 \x4 
ValueError: invalid \x escape


--
components: Documentation
messages: 61484
nosy: dalke
severity: minor
status: open
title: string literal documentation differs from implementation
versions: Python 2.5

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1889
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1367711] Remove usage of UserDict from os.py

2008-01-13 Thread Andrew Dalke

Andrew Dalke added the comment:

Ahh, so the bug here that the environ dict should use neither UserDict nor 
dict, it should implement the core {get,set,del}item and keys and use 
DictMixin.

Martin mentioned that the patch doesn't support setdefault.  He didn't note 
though that the current code also doesn't support the dictionary interface 
consistently.  This shows a problem with popitem.

 import os
 os.environ[USER]
'dalke'
 os.environ[USER] = nobody
 os.system(echo $USER)
nobody
0
 del os.environ[USER]
 os.system(echo $USER)

0
 os.environ[USER] = dalke
 while os.environ: print os.environ.popitem()
... 
('GROUP', 'staff')
('XDG_DATA_HOME', '/Users/dalke/.local/share')
('TERM_PROGRAM_VERSION', '133')
('CVS_RSH', 'ssh')
('LOGNAME', 'dalke')
('USER', 'dalke')
... removed for conciseness ...
('QTDIR', '/usr/local/qt')
 os.system(echo $USER)
dalke
0
 

Not enough people know about DictMixin.

_
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1367711
_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1367711] Remove usage of UserDict from os.py

2008-01-12 Thread Andrew Dalke

Andrew Dalke added the comment:

I was optimization tuning and wondered why UserDict was imported by os.  
Replacing 
UserDict with dict passes all existing regression tests.

I see the concerns that doing that replacement is not future proof.  Strange 
then 
that Cookie.py is acceptable.  There are three places in Lib which derive from 
dict, 
and two are in Cookie.py and in both cases it's broken because set_default does 
not 
go through the same checks that __setitem__ goes through.

(The other place is an internal class in _strptime.)

In looking over existing third-party code, I see this nuance of when to use 
UserDict 
vs. dict isn't that well known.  The documentation says The need for this 
class has 
been largely supplanted by the ability to subclass directly from dict, but 
that 
isn't true if anyone is worried about future-proofing and where the subclass 
changes 
one of the standard methods.

--
nosy: +dalke

_
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1367711
_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1367711] Remove usage of UserDict from os.py

2008-01-12 Thread Andrew Dalke

Andrew Dalke added the comment:

I should have added my preference.  I would like to see UserDict replaced with 
dict.  I didn't like seeing the extra import when I was doing my performance 
testing, through truthfully it's not a bit overhead.

As for future-proofing, of course when there's a change in a base class then 
there can be problems with derived classes.  When that happens, change all of 
the affected classes in the code base, and make sure to publish the change so 
third parties know about it.

Yes, there's a subtlety here that most people don't know about.  But it's not 
going to go away.

As for the evil that is 'exec':

  exec locals().data['MACHTYPE']=1; print MACHTYPE in {}, os.environ

gives me another way to mess things up.

A point of unit tests is to allow changes like this without worry about code 
breakage.  And it's not like other non-buggy code wasn't updated over time to 
reflect changing style and best practices.

If it's not compatible with Jython or IronPython or PyPy then ignore what I 
said, but fix Cookie and update the docs to make that clear as people do think 
that it's better to derived from dict for things like this than to derive from 
UserDict or UserDictMixin.

I can give a lightning talk about this at PyCon.  :)

_
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1367711
_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com